Generative AI has changed the world.
Since ChatGPT’s debut in November 2022, the application has passed the Bar exam, been used by students in school, and automated mundane administrative tasks at work (and failed to write articles for DCD) . Now, hyperscale companies have entered the Generative AI race with Google’s Bard, Microsoft’s $10bn investment in OpenAI to merge it into Bing, and Amazon’s announcement of their AWS-focused generative AI app. Now, the AI conversation has changed from “if" we will use it, to “how” we will use it – and how much of the population will harness its capabilities.
Goldman Sachs predicts that generative AI technology could raise global GDP by seven percent, and 300 million jobs could be changed or diminished due to its implementation. However, the potential of these AI applications is limited by the data center infrastructure that supports them.
To unlock its full potential, the industry must invest in bringing a customized approach to high-speed connectivity – namely through chiplets and custom silicon solutions – before it is too late.
Generalized tech won't work for AI
Bringing a generalized approach to technology has worked for a long time, but this doesn’t work for AI. The widespread adoption and success of new technologies has always been predicated on accessibility and utility. Taking a custom approach to these technologies is required to ensure their survival and maximize their benefits to society. To understand why, we need to examine the current limitations of data center infrastructure.
Today's data centers weren’t built to support the ever-changing needs of generative AI. They rely on monolithic servers that are limited in their number of CPUs and memory models and are not flexible enough to support the demands of AI applications. There are two primary workloads critical to AI: training and inference. Training workloads provide access to vast amounts of data for the model to be trained on, while inference workloads are where we ask systems like ChatGPT a question, and the model searches the training data to generate answers.
To add context to how fast our data centers need to work for generative AI to succeed, Chat GPT-3 has 175 billion parameters compared to Chat GPT-2’s 1.5 billion, and it’s rumored that Chat GPT-4 has 10 trillion (or even 100 trillion) parameters. The depth of these neural networks has never been seen before. Ensuring the rapid transfer of data is critical in allowing these applications to thrive, but it can also come at a tremendous cost in terms of power consumption and the environment: it’s estimated that training Chat GPT-3 released over 552 tonnes of CO2 into the air.
Chiplets are at the forefront of our industry and provide the most efficient and cost-effective solution for the world’s data infrastructure through their highly-customizable nature. Given their small dies, they have higher yields, which lowers manufacturing costs and power consumption. Additionally, they provide a “more than Moore’s” ability to address the compute needs of AI apps compared to traditional GPUs that have been used to train AI models while also providing a more-flexible product configuration. Creating custom silicon solutions allows hyperscalers and data center operators to optimize each chip for specific workloads, including different AI workloads while bringing greater efficiency to manufacturing and reducing power consumption considerably.
As an industry, we could not be better positioned to see the ramping up of chiplets and rolling out of custom silicon as 3nm process nodes are coming online, which offer a 15 percent improvement in performance and a power consumption reduction of 30 percent.
The challenge ahead of us is vast yet simple: to fully realize the promise of generative AI applications, we must act now to implement wholesale upgrades to our global data infrastructure. While updates to global data infrastructure will incur large-scale upfront costs, failure to update will result in data centers constantly having to catch up to future data demands and threatens future technological development.
More in AI & Analytics
Conference Session Panel: What information in the data center do you really need to track?