Mark Zuckerberg confirmed that energy constraints have become the largest bottleneck to building out AI data centers.

Speaking on the Dwarkesh Podcast, the Meta CEO echoed industry commentary on the challenges of building larger and larger data centers.

– Meta

"Over the last few years, there's been this issue of GPU production," he said. "Even companies that had the money to pay for the GPUs couldn't necessarily get as many as they wanted because of all these supply constraints.

"Now I think that's getting less, and you're seeing a bunch of companies thinking 'wow we should really just invest a lot of money in building out these things.' I think that will go [on] for some period of time."

With AI models requiring ever greater investment to improve "there is a capital question of at what point it stops being worth it to put the capital in," Zuckerberg said. "But I actually think that, before we run into that, we're going to run into energy constraints."

He said that while software is only "somewhat regulated" the energy industry is a heavily regulated sector. "When you're talking about building large new power plants or large build-outs and building transmission lines that cross public or private land... you're talking about many years of lead time."

He added: "If we wanted to stand up some massive facility, to power that is a very long-term project. I think [some people will] do it, but I don't think this is something that can be quite as magical as 'you get a level of AI, get a bunch of capital, and put it in [a big data center].'"

These constraints have held back Meta's own data center buildout. In late 2022, the company scrapped in-development facilities for a new AI design and is now developing a number of upgraded facilities.

"I think we would probably build out bigger clusters than we currently can if we could get the energy to do it," Zuckerberg said.

"No one has built a 1GW data center yet. I think it will happen. This is only a matter of time but it's not going to be next year."

Unmentioned in the podcast is Microsoft's potential plan to build a 5GW data center for OpenAI by 2030.

"Some of these things will take some number of years - I don't know how many - to build out. Just to put this in perspective, I think a gigawatt would be the size of a meaningful nuclear power plant only going towards training a model."

In a different part of the podcast, Zuckerberg talked about the Meta Training and Inference Accelerator and other custom silicon, and when it could be used for training its models.

"The approach that we took is first we basically built custom silicon that could handle inference for our ranking and recommendation type stuff, so Reels, News Feed, ads," he said.

"That was consuming a lot of GPUs. When we were able to move that to our own silicon, we're now able to use the more expensive Nvidia GPUs only for training. At some point we will hopefully have silicon ourselves that we can be using at first for training some of the simpler things, then eventually training these really large models."

The company this week announced that its AI assistant Llama 3 would roll out across its platforms, with a 400bn parameter version on the way.