As the insatiable demand for ever faster and more powerful computing shows no sign of abating, chip designers have come up with a simple way to squeeze more out of aging semiconductor architectures: Push more power through them.
"CPUs are going to 200-300W per socket, GPUs are now well into the 500-600W range," HPE's CTO of high-performance computing, Nicolas Dube, said during DCD>Critical Power. "And they keep rising. I mean, there's roadmaps that are pointing to GPUs that will be close to a kilowatt each."
Such mindbogglingly high thermal design power is pushing rack densities up to 100kW and beyond. "I don't want to brag, but we can do up to 400kW in a rack," Dube said. "Now, they're double-wide racks, though, so one might say we're cheating, but we're still at 200kW per equivalent IT rack."
This article appeared in our Critical Power Supplement. Read the whole thing here
Racking up the challenges
Even if server footprint was not an issue, data centers can't deal with higher TDP (thermal design power) chips by simply making racks less dense, especially for HPC. "We're running tightly coupled workloads, where every single compute node and every single server needs to communicate not only with its neighbor, but with the rest of the system when we're running, with millions of threads of parallelism, all together," Dube explained.
"For many of the exascale systems, we're actually piping in 200 gigabits per GPU, and each node has one CPU and four GPUs so every one of those has 800 gigabits piping in and out to transmit data."
To handle all that fabric connectivity, "what we try to do is to have the last connection between the leaf switch and the node to still remain on copper and we're able to do that at 200 gigabit." Of course, that only works over short distances, so high density is "driven by the high connectivity we're looking for."
That brings the next challenge, how do you cool a 400kW rack? "This is what's pushing us to put liquid cooling in there, there's just no way you could air cool that - you'd literally be putting a wind tunnel in front of the rack," Dube said.
HPC is at the forefront of this shift, but Dube argues heavier workloads are coming to all industries, particularly as companies across the spectrum embrace machine learning and data analytics. "This is not just the high-end Department of Energy labs kind of places, this is coming to pretty much every data center, and to colos as well," he said.
Tate Cantrell, CTO of Icelandic HPC and colo business Verne Global, agreed. "When we're taking investors' money, we're putting it into a 15-30 year asset depending on how it's built," he said. "And I would say right now, anyone who's building a data center that is only able to accommodate air cooling, you might want to think twice."
Cantrell envisions a hybrid future, of smaller less-intensive workloads, and of high-density stuff that really requires careful cooling consideration.
This is going to be a tough transition for the industry, he warned. "I was sitting in a meeting - I can't talk about the meeting, or who was running it or what the topic was - and a hyperscaler piped up and said 'the last thing in the world we want to see right now is forcing us to go to liquid cooling.'"
He added: "They said 'yes, you the technologists say it's possible and say it's the greatest thing since sliced bread, but we don't want to go there.' Because, you know, there's an economical challenge to make their business models work in a situation where he has to go back and retrofit a bunch of infrastructure."
Dube concurred that it was important to consider the infrastructure needed, and the water supplies. "There's even the weight: The load-bearing of the floor might be quite a bit more than you're used to."
As high density becomes more commonplace, there are opportunities for other innovations.
"The amount of power that's being wasted in the actual power supply units, to get that incoming AC power to the chip level DC voltage really takes an efficiency cut," VP of electrical engineering at consultancy Morrison Hershfield, Mike Mosman, said.
"The problem is that you're trying to get a lot of people who have existing infrastructure out there to adopt something new."
Dube concurred: "You should plan for high voltage distribution in the data center. But is that high voltage AC or high voltage DC? And where do you make the transformation between AC and DC?"
Again, HPC is leading the way on such trials, "but ultimately it really comes down to where the hyperscalers and wider industry are going. In order to have the right cost structure HPC cannot drive the volume alone, it needs to be taken on by a lot bigger market."
Mosman was also insistent that the broader market needed to help define the future of denser compute. "Here's what we electrical engineers say about high-density computing: It's a mechanical problem... but [as for how that problem] is solved, we have to observe my three favorite words which are 'follow the money.'"