Arm has always had a troubled relationship with the server space.

Licensees of its chip designs have made bold promises about their products and what they could bring to the enterprise data center, cloud, or high-performance computing field. But, while there has been some limited success, most efforts have ended in failure.

This time, however, the company argues that it is different. The stars have finally aligned, with competitive products available across the data center, cloud and supercomputing markets.

With many understandably cautious, Arm and HPE set out to try to prove the chip architecture’s viability in one of those spaces: The HPC field. The companies teamed up with Suse and the UK universities of Edinburgh, Bristol and Leicester to test Arm HPC systems for scientific workloads and evaluate whether they are worth investing in.

One year into a three year project, the Catalyst UK team believes that the results are proving promising.

Arming up

Arm logos
– Sebastian Moss

“The aim of the Catalyst program is to expand the Arm software ecosystem to ensure that the high-performance computing software and applications that currently run on x86 will run on the Arm environment,” Ben Bennett, director of HPC Strategy for EMEA at HPE, told DCD.

“We're trying to squeeze 25 years of cluster knowledge into a three year program.”

Speaking to DCD at their latest status meeting, Catalyst UK members were positive about the progress they were making, but noted a lot was still to be learned over the following two years.

Each of the three universities were given a system consisting of 64 HPE Apollo 70 nodes, each with two 32 core Marvell ThunderX2 processors. They use Suse Enterprise Linux Server for HPC and Suse Enterprise Storage, as well as software tools from the Arm Allinea Studio.

“The ThunderX2 from Marvell is one of the first Arm processors that has the memory bandwidth and the floating point to make it a serious contender for high-performance computing workloads,” Bennett said.

“I expect many of us have looked longingly at Calxeda and some of the early 32-bit variances,” he added, referencing an Arm processor company that ultimately failed, “but it has taken taken Marvell to actually build and deploy something that lets us actually do some of this work.”

Other notable flops include Qualcomm's Centriq line, a promising product that was unfairly killed when the company was forced to rapidly cut costs as it battled a hostile takeover attempt by Broadcom. AMD's Seattle project also bore little fruit, while Broadcom's Vulkan proved unsuccessful (although the technology was sold and ultimately ended up in ThunderX2). Another potential player, Huawei’s Kunpeng 920 server CPU, could be in trouble, with Arm pulling support from the company.

The tide does appear to be turning, however, with the ThunderX2 sold in supercomputers built by HPE, Cray and Atos - including the world’s largest Arm supercomputer, Astra. “The Astra machine was being developed and built at the same time as the Catalyst machines, it is the same nodes, and some of the learnings of building that have gone into Catalyst,” Bennett said.

The future will also see the release of Fujitsu’s A64FX Arm CPU, set to power Japan’s Post-K exascale supercomputer, also known as Fugaku.

With all that in mind, how have the Catalyst deployments held up? "We got our system in late November, and within three days we were running a course on it with 20 people - it basically just worked. And that's maybe not the most exciting message, but it is certainly very interesting,” Dr Michèle Weiland, project manager at Edinburgh’s EPCC, told DCD.

“We've been able to put codes on to the system, run them without really any problems at all. We were prepared for anything, but we didn't have any issues.”

The teams are trying various scientific workloads and testing the performance. “We're currently porting code that already exists, but we will in time be developing code [for the Arm architecture],” Dr Mark Wilkinson, director of the DiRAC HPC facility at the University of Leicester, said.

“And so we can learn, initially, which of our existing codes benefit the most and why. And then we can feed that back into the design of next generation algorithms that can actually take more advantage of the specific architectural features of the chip.”

Simon Burbidge, director of advanced computing and HPC at the University of Bristol, added: “If you can squeeze out more performance on those systems than you can on the others, you can therefore get more science results. That's our driving force. So having a new platform to do that on is really exciting.”

With the UK planning to purchase numerous supercomputers over the next few years, Wilkinson said, “it's key that we know which kind of algorithms can take advantage of Arm so that we can decide when we're procuring what research areas benefit the most from Arm and so that we can make sure we're deploying the right type of architecture.”