The US Department of Energy’s National Energy Research Scientific Computing Center (NERSC) and a consortium of five universities have partnered with Intel to launch a big data research center.

Its purpose is to determine whether or not current HPC systems can support a new generation of data-intensive workloads - 100 terabytes plus, using 100,000 CPU cores or more, and to develop new, scalable computing algorithms. 

Scaling up

The Big Data Center (BDC) is equipped with the NERSC’s Cray XC40 supercomputer called Cori, which contains 2,388 Intel Xeon ’Haswell’ processor nodes and 9,688 Intel Xeon Phi ’Knight’s Landing’ nodes, as well as a 1.8 petabyte Burst Buffer, which is a type of non-volatile storage that increases I/O performance, achieving as much as 1.7TB/sec. Cori was named the 6th most powerful supercomputer in the world by the Top500 list published in June 2017.

The five Intel Parallel Computing Centers (the University of California-Berkeley, University of California-Davis, New York University, Oxford University and the University of Liverpool) are working on research projects involving the development of new analytics and programming techniques to process vast data sets in fields which include astronomy, high-energy physics, climatology and meteorology. Like the DOE, they will be applying the HPC research results to advance their own.

Additionally, the director of BDC and the lead for NERSC’s data, analytics and services team, Prabhat, stated that all progress made at the BDC will be committed to open source and made available ”to peer HPC centers as well as the broader HPC and data analytics communities.” 

He said the research project aims to “solve DOE’s leading data-intensive science problems at scale.”

“The key is in developing algorithms in the context of the production stack. Our multi-disciplinary team consisting of performance optimization and scaling experts is well positioned to enable capability applications on Cori.”

Joseph Curley, director of Intel’s code modernization department, said that “the objective of the BDC comes from a common desire in the industry to have software stacks that can help the NERSC user base, using data driven methods, to solve their largest problems at scale on the Cori supercomputer.”

“So one of our main goals is to be able to use the supercomputer hardware to its fullest capability. Some underlying objectives at BDC are to build and harden the data analytics frameworks in the software stack so that developers and data scientists can use the Cori supercomputer in a productive way to get insights from their data. Our work with NERSC and the IPCCs will involve code modernization at scale as well as creating the software environment and software stack needed to meet these needs.”