The installation of Aurora’s 10,624th and final “blade” marked a major milestone for the highly anticipated exascale supercomputer at the U.S. Department of Energy’s (DOE) Argonne National Laboratory.
After years of diligent work and planning, the system now contains all the hardware that will make it one of the most powerful supercomputers in the world when it is opened up for scientific research. Built by Intel and Hewlett Packard Enterprise (HPE), Aurora will be theoretically capable of delivering more than two exaflops of computing power, or more than 2 billion billion calculations per second.
The Aurora team has been building the system piece by piece over the last year and a half, installing blades and other components as they were delivered to the Argonne Leadership Computing Facility (ALCF), a DOE Office of Science user facility.
“We have been living and breathing the Aurora installation since the first pieces were delivered in November of 2021,” said Susan Coghlan, ALCF project director for Aurora. “While we still have a lot of work to do before we can roll the system out to scientists worldwide, it is incredibly exciting to have the final hardware in place.”
As the backbone of the system, Aurora’s blades are sleek rectangular units that house its processors, memory, networking and cooling technologies. The machine gets its computational muscle from a combination of state-of-the-art Intel CPUs (central processing units) and GPUs (graphics processing units). Each blade is equipped with two Intel Xeon CPU Max Series processors and six Intel Data Center GPU Max Series processors.
With each blade weighing in at around 70 pounds, the team needed a specialized machine to delicately install the units vertically into Aurora’s refrigerator-sized racks. Each of the system’s 166 racks contains 64 blades. The racks are spread out across eight rows, occupying the space of two professional basketball courts in the ALCF data center.
Before the system could be installed, Argonne had to carry out some major facility upgrades. This included adding new data center space to provide enough room for the supercomputer and building mechanical rooms and equipment to provide increased power and cooling capacity.
Now that the machine is fully assembled, researchers from the ALCF’s Aurora Early Science Program and DOE’s Exascale Computing Project will move their work to Aurora to begin scaling their applications on the full system. For the past few months they’ve been working on the Sunspot testbed, which is a test and development system that has the exact same architecture as Aurora but only on two racks. These early users help to stress test the supercomputer and identify potential bugs that need to be resolved ahead of its deployment.
“We’re looking forward to putting Aurora through its paces to make sure everything works as intended before we turn the system over to the broader scientific community,” Coghlan said.