Blue Genes for Sequoia

The IBM BlueGene/Q supercomputer named Sequoia installed at Lawrence Livermore National Laboratories recently regained the title - for IBM and the US - of the world’s fastest supercomputer. The title had previously been held by China and Japan respectively (a supercomputer review is held every 6 months and published as the top 500 list). It won the top spot by reaching a sustained 16.23 petaFLOPS, achieved in 23 straight hours of testing and experiencing no core failures (it reached 81% utilization as 20 petaFLOPS is its theoretical limit). A “FLOP” is a measure of processing performance and refers to floating point operations per second. A petaFLOP(S) is a quadrillion floating point operations per second. That computational power is “equivalent to the 6.7 billion people on earth using hand calculators and working together on a calculation 24 hours per day, 365 days a year, for 320 years…to do what Sequoia will do in one hour."

Image courtesy Lawrence Livermore National Laboratory

What’s more impressive is that the Sequoia is not only the most powerful supercomputer in the world, but it also holds the title of the most efficient supercomputer. The significance of energy efficiency is better understood when the cost of operation and maintenance is factored in. Sequoia consumes about 7,890kW of power while running it's 1,572,864 cores. While most supercomputers are air cooled, Sequoia is 91% water cooled using copper pipes that wind around each node card, and 9% air cooling. To put this into perspective, it is estimated that the previous top supercomputer - the K computer in Japan - costs over $10 million per year to operate consuming just under 10,000kW while its performance measured in petaFLOPS is about 10.51. Sequoia achieves 50% higher performance while consuming 20% less electricity than the K computer.

The Sequoia is a complex mixture of modules based on a scalable hierarchy. The system as a whole consists of 96 refrigerator-sized racks, each weighing about 4,500 pounds. Each of these racks in turn consists of 2 midplanes plus 1, 2, or 4 I/O drawers. The midplanes are made up of 16 node cards, which themselves are made up of 32 compute cards (plus optical modules and link chips). A compute card holds one chip combined with 16 GB of DDR memory.

Image courtesy Lawrence Livermore National Laboratory

This is where it gets interesting. The chip within the module is IBM's PowerPC A2, containing 16 individual processing units or cores. This processor delivers 205 gigaFLOPS with a clock running at 1.6 GHz. It is manufactured with IBM's copper silicon-on-insulator (SOI) process at the 45 nm process node. The silicon chip is 19×19 mm (359.5 mm²). The chip is made up of 1.47 billion transistors and consumes 55W. The Sequoia supercomputer contains 98,304 of these chips.

Although each A2 actually consists of 18 cores (PU0 to PU17), only 16 are used for computing - 1 is used for the operating system and 1 is used as a redundant spare. The spare can replace a faulty core at the time of manufacture, or can hot swap with another core while the chip is in service.

1,572,864 processing unit cores running together might provide incredible hardware power, but programming for a machine with that many cores is another story. The operating system currently used for Sequoia is Red Hat Enterprise Linux. Although the system currently runs on an open network that allows some scientists and institutions to run experiments utilizing the systems resources, it is expected that sometime next year the network will switch to a classified network and focus exclusively on running calculations aimed at expanding the lifespan of nuclear weapons.