Fujitsu’s New Supercomputer CPU Prototype

In the world of high-performance computing (HPC) systems, the K computer from Fujitsu evolved into a legacy supercomputer enmeshed in global industry and research in many different areas including manufacturing. It resides at the Riken Advanced Institute for Computational Science in Kobe (RIKEN), Japan, and has some unbelievable hardware specs. It was named “K” after the Japanese work “kei,” which means “10 quadrillion” (10,000,000,000,000,000). 

The K computer has 705,024 cores from 88 128 2.0 GHz eight-core SPARC64 VIIIfx processors. In each of the 864 cabinets that contain the processors are 96 computing nodes. Each computing node has a single processor and 16 GB of memory. (Image courtesy of Fujitsu and RIKEN.)

RIKEN and Fujitsu have created a Post-K CPU that is ARM-based and no longer relies on SPARC64 processors. They are in the process of building a custom ARM compiler for the system. The goal is to improve the application performance of the K Computer 50-100 times over.

Similarities and Differences Between the K Computer and the Post-K Computer

To maximize efficiency between the processing nodes, Fujitsu uses a switch-less network topology called a torus interconnect, combined with a mesh network. This allows the computing nodes to connect directly to as many other nodes as possible. The Post-K CPU uses the third generation (Tofu3) of the same mesh/torus interconnect architecture found in the K Computer.

Both the K Computer and Post-K CPU use a three-level storage hierarchy system. The main difference between the two is the custom-designed multi-core microarchitecture to implement the ARMv8-A ISA microprocessors. Each ARMv8-A ISA has 48 cores and supports ARM Scalable Vector Extension (SVE) 512-bit wide Single Instruction Multiple Data (SIMD) operations.

Specifications for Post-K

Bottom Line

The Post-K processors with ARM instruction sets will yield an intensely powerful supercomputer capable of better performance than many general-purpose server processors and significantly improve its usability through the integration of the ARM instruction set.