There is an Easy and Affordable Way to Improve HPC Performance

AMD has sponsored this post.

Goldratt’s Theory of Constraints teaches that to optimize the utilization of any system, an engineer needs to maximize the usage of that system’s bottleneck. Due to their static nature, traditional HPC setups for simulation are not such a system. This is because accelerators such as graphics processing units (GPUs) and field programmable gate arrays (FPGAs) can be underutilized. Since the accelerators are physically coupled to the servers they operate with, if there is no simulation to run on that server, the GPU or FPGA sit idle.

Even if another simulation is running on another server and could benefit from the computational power of the accelerators, it is too laborious and impractical to rip them out of the idle server and plug them into the operational server. As a result, the idle hardware limits return on investment while continuing to produce heat and use electricity.

(Stock image.)

Composable infrastructure works a lot like an orchestra. The conductor directs which musicians will play within each ensemble to perform a certain melody. In this metaphor, the conductor is software logic, the musicians are server hardware (GPUs and FPGAs), and each melody is a simulation that needs to be computed.

This is where composable infrastructure can be beneficial. In this setup, all servers and accelerators are connected into one large ecosystem. Much like an orchestra conductor determines which musicians will play each of a symphony’s concurrent melodies, the software running composable infrastructure directs and redirects individual servers, GPUs and FGPAs to compute concurrent simulations as the need arises. As a result, accelerators are less likely to remain idle, which means simulations can run faster and more cost-efficiently.

For instance, if you have four GPUs and you are computing a single simulation, all four can be paired with one of the servers to get results quickly—like an ensemble playing a single melody. If you have four simulations going, each can be run through a separate server paired with a single GPU—like different ensembles playing different melodies in harmony. With this setup, engineers can maximize the utilization of the accelerators within an HPC.

The concept of composable infrastructure is not new; however, it wasn’t easy. “In the past, it was clunky, was not user friendly and it didn’t always work right,” says Brady Black, HPC Solution Engineer at AMD. “With today’s technology, you have the high-performance network, software and tooling to back it up. It’s a real solution—it’s not experimental.”

If you know where to look, composable infrastructure is no longer a complex, bespoke solution. Almost any organization that can afford HPC resources can make those systems composable.

What Are the Benefits of Composable Infrastructure?

Composable infrastructure offers simulation users greater flexibility than they have had in the past. Traditionally, the only choice was to send a simulation to a pre-constructed server. If the queue for that server was too long, the user could redirect the simulation to a cloud-based computational service.

However, by making the servers composable, large numbers of simulations can move through the queue in parallel—potentially at a higher rate overall. Alternatively, if results from a single simulation are needed quickly, all those same computational resources could be pulled into computing results in the shortest possible time.

(Stock image.)

Composable infrastructure makes it easy to add hardware to a server’s ecosystem. Instead of adding new FPGAs into each server, all the new FPGAs can be added into a centralized pool of FPGAs.

Another benefit is that different types of hardware are not located in individual servers; they are grouped into centralized resource boxes—or accelerator pools. This makes the expansion of the system much simpler, and neater, than it would be to expand multiple servers. Instead of opening four servers to add one GPU to each, the four GPUs can be added to a single pool of GPUs.

“You don’t have to make those hardware decisions up front,” Black says. “They can grow as software and hardware continue to be enhanced.”

What Has Changed to Make Composable Infrastructure Easier?

The major difference between traditional HPC and composable infrastructure is that instead of sharing only data between servers, the latter also shares hardware resources. “Usually when the HPC work finishes, the hardware configuration doesn’t change,” Black says. “In composable infrastructure, you’re adding more compute capabilities via software and removing that capability, to servers, as needed.”

(Stock image.)

By using PCIe cables in composable infrastructure, the system will see all hardware it’s connected to as local.

Composable infrastructure can be based on PCIe—the same connectors you might see when opening a server, desktop or laptop—thus expanding the physical network. Advancements in software, operating systems and network fabrics have made it easier to gain access to this orchestrated flexibility.

The benefit to using PCIe is that whenever equipment is paired with each other, they act like a local connection. “The server itself sees the device as local to itself,” Black says. “But in reality, it’s on an external PCIe bus. There is some latency—due to physics—but it behaves locally.”

Composable infrastructure connects all the server hardware available using physical PCIe connections. Accelerators are pooled together into appliances that enable you to plug in eight PCIe devices at a time. Multiple pooling appliances can be added to the PCIe network, software reassigns the pooled hardware as needed. “Resulting in the ability to manipulate the configuration of a server or set of servers to add/remove GPUs based on your applications needs,” said Black.

Is Composable Infrastructure Secure?

The fact that composable infrastructure is designed to easily connect and disconnect to various hardware—almost on a whim—might make some wonder if such a setup might be more vulnerable to hacking. However, just because the composable infrastructure is better at communication between the tools on its ecosystem does not mean it is easier to communicate with outside systems.

In fact, switching to a pure PCIe system can improve security. Traditionally, networks that use multiple networks start with PCIe and move onto something such as Ethernet and/or InfiniBand. For that communication to take place, the data will need to be temporarily stored, processed and forwarded to the new network. Each of these stopping points represents a potential security risk.

“PCIe, on the other hand, was designed for direct point to point communication and as a time-based protocol that, as a network, is not designed to be stored or forwarded,” says Black.

In other words, by switching to a pure PCIe network, composable infrastructure won’t just improve communication between hardware, it will also improve security within the HPC ecosystem.

To hear about how much composable infrastructure improved the throughput of simulation computations in the real world, check out the AMD HPC and AI Solutions Hub.