Meet Ada Lovelace, NVIDIA’s Beautiful New GPU

Overview of the new Ada Lovelace GPU architecture. (Source: NVIDIA.)

When NVIDIA announced its newest graphics card architecture last month—Ada Lovelace, named after the pioneering computer scientist—it didn’t temper expectations. Describing the new architecture as “a thing of beauty,” NVIDIA promised bold improvements for its upcoming Ada-based graphics cards, claiming they would provide up to four times the performance of the previous-generation Ampere-based cards.

How does Ada enable such a big performance bump? In a white paper outlining the new architecture, NVIDIA explained how improved power efficiency, faster ray tracing, and a new feature called the Shader Execution Reordering (SER) are among the enhancements that make Ada so beautiful.

“The highly optimized NVIDIA Ada Lovelace GPU architecture combines third-generation RT Cores, fourth-generation Tensor cores, and next-generation CUDA cores for previously unattainable rendering, AI, graphics, and compute performance,” said Carl Flygare, Professional Graphics and Data Center product marketing manager at PNY.

Let’s break down these details to see what Ada can do for engineers.

New RT, Tensor and CUDA Cores

NVIDIA GPUs are built on three main types of cores: RT Cores for real-time ray tracing, Tensor Cores for machine learning acceleration, and CUDA Cores for basic graphics computations. Ada introduces a new generation of each of these cores.

RT Cores, now in their third generation, have twice the ray-triangle intersection throughput as before. This means that Ada-based cards can render reflections and shadows more accurately, and perform better denoising, which removes random variations in brightness and color to make images smoother and more consistent.

Block diagram of third-generation RT Cores. (Source: NVIDIA.)

Ada’s RT Cores also contain two new hardware units: the Opacity Micromap (OMM) engine and the Displaced Micro-Mesh (DMM) engine. The OMM engine allows faster ray tracing of translucent or irregularly shaped objects like fern leaves or tinted car windows. The DMM engine generates micro-triangles from a new type of graphics primitive called a micro-mesh, which allows ray tracing of increasingly complex geometries without a requisite increase in storage and processing time.

(Source: NVIDIA.)

“Ada’s third-generation RT Cores deliver tremendous improvements for use cases involving photorealistic rendering of movie content, architectural design reviews, and in-silico prototyping—or even design reviews—of product designs, with hardware innovations that also reduce GPU memory requirements, allowing even more sophisticated and complex designs to be viewed and manipulated photorealistically,” Flygare said.

Ensuring high ray tracing framerates is a problem that NVIDIA says cannot be solved with RT Core horsepower alone. That’s why Ada includes a new scheduling system called the Shader Execution Reordering (SER), which is designed to reduce RT bottlenecks. NVIDIA claims that it spent years of research and development on SER, the primary purpose of which is to increase shader efficiency on-the-fly.

“Shader Execution Reordering is as big of an innovation for GPUs as out-of-order execution was for CPUs back in the 1990s, offering 2-3x speedups for some RT workloads,” states the Ada white paper.

The Shader Execution Reordering pipeline in Ada GPUs. (Source: NVIDIA.)

Ada’s fourth-generation Tensor Cores heighten machine learning acceleration, delivering more than twice the F16, BF16, TF32, INT8 and INT4 performance of Ampere, according to NVIDIA. The new Tensor Cores also include the transformer engine that NVIDIA introduced with its latest data center GPU, Grace Hopper.

In addition, Flygare says, “Ada’s new Tensor cores support acceleration of a new FP8 precision data type and provide independent floating-point and integer data paths to further accelerate the use of mixed floating point and integer calculations.”

Lastly, Ada includes the next generation of CUDA Cores, which are responsible for lighting, shading, physics and other graphics computations. The biggest change to CUDA Cores is their number, with the AD102 GPU containing 70 percent more CUDA Cores than the equivalent chip from the last generation, for a total of 18,432.

More Memory and the RTX 6000 Ada Graphics Card

Another of Ada’s innovations is an increase in memory. The AD102 GPU includes 18MB of L1 cache (1.7x that of Ampere), 96MB of L2 cache (16x more than Ampere), and a 36MB register file. NVIDIA says this increase will benefit all applications, though complex operations such as ray tracing will see the most improvement.

The NVIDIA RTX 6000 Ada Generation graphics card. (Source: NVIDIA.)

The NVIDIA RTX 6000 Ada Generation graphics card, based on the AD102 GPU and set for release in December, will be equipped with a hefty 48GB of GDDR6 memory.

“With geometries, textures, lighting, compute, simulation, and multiple AIs all contending for GPU memory, the 48GB of GDDR6 ECC memory available with NVIDIA RTX 6000 Ada gives engineers and associated creative professions the large memory capacity necessary to work with massive datasets and workloads like rendering and simulation,” Flygare said.