Design and Engineering Performance and Flexibility: The NVIDIA L4 Tensor Core GPU

PNY has submitted this article.

Deep learning, generative AI, simulation, ray traced rendering, powerful graphics and virtualized desktops require GPU-accelerated computing to realize the power of current CAD and CAE applications.

Producing photorealistic images directly from CAD files via ray tracing is becoming common, driven by design reviews or the need of marketing and sales for quicker content access. Organizations are reducing power used in data centers, in the cloud and at the edge to reduce TCO and implement sustainability strategies.

The NVIDIA L4 Tensor Core GPU is NVIDIA’s most efficient accelerator for mainstream servers, compatible with vast numbers of current servers, and delivers a highly parallel computing platform designed specifically for HPC, AI and visualization workloads. This whitepaper explores the NVIDIA L4 GPU based on the Ada Lovelace architecture from a performance, energy efficiency and installed base compatibility perspective.

Capabilities modern GPUs must deliver

Modern GPUs must handle workloads across data centers including AI, big data analytics, data science, simulation and professional visualization. The NVIDIA L4 features 7424 NVIDIA CUDA cores for outstanding design and engineering performance to tackle complex problems with productivity and innovation. A wide variety of L4 software support is available, including NVIDIA AI Enterprise that allows organizations to widely deploy AI.

NVIDIA L4 GPU. (Image: PNY.)

Small form factor GPU works in almost any server

Organizations require GPUs that work across servers. The L4 delivers a low-profile form factor operating in a 72W low-power envelope suitable for essentially any existing or planned data center server environment, making it an efficient, cost-effective solution for any server, cloud instance or edge deployment.

L4 makes data centers more flexible, powerful and capable of supporting a wider array of workflows. GPU virtualization allows onboarding or offboarding workers as projects evolve, while allowing for more efficient use of data center resources, by allocating tasks or users exactly at the GPU acceleration required. Data centers can reduce the number of servers, corresponding square footage and networking infrastructure, as well as HVAC requirements, by moving to L4 equipped servers.

Importance of larger GPU memory capacity

AI, ray traced rendering and CAE software require compute power that only GPUs can deliver to keep highly skilled professionals productive and innovative. For the best performance, data needs to be in GPU memory, requiring large memory capacity as task complexity continues to increase. The L4 GPU uses fourth-generation Tensor Core technology with FP8 precision support and 1.5x larger GPU memory. FP8 reduces memory pressure compared to higher precisions and dramatically accelerates AI throughput.

Ray tracing and Tensor Cores

Ray traced rendering simulates the physical behavior of light and materials. NVIDIA’s invention of RT Cores, now in their third generation, made real-time ray tracing a reality. The Ada Lovelace architecture’s fourth generation Tensor Cores accelerate transformative AI technologies including intelligent chatbots, generative AI, natural language processing (NLP) and computer vision. NVIDIA Deep Learning Super Sampling 3.0 (DLSS 3) for graphics and rendering with fourth generation Tensor Cores utilizing fine-grained structured sparsity and FP8 precision can deliver up to a 4x performance improvement.

Advanced video analytics, compression and machine vision acceleration

Video analytics, transcoding or compression and machine vision require high performance real-time processing of data. L4 GPUs provide video transcoding and compression support with NVIDIA’s optimized AV1 stack (all popular legacy CODECs are also supported). L4 DLSS 3 and the NVIDIA Optical Flow Accelerator (OFA) use AI to create additional high-quality frames if required. L4 is also ideal for virtualized augmented reality (AR) and virtual reality (VR). JPEG decoders in L4 speed up applications needing computer vision computational power. L4-based servers can host over 1,000 concurrent video streams and provide over 120x more AI video end-to-end pipeline performance than CPU solutions. L4 can stream in multiple resolutions and formats to multiple platforms while enabling simultaneous broadcasting on more channels including social media platforms.

Energy efficiency

Tightening energy mandates, energy costs and the reality of climate change mean organizations require energy efficiency in data centers. Energy efficient L4 GPUs lower TCO and reduce a site’s carbon footprint—a win-win for enterprises, users and the planet. L4 GPUs deliver up to 120x better AI video performance, deliver up to 99 percent better energy efficiency and dramatically reduce TCO compared to traditional CPU-based infrastructure. Using L4 GPUs enables enterprises to reduce rack space and significantly lower greenhouse gas emissions, or allows smaller solar, wind or other sustainable energy options to power even the most advanced GPU-equipped facilities. Energy efficiency and GPU virtualization autocatalyze a virtuous cycle of energy reduction while allowing scalability to support more users from GPU-enhanced data centers.

The energy saved by switching from CPUs to L4 GPUs in a 2MW data center can power over 2,000 homes for one year or the carbon offset from 172,000 trees grown over 10 years.

Support for remote work is a business continuity imperative

The L4 GPU is ideal for virtualized GPU-enabled collaborative workflows for geographically dispersed teams. NVIDIA virtual GPU (vGPU) software running on the L4 GPU increases workstation performance by 50 percent for mid- to high-end design workflows scenarios. L4 fully supports NVIDIA RTX Virtual Workstation (vWS) for high-end professional software. Over 90 percent of productivity applications utilize GPU acceleration, an ideal scenario for NVIDIA virtual PC (vPC).

Use case example: L4 GPU optimizes conversational AI pipeline

Conversational AI applications are now mainstream. Speech generates hundreds of billions of minutes of data every day.

Online meetings generate 200 million minutes daily.
Contact, Call and Customer Service Centers generate 500 million minutes daily.
Consumer applications generate 1.8 billion minutes daily.

L4 is optimized for inference at scale for a broad range of AI applications, including recommender systems, voice-based AI avatar assistants, generative AI, visual search and contact center automation. Rapid advances in LLM (large language model) technology also benefit from the NVIDIA L4. The L4 GPU is up to 28x faster for natural language processing (NLP) versus a CPU.

Use case example: AI and HPC scientific modeling and simulation

AI is widely used in areas such as life sciences, radiology, genomics, weather and climate modeling and particle physics.

AI and scientific model examples. (Image: PNY.)

L4 simulation performance is significantly faster than CPU performance:

Molecular Dynamics – AMBER software to simulate and analyze biomolecular interactions. One of the features of AMBER is the ability to use GPUs to massively accelerate these simulations: L4 is up to 46x faster vs CPU node.
Molecular Dynamics – NAMD (Nanoscale Molecular Dynamics) for high-performance simulation of large biomolecular systems: L4 is up to 13x faster vs CPU.
Fusion Physics | GTC (Gyrokinetic Toroidal Code): L4 is up to 14x faster vs CPU.

NVIDIA L4 GPU meets performance, space and energy needs

Organizations require servers that increase system performance for CAD and CAE workflows in conjunction with deep learning, generative AI, simulation, advanced rendering and graphics and virtualized desktops. Enterprises also need a smaller footprint cost-effective energy efficient infrastructure to support any workflow running on servers, cloud instances or edge deployment.

The NVIDIA L4 GPU is NVIDIA’s most efficient and adaptable NVIDIA data center accelerator for mainstream servers. The L4 is compatible with a vast number of currently installed servers, and delivers a highly parallel computing platform designed specifically for design and engineering, HPC, AI, visualization and virtualization workloads.

PNY provides support for a wide range of NVIDIA professional GPUs and NVIDIA networking products. For more information on how PNY can help in choosing the right NVIDIA RTX GPU for CAD and CAE engineering workstations, visit PNY.