Accelerating Data Science with Graphics Cards

An NVIDIA Quadro-powered Data Science Workstation. (Image courtesy of NVIDIA.)

Last September, graphics juggernaut NVIDIA announced its latest GPU microarchitecture, Turing, along with the NVIDIA Quadro and GeForce RTX graphics cards that used it. RTX was shorthand for real-time ray tracing, one of the main features enabled with the Turing architecture, and one with applicability to professional visualization, animation, gaming and more.

But despite the name, RTX graphics cards offer more than just powerful graphics. They’re also a powerful tool for data science, and NVIDIA and its partners are putting that power at the forefront of the new Quadro-Powered Data Science Workstations.

Quadro-Powered Data Science Workstations

One of the upgrades to the Turing microarchitecture was more efficient Tensor Cores, a type of processing component first introduced in NVIDIA’s previous Volta microarchitecture. Tensor Cores were added to NVIDIA GPUs to accelerate data science techniques such as machine learning, and they capitalize on the natural similarity between graphics calculations and machine learning algorithms.

The data science functionality built directly into RTX graphics cards makes them a natural choice for desktop workstations made for data scientists. To ensure that they provide the best performance for such users, NVIDIA has developed a set of guidelines for Quadro RTX-based data science workstations. The graphics company has distributed these guidelines to its substantial network of hardware partners to deliver certified Quadro-Powered Data Science Workstations.

First and foremost in the guidelines are the graphics cards themselves. NVIDIA recommends the two highest-end Quadro RTX cards available, the Quadro RTX 8000 and Quadro RTX 6000. The workstations can be configured with a single Quadro card, but the machines can be made even more powerful with two cards connected via NVIDIA’s NVLink bridge. With NVLink, two Quadros can be linked to share a collective graphics memory. For two Quadro RTX 8000s, that adds up to 96GB.

Other than the two RTX cards, there’s a third graphics card option: the Volta-based Quadro GV100, which uses the first generation of Tensor cores. Although a generation behind the RTX cards, the GV100 is “a compute monster,” according to Carl Flygare, PNY’s Product Marketing Manager for NVIDIA Quadro. That’s because it provides hardware support for 64-bit floating point, while the RTX cards only go as far as 32-bit. For the applications that need it, this higher precision can make a huge difference.

“If you're in a situation where you need FP64, you want to go with the GV100, because the RTX boards have to do that through multiple operations in software and provide about 1/32 of the performance of a hardware implementation,” Flygare explained. “It’s valuable for some high-end engineering simulation applications, or science and technical applications, but data science doesn't really require FP64.”

Users looking for the best data science performance should seek out workstations with a Quadro RTX 8000—or better yet, two. If cost is an issue, the RTX 6000 is a much more affordable card. If you know you need FP64, the GV100 is the best option.

The graphics cards available in NVIDIA Quadro Data Science Workstations. (Image courtesy of NVIDIA.)
Besides its own Quadro GPUs, NVIDIA also specifies the CPUs for Quadro Data Science Workstations. For the single GPU variant, NVIDIA recommends the six-core Intel Xeon W-2135 CPU with a base clock speed of 3.7GHz and turbo frequency of 4.5GHz. For dual GPUs, NVIDIA recommends dual Intel Xeon Silver 4110 CPUs, each with eight cores, a base clock speed of 2.1GHz, and a turbo frequency of 3GHz. NVIDIA also recommends at least 128GB of RAM for a single GPU and 192GB for dual GPUs. The workstations can run either Ubuntu LTS 18.04.1 or Red Hat Enterprise Linux (RHEL) 7.5.


CUDA-X AI

That’s just the hardware. The other main component of Quadro-Powered Data Science Workstations is the pre-configured data science software stack called CUDA-X AI. CUDA-X AI incorporates five major subsystems, each optimized to utilize Quadro RTX Tensor Cores:

  • Data Analytics
  • Graph (for visualization)
  • Machine Learning
  • Deep Learning Training
  • Deep Learning Inference

CUDA-X AI also incorporates an optimized version of RAPIDS, an open-source suite of libraries and APIs for GPU-accelerated data science. In addition, CUDA-X AI interacts with major third party frameworks and services such as Apache Arrow, Anaconda, Chainer, PyTorch, TensorFlow, Micrsosoft Azure, Databricks and more. Though these don’t come pre-configured on top of CUDA-X AI, users can install the software necessary for their workflow knowing it’s optimized for their graphics hardware.

CUDA-X AI sits in the middle of hardware and software/services for data science application. (Image courtesy of NVIDIA.)

Quick Data Science with Quadro

Data science is often computationally cumbersome. Even with the power of today’s processors, when you’re working with massive datasets, it’s going to take some time. GPU hardware acceleration with RTX Tensor Cores helps shave off some of this time, but another big time saver is relatively simple: make sure you have enough memory.

“The more GPU memory you have, the bigger batch size you can bring in as you're training your deep neural networks,” Flygare said. “So more GPU memory capacity is the trump card."

The Quadro RTX 8000 is equipped with 48GB of graphics memory, far more than most users would ever need for graphics alone. But for data science, as Flygare pointed out, this memory enables large training batch sizes, which cuts down training time significantly. And if you opt for dual Quadro RTX 8000s, paired with NVLink, you effectively double your graphics memory to a massive 96GB.

This memory pays off substantially, enabling what NVIDIA claims is up to 8 times faster model training. And that’s not even the biggest time saver. With RAPIDS and the CUDA-X AI framework, data preparation can be up to 30 times faster than a traditional CPU. End-to-end, NVIDIA claims that its Quadro Data Science Workstations can achieve a 10x speed increase for data science workflows.

GPU acceleration can increase the efficiency of time-consuming data preparation and training, according to NVIDIA. (Image courtesy of NVIDIA.)

With such significant efficiency increases, it’s no wonder NVIDIA decided to target data science users with custom hardware. And that customization is yet more time saved for the end user. Quadro Data Science Workstations are built by NVIDIA’s hardware partners and benchmarked by NVIDIA to ensure a turnkey data science solution.

“PNY, with NVIDIA, provide [hardware partners] with the software stack, and we turn them loose,” Flygare said. “They go forth and configure a system and test it. If it's running the software correctly, great. If for some reason they're encountering an issue, we work with them to iron it out and make sure the software stack is turnkey.”

For the end user, this translates to a data science workstation that’s ready when they are.

“Someone could take one of these systems out of the box, plug it in in the wall, boot it up, and start doing useful data science work. They don't need to spend hours, days, or potentially even weeks going in and debugging the system to the point where it can start doing useful analytics,” commented Flygare.

To learn more about NVIDIA’s Quadro-Powered Data Science Workstations, and how GPU-acceleration fits in with data science, tune into the PNY webinar on October 24, 2019 at 10:00am PST: NVIDIA Quadro Powered Data Science Workstations and OmniSci – Powering the Transition to AI and BigData Analytics.

For more information on data science workstations, visit PNY.

PNY has sponsored this post.  All opinions are mine.  --Michael Alba