CPUs Annihilate GPU in Deep Learning Training

(Image courtesy of Intel.)

A collaboration between Rice University and Intel compared a 2P system that housed two 22-core Xeon CPUs with disabled hyperthreading against a single Tesla V100.

With a total of 44 Intel cores, researchers demonstrated that the Intel setup was vastly faster than the NVIDIA Tesla V100 GPU by 3.5 times on a deep neural network. Deep neural network processing is the specialty of the GPU, which is why this research is strikingly unusual. GPUs like the Tesla V100 are far better at training deep learning neural networks because of their parallel architecture. GPUs are designed to process many small computations at the same time, whereas CPUs process computations in sequence (generally speaking).

The researchers used a SLIDE (Sub-Linear Deep Learning Engine) algorithm to achieve this somewhat confounding speedup of Intel Xeon CPUs.

The SLIDE algorithm is designed to eliminate the need for GPUs by replacing the fundamental mechanism of training deep neural networks, known as back propagation. Back propagation uses matrix multiplication that GPUs excel at computing. Researchers replaced the matrix multiplication mechanism for back propagation, and through the SLIDE algorithm, turned it into a search computation that is solved by hash tables, which CPUs excel at.

The SLIDE algorithm is potentially a gamechanger for both Intel and those with a vested interest in deep learning. The mainstream integration of SLIDE would disrupt the use of GPUs for deep learning training rapidly. Intel could see an increase in demand (possibly so would AMD if the research can be replicated with its CPUs), while NVIDIA and GPU makers (AMD here as well) could potentially see a stark drop in demand.

Bottom Line

Financially, two Intel Xeon CPUs are less expensive than two NVIDIA Tesla V100 GPUs, and if this research can be validated, it could change market conditions in favor of Intel while disfavoring NVIDIA.