Optimization of Microbiome Tool for GPUs Speeds Up Research Capabilities

(Image courtesy of techexplorist.com.)

Computing projects based on improving the efficacy of hardware as it applies to various research tasks haven taken on new significance because of the COVID-19 pandemic. Considering how much urgent global research is underway to efficiently deconstruct the SARS-CoV-2 virus (which causes COVID-19) and produce a proper vaccine, new efficiency of research tools is welcome news.

High-performance computing researchers at the University of California San Diego (UCSD) have ported the UniFrac microbiome tool to GPUs. UniFrac is a distance metric (a mathematical function for describing the distance between pairs in a set) widely used in the field of microbiology since 2005. In an attempt improve the accuracy and speed of GPUs, the researchers are in effect speeding up the computer hardware toolset used for scientific research and development. These efforts are welcomed by the scientific community, which is struggling to pin down the inner workings of SARS-CoV-2 to better understand how it attacks our ACE-2 receptors and replicates itself.

UniFrac analyzes sets of microbiomes by referencing an evolutionary tree that relates pairs and sets of DNA sequences to one another. Part of the research objective includes using information attained through the Human Microbiome Project and the Earth Microbiome Project to understand what characteristics of human microbes make a cross-section of people vulnerable in a range of susceptibilities to COVID-19 infection.

Using the OpenACC parallel programming model, lead scientific software developer Igor Sfiligoi ported the Striped UniFrac implementation to GPUs. This allowed researchers to have both their CPUs and GPUs draw from a single codebase. Some tweaks were made to local cache functions to give the ported system a boost. As a result, an analysis that required 900 hours to complete with server-class CPUs was finished in just 8 hours on one NVIDIA Tesla V100 GPU. This reduction in runtime was augmented by exploring the use of less precise floating-point math in UniFrac implementations. 

Since computers can only natively store integers, floating-point math is a formulation that represents numbers with decimal points, but there is a trade-off in range and precision of floating-math computing. UniFrac was developed with very high-precision floating-point math (known as fp64 code path) to give scientists and researchers the best quality outcomes. The researchers at UCSD implemented fp32 code path, a lower-precision floating-point math in UniFrac and tested them against the higher-precision fp64 floating-point math in UniFrac. The results were basically identical, but the computation took significantly less time with the fp32 UniFrac implementation.

Bottom Line

Successfully porting the UniFrac microbiome tool to GPUs that allows a laptop with a mobile NVIDIA GTX 1050 GPU to crunch through an Earth Microbiome Project dataset in one hour instead of 13 will surely help COVID-19 researchers use UniFrac (a crucial part of SARS-CoV-2 research) in a far more efficient manner.