How Machine Learning Is Helping in the Fight Against COVID-19

A novel machine learning model suggests that mutations to the SARS-CoV-2 genome have made the virus more infectious. (Stock image)

The Michigan State University team led by Professor Guo-Wei Wei, which was previously using machine learning to help predict future mutations of SARS-CoV-2 (the parent virus of COVID-19), is now working on a deep- learning model for researchers targeting the main protease of the virus.

This machine-learning model is an application of research that Wei and his team were already working on. The primary goal of that research was to help expedite development and save money for drug developers. The models can be trained with datasets that contain information about proteins the drug developers want to target with their products, enabling them to make predictions that help the development and application of pharmaceuticals.

Wei’s primary target on the SARS-CoV-2 virus is the main protease, which is a component of the coronavirus’s machinery that is crucial to the way that the pathogen replicates. If the drugs developed target the vital viral protease, they can prevent the virus from reproducing.

The labeled regions of the SARS-CoV-2 virus where drug treatments may bind to the main protease, as predicted by the deep learning model prepared by the MSU research team. (Image courtesy of Michigan State University.)

The main protease of the virus is made an even more attractive target because of the fact that it is distinct from all known human proteases, which isn’t always the case with viruses. Thus, targeting the viral protease makes it less likely that the treatment will disrupt the natural biochemistry of the virus’ host.

Machine learning requires an information set that is “taught” to the system in order for it to reapply what it has “learned.” Thankfully, the MSU research team did not have to start from scratch on finding the required information. The main protease of the SARS-CoV-2 virus is nearly indistinguishable from the coronavirus responsible for the 2003 SARS outbreak. Information about the structure of the protease and previously found protease inhibitors was incorporated into the dataset for their study.

The researchers used a reformulated version of their MathDL model, which they had previously used and had great success with in the Drug Design Data Grand Challenges competition on computer-aided drug design. The innovation in their MathDL software comes from the fact they were able to develop a multitask MathDL to handle the Mpro inhibitor dataset to help predict various properties of the possible future drug, including properties such as toxicity and binding affinities.

A framework of MathDL energy prediction model as used by Wei’s research team. This model integrates advanced mathematical representations with advanced convolutional neural networks. (Image courtesy of Michigan State University.)

Wei’s team combined the previous findings on coronaviruses with their previously developed deep-learning models to identify where and how tightly protease inhibitors might bind to the virus. They’ve been able to predict the details for over 100 known protease inhibitors and were able to rank them, highlighting the most promising areas of research. That data will be invaluable in helping expedite the search for a drug that might bring the pandemic to an end.

The researchers have published their findings in Chemical Science so that others may continue building on the work they have begun. They noted that although this work expedites the find for a treatment, it does not replace the need for experimental and clinical validation of any treatments proposed therein.