Open Source Data Hubs Could Help Flatten the Curve

The U. S. White House Office of Science and Technology Policy recently joined a collaborative effort with the National Library of Medicine, National Institutes of Health, Chan Zuckerberg Initiative, Georgetown University, Allen Institute for Artificial Intelligence, Microsoft Research, and Cold Spring Harbor Laboratory to launch the COVID-19 Open Research Dataset. The free, open source tool is a compendium of 44,000 peer-reviewed and non-peer-reviewed research papers related to COVID-19 and other members of the coronavirus group. It’s the latest effort aimed at harnessing machine-readable data to employ machine learning techniques to develop insights into the pandemic and how it can be stopped.

“Sharing vital information across scientific and medical communities is key to accelerating our ability to respond to the coronavirus pandemic,” said CZI Head of Science, Cori Bargmann. “The new COVID-19 Open Research Dataset will help researchers worldwide to access important information faster.”

Since the novel coronavirus emerged in Wuhan, China in December 2019, governments have been scrambling to enact a hodgepodge of responses to save lives and prevent healthcare systems from becoming overburdened. Given that this is a new disease with lingering questions about where it came from, its mortality rate, and how the pandemic will evolve, there’s elevated risk to policy makers acting blindly without reliable information. That’s where data science comes in. Whether employed by tech companies, universities, or governments, data scientists have been trying to narrow the COVID-19 knowledge gap by contributing to an increasingly large collection of information.

Since January 22, Johns Hopkins Center for Systems Science and Engineering has shared its interactive web-based dashboard with the public, which provides a visualization of COVID-19 data in real time. The dashboard, also available in a GitHub repository, provides the latest number of cases, deaths, and recoveries for affected countries, giving authorities a way to track how the disease is unfolding. The World Health Organization has a similar COVID-19 situation dashboard.

One company, MicroStrategy, has utilized Johns Hopkins data to build an interactive dossier allowing users to track the change in the COVID-19 death rate over time, among other visualizations. It also has datasets broken down by country, a tool policy makers could potentially use to anticipate whether certain measures enacted will actually flatten the infection curve. Tableau has a COVID-19 Data Hub that contains a side-by-side comparison of the disease’s progression in different countries. DOMO has a tracker showing testing by US States as well as death rate comparisons to influenza.

“Anytime you have the emergence of a novel disease—in this case it’s a novel virus—it’s important to understand not only the detailed genetic makeup of the disease and where the outbreak is occurring, but also how transmission is occurring, how the virus is acting, what types of people are being affected, and what their characteristics and symptoms are,” said Barry Chaiken, Clinical Lead at Tableau Healthcare. “The bottom line is we have to keep collecting data.”