Best Practices for COVID-19 Data Visualizations


The internet has been abuzz this week talking about New York City’s new poster displaying the battle against a “mountain” of COVID-19. Governor Andrew Cuomo’s image, which features a floating nasal swab, “clouds of confusion,” and even an octopus, brought new attention to the intersection of data visualization and the pandemic. Inadvertently or not, many data visualizations twist a mathematical fact into a misleading image or statistic. And with misinformation about the novel coronavirus spreading almost as insidiously as the virus itself, it’s crucial to review visualization best practices to make sure that governments are presenting the most accurate and helpful information to residents.

Image from the United Nations COVID-19 “myth-busting” campaign on Unsplash

What are the best practices?

Some of the best practices are generally applicable to all data visualizations, but there are some that are very specific to the COVID-19 pandemic.

  • Clearly explain labels and terms. This is always important, but during a pandemic, people are more likely to understand and rely on your visualization if it’s clear and precise. Defining scientific terms and public health jargon will be helpful for lay readers, and can help combat misinformation.
  • Be consistent with how things are counted. This is a big issue around COVID-19 data in general; for example, mixing different types of testing is misleading. But some data visualizations have manipulated axes to create more extreme graphs. And others have compared viruses like influenza or the bubonic plague with COVID-19, but done so over disproportionate timelines (for example, don’t compare influenza deaths from a 12 month period to six months of COVID-19 data).
  • Show trends over time instead of just a snapshot of daily infections. Without the additional context, a daily toll means very little and can be wildly misinterpreted. A one-day number won’t help viewers understand the bigger picture.
  • Include population totals to determine if one demographic is getting infected at a rate that is significantly more than their percentage of the population. This helps show if there are racial or gender discrepancies, informs populations that may be more vulnerable, and can increase accountability.

Examples of best practices:

The New York State Department of Health COVID Tracker is a dense trove of information about the virus (in both the state and in New York City). The Fatality by Race/Ethnicity chart (shown below) does a great job of not only breaking down novel coronavirus deaths, but also comparing those numbers to the percentage of the total population. This allows users to easily see how out-of-proportion COVID-19 death rates are for different racial demographics: for instance, New York State (excluding NYC) is only 9 percent Black, but that population has nearly 20 percent of the state’s COVID-19 related deaths.

Allegheny County, Pennsylvania, has a public COVID-19 dashboard to keep residents informed about the virus in their area. One issue with visualizing cases is the lack of testing; experts agree that there have probably been more cases and deaths than have actually been confirmed with testing. To show this likely discrepancy, Allegheny County displays case information over time with both probable and confirmed cases shown in different colors.

Another issue with COVID-19 data visualizations is the inconsistencies with showing all tests versus just positives. Positivity rate is the percentage of tests that are positive, and is an important indicator of testing efficacy; a high positivity rate isn’t good, and generally means that only the sickest patients are getting tested. Displaying this in an open data visualization means that viewers can determine positivity rates, and can understand that spikes in cases are usually not the result of increases in testing. While many places are just reporting positives, King County (which includes Seattle), Washington has all testing information in its COVID-19 Data Dashboard.

Of course, the ultimate quality of these maps is determined by the underlying data; insufficient COVID testing, incomplete racial impact data, and a lack of data standardization has plagued the United States. However, it’s still important to display the data we do have in the most complete, accurate, and transparent way possible, in order to keep residents informed and try to combat the spread of the virus.

Additional resources:

Harvard University: The Coronavirus Visualization Team

Stanford University: Effective Data Visualization in the Era of COVID-19

Tableau: Coronavirus (COVID-19) Data Hub | Case Tracker, Starter Dashboard, Visualizations