Tracking the COVID-19 Outbreak
In less than a few short months, the Coronavirus (COVID-19) has spread from the identification of the first case on December 31st, 2019, to over 500,000 cases worldwide. The World Health Organization (WHO) declared the outbreak as a pandemic a few weeks ago.
Over three months, we have seen the virus spread throughout the globe, with almost all countries affected. Not only has the pandemic impacted the health of people, it has also impacted economies and travel in an unprecedented manner.
Global confirmed cases have hit above half a million and continue to rise. With the recent increase in testing in the United States, we have seen more identified cases locally. So much so that the United States now has the most significant number of confirmed cases across individual countries worldwide.
However, looking at the number of fatalities, we see that Italy and Spain have been hit hardest, suffering the highest number and percentage of deaths among confirmed cases.
The Global Coronavirus Outbreak
To view this dashboard as a live dashboard in a separate window, click here.
Let's take a deeper dive and look at the critical numbers for the global outbreak of COVID-19. Starting with the top of the dashboard we see the aggregate confirmed cases, deaths, and recovered cases.
Number-based KPIs are a good objective measure to indicate the magnitude of the current outbreak. Comparison-based KPIs can also be valuable to show a relative indicator, for example, the percentage fatalities and improvements.
Further down, we can see at a glance that what started with China is now a global outbreak. The most affected areas are currently in Europe and the United States.
Below, we have more accurate numbers to show how the spread has affected the various countries (data as of March 28th). The United States now has the most confirmed cases, but still has a low fatality from the disease.
The data here has been sourced from the WHO Situation Reports and the CDC website through the JHU managed data files available at their GitHub repository. A reliable data source is a basis for a useful dashboard. We will get more into the data mashup and transformation later in this blog.
The Progress of Coronavirus in the United States
To view this dashboard as a LIVE Dashboard in a separate window, click here.
Looking at global numbers gives us a good view of how the outbreak is spreading worldwide. However, I am interested in looking at numbers closer to home. As preparations and responses to the spread of COVID-19 in the United States gain traction, we have more granular data for the states and how we are doing with testing for the disease.
The KPI strip in this dashboard again gives us a quick top-level overview. Some key inferences from this are that testing is continuously growing, and we are identifying more confirmed cases. This could be the reason for the increase in numbers in the recent few weeks that we saw in the earlier dashboard.
Another interesting point is that only 13.93% (as of March 28th, 2020) are testing positive as confirmed cases. This statistic means that as a population, we are being more cautious and aware of the potential for the infection and conducting tests based on even slight symptoms.
Staying on these numbers, we can also see there are a lot of people waiting for results. This confirms some of the recent news we have heard about tests taking longer and specimens waiting for testing.
Right below the KPI strip, the maps show us the concentration of the confirmed cases. These maps offer a more accessible, more visual approach than the table on the left of the dashboard. Though the table shows more accurate numbers, the maps give us an excellent top-level visual of the outbreak. You can quickly tell that the concentration of most confirmed cases and deaths is in both coasts, while the central states have a lower set of numbers.
While the other numbers and visuals on this dashboard give us information about the current status of the outbreak and our response to it, the National Outbreak chart at the bottom shows us the trend of confirmed cases and deaths. We see that around March 15th, the pattern of confirmed cases started an upswing and has been growing since. This direction could be attributed to the increase in testing that started around the same time resulting in the identification of more cases.
The Coronavirus Numbers: Why the Difference?
We can see that the numbers in both of the dashboards are not the same. There are many reasons for these differences, the main one being that the sources of the data are different. While the first one is dependent on the WHO Situation Reports, the US-specific dashboard sources data primarily from the COVID Tracking Project which collects information from the states.
As you can expect, the COVID Tracking Project is closer to the event of testing and identification of the confirmed cases, so it would be expected to be more accurate. However, we see that the WHO is reporting higher numbers. When looking further into the testing data and their accuracy, the COVID tracking Project site states that the accuracy of data from some states could be less accurate than others. When working with multiple data sources and collecting them together, this is a typical obstacle.
When building dashboards, it is always essential to identify the level of accuracy of the data that you are using. Identifying these levels is especially true when the data is sourced from a variety of different sources and organizational perspectives.
Using Wyn Enterprise Dataset Designer
When working with external data sources, as a data scientist, you will see the data in a variety of structures. Some data may have been prepared for a different set of visualizations, or some data may be too raw. In this case, a DataOps pipeline comes in handy.
In the global dashboard, I used a data pipeline to transform the prepared data from the JHU GitHub repository into a more normalized schema. I further aggregated these files using the Wyn Enterprise Dataset Designer, to join the different data sources to provide the aggregated structures for the dashboards.
The COVID Tracking Project, on the other hand, provides a data API but does not offer some of the master data. So a mashup with the data source from the external data API with some master data for state names was needed.
Analyzing Pandemics: COVID-19 and Beyond
In today's data-driven economy, there is a lot of data that we collect and make available. We have to be aware of the process behind each of the data sets we look at and track.
As we go through this Coronavirus (COVID-19) outbreak, the numbers still show that while in some countries the outbreak seems to have slowed down, other countries are still in the growth phase of the epidemic. At this stage, we need to be aware and quick to respond to guidance from our respective health agencies.
For the United States, the consolidated source of do's and don'ts in this pandemic are described on the CDC website.
These datasets are available to use in the Wyn Enterprise online trial and as well as on request from our technical support team to use with your Wyn Enterprise installation.
Understand the Story Behind Your Data
Wyn is a web-based BI and data analytics platform that provides greater insight into your data.
Wyn offers built-in tools for report and dashboard creation, data governance, security integration, embedded BI, automated document distribution, and a business-user friendly interface for self-service business intelligence.
Test drive on your own data with our free evaluation.
You can also request a personalized demo to address your individual BI needs.",