Around the globe, activities are underway to control the spread of the COVID-19 disease in an attempt to reduce the growth and spreak of the infected population. Looking at the dashboards that show exponential growth of cases it becomes hard to understand if these measures have any effect.

Humans are terrible at making sense out of exponentially growing effects, so, as a data scientist, as someone who feels passionate about data driven story telling, I started to look at these data from that perspective – the rate of growth.

The first step is to look at the current number of active cases. Most dashboards today still display the cumulative number of cases reported so far. This made sense during the early stages, but, with time progressing, the number of active cases falls below all cases reported.

The second step is to look at the increase of these cases numbers. How many new actives cases, which are a consequence of an infeciton event that must have happenend in the up to two weeks prior to symptoms manifesting themselves, are reported daily? Shortly before writing this article, the German Chancellor Angela Merkel had announced that social distancing measures would not be relieved until the days between the number of new cases doubling would be longer than ten days.(1)(2)

The third step we propose here is to look at the relation between the total number of active cases today divided by the total number of cases from seven or 14 days ago. The time window is based on what could be typical time ranges symptom free patinets are able to infect others. This number is not quite the same, but somewhat related to, the basic reproduction number (which really is a number not connected to social distances, but R0 is also an indicator at how infectuous a disease is.)

Plotting the number of new cases (as bar charts, as these are individual, discrete data sets), we get:

Computing the ratio of active cases with the number of active cases from seven or 14 days ago, we get:

For 2020-04-01, we get, for the 7 ratio a value of 0.9, for the 14 day a value of 5.0, for Germany, for data taken off the Johns Hopkins data repository.

Lets us compare these numbers with a few known cases, China, who reports it has stopped spread of the virus, South Korea, who has put very efficient and effective measure in place, and Italy, who are undergoing a major crisis and put the country on lockdown on 2020-03-09.

From the above one can see, qualitatively, how measures undertaken have an impact on the growth rate. Also, qualtitatively, one begins to see that it may be possible to model the time between certain measures being applied to the spread of the disease reducing drops.

A look at Singapore is interesting. Singapore may have either not been affected by COVID-19 cases so far, or it may see the first signs of a second wave.

The above is a first attempt of the R²Data Labs team to understand the data with a view to identify possibilities to apply modelling. Our background is technical, but forecasting effects based on time series data with high uncertainties around unknown outside effects is part of our job. We plan to trial approaches to forecast the efficiency of measures taken globally, hoping this can be useful to better understand the situation.

Comments and views are welcome! You can get a copy of a jupyter notebook on our github page at .

Disclaimer: The purpose of this analysis is primarily academic, foster discussion and to encourage data driven understanding to what is clearly a pretty challenging global situationNo political or organizational endorsement is intended.

Dr. Klaus Paul, R²Data Labs, Berlin AI Hub