With all the current fad in Machine Learning tending towards Deep Learning relying on sheer computational machinery to do analytics, there is something elegant about Causal Inference. Forcing Data Scientists to commit their assumptions about the data by drawing graphs about causal effects lends a certain transparency to the insights. It is relevant to policy makers, business leaders who are left to ponder the question “Now that I have the data, how do I see the impact of my policies?”
We at Project Emer2gent are passionate about turning data to insights about the Covid 19 situation – for the society and businesses at large. To that end, we are scouting for open source data sets. Kaggle, Johns Hopkins dataset etc.. One pretty cool data that we came across is the Oxford Covid 19 Government Response Tracker (https://www.bsg.ox.ac.uk/news/worlds-first-covid-19-government-response-tracker-launched-today). The data records confirmed cases, along with counter measures launched by Governments along with a neat Stringency Index which enables comparing different country responses, if only approximately.
This is exactly the starting point for any kind of causal analysis. Consider the question – Now that we have different measures in place for different countries, how do we check for effectiveness of the measures? A good playing ground for causal inference. Quite curiously, as has been pointed out (https://projecteuclid.org/euclid.ss/1009212409), causal models and inference have been effectively applied in the past for epidemiology.
Considering the data Government Response Tracker, the following fact it quickly becomes clear- till now, only a few nations (China, South Korea) have managed to ride out the S-curve (see the number of infections in China beginning from Jan 1, maxing out at about 80000, see figure below). Going forward, I will be using data for China for following analysis, due to the number of data points.
Consider the different countermeasures put by the Chinese Government in place (the step change denotes introduction of new measures)
Now, assessment of effectiveness of different measures using causal inference necessitates a causal graph. I am going to derive one. I am hardly an epidemiologist. In fact, I would be thankful for any insights / critique here from those of you who are better informed in this topic than I.
An intuitive analysis – Each of the implemented measures influences the rate of change of new infections over a 7 day period. Well, the 7 day period time window is again a point where I would be really happy to get more insights as my choice is based simply on a layman’s understanding of the incubation time for new infections. Stringency measures encode our belief in how stringent the measures are. Let us draw them as a simple causal graph.
We are almost there, with the data and causal model, we can start crunching causal models. For causal calculus, I am using the dowhy package (https://github.com/microsoft/dowhy/blob/master/dowhy/) written in Python. Using a simple linear regressor, blocking backdoor paths and defining rate of new infections over a 7 day window (Delta7) as target, gives the following estimates along with their respective p values
|S1 School closure||-220||0.07|
|S3 cancel public events||-2300||0.07|
|S4 Close public transport||-1833||0.137|
|S6 Restrictions on Internal Movement||-460||0.036|
If one were to take the usual p value cutoff at 0.05, it appears that estimates on infection decrease due to Restrictions on Internal Movement is effective, while we need more data points to establish efficacy of other measures.
What do you think? Critique and conversation welcome!
Disclaimer: The purpose of this analysis is primarily academic, foster discussion and to encourage data driven understanding to what is clearly a pretty challenging global situation. No political or organizational endorsement is intended.