It has been a long time since I wrote my latest article. Given the current circumstances with the covid-19 pandemic I felt that I should drop a new article in “Unveiling the reality” to remain true to the primary goal of this blog: being a laboratory of ideas attempting to understand better our universe.
Hence here I am, trying to provide from my confinement in Madrid, Spain, a humble view on the dynamics of this war between covid-19 and the human race.
Everything started the other day, while listening to the Teskey brothers, a very interesting Australian band I have been following recently. At some point a daring idea crossed my mind, what if chaos theory has something to say about this war?. Then I had three choices: going to the cinema, going for jogging or starting to develop this idea about chaos theory and covid-19. Given that I am confined, it came clear the decision: think deeper and share with others, just in case someone out there might find my ideas interesting.
Given that everything started with Teskey brothers let’s leave their music to be the sound track for this post. Here you can find a sample of their lovely music Teskey brothers.
PROBABLY THE MOST FAMOUS EQUATION IN CHAOS THEORY
In 2013 the mathematician Ian_Stewart published his book “17 equations that changed the world” giving a chronological enumeration of the most important equations in the History of Science. You can see the complete list here: List of the 17 equations.
The 16th in the list (it is listed in chronological order, not in order of importance) we have the “Logistic map” which was the equation originating a new discipline and way of thinking: “The chaos theory”. In 1975 the scientist Robert May wrote his famous article “Simple Mathematical Models With Very Complicated Dynamics” in Nature journal, causing a great stir in the scientist community.
This equation is a model describing the dynamics of an animal population when certain maximum size cannot be exceeded, due to a limited amount of resources (food, space, etc.). For that reason the variable in the equation is expressed as a fraction of the maximum possible population, rather than in terms of the absolute number of animals. So the variable only ranges from 0 (none animals at all) to 1 (maximum number of animals).
The equation has the form x(t)=r * x(t-1) * (1 – x(t-1))
x(t) refers to the value of the animal population in time t (fraction of the maximum size)
x(t-1) refers to the value of the population in the previous period of time (t-1)
r is the parameter controlling how fast the animal population replicates.
In order to put the equation in a less scaring shape I will provide the practical meaning: the equation says that the population today is the multiplication of two factors: a replication factor (r * population yesterday) and another factor measuring how far the population yesterday was from the maximum size (1-x(t-1)).
When I say “yesterday” I could be saying “ a year ago, since the problem can be expressed with any time scale you like.
What Robert May did was to study deeply the behaviour of the equation while changing the value of the system parameter r. He found that there are distinct intervals with behaviours radically different depending on the interval the r value falls in. Next figure shows such different behaviours.
The values of the population (fraction of the maximum population) in the figure must be seen as the final values that the dynamics takes after a series of iterations. I mean, it takes several steps of replication to reach the limit values in the figure. That is to say, the figure indicates the final destination of the animal population after a series of replications.
The reader can find a deep explanation of these behaviours in this link: Logistic map. Basically what it happens is that the population drops to zero, whenever the initial population is, when r < 1. Between 1 and 2 the population ends up stabilizing in (r-1)/r. In the interval between 2 and 3 also the final population is (r-1)/r but now the evolution to that final value takes longer. From r=3 upwards we get a series of bifurcations and in the end a really complex chaotic behaviour, in which in every replication we can get strange new values for the population, mainly against the common sense.
LET’S TREAT COVID-19 AS AN ANIMAL
Now that we have stated the chaos theory framework I can be a bit more clear about the idea that crossed my mind when listening to the Teskey brothers. What if covid-19 infecting humans is following a model similar to the logistic map?.
In order to try this out let’s proceed to build an analogy between an animal population and a virus infecting humans. Most of the news and statistics nowadays talk about three different figures, cumulative number of infected people, cumulative number of recovered people and cumulative number of deaths. I think that in order to build a good analogy we need to consider active cases=animal population. So we need to calculate active cases by subtracting from the total cumulative confirmed cases the number of cumulative recoveries and deaths. Next, given that we have daily updates of such figures we can have daily numbers of the active cases. Following the analogy, every single infected person is an unit within the population of infected people. Next step is very important, here we also have a maximum value of infected people, which is the whole population in the country, that is the catastrophic limit the infection can reach.
If we take these considerations we have x(t)= Number of active cases/total population of the country and the logistic map for the infection will be:
Percentage of the population infected (today)=r* percentage of the population infected (yesterday)* (1-percentage of the population infected yesterday)
We can simplify a bit by using %(today) and %(yesterday) for the percentages of population infected today and yesterday respectively.
Given that we have the values for the infections today and yesterday we can try to predict the percentage of the infection tomorrow by calculating first the value of r that we have today:
and use it to try to predict the active cases tomorrow:
It is quite simple, so I wondered if that simple approach could match what is happening over the different countries…..
LOGISTIC MAP IS A REASONABLE ESTIMATOR FOR THE DAILY ACTIVE CASES
I have applied the approach described above to make predictions on a daily basis using the data collected by Johns Hopkins University. The specific data can be found in the following repository: github data repository. The available dates range from the 22nd of January of 2020 to the 14th of March of 2020.
The results for different countries are shown next:
It can be seen that China crossed the threshold r=1 on the 17th of February. We have already explained that values smaller than 1 leads to the extinction of the infection (population of virus infecting people).
On the one hand we can see that the error of the prediction in China is an overestimation of the cases by 2%, which is a reasonable error. On the other hand we see that the maximum value of the curve corresponds to the day in which r crossed the threshold r=1 (17th of February).
We all know that our brothers and sisters in Italy are going through a critical situation. For the data available we get from the calculation that Italy has a value of r=1.19 on the 14th of March. Let’s remember that a value of r between 1 and 2 leads quite quickly to a limit of infection (r-1)/r, what would be for the Italian case 0.19/1.19=16% of all the population infected.
The prediction is overestimating in average a 4% of infections in Italy.
r=1.01 is the value for Korea on the 14th of March. That value needs still to be improved slightly in order to extinguish the infection.
We get for South Korea the same overestimation as for Italy (4%).
The evolution of r in Spain is quite erratic, probably because the infection started later than in Italy, China and South Korea and hence we have less available data to get a stable r value. Probably at this stage the real value of r has not been stabilized yet. r is 1.16 on the 14th of March what has a projection of 14% of the population infected.
The error now is an overestimation of 8% above the real figures. Given that r does not look stabilized we get less accurate predictions here although it is still valid as an estimation.
Next I show the averaged errors (error=prediction – actual value) for the daily prediction of the active cases in different countries:
|Country||Average mean error: (prediction-actual value)|
Two relevant conclusions can be derived from the table above:
- The error is always positive, what means that the model is always overestimating in greater or smaller extent the real number of active infections.
- However the predictions, even overestimated, provide reasonable good estimation of the daily active infections.
If we combine the two conclusions, we can derive that logistic map needs still an additional tweak in order to capture the complete behaviour of covid-19, but also that logistic map can capture in its current shape some truth from the way in which covid-19 infection is behaving.
PREDICTING THE OUTCOME OF THE COVID-19 PANDEMIC
Given that the logistic map turns out to be a reasonable predictor of daily active cases (with a slight overestimation), I think we can use it to build predictions into the future, taking advantage that we know the mathematical limit of this equation once r is known.
Next I proceed to show the future projections for different countries based on the r calculated on the 14 th of March.
We need to have some factors into consideration here:
- The limit has not always the same names. The limits in the fig. 10 show the number of active infections. When reaching a limit it does not mean that the same people are infected all the time, but different people will be getting infected while other get recovered and others sadly end up dead. However the number of active infections is always the same. A stable number of infections will be having an endless rate of people deceased what is not an option for a society.
- Estimation. The model is overestimating slightly the active cases, so these limits in which the infection get stable needs to be considered also as estimations rather than as accurate figures.
- We do not have a vaccine yet. It means that all the population are potentially exposed and the upward limit of total active cases is the whole population.
- We do not know yet whether recovered subjects get immunity. There is no proof about this fact yet. Similarly to the point above, we are considering that all the population can be infected without dropping all those subjects recovered from a previous infection. If either we had a vaccine or the proof that recovered people are immune, then the upward limit of the logistic map would not be the whole population since we should be subtracting immune people (due to immunity after recovering or vaccine). If immunity needed to be introduced in the model, the graphics in fig. 10 would be the percentage of people infected within the group of vulnerable people.
- There is not a fixed future. In previous sections we have seen that the value of r is changing over time due to the measures taken by the governments, people behaviour, etc. My point here is that every day we measure the value of r, we have a future which would be the final outcome if we were not able to change the value of r further. So the real battle must be focused on reducing the value of r over and over until crossing the safety threshold r=1.
MONITORING AND FIGHTING COVID-19
I would like to reflect here (I hope the reader has not fallen asleep yet) about the actual meaning of r. My interpretation is that this parameter measures empirically the aggregation of several factors:
- Contagion rate of the virus
- Interactions across the subjects in the population
- Transportation networks allowing the virus to spread
- Hygiene measures
- Etc. etc.
Typically the standard models describing the propagation of diseases, use these factors to build complex propagation networks to explain how would be the spread of a potential threat. The approach in this post however is based on measuring the aggregation of all the involved factors empirically, once the outbreak has taken place.
For sure the measures taken in terms of confinement, closing public transportation, closing airports, etc. are modifying the value of r over time. So monitoring r in our countries every day and using that value to project to the future (let’s remember that r <=1 leads the active cases to zero and 1< r <=3 leads to the limit (r-1)/r) is the way to estimate in which phase of the pandemic we are.
Remember that calculation of r every day is easy:
where % is the percentage of active cases divided by the population in your country (if no vaccine and not immunity in the recovered subjects).
I have created an application deployed in Google Cloud Platform (GCP) which is monitoring the infection worlwide. You can see the outcome of such application in the link below. It is a Data Studio report that will be updating automatically on a dialy basis. I hope you find it useful.
MY BEST WISHES
That’s it so far, I hope someone out there finds this post useful. I also hope that we all make the most of our spare time during confinement. Do not forget to enjoy Teskey brother’s music and stay safe for you and for others.
See you soon.
3 thoughts on “CAN CHAOS THEORY PREDICT THE OUTCOME OF COVID-19 PANDEMIC?”
Very interesting article.
However, I think you have a flaw in your reasoning, assuming that the number of infected people is only the number published by the governments. At least this is what I understood you are using for estimating the %today. Probably, the numbers they give depend heavily on the number of tests and the locations where they are performed. It’s a number too small compared to the total population to take it as an indicator.
If you take for instance Italy, they publish the total number of tests (tamponi). With this number, you can better estimate the variation of active cases every day, as the ratio of new cases divided by the total number of new tests. It gives you a complete different graphic. Probably the same will happen with the rest of the countries.
With data published up to 23 March, and discarding data prior to 10 March, when the data was too noisy, probably due to the lack of samples, you get an coef.r fitted by the equation r = -0.0012×2 + 0.0257x + 1.1712, R2=0,8571. Solving this equation gives us r=1 for the 6 April (r=27, 10 March + 27 days)
Sorry, can’t post graphs in my reply, only ascii, but you can get Italy data from https://github.com/pcm-dpc/COVID-19/tree/master/schede-riepilogative/regioni
Thank you for taking the time to go through the post and contributing with your comments.
Several points from my side:
1) Yes, I used the official figures reported by the governments. I am conscious that it is only a fraction of the real cases, but still relevant data. In average the underestimation of the official figures can be put as x_official=alpha * x_actual, where alpha is a number smaller than 1 and x the number of active cases. That is to say x_official is only a fraction of x_actual. When it comes to calculate r (following the approach in the post) :
r_official=(x_official_today/population)/[ (x_official_yesterday/population)*(1- x_official_yesterday/population)]
The expression above gets simplified:
r_official=x_official_today/[ x_official_yesterday)*(1- x_official_yesterday/population)]
When using the expression x_official=alpha*x_actual:
r_official=(alpha*x_actual_today(/[ alpha*x_actual_yesterday*(1- alpha*x_actual_yesterday/population)]
We can simplify again:
r_official=x_actual_today(/[ x_actual_yesterday*(1- alpha*x_actual_yesterday/population)]
It gets simplified because the detection rate will be in average pretty much the same for two consecutive days. Hence we have two factors, the first in which the detection underestimation disappears and the second in which such underestimation remains. However, given that the number of active cases there is divided by the total population we have that this second term is very small in comparison with the first one as long as the fraction of infected people is far from 100%. Fortunately this the situation we are going through nowadays.
So in my opinion the calculation with the official figures is still relevant and not that far from the real one as long as the fraction of infected people in the population is small.
2) Having said that, I completely agree in the sense that considering the rate over the total sampling might give better accuracy. I say “might” because we need to consider also the sampling approach. I think this quite similar to what happens when designing a survey, you need to test across a subgroup within the whole universe, taking care of keeping the same characteristics in your testing group as in the whole population. I do not know the testing approach in Italy, but the case I know the most is Spain, for which during a long period of time (now it looks like it is going to change) only people with symptoms were tested. With this approach we have a biased testing group that will lead to overestimation when considering the positive cases rate over the tested group as the actual number of active cases.
However as I said, if having a correct sampling approach the rate of positives over the total tested group would be much better approach than using the official figures. Not sure if we can find testing figures for several countries in any database. I think the one I used for this post is not keeping record of such information.
3) Alistair, thanks again for dropping here your thoughts, I appreciate that and I think it is helpful to put this complex problem in a realistic context. Feel free to use my email address (firstname.lastname@example.org) to share figures and any other relevant info, happy to discuss further on this important topic.
Congratulations Juan! As an outsider to maths….A difficult task thoroughly put together.