1 Introduction

In 2020, the world faced an unprecedented challenge in the form of a new coronavirus outbreak. One of the characterizing features of the outbreak is the difficulty in forecasting its size, and how to link the pandemic to containment measures. The pandemic became a trigger which activated the need to predict the dynamics of infections and their impact on social and economic processes and conditions. However, the available approaches to model these aspects of epidemics do not appear to be plausible solutions to the current situation.

Among the classical epidemiological models, there are three main types of deterministic (parametric) models for infectious diseases that spread as a result of contact with a person in a population, which are well discussed in Hethcote (1989). The acronym SIR stands for a proportion of susceptible (or healthy) “S” people in the population, the proportion of infected “I”, and the percentage of recovered or dead “R” who are supposed to be immune, so that S + I + R = 100%. SIR is used for diseases where the infection provides permanent immunity. The second approach, SIS (susceptible-infected-susceptible), does not take into account immunity, so that S + I = 100%. The third type of model takes into account the natural birth and mortality rates in the population, so this type is focused on long-term modeling of epidemics. When a SIR outbreak happens over a relatively short time (less than 1 year), then this outbreak is called an epidemic. Since an epidemic is relatively fast, the model does not include birth and death dynamics. Epidemics are common in diseases such as flu, measles, rubella, and chickenpox. A good overview of non-parametric models is presented in Choisy et al. (2007).

Classical parametric models cannot be used to describe the COVID-19 pandemic, as they have serious drawbacks, e.g. they do not take into account active administrative restrictions to prevent the spread of the epidemic. On the contrary, the parameters of infection distribution are assumed constant, which implies that diseases are left to spread on their own. According to empirical observations, modeling attempts are based on a set of assumptions, e.g. defining the epidemic as a time period when the number of confirmed cases increases exponentially, and defining the starting date of the epidemic at a date when a certain artificially chosen threshold (“zero”) level has been overcome, which in many cases is set at the first 100 infected people. The number of infected people at the end of the outbreak can ensure the slow speed of the increase in the total number of infected. The total share of infected people is a small proportion of the population, so S ≫ I + R, hence there is no sense to include immunity in the model (and thus SIR would not work properly). For the SIS approach there are solutions to differential equations in finite functions (the Bernoulli equation), yet it follows exponential growth at the beginning of outbreak only if the number of infected people does not tend to zero after the peak of the disease.

The purpose of the paper is to build a model which would help in predicting the length and severity of COVID outbreaks in different countries on the basis of the relevant theoretical background, and to discuss reasons for deviations in the actual dynamics from the predicted path. Such a model and a discussion would provide valuable support to instruments of containment of this and other outbreaks, and would help to minimize the negative impact such outlying events might have on economies and lives.

In Sect. 2, we propose a theoretical approach to modeling the number of those suffering from COVID-19 on the basis of the model developed in Volterra (1976) (which has never been applied to this type of problem) and substantially amended in Pomazanov (2020) to include the peculiarities of epidemic situations. In Sect. 3, we present a parametric contagion model that is being successfully tested for the Ebola virus epidemic. Data sources are described in Sect. 4. In Sect. 5, we consider the deviation from the static model. In Sect. 6, we compare and contrast the findings at the country level and try to identify the nexus between the easing of administrative restrictions and the chances of a second wave of the pandemic. In the last section, we develop an approach which could show how to deal with situations with high numbers of residual infections.

2 Model of contagion and peak (distribution) parameters

A good explanation for the dynamics of a short-term epidemic can be obtained with an amended version of the Lotka–Volterra modelFootnote 1 (L–V). In its classical form, it describes the interaction of two species, one of which is a predator and the other a prey.

To apply the model to an epidemic, the prey could be defined as the share of the population available for infection S(t), which at the time \(\hat{t}\) equals to one, \(S\left( {\hat{t}} \right) = 1\). It is assumed that the population takes measures to dissociate the “prey” according to the simplest law

$$S^{\prime}\left( t \right) = - \alpha \, \cdot \,S\left( t \right),$$
(1)

where \(\alpha\) is a constant reflecting the efficiency of the containment measures.

The predator in this model is the share of infected N(t). A predator, similar to the L-V model, reproduces according to the law,

$$N^{\prime}\left( t \right) = \gamma \, \cdot \,\alpha \, \cdot \,S\left( t \right)\, \cdot \,N\left( t \right) + \tilde{\beta },$$
(2)

where \(\gamma\) is a constant reflecting the scale of the outbreak; \(\tilde{\beta }\) reflects the level of residual infection, which can be identified only at a late stage of the epidemic.Footnote 2

The solution to Eqs. (1) and (2) is the function of the absolute number of infected people, which takes the form:

$$N\left( t \right) = \hat{N}\, \cdot \,e^{{\gamma \left( {1 - e^{{ - \alpha \left( {t - \hat{t}~} \right)}} } \right)}} ,$$
(3)

where \(\hat{N}\) is the number of infected at time \(\hat{t}\) when the epidemic starts.

Both variables (\(\hat{N}~,~\hat{t}~\)) are dependent near the conditional start of the epidemic, but they can change without changing the type of function (3) by renormalizing, taking into account the parameters \(\alpha ,\gamma\). For \(\alpha \left( {t - \hat{t}~} \right) \ll 1\), Eq. (3) would be distributed exponentially \(N\left( t \right) \cong \hat{N}\, \cdot \,e^{{ - \gamma \alpha \left( {t - \hat{t}~} \right)}}\), where \(\hat{N}\, \cdot \,e^{{\gamma \alpha \hat{t}}}\) is invariant. To differentiate epidemics from one-offs it is worth considering \(\hat{N} = 100\), while \(\hat{t}\) define as \(t_{{100}}\). Therefore, for a normalized function of the number of infected people (3) \(N\left( t \right) = 100\, \cdot \,exp\left( {\gamma \left( {1 - e^{{ - \alpha \left( {t - t_{{100}} ~} \right)}} } \right)} \right)\) the three contagion parameters should be solved: \(\alpha ,\gamma ,~t_{{100}}\). These constants, which describe the development of the epidemic, should be determined in line with the dynamics of infection by minimizing the distance function, which will be discussed below.

The next reasonable stage is to define the peak outbreak parameters. The first important parameter is the moment of the peak contagion rate for (3), i.e. time \(t^{*}\), where \(N^{\prime}\left( t \right)\) is at maximum. This can be solved by \(N^{\prime } \left( t \right) = 0\). So,

$$t^{*} = \hat{t} + \frac{{\ln \left( \gamma \right)}}{\alpha },$$
(4)

which at its peak results in a maximum increase in the number of new infections per day.

$$\dot{N}^{*} = \hat{N}\, \cdot \,\alpha \, \cdot \,e^{{\gamma - 1}} ,$$
(5)

where \(\dot{N}^{*}\) allows us to evaluate the maximum load on the healthcare system.

Assuming \(\tilde{\beta } = 0\), from (2) one can forecast the maximum number of infected over the course of the epidemic:

$$N_{{max}} = \hat{N}\, \cdot \,e^{\gamma } .$$
(6)

The timing of the exit from containment measures (which should be highly correlated with the stages of epidemic ending) can be set up based on practical considerations, e.g., on the percentile p of the remaining infections until \(N_{{max}}\) is reached, so time \(T_{p}\) is calculated from the definition of the percentile,

$$p = \frac{{N_{{max}} - \hat{N}\, \cdot \,e^{{\gamma \left( {1 - e^{{ - \alpha \left( {T_{p} - \hat{t}~} \right)}} } \right)}} }}{{N_{{max}} }}.$$
(7)

From (7) for small p one could easily infer a simple formula to define the date for easing restrictions:

$$T_{p} = t^{*} + \frac{1}{\alpha }\ln \left( {\frac{1}{p}} \right),$$
(8)

where \(t^{*}\) denotes the peak of the epidemic (4).

As a reasonable option, one could begin the relaxation of the regime of restrictions when p = 10% (the end of the “high security regime”), and the second stage of easing when p = 1% (the end of the self-isolation regime).

3 Validation of a parametric contagion model on Ebola outbreaks

Equation (3) was validated on data of the Ebola virus epidemic (WHO, 2016). Calculations are performed on a daily basis for each date T during the course of the epidemic, without relying on the information known after date T. The result is shown in Fig. 1. The graphs are aligned along the horizontal axis (identical for all graphs). In chart (A) the current forecast value of the peak infection date for the model is calculated on a daily basis, the dashed line is the actual peak, the straight dotted line is the current date of the forecast T. Chart (B) shows the forecast of the maximum number of infected (Cmax) in all African countries, and the actual number of those who got the disease (dotted). Chart (C) shows the predicted and actual number of infections per day when all information is available.

Fig. 1
figure 1

Modeling the Ebola virus epidemic

The dynamics of the Ebola virus outbreak are well approximated by (3). Forecasts of the peaks of infections per day begin to approach their true values when the fluctuations subside. Approaching the values which explain the total number of infected (in the proximity of 10,000 people, or up to 30% of the true, ex-post measured number of infected) begins as the epidemic approaches the peak of infection. Before that, the forecasts are highly volatile and not significant.

One of the peculiarities of the approach is that it works well for outbreaks where the peak is over (see the Chart A). Earlier than that, the date of the epidemic peak might be less reliable.

Based on empirical observations of COVID-19 outbreaks, when the model starts to indicate that the peak is over, the true peak date should be already over for 7 days or more, and so a steady trend of the decreasing intensity of the infection rate begins.

4 Database description

For the purpose of this paper, we relied on data provided by the World Health Organization (2020), which monitors the outbreak on a daily basis. Country-wise information was obtained for the whole set of member countries available during the preparation of the paper. The data set consists of the total number of infected people in each country (for some countries the data was available for individual provinces or regions, e.g. China, the US) on a daily basis starting from January 22nd and ending June 29th, which are the first 160 days of observations of the COVID pandemic.

In addition, we used information on restrictions imposed by governments in different countries, which was collected by the Government Response tracker by Oxford Blavatnik School of Government (2020). This tracker is calculated by a team of 100 Oxford community members who have continuously updated a database of 17 indicators of government responses in 177 countries. These indicators examine containment policies such as school and workplace closings, restrictions on public events and public transport, and stay-at-home policies.

5 Indicators of deviation from the static model

Since the purpose of this study is to analyze the spread of the epidemic in different countries, it is necessary to propose indicators of the deviations of the actual dynamics from the static model. These indicators should show in which countries the observed dynamics differ significantly from the model, so that it is possible to give an interpretation of the systemic and non-systemic causes of the deviation from the “ideal” dynamics. Having identified these reasons, it is necessary to draw conclusions about the factors that lead to the unpredictability of the dynamics. In itself, such unpredictability is a significant cause of uncertainty both for the global economy and for individual countries.

We consider a daily increase in the number of infected people C’(t) and its static model N′(t), where \(t \in \left[ {\hat{t},T} \right]\) is a date in a specified region or country. \(\hat{t}\) is defined as a model starting date when the first \(\hat{N}\) cases are registered (it might be 10, 100 or 1,000 depending on definition in the calculations). In fact, the starting date would not have any effect on the calculation result, it should be chosen in accordance with the scale of the region, country, city, etc. Details of model calibration are in “Appendix”.

The Hodrick-Prescott deviation measures the relative difference of the model \(N^{\prime } \left( t \right)\) from its HP-filtered curve \(C^{\prime}_{\lambda } \left( t \right)\) based on actual data of daily cases.

To calculate this deviation, data for C′(t) is filtered from WHO (2020) with the Hodrick-Prescott methodology with the parameter \(\lambda = 1600\, \cdot \,90^{4}\) for the period \(t \in \left[ {\hat{t},T} \right]\) (as discussed in Hodrick & Prescott, 1997), yielding a time series \(C^{\prime}_{\lambda } \left( t \right)\). The deviation metric from the level model λ is given by the function:

  • over the entire observation interval:

    $$HP2\left( \lambda \right) = \frac{1}{{C_{{max}} }}\mathop \sum \limits_{{t = \hat{t}}}^{T} \left| {^{\prime}_{\lambda } \left( t \right) - N^{\prime}(t)} \right|,$$
    (9)
  • over the observation period after the peak of contagion is over:

    $$HP3\left( \lambda \right) = \frac{{\mathop \sum \nolimits_{{t = Peak}}^{T} \left| {^{\prime}_{\lambda } \left( t \right) - N^{\prime } \left( t \right)} \right|}}{{C\left( T \right) - C\left( {Peak} \right)}},$$
    (10)

    where Peak is the date of the peak of infection (4), \(C_{{max}}\) is the forecast of the maximal number of people infected according to model (6).

In Fig. 2 there are two countries with very different dynamics for the infection outbreak. One of the examples, Iran (chart A), describes a significant deviation of the modeled curve from the HP-filtered average, while the other case, Italy (chart B), describes an “ideal” coincidence of the model with filtered and actual data: the number of people actually infected is aligned with the modeled number.

Fig. 2
figure 2

An example of a country with a significant deviation from the model (Iran, A) and a country with an almost perfectly modeled dynamic fit (Italy, B)

The statistical data quality Q metric is necessary to define those countries in which the statistical data of the number of COVID-19 infected is collected improperly, which leads to unexplained statistical fluctuations in the number of new infections per day. When analyzing the deviations between the actual and model dynamics, such countries should be ignored (or, at least, treated with caution when judgments and conclusions are made) since such deviations are probably associated with significant statistical errors or local random outbreaks (as in China).

The metric shows the difference in the deviation of the actual number of cases per day C′(t) from the HP-filtered curve, adjusted for statistical error in relative terms (%):

$$Q\left( \lambda \right) = \sqrt {max\left( {0,\frac{{\mathop \sum \nolimits_{{t = \hat{t}}}^{T} \left[ {\left( {C^{\prime } \left( t \right) - C_{\lambda }^{\prime } \left( t \right)} \right)^{2} - C_{\lambda }^{\prime } \left( t \right)} \right]}}{{\mathop \sum \nolimits_{{t = \hat{t}}}^{T} C_{\lambda }^{\prime } \left( t \right)^{2} }}} \right).}$$
(11)

The problem of statistical data quality is very important, as it looks very poor even in such European countries as Spain and France (Fig. 3, charts A and B, respectively).

Fig. 3
figure 3

Unexplainable data errors in measuring the number of people with COVID-19 in Spain and France

The Stringency Index metric shows how intensively the restrictions (self-isolation, quarantine measures, etc.) were relaxed following the peak of the epidemic in countries (which had passed an infection peak at the time of the study):

$$SI = \frac{{\mathop \sum \nolimits_{{t = Peak}}^{T} IS\left( t \right)}}{{T - Peak + 1}},$$
(12)
$$SI\% = \frac{1}{{max_{{t \in \left[ {\hat{t},T} \right]}} IS\left( t \right)}}\, \cdot \,SI.$$
(13)

The lower the SI percentage, the faster the relaxation of the restrictions. For the sake of the analysis, we concentrated on countries with a large number of infected and where one peak of infection has already passed. The metric is based on daily data about the Stringency Index (from Oxford, 2020), which is among the metrics used by the Oxford COVID-19 Government Response Tracker.

The Stringency Index is defined in a range between 0 and 100 reflecting these indicators. The higher the index score, the higher the level of restrictions.

The Stringency Index provides a gauge that allows us to consider the timing of enforcing and relaxing epidemic containment measures.Footnote 3

6 The static infection model of COVID-19 and its results

In a setup of the static model with zero residual infection rate (\(\tilde{\beta } = 0\)) we compared the predicted results and the actual dynamics for 90 countries in the WHO (out of 177). However, in the paper we present results for a lower number of countries, as we have applied a set of filters and criteria which excluded many of them from the applicability of the analysis. The first criterion was to exclude countries which might have not passed the first peak, this excluded 29 countries.

The average deviation of the forecasted peak on the basis of the static infection model from the HP-filtered model (which is generally very similar to the actual data) is 5 days (with a standard deviation of 18 days), and when countries with a strong second wave are excluded, the deviation between the HP-filtered peak and the actual peak falls to zero (with a standard deviation of 5 days). If the distances of a predicted peak from the HP-filtered data are taken at absolute values, the averages increase, while the accuracy improves. This highlights that the model reflects reality well, and therefore it is worth considering this model as a basis for future investigations into the topic.

For the analysis of events and trends that impact the dynamics of the COVID-19 spread over countries we applied a set of additional filters. For instance, we decided to exclude the countries with a low number of infected (less than 5000 people), which cut 11 other countries from the analysis, some of which are important for the future research (e.g. Hungary, which shows clear signals of a second wave—in line with the many other CEE countries). The idea behind that was to avoid situations in which the outbreaks are too small to make reasonable conclusions about.

At the next stage, we excluded countries with a low quality of statistics (high Q metric)—another nine locations. Some of them are worth monitoring in the future, such as France, China, Sweden, South Korea. Finally, we choose 20 countries with relatively a high HP metric (see Table 1).

Table 1 Countries analyzed and their key metrics

The countries from Table 1 were analyzed to identify whether there is any relationship between the strength of the 2nd wave of the outbreak and the restrictions imposed by governments. The basic hypothesis was that easing restrictions would be directly related to the strength of the second wave of the coronavirus. Summarizing the findings of, for countries where the second wave is observed, the idea of a nexus between the speed of lifting restrictions and the speed of epidemic spread is supported in most cases (see Table 2).

Table 2 Comparison of the outbreak dynamics in selected countries with the Index of Stringency (summary)

For Afghanistan and Qatar, the evidence on the absence of a second wave might be not convincing, as the peaks of infections in those countries are too close to date when we stopped collecting the data (June 29th, 2020). Hopefully, those countries avoid the second wave, as the speed of lifting restrictions is low, but the situation should be monitored.

For countries with a second wave, the analysis suggests that the faster the speed of lifting restrictions, the higher the strength of the second wave. Moreover, the countries in the table can be additionally subdivided according to the following approach. Some countries with a weak or very weak second wave (these are the countries where the number of infected in the second wave is substantially less than the number of infected at the first peak)—Spain, Australia and Austria—do not fit into the overall picture well. However, for the last two countries the speed of the second wave compared to the minimal number of infections after the peak is substantially growing, so it might be the case that the second wave would be “strong”, and so they will fit the hypothesis.

In countries with a strong second wave (i.e. the number of maximal infected in the last two weeks of June is more than 40% higher than the previous peak), e.g. Iran, Israel, Turkey and the US, the link between lifting restrictions and the probability of the second wave might be weak. However, these cases seem to be very special for the following reasons: Middle-Eastern countries might have a somewhat different pattern of behavior in personal communication between people, the density of the population in cities, and the large proportion of small street retail businesses. Those two aspects might have impacted the link under investigation, although future research into formal relations between social and economic factors which might undermine the relation is needed. In the US, the situation is very different from county to county, and deeper research is required, yet, between the first and the second wave the country faced a series of mass public events when thousands of people met in the streets while the epidemic outbreak might not have been over.

The link between substantially eased restrictions and the increase in the number of infected during the second wave is specifically strong in some CEE economies (Fig. 4): when the restrictions are eased rapidly, the outbreak increases within a few weeks, and the strength of the second wave seems to be very powerful. This is even more important as the modeled dynamic of the outbreak implies that the number of infected should approach zero instead of a second wave occurring.

Fig. 4
figure 4

Outbreak waves and Index of Stringency dynamics in CEE countries

7 The residual infection model of COVID-19 and its results

We solve differential Eqs. (1) and (2) for the normalized \(\tilde{\beta } = \beta \, \cdot \,\hat{N}\, \cdot \,\alpha \, \cdot \,e^{{\gamma - 1}} > 0\). To obtain an absolute number of infected \(N\left( t \right)~\) and the speed of the daily contagion \(N^{\prime}\left( t \right)\) one uses the following equations:

$$\begin{aligned} N\left( t \right) & = \hat{N}\, \cdot \,e^{{\gamma \left( {1 - e^{{ - \alpha \left( {t - \hat{t}~} \right)}} } \right)}} \, \cdot \,\left[ {1 + \frac{\beta }{e}\left( {Ei\left( \gamma \right) - Ei(\gamma )\, \cdot \,e^{{ - \alpha \left( {t - \hat{t}~} \right)}} } \right)} \right]~N^{\prime}\left( t \right) \\ & = \gamma \, \cdot \,\alpha \, \cdot \,e^{{ - \alpha \left( {t - \hat{t}~} \right)}} \, \cdot \,N\left( t \right) + \beta \, \cdot \,\hat{N}\, \cdot \,\alpha \, \cdot \,e^{{\gamma - 1}} \\ Ei\left( x \right) & = \ln \,\ln ~\left( x \right)~ + \mathop \sum \limits_{{k = 1}}^{\infty } \frac{{x^{k} }}{{k \cdot k!}}~ \\ \end{aligned}$$
(14)

Equation (14) contains a vector of four unknown parameters \(\vec{p} = (~\alpha ,\beta ,\gamma ,\hat{t})\), which are defined by the optimization (minimization) of a metric function \(F\left( {\vec{p}} \right)\) (see “Appendix”) under the initial approximation of \(\beta = 0.\)

Our calculations during the preparation of this paper suggest that \(\beta\) depends significantly on the current phase of the pandemic. Therefore, for some countries \(\beta\) may still not be zero, which would result in a predicted increase in the number of cases (which, in theory, is unlimited if previously infected people do not have immunity) after the successful completion of the main phase. From a practical point of view, this should indicate caution in relaxing restrictions in this country. For example, at the time of writing, the international tourist season was open in Turkey, yet our model implies one of the world’s highest residual infection levels, which might result in new outbreaks in other countries (Fig. 5).

Fig. 5
figure 5

The case of Turkey: a country with constant residual infection rate, calculated and forecasted as of 19th July 2020

As of July 19th, the formulae (14) suggest that high residual infection rates will be observed in countries like Turkey, Ecuador, Romania, Portugal, Bosnia and Herzegovina, Belarus (ranked descending vs. the peak).

The level of residual infection is needed to serve as a starting value to implement this model for calculating the second and subsequent waves of the outbreak. This can be done by simply inputting \(\hat{N}~,~\hat{t}\) into (2) and (3) as the initial conditions of the new wave.

8 Conclusions and directions for future research

The paper proposed a COVID-19 impact dynamic model, whose result is a high quality prediction of the dates and scales of peaks as compared to actual data about infections all over the world for countries with reliable data.

It is the first wave of a pandemic which should attract a special fundamental scientific interest, because it is usually free from impacts of previous socio-economic measures to contain outbreaks, and from re-infection from other regions and countries, vaccinations, and other effects. The importance of such analysis is usually underestimated by the scientific community despite it connecting the efficiency of containment measures and the scale of outbreaks. Our approach to modeling the spread of a pandemic proposed and substantiated in this article has the value of providing a rapid and continual measure of whether restrictions should be lifted, as opposed to classical models (SIR and SIS) which require a much larger data set with revealed peaks.

The research provides convincing evidence that the proposed model predicts the dates of peaks and the scale of outbreaks well, and so might help avoiding the earlier-than-needed lifting of restrictions. Most important is that the model has been developed before the end of the pandemic, so although future data might impact both the results and conclusions, this continual modeling is a solid basis for future research in epidemiological studies.

The model has revealed a nexus between the appearance of the second wave of the pandemic and the easing of restrictions. The paper can also serve as a basis for hypothesizing about the quantitative analysis of this nexus. Such an analysis might help to design policy regarding the timing and speed of the easing of restrictions against such pandemics. We leave the testing of the quantitative hypotheses for future investigations.

As this modeling is done while the statistics have not been collected for all countries, it is difficult to judge whether any nation has already passed the peak of infection, the conclusions might change with the new information.

The results for specific countries obtained in this work at the end of June 2020 may not be relevant at the time of its publication, but forecasting and warning methods based on the presented model will be relevant when choosing the intensity and socio-economic methods of suppressing possible future pandemics.

The generalization of the model on the basis of residual infection would help in identifying effective criteria to recognize the appearance of the second and subsequent waves of COVID-19 infections. It seems that the waves would follow the same law of contagion as the first wave, but with different parameters.

Despite this, the model requires future development in several ways, e.g., the implementation of updated empirical information as well as new socio-economic and medical instruments for pandemic containment. It also needs more careful verification for a wider exploration for policy advice. One possible direction could be research into the links between the proposed model and risk-management models including systemic risk models (Springer, 2021). In addition to methods referred to or presented in this paper, it is worth considering a comparison with machine learning and artificial intelligence-based approaches in a manner similar to that conducted in Karminsky and Burekhin (2019).