Introduction

Viral disease transmission depends on various factors such as human movement and contacts or possible biological dynamics (Kraemer et al. 2019). Although human transfers cause disease transmission, sometimes the origin of viral disease can be animals (Barker and Reisen 2019). Rarely, a virus by chance or genetic mutation can be transmitted from its original host to another animal and replicate and infect the other hosts except for the main host, such as the emergence of a novel human coronavirus. Therefore, the new coronavirus can spread from one person to another in the mutated species and has an outbreak worldwide (Fung et al. 2020). Most human coronaviruses cause infections, and their epidemic type is the beta-coronavirus family, which causes acute and severe respiratory coronavirus syndrome (SARS) and Middle East respiratory syndrome coronavirus (MERS) outbreak and in last two decades that killed more than 10,000 people worldwide. Typical epidemic estimation models classified people based on infection-related factors such as dynamic patterns (Arino et al. 2007; Riley 2007; Sattenspiel 2009). These logistic models usually ignore the peak and the period length of transmission (Chowell et al. 2016; Chowell 2017; Pell et al. 2018). Moreover, they estimate just a single wave of the epidemic incident curve. The use of sub-epidemic models can help group people into different clusters to estimate the incident curve in multiple curves (Chowell et al. 2019; Dorosti et al. 2020; Hassantabar et al. 2020).

Based on studies, it is confirmed that the outbreak of COVID-19 in other countries, including Germany, France, Thailand, Japan, South Korea, Vietnam, Canada, the USA, Hong Kong, and Taiwan, has a direct relationship with traveling of infected people of China to the other countries; therefore, restricting travelers can significantly prevent the virus spreading (Ralph et al. 2020). Based on evaluation in analyzing the impact of travel restrictions on national and international COVID-19 outbreaks, it was shown that most of China’s cities had already received many infected travelers by the beginning of the travel ban in Wuhan on January 23, 2020. Wuhan travel quarantine delayed epidemic progress in China only 3–5 days but has a more significant impact internationally (Chinazzi et al. 2020). The study of Wuhan-Thailand travelers that are infected with COVID-19 shows that all cases are not directly linked to the Huanan Seafood Market but their genomes are identical (Anzai et al. 2020). Based on the analysis, the abolishing of travels from China to other countries in January and February 2020 because of COVID-19 reduced the probability of a major epidemic in Japan by 7 to 20% (Okada et al. 2020). The risk of exporting from the affected regions in China to Europe by air shows that the risk is still high in Europe, especially Britain, Germany, and France which are most at risk. Imports from Beijing and Shanghai will pose a greater and wider threat to Europe (Pullano et al. 2020). With increasing international travel around the world, new tools have been developed to help identify areas where diseases may be prevalent with the aims of taking preventive measures.

In this paper, we predicted the COVID-19 outbreak in Iran with the use of a sub-epidemic model. Finally, we estimate risk factors of travelers’ infection based on inter(intra)-provincial movement. The main objectives of the paper are:

  • SIR method for incident prediction with a single peak

  • Generalized logistic growth model (GLM) for prediction of incidents with multiple peaks and division to single-peak pandemics

  • A probabilistic model for assessment of travel-related risk

In the following section, in “Literature Review,” recent researches of COVID-19 are assessed related to prediction, modeling, and risk assessments. In “Methods and materials,” all proposed methods with mathematical calculations are provided. Furthermore, in “Results and discussion,” all the findings are illustrated graphically and interpreted. Finally, in “Conclusion,” results are summarized.

Literature review

There are some researches about the prediction and forecasting of the COVID-19 pandemic with different types of mathematical models. For instance, Alzahrani et al. used the Autoregressive Integrated Moving Average (ARIMA) model to analyze COVID-19 incidence in Saudi Arabia. They used ARIMA (2,1,1) to forecast the daily number of infections in some in the following week. The ARIMA model can predict time series with increasing or decreasing trends better than oscillating (Alzahrani et al. 2020). Sarkar et al. also presented a SIR-based model called SARIIqSq to model the COVID-19 pandemic in India. They developed the SIR model by adding isolated infected and quarantined susceptible to the database of prediction. Their finding shows the importance of quarantining for prevention of future infection and extending of the pandemic. The results reveal that the best time and quality of patient quarantining and isolation are important factors for prevention (Sarkar et al. 2020). Singhal et al. used the Gaussian mixture model to analyze the COVID-19 outbreak. This study was performed for the USA, Italy, and India. The proposed model extended the data-driven method for the segregation of trends for the daily incidence of COVID-19 (Singhal et al. 2020). Wang et al. presented a logistic growth model and a machine learning approach for prediction of the COVID-19 outbreak. They used time series data in their model; however, it cannot contain travel-related risk and holiday effects on the model (Wang et al. 2020).

Researchers focused on the environmental effects of COVID-19 outbreak and incidence not barely time series analysis. Fareed et al. studied the climate effect on the mortality rate of COVID-19. They investigated humidity and air quality to predict the new incidence of COVID-19. Their effects lead policymakers to control incidence (Fareed et al. 2020). Moreover, Shakoor et al. discussed about air pollution influence on the COVID-19 outbreak in the USA and China. Results reveal that global incidence and shutting down conditions cause the reduction of air pollution (Shakoor et al. 2020). Shahzad et al. studied the impact of daily temperature and PM2.5 air particles on COVID-19 widespread in Spain. The result showed a direct impact on air temperature and the virus spread (Shahzad et al. 2020b), while some research revealed that temperature led to a decrease in the virus infection rate (Iqbal et al. 2020). However, in some investigation, different provinces had different effects; for instance, in the Hubei province, temperature effect is positive, while Jiangsu showed a negative effect (Shahzad et al. 2020a). In the study of climatology parameters in the virus spread in the USA, Doğan et al. observed that temperature effects are negative and air humidity has a direct relationship with the outbreak speed (Doğan et al. 2020). Finally, in a study from Iran provinces, it was illustrated that daily temperature has no significant effects on the COVID-19 outbreak. However, the most important factors of outbreak growth are the province population density and intra-province movements (Ahmadi et al. 2020).

Methods and materials

SIR model for incidence estimation

The SIR model is based on a differential equation without involving demographic features. The SIR model is defined as following Eqs. (1–3):

$$ \frac{dS(t)}{dt}=-\frac{\beta S(t)I(t)}{N},\kern0.5em \frac{dR(t)}{dt}=\gamma I(t),\kern2em \frac{dI}{dt}=-\left(\frac{dS(t)}{dt}+\frac{dR(t)}{dt}\right)\kern0.5em $$
(1)
$$ S(t)+I(t)+R(t)=N. $$
(2)
$$ {R}_0=\frac{\beta }{\gamma } $$
(3)

where S, I, and R are the number of susceptible, infected, and removed (death and recovered) people between N populations. The sum of these three parameters is constant over time. Moreover, the reproduction number R0 is defined as Eq. (3) (Bjørnstad et al. 2002).

Generalized logistic growth model for sub-epidemic decomposition

The GLM for sub-epidemic estimation is defined as following Eq. (4).

$$ \frac{dC(t)}{dt}= rC{(t)}^p\left(1-\frac{C(t)}{K_0}\right) $$
(4)

where C(t) is a cumulative infected population over time that \( \frac{dC(t)}{dt} \) shows the incidence. r indicates growth rate, p indicates the scaling parameter, and K0 is the final epidemic size. Finally, each of the sub-epidemic waves is defined as a coupled differential equation as following Eq. (5) (Chowell et al. 2019):

$$ \frac{d{C}_i(t)}{dt}=r{C}_i{(t)}^p\left(1-\frac{C_i(t)}{K_i}\right){A}_{i-1}(t) $$
(5)
$$ {A}_i(t)=\left\{\begin{array}{c}1\kern1.5em {C}_i(t)>{C}_{\mathrm{thr}}\\ {}0\kern4.75em \mathrm{else}\end{array}\right. $$
(6)

Let i be the counter of sub-epidemic so that Ai(t) is a controlling parameter.

Travel-related risk estimation

The epiflow R package was introduced to assess the risk of travel-related illness outbreaks. This package was used to assess the risk of yellow fever in southeastern Brazil from December 2016 through May 2017 (Moraga et al. 2018). Also, the epicontacts package is an R package that provides a unique data structure to combine into a single object to facilitate more efficient visualization and analysis. This package includes interactive visualization functionality as well as network analysis techniques that have now been developed, maintained, and introduced as part of the R Epidemics Consortium (RECON) (Nagraj et al. 2018). The presented risk assessment is the estimation of the expected number of infections in the presence of usual travel between provinces. The inter(intra)-provincial movement describes the exportation and importation of people with uncertainty measures. Cw, s indicates the cumulative infected population over time windows w in the location of s. Moreover, pops is the province population and \( {T}_{S,D}^w \) is the number traveler between province S and D. Now, the probability of traveler infection is defined as follows (Dorigatti et al. 2017):

$$ {P}_D=\frac{T_{S,D}^w}{\mathrm{p}\mathrm{o}{\mathrm{p}}_s} $$
(7)

Also, the incubation period DE and the infectious period DI are distributions that would be determined based on the nature of the epidemic virus. Therefore, the probability of incubation and infection in time interval W is defined as follows:

$$ {p}_i=\min \left(\frac{D_E+{D}_I}{W},1\right) $$
(8)
$$ {E}_{S,D}={C}_{S,W}.{P}_D.{P}_i $$
(9)

where ES, D is the cumulative number of possible infected people traveling between province S and D. Furthermore, the risk of infection is equal by (Dorigatti et al. 2017):

$$ {\lambda}_s=\frac{C_{S,W}.{L}_0}{\mathrm{p}\mathrm{o}{\mathrm{p}}_s.W} $$
(10)

where L0 is the average length of staying in the destination that we consider a single day as an assumption. Eventually, the probability of infection in the incubation period is calculated as:

$$ {I}_{S,O}={T}_{S,O}^W.{\lambda}_s.{p}_I $$
(11)

To conclude, we can estimate the total number of importation and exportation TS, O with summation of IS, O and ES, O (Dorigatti et al. 2017).

Results and discussion

Data resource

In this paper, two sources are used to extract related data. The first resource is related to COVID-19 infections and deaths for Iran. The datasets generated and/or analyzed during the current study are available at the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University (Dong et al. 2020). Moreover, the inter-provincial population movement of Iran during a year has been extracted from official Road Maintenance and Transportation Organization (RMTO) information.

Study of the COVID-19 outbreak in Iran

The spread of viral diseases causes considerable suffering and mortality in the affected population and threatens the public health and the social and economic well-being of the humans affected, as well as the significant economic costs to local and national governments. The current outbreak of COVID-19 has quickly started in China and has spread across countries. Iran, as one of the countries involved in this disease, needs risk assessment and forecasting to better control the pandemic. The first case of COVID-19 involvement on February 19, 2020, from Qom province, began with the death of two patients and gradually spread in the whole country, with the result that Iran was nearly shut down.

The outbreak of the disease in Iran is extremely high that trends with 99% accuracy as shown in Fig. 1a. In the meantime, several patients have recovered, which promises to improve the health of this group of patients, and several patients have unfortunately died. Concurrent with the rising trend of the disease, reports suggest continued shutdowns of public centers, cuts in office hours and markets, and closures of some cities by law enforcement force or locals. The rate of outbreaks is directly related to the province’s population and inter-provincial transportation, which can be greatly controlled and reduced by quarantining the provinces. Qom is known as the center of the outbreak because the first case of COVID-19 was detected in it. However, Tehran as the most populous province has the highest rate of infection and deaths from the COVID-19 outbreak. Results show that provinces with a high population have the highest rate of infection (Ahmadi et al. 2020). In many countries, COVID-19 incidence contains a single peak; however, for Iran, the condition is different. Many factors such as social distance, traveling, and economic crisis (need of working) are factors that affect the incidence pattern. The results of incident estimation using SIR until 11 May 2020 are shown in Fig. 1. Based on the number of daily infections in Fig. 1b, we estimated that the end time of the outbreak in Iran should have been early June, However, the second peak changed all the main governing problem. In other words, based on the SIR model, reproduction number equals 1.09 so that sum of sustainable, infected, recovered, and death is 601,895 until 11 May 2020, with 505,875 sustainable and 96,020 removed (recovered or death). Iran experiences reducing the slop of infection at the COVID-19 pandemic where growth rate was decreased based on Fig. 1c.

Fig. 1
figure 1

Results of epidemic analysis of the COVID-19 pandemic

Now (13 August 2020) that all equations are changed, we faced several peaks of incidence. In other words, several pandemics were started and stopped in time intervals. Regarding the incident plot in Fig. 2, we can see several oscillations, for example, the property graph is different between the first and second month of infection. Therefore, we need other types of analysis to estimate the changes. Thus, the sub-pandemic method is a better choice. In this regard, prediction models based upon the GLM is illustrated in Fig. 2. The model is performed for two, three, and four epidemic waves. The sum of all epidemics (red line) should predict the main daily incidence. The first epidemic waves are almost stopped at 100 days of infection until early June 2020 where similar results of the SIR model are presented in Fig. 1. Based on the results, epidemic assumption is considered. Second, third, fourth and also the next pandemic have started. For instance, based on Fig. 2a, an epidemic with a double peak should be terminated at almost 200 days of infection. However, the observations show the effects of other sub-epidemics. The observation shows an almost horizontal line of infection; therefore, other sub-epidemics also are possible. We can say that this pandemic could be treated for different reasons. For example, these waves can be from different provinces or climates, or it can be different controlling behaviors. Finally, we forecast the number of incidences for future days. Based on the behavior of the pandemic plot, an autoregression model can estimate the ongoing day. The result can be seen in Fig. 3. The red curve shows the actual number of incidences, the blue line is the smoothed value of incidence, and the green line is the forecasted curve for future behavior of the COVID-19 outbreak in Iran.

Fig. 2
figure 2

Results of estimation of sub-epidemic waves

Fig. 3
figure 3

Results of estimation of sub-epidemic waves

The comparison of the presented method with other approaches for prediction of the outbreak of COVID-19 in other countries is illustrated in Table 1. Regarding Table 1, one of the important properties of the presented method is the support of multiple-peak pandemics. In other models, a pandemic involved a single extremum that only presented maximum incidence, pandemic end time, and total numbers of infection. However, the proposed method can flexibly choose single- or multiple-peak outbreaks.

Table 1 The comparison of some COVID-19 outbreak prediction models used in the literature

Study of travel-related outbreak risk

While COVID-19 is widespread throughout Iran and infections are seen around the country, traveling increases the risk of an outbreak. With increasing inter- and intra-provincial trips across Iran during the pandemic period, new surveillance tools are needed to help identify locations where diseases may spread or preventive measures should be needed. This article is aimed at assessing the risk of a travel-related outbreak. Estimates have been made of the expected number of symptomatic and asymptomatic infections that can be transmitted to other parts of the country. Estimates are obtained from the number of reported cases by integrating data on the cumulative number of cases reported, population movements, length of stay, and information on the distribution of incubation and infectious disease periods. Data on COVID-19 infections have been extracted from the official Iranian Ministry of Health data. Also, the inter-provincial population movement of Iran during a year has been extracted from official RMTO information.

In this study, Tehran as the most populous province of Iran with more than thirteen million population according to the 2016 census has been considered, because maximum numbers of infection and death are seen in Tehran. Therefore, the risk of inter-provincial travel has been examined in this study. Internal travelers are likely to be affected by the disease while residing in a province and increase the outbreak of the disease in their province. This study is necessary to limit the outbreak in different regions of Iran. Figure 3 shows the population flow as a spatial network. The graph is represented by the origin and destination of the trip in the y- and x-axes, respectively.

According to Lauer et al. (Lauer et al. 2020), the incubation period of COVID-19 is 5.1 days and infectious periods are 11.5 days with a 95% confidence interval. These numbers are considered with the normal distribution. The results of Fig. 4 show the outbreak of the disease in the provinces of Iran when traveling to Tehran province and vice versa. This estimate is obtained regardless of patients within each province. Due to high frequency and high population movement, most of the patients will be from Isfahan, Gilan, Mazandaran, West Azerbaijan, and Tehran province itself. This is considered a 1-day stay in the destination province. Other provinces also would experience possible infection to 100 new patients. It is noticed that these possible numbers of infections would be just 1 day of stay at Tehran; in another word, it is obvious that the travel is runtime and the possible number of incidents can be greater than our results (Fig. 5).

Fig. 4
figure 4

Risk of outbreaks in Iran provinces if traveling to Tehran province and vice versa

Fig. 5
figure 5

The network graph of internal movement between provinces

Conclusion

COVID-19 (SARS-CoV-2) is one of the viral pandemics that involved all the world in 2019–2020. People and business managers need to schedule their works based on times that the virus would stop the infection. Therefore, the forecasting of the end time is substantial. Moreover, studying trends of the virus incident not only helps governments to control the disease but also leads to finding importance in determining infection control factors. In this regard, modeling of infection is helpful. Therefore, in this paper, we present models of sub-epidemics to find trends and behavior of infection in the case study of Iran. We used the GLM technique to simulate the COVID-19 incident by single, double, triple, and quadruple sub-waves of infection. It can help decision-makers to find possible sub-epidemics. Based on the results of single-wave epidemic modeling using SIR techniques, it was shown that it had been forecasted that roughly first June 2020 was the end time of infection in Iran. However, the next observations were changed and tend to generate other epidemic waves. Therefore, the presented sub-epidemic model can estimate pandemics with several peaks of incidence. It is noteworthy that many citizens are going to travel and leave the quarantine area by neglecting the virus. This issue increases the outbreak of disease from one province to another. Therefore, we evaluated this risk by examining Tehran as the most populated province in Iran. The results show that the provinces of Isfahan, Gilan, Mazandaran, West Azerbaijan, and Tehran are at higher risk than other provinces. Therefore, the government should prevent major disasters by restricting entry and exit in these provinces.