1 Introduction

An epidemic is a disease that spreads rapidly to a large number of people in a given population within a short period. Many epidemics occur in the world. Covid-19 and Ebola are recent prominent examples.

People have tried many methods to study epidemics. The susceptible, infected, and recovered model (SIR model) is considered as one of the seminal models of epidemics [1]. A recent work [2] gives a comprehensive review of the methods to model and analyze Covid-19. Out of these methods, the models relevant to our model are compartmental models. They are prominent methods used for the analysis and prediction of Covid-19 dynamics [2,3,4]. However, these works consider only infection from people to people from within the population and do not consider any external source of infection explicitly.

World Health Organization (WHO) has identified external transmission as one of the three modes of transmission [5]. According to WHO, the infection within the population is called as Local transmission and community transmission, and the infection external to the population is called as Imported cases. We call the infection from a source within the population and external to the population as the endogenous and exogenous spread of infection, respectively. Human migration is one of the prime reasons behind the exogenous spread of infection.

The governments can intervene to curb the spread of the disease by bringing in policies to stop human mobility [6]. However, the implementations of such intervention policies have a lot of challenges. Social disagreement is an example. Social disagreement means people do not abide by the government’s directives. The Tablighi JamaatFootnote 1. religious congregation that happened in India and the human mobility as a result of it is an example of social disagreement.

As a part of intervention, the Governments can restrict human mobility. However, they cannot completely prevent all such human mobility and migration. For example, in the Indian sub-continent, people migrate to metropolitan cities for work. Due to the risk of Covid-19 exposure in these overpopulated cities, people migrate back to their homes [7]. This is also known as reverse migrationFootnote 2.

The government cannot deny one’s right to go home. However, the government can allow necessary movement in a controlled manner. For example, when people move from one state to another, the state governments can issue passes for anyone who is allowed to travel to that state similar to what was practiced by the State of KeralaFootnote 3. They can identify the incoming people and ensure that they correctly follow the procedures advised by the respective governments.

These movements will increase the exogenous spread of the infections compared to the ideal condition of sealed borders. To find the amount of infection during this movement and when the peak occurs, authorities need an explicit model that can predict infection through exogenous means [8]. Our model extends the SIR model and explicitly takes care of the amount of exogenous infection and endogenous infections. In the case of the spread of epidemics, even if there is a small increase in the number of infected people, the impact grows exponentially with time. Hence, it is important to consider the exogenous infections while studying the dynamics of epidemics [9]. This allows the governments to have pertinent information regarding the possible exogenous infections. This gives the government authorities time to prepare their medical resources accordingly.

There are challenges even if people do not migrate. For example, frontline workers like doctors and nurses are more frequently exposed to the virus than a common man. Correspondingly, we need to be able to model different rates of infections for different groups of people. The governments will have to make all the necessary safety equipment available to the frontline workers and monitor their health constantly to control the infection as a measure of intervention.

In this context, we address the following research questions that significantly modifies the current, well-studied SIR model by infusing external knowledge related to pandemic [10]:

  1. 1.

    How to quantify the exogenous spread of infection?

  2. 2.

    What is the interplay between the exogenous and endogenous spread of infection concerning the following:

    1. (a)

      In the presence of social disagreement.

    2. (b)

      In the presence of controlled migration.

    3. (c)

      In the presence of n communities that have a different rate of infection—e.g., frontline workers such as healthcare workers or hospitality workers.

  3. 3.

    What is the change in the peak position (the most significant number of people infected in a unit of time) in the presence of exogenous infection?

  4. 4.

    What is the change in the height of the peak in the presence of exogenous infection?

The following are our contributions in this work. We study the impact of external reasons of infections such as cross-border mobility on COVID infection by introducing a novel SIR-like compartmental model called Exo-SIR.

We study three variants of the model applicable for special scenarios like the presence of social disagreement, the presence of different groups that have a different amount of risk, and infectiousness like the frontline workers.

We analyze the interplay between endogenous and exogenous infections during the Covid-19 and Ebola pandemics in the following ways.

  1. 1.

    Analytically.

  2. 2.

    By simulating the Exo-SIR model with and without assuming contact network for the population.

  3. 3.

    By implementing the Exo-SIR model on real datasets regarding Covid-19 and Ebola.

We compare the predictions of Exo-SIR with the SIR model using real data on the recent spread of the Covid-19 in India and the USA and the spread of Ebola in Africa as the ground truth.

This paper is structured as follows. Section 2 discusses related works and preliminaries. Then, we formulate the Exo-SIR model by extending the SIR model and discuss the different variants of the model (in Sect. 3). We analyze our model by comparing it with the SIR model and study the behavior of the infected population in the presence and absence of exogenous infection (in Sect. 4). Then, we describe the simulation study where we simulated the SIR model and Exo-SIR model and compared them (in Sect. 5). Finally, we study the real data of Covid-19 and Ebola epidemics (in Sect. 6).

2 Related works

Here, we discuss the works related to the idea of exogenous influence to the population under study.

The work in [11] considers exogenous infections for Malaria at China—Myanmar border. However, the model is not deterministic. In a deterministic model, individuals in the population are assigned to different subgroups or compartments, each representing a specific epidemic stage. Deterministic models often provide useful ways of gaining sufficient understanding about the dynamics of populations whenever they are large enough [12]. Also, the deterministic models are simpler and more popular [13, 14]. A mobility-based SIR model [15] is a deterministic model. Our model is also deterministic.

2.1 Models of external influence on online social networks

Information diffusion in online social networks is similar to the way the virus spreads in a population [16]. There are a few recent works in the literature that attempt to model the external influence in information diffusion in online social networks [17]. Moreover, [17] and [18] propose information diffusion model on the network. These works assume that the information flows through an underlining network. Also, they consider links from other websites like the mainstream media as external sources of information. Internal diffusion is when the shared messages do not have any external links.

The work described in [17] uses very specific parameters like the following:

  • Probability of any node receiving exposure at time t

  • The random amount of time it takes an infected node to expose its neighbors

  • How the probability of infection changes with each exposure

  • The probability that a node I have received n exposures by time t

The work described in [18] traces the information cascade and thereby tries to reconstruct the underlying graph structure as much as possible. Also, they conclude that external influence has a bigger impact on the network when compared to the influence of social media influencers.

The model that is closest to our work is Yang et. al.’s model [19]. This model is an extends the SIR model (explained in Sect. 2.4) by including the external influence on the network. State transition diagram of the diffusion mechanisms of this model is given in Fig. 1.

Fig. 1
figure 1

State transition diagram of the diffusion mechanisms in Yang et al’s model. Diagram taken from [19]

This model is defined in the following way.

$$\begin{aligned}&s+i+r = 1 \end{aligned}$$
(1)
$$\begin{aligned}&\frac{\mathrm{d}s}{\mathrm{d}t} = -p_{1}ksi-((1-p_1)p_3+p_4)\theta s \end{aligned}$$
(2)
$$\begin{aligned}&\frac{\mathrm{d}s}{\mathrm{d}t} = -p_{1}ksi-((1-p_1)p_3+p_4)\theta s \nonumber \\&\qquad \quad -(1-p_1)p_5ksi \end{aligned}$$
(3)
$$\begin{aligned}&\frac{\mathrm{d}i}{\mathrm{d}t} = p_{1}ksi+p_4\theta s-p_2i \end{aligned}$$
(4)
$$\begin{aligned}&\frac{\mathrm{d}r}{\mathrm{d}t} = p_2i+(1-p_4)p_3\theta s+(1-p_1)p_5ksi \end{aligned}$$
(5)

In Fig. 1, there are two possible transitions from the state S to I. One path is the normal endogenous path, and the second is due to external influence. These transitions have probabilities \(p_1\) and p4, respectively. Similarly, there are two possible transitions from the state S to R—one through endogenous and the other through external influence. Their probabilities are \(p_5\) and \(p_3\), respectively. However, the transition from the state I to R is not affected by external influence(s).

Although the exogenous infection is modeled in Yang et al.’s model, it fails to capture the dynamics between endogenous and exogenous infections. This is because they do not differentiate between the infections due to exogenous factors from those due to endogenous factors.

2.2 Other studies of endogenous and exogenous information diffusion

The dual nature of message flow over the online social network is studied and verified in [20]. Here, the dual nature refers to the injection of exogenous opinions to the network and the endogenous influence-based dynamics. In [21], the authors propose a method for extracting the relative contributions of exogenous and endogenous contents. In [22], the authors postulate that the nature of the information plays a crucial role in the way it spreads through the network. They quantify two properties of the information—endogeneity and exogeneity. Endogeneity refers to its tendency to spread primarily through the connections between nodes, and exogeneity refers to its tendency to spread to the nodes, independently of the underlying network. In [23], the authors study the bursts that originate from endogenous and exogenous sources and their temporal relationship with baseline fluctuations in the volume of tweets. The study reported in [24] classifies the bursts into endogenous and exogenous. According to this study, those bursts that reach the peak almost instantaneously after the diffusion starts and then go down slowly are exogenous bursts. Also, those bursts that gradually increase and slowly decrease are endogenous.

2.3 Compartmental models for Covid-19 modeling

Compartmental models are prominent methods that are used for the analysis and prediction of Covid-19 dynamics. The SIR model is one of the seminal compartmental models. Many compartmental models have come up recently to improve the SIR model. QSIR model [25, 26] is an example in which they add an extra state to the standard SIR model that represents the number of people in Quarantine. SPCIRD model [27] adds three extra states—P, C, and D, where P represents the number of susceptible people who are partially controlled. Partially controlled people are those who can be considered as people not conforming to all the restrictions of the Quarantine. C represents the number of susceptible people who are controlled. Controlled people are those who can be considered as people conforming to all the restrictions of the Quarantine. D represents the number of people who died. Multiple epidemic wave model [28] as its name suggests models the multiple waves of infection that could occur. Time-dependent SIR model [29, 30] considers the constants in the SIR model—beta and gamma to be varying with time. However, none of these models consider infections arising from outside the population, mostly due to the cross-border mobility of infected people. Hence, we introduce the Exo-SIR model to address this particular issue.

2.4 SIR model

This section briefly reviews the SIR epidemiological model to learn how epidemics spread through population. SIR is often used to study information diffusion by approximating the process of epidemic spread.

In this model, the population is classified into three—susceptible (who are prone to infection), infected (who contain the infection), and recovered (who do not have the infection and its associated symptoms). In the limit of sizeable total population N that does not change over time, the given equations model the dynamics of the spread [31]:

$$\begin{aligned}&s(t)+i(t)+r(t) = 1 \end{aligned}$$
(6)
$$\begin{aligned}&\frac{\mathrm{d}s}{\mathrm{d}t} = -\beta si \end{aligned}$$
(7)
$$\begin{aligned}&\frac{\mathrm{d}i}{\mathrm{d}t} = \beta si - \gamma i \end{aligned}$$
(8)
$$\begin{aligned}&\frac{\mathrm{d}r}{\mathrm{d}t} = \gamma i \end{aligned}$$
(9)

where the fraction of susceptible, infected, and recovered people at time t are represented by s(t), i(t), and r(t), respectively. \(\beta \) is the rate of infection, and \(\gamma \) is the rate of recovery.

3 The model

In this section, we propose the Exo-SIR model. It differs from SIR model in the following ways. It classifies infected nodes into two different types—Infected from exogenous source and Infected from endogenous source. It also differentiates between the spread from endogenous and exogenous sources.

Susceptible nodes become infected with a certain probability called the rate of infection. This rate could be different for endogenous and exogenous infections. The nodes affected by endogenous and exogenous sources move into different states. We assume that susceptible nodes get infected from only one of these sources and never from both sources. Hence, even when some nodes are susceptible to endogenous and exogenous infection, they become infected by either an endogenous or an exogenous source but not both. The infected nodes recover with a certain probability called the recovery rate. These nodes move into the recovered state. The advantage of the Exo-SIR model compared to the SIR model is that we can observe the endogenous and exogenous diffusion separately.

We use the following notations:

  • S state of susceptible

  • \(I_x\) state of infected from exogenous source

  • \(I_e\) state of infected from endogenous source

  • R state of recovered

  • \(i_x\) Fraction of nodes that are infected from exogenous source

  • \(i_e\) Fraction of nodes that are infected from endogenous source

  • r Fraction of nodes that are recovered

  • \(\beta _x\) Rate at which the exogenous source infects the nodes

  • \(\beta _e\) Rate at which the nodes infect other nodes

  • \(\gamma \) Rate at which the nodes get recovered

  • We use the words infection, diffusion, and spread interchangeably according to the context.

The state transition diagram of the Exo-SIR model is given in Fig. 2.

Fig. 2
figure 2

State transition diagram of the nodes in the Exo-SIR model

We classify infected nodes into two different types—infected from exogenous source \(i_x\) and infected from endogenous source \(i_e\).

$$\begin{aligned} i_e+i_x = i \end{aligned}$$
(10)

We assume that the total population remains constant.

$$\begin{aligned} s+i+r = 1 \end{aligned}$$
(11)

A fraction of the susceptible people s gets infected by exogenous sources, and another fraction of s gets infected by endogenous sources. For endogenous infection, the population that is infected plays a big role. Hence, we have

$$\begin{aligned} \frac{\mathrm{d}s}{\mathrm{d}t} = -\beta _x s -\beta _e s i \end{aligned}$$
(12)

Increase in \(i_{x}\) is determined by the number of susceptible nodes and the decrease in \(i_x\) is determined by \(i_x\). This gives

$$\begin{aligned} \frac{\mathrm{d}i_x}{\mathrm{d}t} = \beta _x s - \gamma i_x \end{aligned}$$
(13)

Increase in \(i_{e}\) is determined by the number of susceptible nodes and the number of infected nodes and the decrease in \(i_e\) is determined by \(i_e\). This gives

$$\begin{aligned} \frac{\mathrm{d}i_e}{\mathrm{d}t} = \beta _e s i - \gamma i_e \end{aligned}$$
(14)

Increase in r is determined by the number of infected people in the network. This gives

$$\begin{aligned} \frac{\mathrm{d}r}{\mathrm{d}t} = \gamma i \end{aligned}$$
(15)

3.1 Variants of the model to address specific situations

In this section, we discuss how the Exo-SIR may be used in the different situations.

3.1.1 Exo-SIR model with social disagreement

This scenario occurs when people do not abide by the government’s orders, for example, not wearing masks, not following social distancing, etc. As a result, more people contract the virus, and hence, the infectiousness of the disease will go up. This can be represented in the Exo-SIR model by increasing the \(\beta _e\) value.

3.1.2 Exo-SIR model with people migrating with the permission of the government

This scenario can be studied using the Exo-SIR model. Here, we assume that when the government allows people to travel, the government makes sure that these people are isolated and given treatment. Change in \(i_{x}\) is influenced by the action of the government that allowed people to travel across their border. Hence, planning and execution efficiency to minimize the impact are essential. This is captured in \(\beta _x\). If the government efficiently contains the infection from these people, then the value of \(\beta _x\) goes down.

3.1.3 Exo-SIR model with multiple groups that have different risk of infection

Fig. 3
figure 3

State transition diagram of the model

This case may be depicted as shown in Fig. 3. In this case, there are n different groups of susceptible people with varying levels of infection risk. Hence, we add them up wherever we use s in the equations of the Exo-SIR model. Also, the value of each parameter is different for a different group of people. Hence, we have different values for each group of people for the parameters. Hence, there will be the summation of the n groups and parameters for each group. Figure 3 is the state diagram and the equations are given below.

$$\begin{aligned}&i_e+i_x = i \end{aligned}$$
(16)
$$\begin{aligned}&s = \sum _{k=1}^{n}s_k \end{aligned}$$
(17)
$$\begin{aligned}&s+i+r = 1 \end{aligned}$$
(18)
$$\begin{aligned}&\frac{\mathrm{d}s}{\mathrm{d}t} = -\sum _{k=1}^{n}\beta _{xk} s_k - \sum _{k=1}^{n}\beta _{ek} s_k i \end{aligned}$$
(19)
$$\begin{aligned}&\frac{\mathrm{d}i_x}{\mathrm{d}t} = \sum _{k=1}^{n}\beta _{xk} s_k - \gamma i_x \end{aligned}$$
(20)
$$\begin{aligned}&\frac{\mathrm{d}i_e}{\mathrm{d}t} = \sum _{k=1}^{n}\beta _{ek} s_k i - \gamma i_e \end{aligned}$$
(21)
$$\begin{aligned}&\frac{\mathrm{d}r}{\mathrm{d}t} = \gamma i \end{aligned}$$
(22)

4 Analysis

In this section, we compare our model with SIR model and analyze the dynamics of exogenous spread and endogenous spread.

4.1 Comparison with SIR model

Mirroring the rate of change of \(s(t),\ i(t),\) and r(t) in the SIR model (Sect. 2), we find the expressions for the rate of change of \(s(t),\ i(t),\) and r(t) for the Exo-SIR model.

Rate of change of s is given by

$$\begin{aligned} \frac{\mathrm{d}s}{\mathrm{d}t} = -\beta _x s-\beta _e s i \end{aligned}$$
(23)

Rate of change of r is given by

$$\begin{aligned} \frac{dr}{dt} = \gamma i \end{aligned}$$
(24)

Differentiating Eq. 10 with respect to time, we get

$$\begin{aligned}&\frac{\mathrm{d}i}{\mathrm{d}t} = \frac{\mathrm{d}i_e}{\mathrm{d}t}+\frac{\mathrm{d}i_x}{\mathrm{d}t} \end{aligned}$$
(25)
$$\begin{aligned}&\frac{\mathrm{d}i}{\mathrm{d}t} = \beta _e s i - \gamma i_e + \beta _x s - \gamma i_x \end{aligned}$$
(26)
$$\begin{aligned}&\frac{\mathrm{d}i}{\mathrm{d}t} = \beta _e s i + \beta _x s - \gamma (i_x + i_e) \end{aligned}$$
(27)

Applying Eqs. 10 on 27, we get

$$\begin{aligned} \frac{\mathrm{d}i}{\mathrm{d}t} = \beta _e s (i_x + i_e) + \beta _x s - \gamma (i_x + i_e) \end{aligned}$$
(28)

Here, even if we assume that there are no infected people in the beginning—i.e., \(i_e = 0\) and \(i_x = 0\), we get the following.

$$\begin{aligned} \frac{\mathrm{d}i}{\mathrm{d}t} = \beta _x s \end{aligned}$$
(29)

This shows that, unlike the SIR model, the Exo-SIR model explains how an infection starts spreading from the state where no one is infected. SIR model assumes that there is an initial outbreak size \(i_0\). This means \(i_0\) people are infected in the beginning and \(i_0 > 0\) [32]. Our work addresses this limitation of the SIR model. Note that the Exo-SIR model would behave the same way as the SIR model if we assume that \(i_x = 0\ and\ \beta _x = 0\).

4.2 Dynamics of exogenous spread and endogenous spread

In this section, we find the relationship between the cumulative exogenous infections (\(i_x\)) and the daily endogenous infections (\(\frac{\mathrm{d}i_e}{\mathrm{d}t}\)).

Applying Eqs. 10 on 14, we get

$$\begin{aligned}&\frac{\mathrm{d}i_e}{\mathrm{d}t} = \beta _e s(i_e+i_x) - \gamma i_e \end{aligned}$$
(30)
$$\begin{aligned}&\frac{\mathrm{d}i_e}{\mathrm{d}t}\bigg |_{i_x>0} = \beta _e s(i_e+i_x) - \gamma i_e \end{aligned}$$
(31)

At \(i_x = 0\),

$$\begin{aligned} \frac{\mathrm{d}i_e}{\mathrm{d}t}\bigg |_{i_x=0} = \beta _e si_e - \gamma i_e \end{aligned}$$
(32)

Since all \(\beta _e, s, i_e, and\ \gamma \) are positive,

$$\begin{aligned} \frac{\mathrm{d}i_e}{\mathrm{d}t}\bigg |_{i_x=0} < \frac{\mathrm{d}i_e}{\mathrm{d}t}\bigg |_{i_x>0} \end{aligned}$$
(33)

This shows that \(\frac{\mathrm{d}i_e}{\mathrm{d}t}\) increases in the presence of \(i_x\). In other words, this shows that the presence of exogenous diffusion causes endogenous diffusion to increase.

5 Simulation

We simulate the Exo-SIR model to determine its behavior for various scenarios that are represented by the different values of its parameters. We simulated the model in two ways:

One, by assuming no network (well-mixed population). In this scenario, a susceptible node can get infected from any of the infected nodes in the population under consideration.

Two, by assuming that the people network is a scale-free network. Within this network, the susceptible nodes can catch the infection from only those infected nodes, which they are connected to through an edge, i.e., their immediate neighbors. We chose scale-free network because there are pieces of evidence that the human disease network could be scale-free [33]. The results of these simulations are discussed in the following section.

5.1 Using scale-free network

The analysis presented in this section has been done considering a scale-free contact network for the population under study, which is called Barabási-Albert network [34]. Under this scenario, the susceptible nodes can catch the infection from only those infected nodes, which they are connected to through an edge, i.e., their immediate neighbors. We have predicted the values for various combinations of \(\beta _x\), \(\beta _e\), and \(\gamma \) using the Exo-SIR model in the network mentioned above.

Next, we study the dependency of endogenous spread on the exogenous factors through simulation. The step-by-step methodology adopted to carry out the simulation and the analysis is given in Algorithm 1.

figure a

In the above algorithm, we have carried out 50 simulations for each combination of the parameters and averaged it out to address the bias that might get introduced due to the network structure since the setting up of a network in step 3 in the above algorithm is random each time.

Sample simulation results are shown in Figs. 4 and  5. Figure 4 shows the SIR model’s simulation results with no exogenous influence, and Fig. 5 shows the simulation results with exogenous influence. Here, we can see that when we consider exogenous factors, the peak of the distribution of the number of the infected population shows changes.

Fig. 4
figure 4

plot of susceptible, infected, and recovered with no exogenous source

Fig. 5
figure 5

plot of susceptible, infected, and recovered with exogenous source

Figures 6 and  7 are a result of simulation and analysis done as described in Algorithm 1 and provide us with the following insights. Figure  6 shows that endogenous peak tick decreases with increase in \(\beta _x\). Figure  7 shows that \(\beta _x\)(exogenous factors) influence the peak value of endogenous infections. The endogenous peak value increases with increase in \(\beta _x\).

We can conclude that exogenous source and its infection impacts the endogenous spread in the network by advancing the peak and increasing the height of the peak.

Fig. 6
figure 6

impact of \(\beta _x\) on peak tick of \(i_e\)

Fig. 7
figure 7

impact of \(\beta _x\) on peak value of \(i_e\)

5.2 With no network

In this section, we determine the relative effects of \(\beta _x\), \(\beta _e\), and \(\gamma \) on the endogenous peak statistically and measure the impact of \(\beta _x\) on endogenous infections, which is consistent with the results shown above. Here, we did not assume any network for our population, and the objective of these simulations was to determine the impact of \(\beta _x\), \(\beta _e\), and \(\gamma \) on endogenous peak value and peak tick (see Table 1). To achieve this, we took a sample of 27000 simulations and analyzed them as described in Algorithm 2.

figure b

The following inferences can be drawn from the results of regression analysis. The p value for all the three variables is less than 0.05. This means we would reject the null hypothesis and adopt the alternate hypothesis that the impact of all the three parameters on the peak endogenous infection’s peak is statistically significant.

The adjusted R-squared value is maximum(0.70) when all the three parameters are considered while fitting the regression model. This means that we can better explain the variation in the dependent variable when considering all three, i.e., \(\beta _e\), \(\beta _x\), and \(\gamma \). Removing any one of them would decrease the adjusted R-squared value. Also, the confidence interval of each parameter is mentioned in Table 1.

Table 1 Impact of \(\beta _e\), \(\beta _x\), and \(\gamma \) on ln(ie_peak)

\(\beta _x\) impacts endogenous infections as much as \(\beta _e\)(the contribution of both is almost equal), which is an important observation. This means that exogenous factors also have a considerable impact on the endogenous infection, and ignoring the exogenous factors would not give an accurate estimate of the endogenous infections.

6 Analysis using real data

In this section, we describe the data and the analysis of the implementation of the SIR model and Exo-SIR model on the Covid-19 and Ebola epidemics.

6.1 Covid-19 infection in India

Covid-19 has caused large and persistent negative effects on the world economyFootnote 4. India is one of the countries that are worst affected. There were many issues that made the spread of Covid-19 in India complicated. One of them was the migration of people from different parts of the country and abroad.

Many sub-events in India involved the migration of people. Examples are a celebrity coming to India from the UK and socializing at many places even after being tested positive for Covid-19Footnote 5., laborers working in different states or other countries moving back to their native places [7], and large religious meetings with participation from many national and international locations.

A major sub-event was the Tablighi Jamaat religious congregation in Delhi from 1st March 2020 to 21st March 2020Footnote 6. Over 9000 people from various states of India participated in this eventFootnote 7. Nearly 4300 cases have been reported that can be traced to the eventFootnote 8. As of 18th April 2020, \(30\%\) of the cases in India were due to this eventFootnote 9. The number of people from each state is widely deferred. Hence, the impact of the event was significantly different for different states. However, it is reasonable to state that the mobility of people is a causative phenomenon that changed the dynamics of the spread of the virus.

We apply the Exo-SIR model on a real dataset regarding the spread of the Covid-19 pandemic in the Indian states of Rajasthan, Tamil Nadu, and Kerala from 14th March, 2020 to 14th April, 2020. Exogenous spread dominates endogenous spread in Tamil Nadu, whereas the contrary is true in the case of Rajasthan. Both the endogenous and exogenous spread in Kerala have roughly the exact prevalence. The trends in the analytical study, results of the simulations, and the analysis of the real dataset are consistent.

We analyzed the data of three states in India, namely Tamil Nadu, Rajasthan, and Kerala. The reason for choosing these states is that \(i_e \ll i_x\) in Tamil Nadu, \(i_e \gg i_x\) in Rajasthan, and \(i_e \approx i_x\) in Kerala.

We constructed our dataset from three different sourcesFootnote 10 for our analysis—covid19india.org.Footnote 11, the government website of the respective states for their press release to find the daily number of Tablighi cases and Wikipedia page on state-wise daily dataFootnote 12.

covid19india.org. is a publicly available volunteer-driven dataset of Covid-19 statistics in IndiaFootnote 13. There are multiple files in this dataset. One of which is called raw data that captures the anonymized details of the patients. In the raw data, the columns of interest for our study are DateAnnounced, DetectedState, and TypeOfTransmission.

Another file from covid19india.org is called states_daily. In this file the columns of interest are states_daily/status, states_daily/kl, states_daily/rj, states_daily/tn, and states_daily/date. kl, rj and tn are the codes used in this dataset for the states of Kerala, Rajasthan, and Tamil Nadu, respectively.

Here, status can have the following values: infected, recovered, and diseased. From these columns, we prepared the time series dataset for each state. The columns available in the dataset we created are daily confirmed, daily deceased, daily recovered, date, total confirmed, total deceased, totally recovered, and daily imported cases.

Another dataset that we used is the compilation of the press releases (news bulletins) from the states’ governments under study. This is to get the daily number of cases due to a significant event that influenced the Covid-19 spread in India—Tablighi Jamaat religious congregation. Since there was no ready-made data available, we manually went through the press releases and collected the data.

Now, we discuss how the values in the dataset is mapped on to the variables in the Exo-SIR model. On a particular day, say day k, by rearranging and differentiating Eq. 11, we get the following.

$$\begin{aligned} \frac{\mathrm{d}s}{\mathrm{d}t} = -\left( \frac{\mathrm{d}i}{\mathrm{d}t} + \frac{\mathrm{d}r}{\mathrm{d}t}\right) \end{aligned}$$
(34)

where \(\frac{\mathrm{d}r}{\mathrm{d}t} \) is the sum of the numbers of the daily recovered and the daily deceased cases on day k and

$$\begin{aligned} \frac{\mathrm{d}i}{\mathrm{d}t} = \frac{\mathrm{d}i_e}{\mathrm{d}t} + \frac{\mathrm{d}i_x}{\mathrm{d}t} \end{aligned}$$
(35)

where \(\frac{\mathrm{d}i_e}{\mathrm{d}t} \) is the daily confirmed cases on day k and \(\frac{\mathrm{d}i_x}{\mathrm{d}t}\) is the sum of daily imported cases on day k and the daily cases due to Tablighi event on day k.

The initial values of s, i and r are found as follows.

$$\begin{aligned} s = 1 - \frac{\text{ d(0) }}{N} \end{aligned}$$
(36)

where d(0) is the daily confirmed on day 0 and N is the total population who are prone to the infection.

i is the total number of confirmed cases on day 0 and r is the sum of the total numbers of the deceased and the recovered cases on day 0.

figure c

Next, we analyze the data from Tamil Nadu, Rajasthan, and Kerala. We compare the peak tick and peak value of the plot of \(i_e\) in the presence and absence of \(i_x\). This would give information about the impact of \(i_x\) on \(i_e\). For this purpose, we used Algorithm 3.

For the state of Tamil Nadu, the plots of \(I_e\) in the presence and absence of \(i_x\) are plotted in Figs. 8 and 9, respectively. \(I_x\) is plotted in Fig. 10. For the state of Rajasthan, the plots of \(I_e\) in the presence and absence of \(i_x\) are plotted in Figs. 11 and 12, respectively. For the state of Kerala, the plots of \(I_e\) in the presence and absence of \(i_x\) are plotted in Figs. 14 and 15, respectively. \(I_x\) is plotted in Fig. 13. In all these plots, we can see that \(i_x\) is very small compared to \(i_e\). Yet, \(i_x\) is having an impact on \(i_e\). \(I_x\) is plotted separately in Figs. 1013 and 16.

Fig. 8
figure 8

\(I_e\) in the presence of \(i_x\). The values of \(i_x\) are very small for the scale of this plot. Hence, it is plotted separately. Please refer Fig. 10

Fig. 9
figure 9

\(I_e\) in the absence of \(i_x\)

Fig. 10
figure 10

\(i_x\) in Exo-SIR model. Please note that y axis is in the scale of \(10^{-4}\)

Fig. 11
figure 11

\(I_e\) in the presence of \(i_x\). The values of \(i_x\) are very small for the scale of this plot. Hence, it is plotted separately. Please refer Fig. 13

Fig. 12
figure 12

\(I_e\) in the absence of \(i_x\)

Fig. 13
figure 13

\(i_x\) in Exo-SIR model. Please note that the y axis is in the scale of \(10^{-6}\)

Fig. 14
figure 14

\(I_e\) in the presence of \(i_x\) in the state of Kerala. The values of \(i_x\) are very small for the scale of this plot. Hence, it is plotted separately. Please refer Fig. 16

Fig. 15
figure 15

\(I_e\) in the absence of \(i_x\) in the state of Kerala

Fig. 16
figure 16

\(i_x\) in Exo-SIR model. Please note that the y axis is in the scale of \(10^{-5}\)

Table 2 Impact of \(I_x\) on \(I_e\) in the state of Tamil Nadu
Table 3 Impact of \(I_x\) on \(I_e\) in the state of Rajasthan
Table 4 Impact of \(I_x\) on \(I_e\) in the state of Kerala

The peak tick and peak values corresponding to the \(I_e\) of the Exo-SIR model in the presence and absence of \(i_x\) for Tamil Nadu, Rajasthan, and Kerala are mentioned in Tables 23, and 4, respectively. In all the tables, we can see that the peak value of \(i_e\) is different when the case of \(i_x\) is present. Also, we can see that the peak tick of \(i_e\) is different for the instance when \(i_x\) is present.

Finally, we present the comparison of the predictions of Exo-SIR model and SIR model with the real data for the following cases:

  1. 1.

    Covid-19 in Kerala (Fig. 17)

  2. 2.

    Covid-19 in Tamil Nadu (Fig. 18)

  3. 3.

    Covid-19 in Rajasthan (Fig. 19)

Fig. 17
figure 17

Comparison of the predictions of Exo-SIR and SIR models with real data for Covid-19 in Kerala

Fig. 18
figure 18

Comparison of the predictions of Exo-SIR and SIR models with real data for Covid-19 in Tamil Nadu

Fig. 19
figure 19

Comparison of the predictions of Exo-SIR and SIR models with real data for Covid-19 in Rajasthan

Here, the peak values are scaled down as they are very high for both SIR and Exo-SIR predictions. This may be due to the fact that in both SIR and Exo-SIR models, we assume that each infected person is equally likely to infect all the susceptible people. In the real life, this is not true. However, we can see that in all the three cases (shown in Figs. 17, 18, and 19), the peak of the Exo-SIR model is closer to the peak of the real data.

6.2 Covid-19 infection in the USA

In this section, we discuss the analysis that we carried out on the data of Covid-19 infection in the USA.

We constructed our dataset from two different sourcesFootnote 14 for our analysis—kaggle.com and incoming tourists travel data for the USA from the CEIC databaseFootnote 15.

Now, we discuss how the values in the dataset is mapped on to the variables in the Exo-SIR model. We calculated the number of endogenous infections \((I_E(t))\) from the following equation.

$$\begin{aligned} I_E(t)\ =\ I_E(t-1) + \mathrm{Daily}(t)-D(t-1) \end{aligned}$$
(37)

where, \(\mathrm{Daily}(t)\) is the daily new cases at the time slice t and \(D(t-1)\) is the deaths from within the USA population at the time slice \(t-1\).

We estimated infected tourists death number from endogenous deaths in the following way. First, we calculated \(\gamma \) from endogenous data by using equation

$$\begin{aligned} \gamma = \frac{\mathrm{d}r/\mathrm{d}t}{i} \end{aligned}$$
(38)

Applied the same gamma to get the number of deaths from data of exogenous infections using the equation

$$\begin{aligned} r(t) = r(t-1) + \mathrm{d}r/\mathrm{d}t \end{aligned}$$
(39)

where

$$\begin{aligned} \mathrm{d}r/\mathrm{d}t = \gamma *i(t-1) \end{aligned}$$
(40)

Then, we calculated the number of exogenous infections \((I_X(t))\) by using the equation:

$$\begin{aligned} I_X(t) = I_X(t-1)\ + \mathrm{Daily}(t) - D(t) \end{aligned}$$
(41)

where \(\mathrm{Daily}(t)\) is the daily new tourist cases at the time slice t and D(t) is the number of deaths at the time slice t

Then, we calculated the number of susceptible people by using the following equation:

$$\begin{aligned} S(t) = N-I_E^c(t)-I_X^c(t) \end{aligned}$$
(42)

where \(I_E^c(t)\) is the cumulative value of \(I_E(t)\) and \(I_X^c(t)\) is the cumulative value of \(I_X(t)\).

Finally, we computed \( \frac{\mathrm{d}(i_e)}{\mathrm{d}t}\), \(\frac{\mathrm{d}(i_x)}{\mathrm{d}t}\), \(\frac{\mathrm{d}(r)}{\mathrm{d}t}\) and \(\frac{\mathrm{d}(s)}{\mathrm{d}t}\) values.

Next, we analyze the Covid-19 data from the USA by applying the Exo-SIR model. We compare the peak tick and peak value of the plot of \(i_e\) in the presence and absence of \(i_x\). This would give information about the impact of \(i_x\) on \(i_e\). For this purpose, we used Algorithm 3. The cases in the presence and absence of \(I_x\) are plotted in Figs. 20 and 21, respectively. \(I_x\) is plotted in Fig. 22.

In these plots, it can be observed that the peak and the height of the peak are different compared to the values in the absence of \(i_x\) The peak tick and peak values corresponding to the \(I_e\) of Exo-SIR model in the presence and absence of \(i_x\) are mentioned in Table 5.

Fig. 20
figure 20

\(i_e\) and \(i_x\) for Covid-19 in the USA

Fig. 21
figure 21

\(i_e\) in the absence of \(i_x\) for Covid-19 in the USA

Fig. 22
figure 22

\(i_x\) for Covid-19 in the USA

Table 5 Impact of \(I_x\) on \(I_e\) in Covid-19 in the USA
Fig. 23
figure 23

Comparison of the predictions of Exo-SIR and SIR models with real data for Covid-19 in the USA

Figure 23 shows the comparison of the predictions of Exo-SIR model and SIR model with the real data. Here, we can see that the peaks in the SIR and Exo-SIR plots are of the same height and are coming more or less simultaneously. However, both of them are very different from the peak position in the real data.

6.3 Ebola infection in Guinea

Ebola, also known as EVD, was another severe, often fatal epidemic that hit the Western African countries from 2014 to 2016, particularly Guinea, Sierra Leone, and Liberia. Its fatality rateFootnote 16 varies from 25 to 90%. Like the case of Covid-19, there was migration of people from abroad, especially tourists traveling into these countries. The dataset regarding travel and tourism is publicly availableFootnote 17.

We compared peak tick and peak value of the plot of \(i_e\) in the presence and absence of \(i_x\), as per Algorithm 3. This gave us information and important insights on the impact of \(i_x\) on \(i_e\).

We constructed our dataset from two different sources: kaggle.com and incoming tourists travel data for Guinea from UNWTO DashboardFootnote 18.

Now, we discuss how the values in the dataset is mapped on to the variables in the Exo-SIR model. We calculated the number of endogenous infections \((I_E(t))\) from the following equation.

$$\begin{aligned} I_E(t)\ =\ I_E(t-1) + M(t)-D(t-1) \end{aligned}$$
(43)

where M(t) is the monthly new cases at the time slice t and \(D(t-1)\) is the deaths from within the Guinea population at the time slice \(t-1\).

We estimated infected tourists death number from endogenous deaths in the following way. First, we calculated \(\gamma \) from endogenous data by using the equation

$$\begin{aligned} \gamma = \frac{\mathrm{d}r/\mathrm{d}t}{i} \end{aligned}$$
(44)

Then, we applied the same gamma to get the number of deaths from data of exogenous infections using the equation:

$$\begin{aligned} r(t) = r(t-1) + \mathrm{d}r/\mathrm{d}t \end{aligned}$$
(45)

where

$$\begin{aligned} \mathrm{d}r/\mathrm{d}t = \gamma *i(t-1) \end{aligned}$$
(46)

Then, we calculated number of exogenous infections \((I_X(t))\) by using the equation:

$$\begin{aligned} I_X(t) = I_X(t-1)\ + M(t) - D(t) \end{aligned}$$
(47)

where M(t) is the monthly new tourist cases at the time slice t and D(t) is the number of deaths at the time slice t.

Then, we calculated the number of susceptible people by using the following equation:

$$\begin{aligned} S(t) = N-I_E^c(t)-I_X^c(t) \end{aligned}$$
(48)

where \(I_E^c(t)\) is the cumulative value of \(I_E(t)\) and \(I_X^c(t)\) is the cumulative value of \(I_X(t)\).

Finally, we computed \( \frac{\mathrm{d}(i_e)}{\mathrm{d}t}\), \(\frac{\mathrm{d}(i_x)}{\mathrm{d}t}\), \(\frac{\mathrm{d}(r)}{\mathrm{d}t}\) and \(\frac{\mathrm{d}(s)}{\mathrm{d}t}\) values.

Next, we analyze the data from Guinea. We compare the peak tick and peak value of the plot of \(i_e\) in the presence and absence of \(i_x\). This would give information about the impact of \(i_x\) on \(i_e\). For this purpose, we used Algorithm 3. The cases in the presence and absence of \(I_x\) are plotted in Figs. 24 and 25, respectively. \(I_x\) is shown in Fig. 26.

From Figs. 24, 25, and 26, the peak and the height of the peak are different compared to the values in the absence of \(i_x\). The peak tick and values corresponding to \(I_e\) of the Exo-SIR in the presence and absence of \(i_x\) are mentioned in Table 6.

Fig. 24
figure 24

\(i_e\) and \(i_x\) for Ebola

Fig. 25
figure 25

\(i_e\) in the absence of \(i_x\) for Ebola

Fig. 26
figure 26

\(i_x\) for Ebola

Figure 27 shows the comparison of the predictions of Exo-SIR model and SIR model with the real data. Here, we can see that the peak of Exo-SIR and SIR models is coming differently and they are coming far from the peak of the actual data.

Table 6 Impact of \(I_x\) on \(I_e\) in Guinea
Fig. 27
figure 27

Comparison of the predictions of Exo-SIR and SIR models with real data for Ebola in Guinea

6.4 Discussion

Both Covid-19 and Ebola satisfy our hypothesis that the endogenous spread changes in the presence of exogenous spread. Also, the results in the case of Covid-19 infection in India show that the Exo-SIR model predicts the epidemic’s peak tick better than the SIR model.

Covid-19 in the USA and Ebola in Guinea show less accurate predictions than Covid-19 in India. This may be because of the following reasons.

In these cases, we took the data from the beginning of the spread of the infection. As soon as the infections started growing, the governments began multiple interventions to curb the spread of the epidemics. If these efforts were successful, that would change the values of the constants that we calculated using the initial values. This will reflect in the curve of the real data primarily by delaying the peak and flattening the curve. This can be observed in the real data of Covid-19 in the USA and Ebola in Guinea. On the other hand, in the case of the data from India, we took the data when the migration of people after the Tablighi religious congregation happened. By this time, India was already on the alert, and the government had already intervened in the matter. Hence, our calculation of the constants was closer to the actual values.

We analyzed a sub-event in the case of Covid-19 in India, the Tablighi religious congregation, with many participants from almost all the states in India. The number of these people who traveled back to the states was considered \(I_x\). The probability of these people being infected was very high as the event was a hot spot of the infection. However, in the case of Covid-19 in the USA and Ebola in Guinea, we considered the tourist arrival data as \(I_x\). We made strong assumptions in these cases due to the unavailability of the daily inflow of the infected people to the population. In the case of Covid-19 in the USA, we calculated the external infection as the tourist arrival data multiplied by the total infection in the world. We normalized it by the total population of the world. In the case of Ebola in Guinea, we calculated the external infection as the tourist arrival data multiplied with the total population of the three countries where the infection was the most prevalent and normalized it by the world’s total population. In these cases, the probability that all the people in the travel data are infected is comparatively less. This may be the reason for the difference. It is important to note that the SIR model performed equally bad in these cases. This also suggests that the issue might be with the data.

The peak value of the predictions of both SIR and Exo-SIR models was very high compared to the real values. The reason for this may be the following. In the case of SIR and Exo-SIR models, we assume that susceptible people are equally likely to get infected from each infected person in the population. This is not true in real life. In real life, people are likely to get infected only from those they contact. This number is much less than the assumption in both SIR and Exo-SIR models.

7 Conclusion

This study introduced the Exo-SIR model by extending the SIR model. Unlike the other epidemiological models, the Exo-SIR model differentiates between the endogenous and exogenous spread of virus/information. We studied the model in the following ways:

  1. 1.

    Analytical study

  2. 2.

    Simulation considering the presence of contact network of the population and assuming it to be a scale free network

  3. 3.

    Simulation without considering the presence of contact network

  4. 4.

    Implementation of the Exo-SIR model on real data about the spread of Covid-19 in India, Covid-19 in the USA, and the spread of Ebola in Guinea.

We found that all the four analyses mentioned here converge to the same result: the peak comes differently in time and size when the exogenous source is present. We studied the impact of exogenous infection on endogenous diffusion. We found that exogenous diffusion impacts the endogenous spread of infection. If there are exogenous sources of infection, like in the case of Covid-19 or Ebola, then the Exo-SIR model is more appropriate to estimate the scenario better. This will help the government allocate its resources better as the endogenous and exogenous spread needs different sets of actions to stop them.

Limitations and Future works: We used the SIR model for comparison as it is simple and widely used. Other models like SEIR, SEYAR, etc., that could be used for a similar study. There is scope for introducing the external source of infection to these models like SEIR and SEYAR. Also, we have considered only one external source of infection. There may exist multiple external sources of infection like bats, pigs, birds, etc. Another possible scenario is the possible presence of multiple viruses. We propose to study these in the future.