# A network epidemic model for online community commissioning data

- 2.9k Downloads

## Abstract

A statistical model assuming a preferential attachment network, which is generated by adding nodes sequentially according to a few simple rules, usually describes real-life networks better than a model assuming, for example, a Bernoulli random graph, in which any two nodes have the same probability of being connected, does. Therefore, to study the propagation of “infection” across a social network, we propose a network epidemic model by combining a stochastic epidemic model and a preferential attachment model. A simulation study based on the subsequent Markov Chain Monte Carlo algorithm reveals an identifiability issue with the model parameters. Finally, the network epidemic model is applied to a set of online commissioning data.

## Keywords

Stochastic epidemic models MCMC Random graphs Preferential attachment Community commissioning## 1 Introduction

Social network analysis has been a popular research topic over the last couple of decades, thanks to the unprecedentedly large amount of internet data available, and the increasing power of computers to deal with such data, which details ties between people or objects all over the world. A lot of models have been developed to characterise and/or generate networks in various ways. One well-known class of models in the statistical literature is the exponential random graph model (ERGM), in which the probability mass function on the graph space is proportional to the exponential of a linear combination of graph statistics; see, for example, Snijders (2002). The Bernoulli random graph (BRG), in which any two nodes have the same probability of being connected, independent of any other pair of nodes, is a special case of an ERGM. Although the choice of graph statistics allows an ERGM to encompass networks with different characteristics, in general the ERGMs do not describe real-life networks well; see, for example, Snijders (2002) and Hunter et al. (2008).

Instead of characterising a network by graph statistics, such as the total number of degrees, the configuration model considers the sequence of the individual degrees; see, for example, Newman (2010, Chapter 13). Each node is assigned a number of half-edges according to its degree, and the half-edges are paired at random to connect the nodes. Despite its simple rule of network generation, the configuration model may contain multiple edges or self-connecting nodes, which might not occur in real-life networks. Also, the whole network is not guaranteed to be connected. Moreover, even though the individual degrees may be flexibly modelled by a degree distribution, they are not completely independent as they have to sum to an even integer.

One prominent feature of social networks in real life is that they are scale-free, which means that the degree distribution follows a power law (approximately); see, for example, Albert et al. (1999, 2000), and Stumpf et al. (2005). The preferential attachment (PA) model by Barabási and Albert (1999) is one widely known model (Newman 2010, Chapter 14) that generates such a network with a few parameters and a simple rule. Other models also exist that characterise either the degree distribution, for instance the small-world model by Watts and Strogatz (1998), or other aspects such as how clustered the nodes in the network are (Vázquez et al. 2002).

While the majority of the network models focus on the topology of the network, some models are developed to describe the dynamics within the network, in particular how fast information spreads with respect to the structure of the network. As spreading rumours or computer viruses through connections in a social network is similar to spreading a disease through real-life contacts to create an epidemic, most of these models incorporate certain compartment models in epidemiology. For instance, the susceptible-infectious-recovered (SIR) model splits the population into three compartments according to the stage of the disease of each individual. A susceptible individual becomes infectious upon contact with an infectious individual and recovers after a random period. Traditionally, the infectious period and the contacts made by an infected individual are assumed to follow an exponential distribution and a homogeneous Poisson process, respectively. While these assumptions may be unrealistic for real-life data, they are useful as the epidemic process is now Markovian. The dynamics of compartment sizes over time can usually be characterised by a small number of parameters in the rate matrix, which is used to obtain the transition probabilities through the Kolmogorov’s equations; see, for example, Wilkinson (2011), Section 5.4. While other kinds of compartment models can be formulated in a similar way, some models depart from the Markovian assumptions and will be discussed later. For more details on the SIR model and its variants, see, for example, Andersson and Britton (2000).

Often implicitly assumed in such compartment models is that the epidemic is homogeneous mixing, that is, each individual can interact uniformly with all other individuals in the community he/she belongs to. However, this is not the case when it comes to network epidemics, as one can only infect and be infected by their neighbours in the network, and the collection of neighbours differs from individual to individual. Therefore, modelling an epidemic on a structured population requires relaxing the homogeneous mixing assumption. Instead of assuming the same set of values for the parameters governing the dynamics, one approach is to apply a separate set of parameter values to, for example, each individual or all individuals with the same degree. Such an approach focuses on the modelling side and is dominant in the physics literature. A comprehensive review is provided by Pastor-Satorras et al. (2015).

^{1}, which is an online platform that enables communities to propose and design community-commissioned mobile applications (Garbett et al. 2016). The process of generating the application starts with a community creating a campaign page and sharing it via online social networks. If we view an individual having seen a campaign and in turn promoting it as being “infected” (and “infectious” simultaneously), then the process of sharing a campaign can be compared to spreading a real-life virus to create an epidemic. The main difference is that such an infectious individual cannot potentially infect anyone in the population but only those connected to them on the social networks. For one campaign, the cumulative count of infected and the network of infected users are plotted in Figs. 1 and 2, respectively. The former deviates from the typical S shape of a homogeneous mixing epidemic, while the latter displays star-like structures and long paths, which typical features in real-life networks. It should be noted that this does not represent the complete underlying network \(\mathcal {G}\), which is usually unknown.

Due to the difference in the data being applied to, as well as the inclination towards inference, epidemic models in the statistics literature provide a stark contrast from the classical compartment model, not only with respect to the network issue. First, to accommodate heterogeneities in mixing, Ball et al. (1997) and Britton et al. (2011) proposed models which incorporate two levels and three levels of mixing, respectively. Each individual belongs to both the global level and one or more local levels, such as household, school or workplace, and homogeneous mixing is assumed to take place at each level but with a separate rate. Such models are prompted by data with detailed information of these local level structures each individual belongs to, such as the 1861 Hagelloch measles outbreak data analysed by Britton et al. (2011). Second, some SIR models and their variants relax the assumption that the infectious period follows the exponential distribution, essentially rendering the epidemic process non-Markovian. For instance, Streftaris and Gibson (2002) used the Weibull distribution, while Neal and Roberts (2005) and Groendyke et al. (2012) used the Gamma distribution. In general, the compartment dynamics cannot be represented by a simple differential equation. Third, information is often missing in epidemic datasets, such as the infection times and, if a network structure is assumed, the actual network itself. Therefore, models are developed with a view to inferring these missing data, usually achieved by Markov Chain Monte Carlo (MCMC) algorithms. Examples of models which impose a network structure include Britton and O’Neill (2002), Neal and Roberts (2005), Ray and Marzouk (2008) and Groendyke et al. (2011). In the data considered by these authors, no covariates exist to inform if two individuals are neighbours in the network, and the edge inclusion probability parameter is assumed to be the same for any two individuals in the network. Essentially the underlying network is a BRG, which yields a Binomial (or approximately Poisson) degree distribution. Such a network model seems unrealistic for our App Movement data, compared to a model that generates a scale-free network or utilises a power law type degree distribution.

In view of the differences in objectives and applications shown above, we propose a network epidemic model as an attempt to narrow the gap in the literature. We focus on a susceptible-infectious (SI) model, in which the epidemic process takes place on a network which is assumed to be built from the PA model, thus deviating from a BRG. When it comes to inference, the data contain the infection times and potentially the transmission tree, while the underlying network is unknown and therefore treated as latent variables. We aim at simultaneously inferring the infection rate parameter, the parameters governing the degree distribution, and the latent structure of the network, in terms of the posterior edge inclusion probabilities, by using an MCMC algorithm. While the choice of the SI model is due to the data in hand, we believe the model structure and algorithm introduced can be extended to other compartment models.

The rest of the article is divided as follows. The latent network epidemic model is introduced in Sect. 2. Its likelihood and its associated MCMC algorithm are derived in Sect. 3. They are then applied to two sets of simulated data in Sect. 4, and a set of real online commissioning data in Sect. 5. Section 6 concludes the article.

## 2 Model

In this section, we introduce the latent network SI epidemic model. Describing the formation of the network and the epidemic separately will facilitate the derivation of the likelihood in the next section. The notations and definitions are kept to be similar to those in Britton and O’Neill (2002) and Groendyke et al. (2011).

Consider an epidemic in a closed population of size *m*. Let \(\mathbf {I} = (I_1, I_2, \ldots , I_m)\) denote the ordered vector of infection times, where \(I_i\) is the infection time of individual *i*, and \(I_i \le I_j\) for any \(i<j\). We assume that the first individual is the only initial infected individual. In order to have a temporal point of reference, only the times of \(m-1\) infections will be random, and so we define \({\tilde{\mathbf {I}}}=\mathbf {I}-I_1=({\tilde{I}}_1=0,{\tilde{I}}_2,\ldots ,{\tilde{I}}_m)\) for convenience. We also assume that the observation period is long enough to include all infections.

Next, consider the undirected random graph \(\mathcal {G}\) of *m* nodes which represents the social structure of the population, in which the node *i* represents the \(i{\text {th}}\) individual. Using the adjacency matrix representation, if individuals *i* and *j* are socially connected, we write \(\mathcal {G}_{ij}=1\) and call them neighbours of each other, \(\mathcal {G}_{ij}=0\) otherwise. In this sense, \(\mathcal {G}_{ij}\) can be interpreted as a potential edge of *i* and *j*. We also assume symmetry in social connections and that each individual is not self-connected, that is, \(\mathcal {G}_{ij} = \mathcal {G}_{ji}\) and \(\mathcal {G}_{ii}=0\), respectively, for \(1\le i,j\le m\).

To characterise \(\mathcal {G}\), we use a modified version of the PA model by Barabási and Albert (1999), which generates a network by sequentially adding nodes into it. This requires an order of how the nodes enter the network, which is not necessarily the same as the epidemic order. Therefore, we define a vector random variable of the network order, denoted by \(\varvec{\sigma }=(\sigma _1,\sigma _2,\ldots ,\sigma _m)\), whose support is all *m*! possible permutations of \(\{1,2,\ldots ,m\}\). Node \(\sigma _i~(1\le i\le m)\), labelled by the epidemic order, is the \(i{\text {th}}\) node that enters the network. Such order is mainly for the sake of characterisation using the PA model, and the network is assumed to have formed before the epidemic takes place, and remain unchanged throughout the course of the epidemic. Such an assumption is reasonable because the timescale of an epidemic is usually much smaller than that of network formation, the process of which is described next.

### 2.1 Sequence of new edges

*truncated*Poisson distribution in terms of identifying \(\mu \).

### 2.2 Attaching edges to nodes

*without*replacement from \(\{1,2,\ldots ,i-1\}\), with the weight assigned to node \(\sigma _j\) equal to \(w_j\), where

*not*reduce to a BRG, where the degree distribution is Binomial with parameters \((m-1,p)\), where

*p*is the edge inclusion probability, but provides a crude approximation to it.

### 2.3 Constructing the epidemic

The Markovian epidemic process is constructed as follows. At time 0, the whole population is susceptible except individual 1, who is infected. Once infected at time \({\tilde{I}}_i\), individual *i* makes infectious contacts at points of a homogeneous Poisson process with rate \(\beta \sum _{j=1}^m\mathcal {G}_{ij}\) with its neighbours (according to \(\mathcal {G}\)), and stays infected until the end of the observation period. The random transmission tree \(\mathcal {P}\), with the same node as \(\mathcal {G}\) and whose root is the node labelled 1, can be constructed simultaneously. If individual *i* makes infectious contact at arbitrary \(t_0\) (governed by the aforementioned Poisson process) with susceptible neighbour *j*, we write \(\mathcal {P}_{ij}=1\), again using the adjacency matrix representation. This implies \({\tilde{I}}_j=t_0\), and \(\mathcal {P}_{ji}=0\) as individual *i* cannot be re-infected. Also, \(\mathcal {P}_{ij}=1\) implicity implies that \(\mathcal {G}_{ij}=(\mathcal {G}_{ji}=)1\), as the epidemic can only spread through social connections, i.e. the edges in \(\mathcal {G}\). Also, we assume \(\mathcal {P}_{ii}=0\) as any individual cannot be infected by themselves.

## 3 Likelihood and inference

*L*, as a function of \(\beta \), \(\mu \), \(\gamma \) and \(\varvec{\sigma }\). We assume both \(\mathcal {G}\) and \(\mathcal {P}\) are given because, as argued by Britton and O’Neill (2002) and Groendyke et al. (2011), it is easier to condition on \(\mathcal {G}\) and \(\mathcal {P}\) in order to calculate

*L*, and, if they are unobserved, include them as latent variables in the inference procedure. Two conditional independence assumptions need to be noted. Because of the Markovian nature of the epidemic, \(\mathcal {P}\) and \({\tilde{\mathbf {I}}}\) are independent given \(\mathcal {G}\). It is also common that the data \((\{{\tilde{\mathbf {I}}},\mathcal {P}\})\) and (a subset of) the parameters \((\mu ,\gamma ,\varvec{\sigma })\) are independent

*apriori*, given \(\mathcal {G}\), when models are formulated by centred parameterisations (Papaspiliopoulos et al. 2003). Therefore, the likelihood can be broken down into the following components:

*apriori*, which relate to the properties of the network. We assign the following independent and vaguely informative priors:

*a*/

*b*is the mean of a random variable \(X\sim \text {Gamma}(a,b)\). By Bayes’ theorem, we have

## 4 Simulation study

A simulation study is carried out to examine if the inference algorithm in Appendix B can recover the true values of the parameters used to simulate from the model in Sect. 2. Specifically, we set \(m=\) 70 and consider all combinations of the following true values: \(\gamma = 0,0.2,0.5,0.8,1, \beta = 0.4\), and \(\mu = 4,6,8,10\). For each of the 20 combinations, we first simulate the PA network and then simulate the epidemic on the network. Because of how we construct and simulate from the model, we have complete information on the underlying graph \(\mathcal {G}\), the transmission tree \(\mathcal {P}\), and the infection times \({\tilde{\mathbf {I}}}\). When \(\mathcal {G}\) is given together with \(\mathcal {P}\) and \({\tilde{\mathbf {I}}}\), the MCMC algorithm only needs to be applied to \(\beta \), \(\mu \), \(\gamma \) and \(\varvec{\sigma }\), and it successfully recovers each of the three scalar parameters. Also, the posterior correlations between \(\beta \) and \(\mu \) and between \(\beta \) and \(\gamma \) are both close to zero, which makes sense because of the independence conditional on \(\mathcal {G}\), according to (9). However, we should focus on how good the algorithm is at inferring \(\mathcal {G}\) given \(\mathcal {P}\) and \({\tilde{\mathbf {I}}}\) only, because \(\mathcal {G}\) is usually unknown in real-life data, while \(\mathcal {P}\) being known is motivated by the data set in Sect. 5. Therefore, the complete MCMC algorithm for \(\beta \), \(\mu \), \(\gamma \), \(\varvec{\sigma }\) and \(\mathcal {G}\) is applied to the same set of simulated data for each parameter combination.

The identifiability issue prompts us to consider dropping or fixing at least one of \(\beta \), \(\mu \) and \(\gamma \). While \(\beta \) and \(\mu \) are essential to characterise the epidemic and the network, respectively, leaving \(\gamma \) out means we do not allow the degree of how preferentially attaching the network is to vary. Therefore, \(\gamma \) is fixed to be 0, which is equivalent to the network model being reduced to the original PA model, and will not be estimated.

*m*being allowed to take different values, namely \(m= 30,50,70\), combined with the following true values: \(\beta =\) 0.4 and \(\mu = 4,6,8,10\). For each parameter combination, we also allow different proportions of \(\mathcal {G}\) in the simulated data to be known in addition to \(\mathcal {P}\). A proportion of 0 means only \(\mathcal {P}\) is known, while a proportion of 1 means both \(\mathcal {P}\) and \(\mathcal {G}\) are given. The true value of \(\mu \) against its posterior distribution is plotted for different combinations of

*m*and proportions of \(\mathcal {G}\) in Fig. 5. The posterior of \(\mu \) again shows no correlation with its true value in the first row, which corresponds to no \(\mathcal {G}\) given at all, but it converges towards its true value as the proportion goes to 1. Also, it is now possible to recover the true value of \(\mu \), with even, say, a quarter of the potential edges of \(\mathcal {G}\) additional to \(\mathcal {P}\).

Rather than looking at the identifiability of one parameter alone, we can investigate the product \(\alpha :=\beta \times \mu ^{*}\), the posterior of which can be obtained post-inference, where \(\mu ^{*}:=\mu +e^{-\mu }\). Plotting the true value of \(\alpha \) against its posterior (not shown) in the similar way to Fig. 5 reveals that it is identifiable regardless of its true value, *m*, or the proportion of \(\mathcal {G}\) given. The introduction of \(\mu ^{*}\) is due to that the mean of the distribution in (1) is approximately \(\mu +e^{-\mu }\) when *i* is large. As \(\alpha \) is the product of the (unscaled) epidemic rate and the average network connectedness, we can interpret it as the *network scaled epidemic rate*. Epidemics on two different networks are comparable through this parameter if the two networks have similar values of \(\mu ^{*}\).

*p*in their BRG model and argued that “the model parameterisation permits different explanations of the same outcome”. This means that we cannot simultaneously identify the parameters that characterise the epidemic rate and the network connectedness, respectively, and being able to identify one relative to the other is as good as we can do.

## 5 Application

Before applying the proposed model to its data set introduced in Sect. 1, we shall describe App Movement in detail. This platform removes the resource constraints around mobile application development through providing an automated process of designing and developing mobile applications. The process begins with the support phase whereby a community creates a campaign page in response to a community need and engages the community in supporting the concept through promoting and sharing the campaign on online social networks. When the target of 150 members supporting the campaign within 14 days has been met to ensure an active user base, the campaign proceeds to the design phase, in which ideas regarding the design of the mobile application are being voted on. Once supporters have cast their votes, the platform incorporates the highest rated design decisions and automatically generates the mobile application. Since its launch in February 2015, App Movement has been adopted by over 50,000 users supporting 111 campaigns, 20 of which have been successful in reaching their target number of supporters, with 18 generated mobile applications currently available in the Google Play Store and Apple App store, for iOS and Android devices, respectively.

*m*ranging from 334 to 402. Each campaign corresponds to a different proposed application. The inference algorithm is used with \(\gamma \) fixed to 0. For each epidemic, 5 chains of length 2000 (no thinning) are obtained, after the first 1000 iterations being discarded as burn-in, during which the proposal standard deviation for \(\mu \) is tuned. The traceplots and posterior densities of \(\beta \) and \(\mu \) are plotted in Fig. 8, for the model fit to the epidemic visualised in Figs. 1 and 2. The acceptance rate for \(\mu \) is 0.269 and is similar for the other 9 epidemics considered. The posterior means and standard deviations of \(\beta \), \(\mu \) and \(\alpha \) for all epidemics are reported in Table 1. Also reported is the correlation between \(\beta \) and \(\mu ^{*}=\mu +e^{-\mu }\), which is modest but consistently negative. For any parameter \(\theta \), we denote E\((\theta |\mathcal {P},{\tilde{\mathbf {I}}})\) as its posterior mean. We can see that E\((\alpha |\mathcal {P},{\tilde{\mathbf {I}}})\) is not dependent on

*m*and is significantly different across the epidemics. Combining with the fact that the correlation with \(\mu \) (or \(\mu ^{*}\)) is modest (not shown), \(\alpha \) can be seen to be successfully identified.

Posterior mean (SD) and correlation of the scalar parameters in the PA model fitted to ten different campaign epidemics

Epidemic | | \(\beta \) | \(\mu \) | Correlation | \(\alpha \) |
---|---|---|---|---|---|

1 | 402 | 0.091 (0.005) | 0.323 (0.051) | \(-\)0.041 | 0.096 (0.005) |

2 | 391 | 0.218 (0.012) | 0.775 (0.069) | \(-\)0.137 | 0.269 (0.016) |

3 | 390 | 0.384 (0.022) | 0.606 (0.057) | \(-\)0.147 | 0.442 (0.025) |

4 | 388 | 0.242 (0.013) | 0.689 (0.061) | \(-\)0.136 | 0.289 (0.017) |

5 | 387 | 0.315 (0.017) | 0.718 (0.061) | \(-\)0.071 | 0.38 (0.022) |

6 | 371 | 0.491 (0.028) | 0.59 (0.058) | \(-\)0.087 | 0.563 (0.033) |

7 | 363 | 0.373 (0.022) | 0.603 (0.061) | \(-\)0.115 | 0.43 (0.026) |

8 | 358 | 0.453 (0.026) | 0.708 (0.061) | \(-\)0.068 | 0.545 (0.033) |

9 | 335 | 0.208 (0.012) | 0.592 (0.063) | \(-\)0.100 | 0.238 (0.015) |

10 | 334 | 0.147 (0.009) | 0.601 (0.066) | \(-\)0.169 | 0.169 (0.01) |

*p*and simulate a BRG, the probability that one particular user is connected to at least 18 users is \(1.453\times 10^{-11}\). Combining these two quantities with the independence of potential edges, we can see that it is very unlikely a BRG generated in this way will be connected, let alone overlay \(\mathcal {P}\). On the other hand, the network construction described in Sect. 2 ensures that the PA network generated is always connected. Finally, contrary to the clear inverse relationship between \(\beta \) and

*p*reported in Britton and O’Neill (2002) for both simulated and real-life data, the joint posterior of \((\beta ,p)\) can be well approximated by a bivariate Gaussian distribution, for all epidemics reported here. Combining with the fact that the correlations are small (last column of Table 2), \(\beta \) and

*p*can be said to be close to independence

*aposteriori*. This suggests that the presence of \(\mathcal {P}\) actually makes

*p*(and \(\beta \)) identifiable, but the estimate of the successfully identified

*p*now shows a poor fit of the BRG model to our data.

Posterior mean (SD) and correlation of the parameters in the BRG model fitted to ten different campaign epidemics

Epidemic | | \(\beta \) | | Correlation |
---|---|---|---|---|

1 | 402 | 0.087 (0.005) | 0.0059 (0.00029) | \(-\)0.085 |

2 | 391 | 0.232 (0.012) | 0.006 (0.00031) | \(-\)0.069 |

3 | 390 | 0.411 (0.022) | 0.0055 (0.00028) | \(-\)0.052 |

4 | 388 | 0.262 (0.014) | 0.0058 (0.00029) | \(-\)0.045 |

5 | 387 | 0.338 (0.018) | 0.0057 (0.00029) | \(-\)0.029 |

6 | 371 | 0.512 (0.028) | 0.0058 (0.0003) | \(-\)0.036 |

7 | 363 | 0.399 (0.022) | 0.0059 (0.00032) | \(-\)0.069 |

8 | 358 | 0.492 (0.027) | 0.0061 (0.00032) | \(-\)0.033 |

9 | 335 | 0.219 (0.013) | 0.0065 (0.00036) | \(-\)0.052 |

10 | 334 | 0.154 (0.009) | 0.0066 (0.00037) | \(-\)0.106 |

While the values of \(E(p|\mathcal {P},{\tilde{\mathbf {I}}})\) in Table 2 are low, those of E\((\beta |\mathcal {P},{\tilde{\mathbf {I}}})\) are similar to their PA counterparts in Table 1, but are unusually high compared to real-life epidemics. This is because, while real-life epidemics usually spanned days [see, for example, Britton and O’Neill (2002) and Neal and Roberts (2005)], the campaign epidemics spanned weeks (see the time scale of Fig. 1). Out of the ten epidemics reported here, epidemics 1 and 6 spanned the longest and shortest, with a period of 187.371 and 36.7 days, respectively, and this explains why their respective values of \(E(\beta |\mathcal {P},{\tilde{\mathbf {I}}})\) are on opposite extremities among those reported in Table 1.

## 6 Discussion

We have described a network epidemic model which combines the SI epidemic model with the PA network model, with the inference carried out by MCMC as the likelihood can be explicitly computed. The results of two simulation studies suggest dropping one parameter in order to make the *network scaled epidemic rate* parameter identifiable. The model and inference algorithm are successfully applied to ten different “sharing” epidemics of a set of online community commissioning data. The results suggest that the PA model is a better alternative for network generation than the BRG model, for data sets of epidemics taking place on social networks.

Several modifications can potentially make the model more useful. First of all, information on the average connectedness or the degree distribution on social networks can be solicited beforehand, so that an informative prior can be assigned to \(\mu \) and/or \(\gamma \), at least one of which can then be identified. Given the vast amount of data about social networks such as Facebook and Twitter freely available on the Internet, such information should be possible to obtain.

Another way of gaining information for the parameters, particularly for the App Movement data, is combining the epidemics in the modelling and inference. As users on the social network are usually involved in more than one epidemic, we can pull together several epidemics which have overlap in the users and build a larger underlying network \(\mathcal {G}\) comprising all the users involved, which is guaranteed to be connected. As a user may be infected in one epidemic but not another, each of the epidemics may then be incomplete. While the likelihood calculations, for example (5), may differ slightly, and each epidemic has a different rate parameter (no matter whether it is network scaled or not), the inference procedure is basically the same, and only one set of parameters \((\mu , \gamma )\) is used to govern the network generation. Borrowing strength from other epidemics in this way will utilise more information available in the data, and result in parameters being better identified or more precisely estimated.

That the epidemic is Markovian given the network is a simplistic assumption, which has been shown inadequate for real-life epidemics. To relax this assumption, one can use alternative distributions for the infectious period, such as the Gamma distribution, which is used by, for example, Xiang and Neal (2014). This is not relevant here because the SI model is used instead of the more popular SIR model, or any compartment model in which being infectious is not the final state. Instead of using one single epidemic rate \(\beta \) for all the infections, meaning that the interarrival times are exponentially distributed, one can use a different rate \(\beta _i\) for the infection of individual *i*, where \(\beta _i\) is drawn from certain probability distribution. This approach to modelling non-Markovian processes, proposed by Masuda and Rocha (in press), can be applied to any kind of compartment model and can encompass a range of interarrival time distributions, simply by choosing a different probability distribution from which \(\beta _i\) is drawn. When it comes to the inference, the interarrival time distribution parameter(s) and each \(\beta _i\), given all other epidemic rate parameters, will be updated individually, on top of the existing parameters and latent variables. Given that most of the computational time lies in updating the potential edges of \(\mathcal {G}\) one by one, this should not add much to the computational burden. On the other hand, including users who have not seen or joined the campaign to correct for the potential overestimation of \(\beta \) will add to the computation burden, because the number of users *not* infected by a particular campaign is vast compared to the number of infected.

## Footnotes

## Notes

### Acknowledgements

This research was funded by the EPSRC grant DERC: Digital Economy Research Centre (EP/M023001/1).

## References

- Albert, R., Jeong, H., Barabási, A.-L.: Diameter of the world-wide web. Nature
**401**, 130–131 (1999)CrossRefGoogle Scholar - Albert, R., Jeong, H., Barabási, A.-L.: Error and attack tolerance of complex networks. Nature
**406**, 378–382 (2000)CrossRefGoogle Scholar - Andersson, H., Britton, T.: Stochastic Epidemic Models and Their Statistical Analysis, Number 151 in ‘Lecture Notes in Statistics’. Springer, New York (2000)MATHGoogle Scholar
- Ball, F., Mollison, D., Scalia-Tomba, G.: Epidemics with two levels of mixing. Ann. Appl. Probab.
**7**, 46–89 (1997)MathSciNetCrossRefMATHGoogle Scholar - Barabási, A.-L., Albert, R.: Emergence of scaling in random networks. Science
**286**(5439), 509–512 (1999)MathSciNetCrossRefMATHGoogle Scholar - Bezáková, I., Kalai, A. and Santhanam, R.: Graph model selection using maximum likelihood. In: Proceedings of the \(23^{rd}\) International Conference on Machine Learning, Pittsburgh, PA, 2006. International Machine Learning Society (2006)Google Scholar
- Britton, T., O’Neill, P.D.: Bayesian inference for stochastic epidemics in populations with random social structure. Scand. J. Stat.
**29**(3), 375–390 (2002)MathSciNetCrossRefMATHGoogle Scholar - Britton, T., Kypraios, T., O’Neill, P.D.: Inference for epidemics with three levels of mixing:methodology and application to a measles outbreak. Scand. J. Stat.
**38**, 578–599 (2011)MathSciNetMATHGoogle Scholar - Garbett, A., Comber, R., Jenkins, E., Olivier, P.: App movement: A platform for community commissioning of mobile applications, In: CHI ’16 Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, pp. 26–37. ACM. (2016). doi: 10.1145/2858036.2858094
- Groendyke, C., Welch, D., Hunter, D.R.: Bayesian inference for contact networks given epidemic data. Scand. J. Stat.
**38**, 600–616 (2011)MathSciNetMATHGoogle Scholar - Groendyke, C., Welch, D., Hunter, D.R.: A network-based analysis of the 1861 hagelloch measles data. Biometrics
**68**, 755–765 (2012)MathSciNetCrossRefMATHGoogle Scholar - Hunter, D.R., Goodreau, S.M., Handcock, M.S.: Goodness of fit of social network models. J. Am. Stat. Assoc.
**103**(481), 248–258 (2008)MathSciNetCrossRefMATHGoogle Scholar - Masuda, N., Rocha, L.E.C.: A Gillespie algorithm for non-Markovian stochastic processes. Soc. Ind. Appl. Math. Rev. (in press)Google Scholar
- Neal, P., Roberts, G.: A case study in non-centering for data augmentation: stochastic epidemics. Stat. Comput.
**15**, 315–327 (2005)MathSciNetCrossRefGoogle Scholar - Newman, M.E .J.: Networks: An Introduction. Oxford University Press, Oxford (2010)CrossRefMATHGoogle Scholar
- O’Neill, P.D.: A tutorial introduction to Bayesian inference for stochastic epidemic models using Markov chain Monte Carlo methods. Math. Biosci.
**180**, 103–114 (2002)MathSciNetCrossRefMATHGoogle Scholar - Papaspiliopoulos, O., Roberts, G.O., Sköld, M.: Non-centered parameterisations for hierarchical models and data augmentation. In: Bernardo, J.M., Bayarri, M.J., Berger, J.O., Dawid, A.P., Heckerman, D., Smith, A.F.M., West, M. (eds.) Bayesian Statistics 7, pp. 307–326. Oxford University Press, Oxford (2003)Google Scholar
- Pastor-Satorras, R., Castellano, C., Van Mieghem, P., Vespignani, A.: Epidemic processes in complex networks. Rev. Mod. Phys.
**87**(3), 925–979 (2015)MathSciNetCrossRefGoogle Scholar - Ray, J., Marzouk, Y.M.: A Bayesian method for inferring transmission chains in a partially observed epidemic, In: Proceedings of the Joint Statistical Meetings: Conference Held in Denver, Colorado, 3–7 August 2008. American Statistical Association (2008)Google Scholar
- Snijders, T.A.B.: Markov chain Monte Carlo estimation of exponential random graph models. J. Soc. Struct.
**3**, 1–40 (2002)Google Scholar - Streftaris, G., Gibson, G.J.: Statistical inference for stochastic epidemic models. In: Proceedings of the 17th International Workshop on Statistical Modelling, pp. 609–616 (2002)Google Scholar
- Stumpf, M.P.H., Wiuf, C., May, R.M.: Subnets of scale-free networks are not scale-free: sampling properties of networks. Proc. Natl. Acad. Sci.
**102**(12), 4221–4224 (2005)CrossRefGoogle Scholar - Vázquez, A., Pastor-Satorras, R., Vespignani, A.: Large-scale topological and dynamical properties of the Internet. Phys. Rev. E
**65**, 066130 (2002)CrossRefGoogle Scholar - Watts, D.J., Strogatz, S.H.: Collective dynamics of ‘small-world’ networks. Nature
**393**, 440–442 (1998)CrossRefMATHGoogle Scholar - Wilkinson, D.J.: Stochastic Modelling for Systems Biology, 2nd edn. Chapman & Hall, London (2011)MATHGoogle Scholar
- Xiang, F., Neal, P.: Efficient MCMC for temporal epidemics via parameter reduction. Comput. Stat. Data Anal.
**80**, 240–250 (2014)MathSciNetCrossRefGoogle Scholar

## Copyright information

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.