1 Introduction

Spatiotemporal data are those that contain both spatial (location) and temporal (time) properties. In veterinary epidemiology, this may be the records of tests carried out at a particular location (e.g. a farm) and time or simply the movement of animals, the recording of which is often mandated by governments and provides researchers with a wealth of data with which to analyse the outbreak and transmission of diseases (Moustakas and Evans 2016; Hong and Paik 2012). In many cases the volume of collected data poses a significant statistical and computational challenge to the understanding of both outbreak patterns, transmission dynamics and thus control of the epidemic.

Most often, the transmission processes of a disease are, at least, partially understood (Lowe et al. 2015; Reiczigel et al. 2010). For example, diseases such as foot and mouth disease (FNM) (Kao et al. 2007; Keeling et al. 2001), avian influenza (AI) (Gumel 2009) and classical swine fever (CSF) (González-Parraa et al. 2011) are spread by close contact with infected individuals with little or no latent stages while diseases such as bovine tuberculosis (bTB) (Moustakas and Evans 2015; Biek et al. 2012) have long latent periods and all contain a temporal element in the form of animal movements. From a phenomenological perspective we can write simple compartmental models for these diseases and solve them on the network of farms (spatially) for a period of time (temporally) incorporating the movements of (potentially infected) animals in the model. This presupposes that we know the transmission parameters which, in reality, are either unknown or estimated.

Several techniques can be used to extract useful information from these datasets or to infer disease transmission parameters. In this article we will review some useful techniques that can be used to obtain pertinent information from a large spatiotemporal dataset and, using generated datasets, provide examples of these techniques.

2 Network analysis of spatiotemporal data

Purely spatial models are often confounded in modelling disease spread when there is occasional long-distance infectious contacts [e.g. livestock trading over long distances, which played an important role in the 2001 foot-and-mouth outbreak in Britain (Kiss et al. 2006; Kao et al. 2007). When either a reasonable model or explicit records are available for these pairwise contacts, it may be appropriate to model them using a contact network. Doing so makes available a variety of network analysis methods, which have become very popular (e.g. Stärk et al. 2006; Dube et al. 2009; Martínez-López et al. 2009) and are often referred to as “social network analysis”.

Traditionally, it has been common to ignore the dynamic nature of many contact networks, and instead process known or modelled contacts into a static network prior to analysis: common approaches have included aggregation of contacts over some appropriate time frame into a “snapshot” network, or taking an average network over a longer time period. More complex adaptations are possible to preserve more information (Holme 2013).

However, due in part to better data resolution and the increase in availability of analytic tools, it is becoming more common to include the dynamic nature of contact networks in epidemiological analyses. There is evidence that ignoring temporal information about contacts can give a deceptive picture. For example, consider two different orderings of infections contacts: either a contact between A and B followed by a contact between B and C, or the alternate ordering of a contact between B and C followed by a contact between A and B. In the first case, there is potential for pathogen movement from A to C, but in the second, there is no such possibility. An static aggregation of either ordering would result in the same network, and potentially identify pathogen flow from A to C as a possibility in both cases.

The size of a connected component (a maximal joined-up set of nodes) is often used as an upper bound on the maximum outbreak size in a static network (Dube et al. 2009). In a dynamic network, it is more appropriate to use the idea of an infection chain: the size of a set of nodes that could potentially be infected by temporally possible routes from a single starting point of an outbreak (Dube et al. 2009; Nöremark and Widgren 2014).

There are several pieces of software available to compute such measures on dynamic networks, including EpiContactTrace in R (Nöremark and Widgren 2014), Gephi (a standalone graphical interface for network analysis) (Bastian et al. 2009), ORA-LITE as part of the CASOS project, or the Python module networkx (Hagberg et al. 2008). For our example below, we have used networkx,Footnote 1 but any of the other available packages would have sufficed: in the subsequent parameter-estimation example we use a different open-source package: Broadwick, written in Java (O’Hare et al. 2016).

To aid in showing the importance of temporal information to understanding a dynamic network’s impact on disease spread, we have generated a simulated dataset of livestock trading amongst farms in a fictional island nation, which we will call Florin. We depict the locations of the fictional farms on a map of Florin in Fig. 1. We provide fictional locations and trades, along with the python code used in this example as supplementary material, in the hope it may serve as a basic tutorial. We will first inspect the network derived from our cattle trades and calculate some summary statistics. We find the data for our network in the cattle trades listed in movements.csv (part of S1), where each line contains an ID for a source farm, an ID for a destination farm, and the day number when the movement took place (note that while we have used non-negative integer numbers for the dates, most software is also capable of dealing with string-formatted dates).

In the code in S2, we first load the geographic locations and trade network into a networkx directed graph, and then plot it, giving the directed, spatially-embedded network in Fig. 2. The entire network is fairly dense, so we also plot the network composed of only movements on the first day. We then compute the frequencies of out-degrees by node (Fig. 3), aggregating all edges together over time. These sorts of degree distributions are widely used in static networks as an important network characteristic (Dube et al. 2009), but adaptations are increasingly being made to the dynamic setting (Nöremark and Widgren 2014; Holme 2013). One simple adaptation requires us to define a time window size, and calculate degrees of nodes within that size of time window. In the code in S2, we calculate the mean and maximum in-degree by node over a variety of time windows, and plot this in Fig. 4.

Fig. 1
figure 1

A map of the fictional island nation Florin, with the locations of its cattle-trading farms shown as dots

Fig. 2
figure 2

A spatial embedding of the entire fictional cattle trade network (left) and the fictional cattle trade network of only movements on the first day (right) given in movements.csv in S1, with edge directions shown by thicker rectangles at the destination of the edge

Fig. 3
figure 3

Frequencies of out-degree by node in the fictional cattle trade network in Florin

Fig. 4
figure 4

Node-wise maximum (left) and mean (right) out-degree over a variety of sizes of time windows in the fictional cattle trading network of Florin

Our out-degree distribution in Fig. 3 gives us some information on our network, and allows us to compare it to other well-studied networks: a long-tailed distribution is common in real data-derived networks, and has important implications for spreading processes on the network (Estrada 2010). Our plots of changes in out-degree over differing time windows. Figure 4, show simple linear growth: the expected mean and maximum out-degree are proportional to the time window, with no special important size of time window. Often in real data there is an important time-window: for example, degrees in the Scottish cattle trading network increase dramatically at time windows that are multiples of seven days, due to the weekly timing of British cattle trading markets Cattle Tracing System.

2.1 Importance of temporal information to maximum possible outbreak size

We now turn our attention to the maximum possible outbreak size on our network, furnishing an example of the importance of temporal information. We take two approaches to measure a possible outbreak size, and find very different answers. In the first approach, we ignore the timing of the movements, and create one aggregated static network with holdings as nodes and a directed link from one to the other if there has been a cattle trade between the holdings in that direction (as in Fig. 2).

We then calculate the number of holdings that are “downstream” of a holding - that is, could be reached by a directed path in the network. We find a directed component of 649 downstream holdings in the network, suggesting that in the worst case scenario in which every contact between an infected farm and a susceptible farm transmits disease, a (fictional) pathogen could spread to up to 649 farms. If this worst-case infection is seeded at random throughout the aggregated network, the mean outbreak size is 17 farms.

In contrast, in our second approach, we include the timing of the cattle trades and use a dynamic network for our analysis, and we find a maximum infection chain size of 314, and a mean outbreak size of 5 farms.

Ignoring the timing of movements can also give a deceptive measure of infectious distance between two nodes (here farms), where infectious distance is the number of links in a network that an infectious would have to travel over to move from one node to the other Fig. 5.

Fig. 5
figure 5

Holdings that are reachable from the farm shown with a red square in networks that aggregate and ignore time (on the left), and fully consider time (on the right). Farms shown by darker dots are farther in network distance. The fictional farm locations and movements available as S1 were used to create these figures, along with python code in S2

2.2 Real-data example

We give a real-data example of a similar difference using Scottish data in Example 1 and Fig. 6.

Example 1: In Fig. 6 a picture of a single holding in Scotland and the network distance between it and other holdings in the aggregate network and the fully dynamic network ScotEID—scottish EID livestock traceability research. We can see that many holdings that are reachable in the time aggregating network are not reachable in the network that fully considers time, and that the distances for holdings that are reachable in both are not preserved. If we ignored the timing of cattle movements, we might think that all the holdings shown in blue dots on the lefthand side of the below _gure are close in the network to the starred holding, and therefore at risk of a disease in an outbreak involving the starred holdings, but we can see from the right hand side of the _gure that this isn’t actually the case.

Fig. 6
figure 6

Holdings that are reachable from the farm shown with a red star in networks that aggregate and ignore time (on the left), and fully consider time (on the right). Farms shown by darker dots are farther in network distance. Scottish cattle movements in January of 2010 have been used to create these networks

Thus far we have restricted ourself to considering a known network without any spatial interactions or stochastic disease processes: our focus has been on examining the network itself. We now turn our attention to a realistic example of a disease and the Bayesian methods that can help us infer parameters required in modelling it.

3 Parameter inference techniques

For disease outbreaks that are accompanied by spatiotemporal data such as the case of an outbreak of a disease on a single farm with known cattle movements as in Example 1 we can construct a relatively simple agent based model where each agent has a disease state and a location. An important consideration in developing such models is how to obtain meaningful or realistic values for the parameters therein.

In this section we will summarise some techniques for estimating the transmission parameters for a disease that has recorded spatiotemporal data. Broadly speaking, these techniques fall into two categories depending on whether or not we can write out a likelihood function or if the calculation of this function is computationally unfeasible. These methods have the aim of finding those parameters for a particular model that best describe an observed set of data, such as test results, by exploring the space of all parameters through the use of a random walk through this space. The area of space that provides a best fit for the model to the data is recorded and the distribution of each parameter value in this region of space (often referred to as the posterior distribution) provides an estimate of the parameters.

We will illustrate one of these techniques using an SIR model on the same data as was used in Sect. 2.

In the following sections we will adopt the following notation, consistent with current literature: \(\varvec{\theta }\) is a vector of unknown parameters that we wish to infer given some set of observations, \(\varvec{D}\). We will denote \(\eta (\cdot )\) as the computational/mathematical model which will produce a range of possible outcomes that we will write as \(\varvec{X}\sim \eta (\varvec{\theta })\) when run repeatedly for the same set of inputs.Footnote 2 Using these notations we can write the likelihood of the data under the model given the parameters \(\varvec{\theta }\) as \(\pi (D\vert \varvec{\theta })\).

The Bayesian approach is to find the posterior distribution of \(\varvec{\theta }\) given \(\varvec{D}\) as

$$\begin{aligned} \pi (\varvec{\theta }\vert \varvec{D}) = \frac{\pi (\varvec{D}\vert \varvec{\theta })\pi (\varvec{\theta })}{\pi (\varvec{D})} \end{aligned}$$

where \(\pi (\varvec{\theta })\) as the prior distribution and reflects the assumptions about the parameters in the model and \(\pi (\varvec{D})\) is the observed data.

3.1 Likelihood-free methods

Approximate Bayesian computation (ABC) are a collection of methods for performing Bayesian inference without the calculation of a likelihood function and are sometimes referred to as likelihood-free algorithms. Recently, these methods have become very popular in biological sciences, most notably genetics (Tanaka et al. 2006; Beaumont et al. 2002) and population biology (Lopes and Beaumont 2010) due to the fact the likelihood function can be difficult or impossible to compute for some models. In this section we will summarise how the method is used in practise, a fuller description of the technique is given in Csilléry et al. (2010).

The most basic form of the ABC algorithm is based on a rejection algorithm and given as:

  1. (0)

    Calculate a measure that characterises the system for the observed data \(\varvec{D}\).

  2. (1)

    Draw \(\varvec{\theta }\) from \(\pi (\varvec{\theta })\).

  3. (2)

    Simulate \(X\sim \eta (\varvec{\theta })\).

  4. (3)

    Calculate the distance measure, \(\rho (\varvec{X},\varvec{D})\), and accept \(\varvec{\theta }\) if \(\rho (\varvec{X},\varvec{D})\le \delta\) where \(\delta\) is the tolerance (accuracy) of the estimation method.

  5. (4)

    Repeat these steps until a sufficient number of accepted \(\varvec{\theta }\)s are drawn.

The accepted values of \(\varvec{\theta }\) are not drawn from the posterior distribution but an approximation to it (written as \(\pi (\varvec{\theta }\vert \rho (\varvec{D},X)\le \delta )\). When \(\delta =0\) this algorithm draws from the posterior distribution \(\pi (\varvec{\theta }\vert D)\). The smaller the value of \(\delta\) the more accurate the approximation to the posterior distribution but this comes with added computational cost. The distance measure, \(\rho (\varvec{X},\varvec{D})\), is usually taken as the euclidean distance \(\vert \vert \varvec{X}-\varvec{D}\vert \vert\). If \(\varvec{\theta }\) is large (i.e. the data are high dimensional) is is common to use a summary statistic to summarise the model output and data and thus reduce the dimensionality of the space. This choice of summary statistic is crucial for the quality of the approximation (Beaumont et al. 2009). In this scenario, step 3 above would be written

  1. (3)

    Accept \(\varvec{\theta }\) if \(\rho (S(X),S(D))\le \delta\), where \(S(\cdot )\) denotes a summary statistic.

Of course, a poor choice of summary statistic will add another layer of approximation to that already added by the use of a distance measure and tolerance.

For a detailed explanation of the ABC algorithm and its variants applied to several models see Turner and Van Zandt (2012). This ABC algorithm has been extended recently to approximate Markov Chain Monte Carlo algorithms (Marjoram et al. 2003) and to approximate sequential Monte Carlo algorithms (Sisson et al. 2007).

3.2 Monte Carlo methods

If it is possible to calculate the Likelihood function, i.e. probability of observing \(\mathbf {D}\) given a set of parameters, \(\pi (\mathbf {D}|\varvec{\theta })\) in a computationally tractable manner, the goal is to find those parameters that maximise this function. If there is some a priori knowledge of the model parameters, these can be incorporated into the search, a method referred to as Maximum a Posteriori (MAP) estimation, and is often more appropriate for the models encountered in ecology and epidemiology. MAP is used to estimate a mode of the posterior distribution (the distribution of \(\varvec{\theta }\) that maximises the likelihood function). This a priori knowledge (prior distribution or simply priors) can be as simple as a uniform distribution within some wide limits for priors that are not well known to specific distributions with low measures of spread for well known priors. The posterior distribution of the parameters given the observed data can now be written as

$$\begin{aligned} \pi (\varvec{\theta } |\mathbf {D}) = \frac{\pi (\mathbf {D}|\varvec{\theta })g(\varvec{\theta })}{\int _\vartheta \pi (\mathbf {D}|\vartheta )g(\vartheta )\mathrm {d}\vartheta } \end{aligned}$$
(1)

where the integral in the denominator is over the domain of g, the prior distribution of the parameters \(\varvec{\theta }\), and is usually evaluated numerically by sampling the parameters over the prior space. MAP estimates the model parameters, \(\hat{\varvec{\theta }}\) for which the posterior distribution has its’ maximum (i.e. the mode of the distribution) and is written as

$$\begin{aligned} \hat{\varvec{\theta }}_\mathrm {MAX}= \mathrm {argmax}_{\varvec{\theta }} \pi (\varvec{\theta }|\mathbf {D}) = \mathrm {argmax}_\mathbf {\theta } \frac{\pi (\mathbf {D}|\varvec{\theta })g(\varvec{\theta })}{\int _\vartheta \pi (\mathbf {D}|\vartheta )g(\vartheta )\mathrm {d}\vartheta } \end{aligned}$$
(2)

Thus our problem is to find those parameters, \(\varvec{\theta }\), that maximise the likelihood \(\pi (\mathbf {D}|\varvec{\theta })\). For complex models we need to explore parameter space to find \(\hat{\varvec{\theta }}\), which can achieved by simulating this distribution using the Markov Chain Monte Carlo. Using this technique gives us a distribution for the estimates of \(\hat{\varvec{\theta }}\) rather than the point estimates returned by ML.

Calculating the probability in (1) for most models is intractable and is often approximated using Monte Carlo methods which performs the integration by sampling \(\varvec{\theta }\) from a distribution and ‘saving’ those samples that satisfy a condition. This (inefficient) Monte Carlo integration is improved by exploring the parameter space in a manner that hones in on the area of space that we want (i.e. gives those parameters that maximise a likelihood function or minimise a distance function in ABC). In many cases a Markov Chain is used to perform this exploration.

There have been many papers published on MCMC, for example Gilks et al. (1996); Brooks (1998); Berthelsen and Möller (2003); Doucet et al. (2000), which should be consulted for a more rigorous treatment as we will only give an algorithmic outline here.

Suppose we are able to calculate the Likelihood function, \(\pi (\mathbf {D}|\varvec{\theta })\), the steps to perform the MCMC algorithm are:

  1. (1)

    Select a starting point in the parameter space, i.e. draw \(\varvec{\theta }\) from \(\pi (\varvec{\theta })\).

  2. (2)

    Calculate the likelihood for this \(\varvec{\theta }\). This is usually the most computationally intensive part of the algorithm.

  3. (3)

    Take a trial step by selecting a new set of parameters \(\varvec{\theta }_{\mathrm {trial}}\) from \(q(\varvec{\theta }_{\mathrm {trial}} \mid \varvec{\theta }_{\mathrm {current}})\). There is no hard and fast rule about how to select these parameters, taking a large step means the parameter space is explored more quickly but not with any great accuracy, steps that are too small mean that the local area is explored in great detail but it takes longer to explore the whole space. In general selecting a trial step from a normal distribution makes sense where the standard deviation can be used to ‘tune’ the step size.

  4. (4)

    Compare the likelihood for this trial step to the previous step and accept the trial according to a rejection algorithm, if the trial is accepted the the parameters are updated \(\varvec{\theta } = \varvec{\theta }_{\mathrm {trial}}\) and a new trial step is sampled. The Metropolis–Hastings algorithm is commonly used to determine whether or not to accept the trial step. The basic algorithm is to accept trial according to a probability proportional to the difference of the likelihoods (\(\mathcal {L_\mathrm {trial}}-\mathcal {L}\). If \(\mathcal {L_\mathrm {trial}} \ge \mathcal {L}\) then the trial step is always accepted (thus always moving towards the areas of parameter space that maximise the likelihood), conversely if \(\mathcal {L_\mathrm {trial}} < \mathcal {L}\) the trial step has a high probability of being accepting if the trial likelihood is close to that of the previous step allowing a chance of ‘going downhill’. This means that the walk does not get stuck in a local maximum and thus guarantees that the global maximum will be found (but makes no prediction as to how long it will take to find).

  5. (5)

    Several such walks or chains are run, each with a different initial \(\varvec{\theta }\) until each converge on the same region of the parameter space. This region defines the posterior distribution. The goal of any inference technique is to find this region and draw samples from it, the distribution of these sampled parameters make up the posterior distribution of the parameters. We often refer to burn-in when talking about MCMC, this is simply the process of removing those steps in the Markov Chain that are not in the region of the posterior distribution/maximum likelihood.

MCMC will generate parameters while exploring parameter space in a manner that spends most time in the important regions. In the parlance of inference methods, the samples (parameters) mimic samples drawn from the target distribution (i.e. those parameters we are trying to find).

The efficiency of the MCMC is determined by how well the random walk (Markov Chain) explores the parameter space (how fast it can find the target area). If there are correlations between parameters in the model, these must be taken into account in constructing the trial steps. Failure to take the correlations between parameters into account will result in exploring an area of space that will not contribute to the posterior distribution. A novel method for constructing trial steps was proposed by Haario et al. (2001) and described how it can be applied to epidemiological data in O’Hare (2015).

4 Worked example

To demonstrate an application of the MAP technique to a model describing some spatiotemporal data, we will run a SIR model starting with a single infected animal using the locations and movements in the datasets given in S1 and use the technique outlined above to infer the parameter values in the model.

We model the epidemic as consisting of three distinct stages susceptible, infectious and removed. Infected animals can infect others in the same herd/farm at a rate \(\beta\) per time-step and infectious animals are removed at a rate \(\sigma\). The method of removal is not important for this example but may be, for example, through culling detected infected animals. We allow for heterogeneity in the size of each farm by sampling the size from a Normal distribution with a mean of 60 and a standards deviation of 20, N(60, 20). When moving animals between farms we allow them to potentially infect others on both farms in that time step. For computationally efficiency, we create agents for the infected animals only, updating their location when moving between farms. The movement data describes the source and destination location and the date of the movement, the number of animals moved is sampled from N(6, 4). The number of infected animals that is moved is sampled from a hypergeometric distribution.

We start with a single infected farm on a highly connected farm (farm 2 in the dataset) and solve the model using the Gillespie algorithm, moving animals between farms at each time step according to the movements in the supplementary information. We use parameter values of \(\beta =0.09, \sigma =0.007\) to obtain 65 infectious animals and 35 removed (assuming, of course, that there is some physical mechanism to detect and record infectious and removed animals).

We write the likelihood for this model as

$$\begin{aligned} \mathcal {L} = \frac{n!}{\Pi x_i!}\Pi p_i^{x_i} \end{aligned}$$
(3)

where n is the size of the infected population (the number of infectious and removed), \(x_i,p_i\) are the numbers in the infectious and removed classes and the probability of observing these numbers respectively. Using uniform priors \(\beta =[0.001, 0.1], \sigma =[0.0001,0.001]\) and running 25 separate Markov Chains each starting at a random point in space each 1000 steps long and calculating the mean number of infectious and removed in 50 simulations at each step we estimate \(\beta =0.0910\) with a \(95\%\) credible interval of 0.0768, 0.101 and \(\sigma =0.0051\) with a credible interval of 0.0, 0012 thus recovering the parameters \(\varvec{\theta }=(\beta =0.09, \sigma =0.007)\), the kernel density estimates are shown in Fig. 7.

Fig. 7
figure 7

Posterior distribution of the parameters \(\beta ,\; \sigma\) in a SIR model using the fictional cattle trading network of Florin. A single simulation was run using the Gillespie algorithm, moving animals around the network according to the movements in Fig. 2. We seeded the outbreak with a single infected animal on farm 2 using \(\beta =0.09, \sigma =0.007\)

In recovering the posterior values for the parameters in our model we record the time series data of the numbers of infectious individuals and the number of infected farms over time as a measure of the likely number of cases we can expect from a similar outbreak (Fig. 8).

Fig. 8
figure 8

Time series plot of the number of infected animals (left) and infected farms (right). The black line denotes the mean values and the grey shaded areas show the maximum and minimum values. We can see that in some cases that the disease dies out as both the minimums on the left and right are zero but overall the epidemic is increasing in size (we have chosen parameter values so that \(R_0 > 1\)

5 Conclusion

The vast amount of animal location, movement and test data that is collected during modern disease outbreaks is a valuable resource for mathematical epidemiologists. Analysing this data is not without difficulty due to the size and nature of the collected data but modern inference techniques and advances in pattern extraction in spatiotemporal datasets have aided the control of the spread of diseases. Ignoring temporal affects leads to both an overestimation of the predicted outbreak size and poorly designed control measures. Incorporating the dynamic nature of the network of animal movements can reveal important time windows that can be targeted when designing interventions.

In this paper we have outlined two broad techniques for extracting epidemiological information from mathematical models using these spatiotemporal data sets, giving a step-by-step approach to introduce the concepts and terminology involved. A realistic model using fictitious data demonstrated how the transmission parameters could be recovered (the code is available as one of the examples in the Broadwick framework). Approachable references to more advanced texts are given for the interested reader.