# Underestimating the effects of spatial heterogeneity due to individual movement and spatial scale: infectious disease as an example

- First Online:

- Received:
- Accepted:

DOI: 10.1007/s10980-012-9830-4

- Cite this article as:
- Cross, P.C., Caillaud, D. & Heisey, D.M. Landscape Ecol (2013) 28: 247. doi:10.1007/s10980-012-9830-4

- 3 Citations
- 486 Downloads

## Abstract

Many ecological and epidemiological studies occur in systems with mobile individuals and heterogeneous landscapes. Using a simulation model, we show that the accuracy of inferring an underlying biological process from observational data depends on movement and spatial scale of the analysis. As an example, we focused on estimating the relationship between host density and pathogen transmission. Observational data can result in highly biased inference about the underlying process when individuals move among sampling areas. Even without sampling error, the effect of host density on disease transmission is underestimated by approximately 50 % when one in ten hosts move among sampling areas per lifetime. Aggregating data across larger regions causes minimal bias when host movement is low, and results in less biased inference when movement rates are high. However, increasing data aggregation reduces the observed spatial variation, which would lead to the misperception that a spatially targeted control effort may not be very effective. In addition, averaging over the local heterogeneity will result in underestimating the importance of spatial covariates. Minimizing the bias due to movement is not just about choosing the best spatial scale for analysis, but also about reducing the error associated with using the sampling location as a proxy for an individual’s spatial history. This error associated with the exposure covariate can be reduced by choosing sampling regions with less movement, including longitudinal information of individuals’ movements, or reducing the window of exposure by using repeated sampling or younger individuals.

### Keywords

Source-sink metapopulation Epidemiological model Observational bias Disease transmission Host density Modifiable areal unit problem## Introduction

Understanding the factors that determine habitat quality and create ‘hotspots’ is critical to effective conservation efforts as well as the control of invasive species and infectious diseases. The link between the observed pattern and underlying process creating that pattern, however, may be difficult to identify in spatially heterogeneous systems, particularly when studied individuals move among habitats (Van Horn 1983; Ray and Hastings 1996). In the ecological literature, dynamic process models are often used to understand how spatial regions interact with one another through time (e.g. population size in different patches as a function of birth, death and movement processes). On the other hand, the spatial statistics approach consists of fitting statistical models to observed data, or data simulated from a phenomenological statistical distribution. In most of these statistical analyses the underlying ecological process is not explicitly modeled (e.g. number of cases is a Poisson random variable, Fotheringham and Wong 1991; Bernardinelli et al. 1995). Here we tie these two approaches together using a process-based spatially heterogeneous metapopulation model with movement. We investigate how host movement and the spatial scale of analysis can bias conclusions about the importance of spatial heterogeneity by comparing the inferences drawn from simulated data to the true simulated process.

The functional relationship between host density and parasite transmission is fundamental to understanding infectious disease dynamics and implementing effective control strategies (Anderson and May 1991; McCallum et al. 2001). Density-dependent disease transmission models have been extensively used in theoretical and applied epidemiology, and assume that the probability of a susceptible host becoming infected is an increasing function of the density of infectious hosts. These models predict that epidemics will not occur as long as the host density is less than some threshold (Kermack and McKendrick 1927; Getz and Pickering 1983). In human epidemiology, this forms the basis for mitigation strategies based on social distancing (e.g. school closures) to prevent or control disease outbreaks (Glass and Barnes 2007; Halloran et al. 2008; Cauchemez et al. 2008). This framework extends from human systems to wildlife systems where the distribution and abundance of wildlife host populations can be affected by manipulating hunting pressure, artificial food sources, habitat heterogeneity, and predator abundance. These interventions should, in turn, affect disease transmission (Miller et al. 2003; Rudolph et al. 2006; Conner et al. 2007; Cross et al. 2007), however empirical data supporting the epidemiological importance of host density is mixed (Bouma et al. 1995; Rogers et al. 1999; Delahay et al. 2000).

An additional issue that limits our ability to assess the effects of host density is that parasite transmission among individuals or populations cannot be observed directly. Instead, infection events must be inferred retrospectively. In natural populations, hosts are typically tested infrequently and may die, recover or disperse from the area between becoming infected and being tested, which may result in biased estimation of the parasite transmission rate. Similar problems arise in human systems in areas without access to health care providers or when some infections are sub-clinical. For some simple cases, seroprevalence and age data can be used to estimate the underlying infection process as a function of calendar time and age (Grenfell and Anderson 1985; Heisey et al. 2006, 2010). But such analyses of interval-censored data do not accommodate the movement and exposure histories of the subjects, and thus are mis-specified with respect to an individual’s exposure history.

The difficulties of detecting density-dependence due to movement and spatial scale have been noted in the host-parasitoid and population dynamics literature (Heads and Lawton 1983; Pulliam 1988; Hastings 1993; Ray and Hastings 1996; Veldtman and McGeoch 2004). A common refrain in ecological and epidemiological studies on the issue is that they must be conducted at the appropriate scale. We argue, however, that observational studies can minimize, but not eliminate, bias induced by movement, and that reducing the bias may obscure local heterogeneities. We show that even when data are collected at the appropriate scale they still may lead to highly biased inferences. Further, little general guidance is available on the degree of bias associated with either movement or spatial scale. In this study, we explore the above issues by simulating the disease transmission process and host movement, and subsequently aggregating the simulated data at different spatial scales.

## Methods

Our underlying process model is a traditional susceptible-infectious-recovered (SIR) epidemiological model (Kermack and McKendrick 1927; Anderson and May 1991), which we implement on a spatial lattice of resources, hereafter referred to as patches. Host individuals are assumed to move among the patches to their four neighbors that share an edge (i.e. rook connectivity). We assume a constant metapopulation size, whereby hosts die and reproduce at the same rate \( \delta \). The recovery rate is denoted as \( \gamma \). Pathogen transmission occurs only within-patch and is density-dependent governed by \( \beta {\kern 1pt} {\kern 1pt} S_{i} {\kern 1pt} I_{i} \), where \( \beta \) is the transmission coefficient and \( S_{i} \) and \( I_{i} \) are the numbers of susceptible and infectious individuals in patch \( i \). We do not model frequency-dependent transmission, \( \beta \,S_{i} \,I_{i} /N_{i} \), or intermediate transmission functions (e.g. \( \beta \,S_{i} \,I_{i} /N_{i}^{\theta } \)) because one of our goals is to show that even when the underlying process is density dependent the observed pattern may show no measureable effect of host density on pathogen transmission, which is what is expected from a frequency-dependent transmission process.

*i*at time

*t*is linearly related to the ratio of patch population size to resources (\( \mu_{i} \left( t \right) = \omega {\kern 1pt} N_{i} \left( t \right)/\rho_{i} \) where \( \omega \) is the slope parameter). Positive density dependence in dispersal rates due to limiting resources has been found in mammals, birds and insects and modeling analyses indicate that it is an evolutionarily stable strategy under most resource-limited scenarios (Lloyd-Smith 2010). To assess whether our results were sensitive to the particular functional form of movement we also conducted simulations where movement rate was a sigmoidal function of \( N_{i} \left( t \right)/\rho_{i} \). Using the notation {

*n*

_{i}} to indicate the set of four patches that are edge neighbors of patch

*i*, the full model is written as:

For simplicity, the patches were distributed on a 12 × 12 torus to avoid boundary issues. One realization of the modeled trajectories is shown in Supplementary Appendix 1.

The epidemiological model is completely deterministic conditional on the resource distribution. To create spatial autocorrelation in patch resources patterns we allocated resources hierarchically, whereby patches exist within regions of varying quality. Patches are allocated to larger regions consisting of four neighboring patches and the mean of the patch resources for a given region *h*, \( \theta_{h} \), is spatially random and chosen from a uniform distribution with a minimum of 0.01 and a maximum of 100. The resource \( \rho_{i} \) for each patch *i* in region *h* was then randomly allocated as \( \rho_{i} = \hbox{max} \left( {0.01,\,x_{i} } \right) \), *x*_{i} is pulled from a Gaussian distribution with mean \( \theta_{h} \) and standard deviation *θ*_{h}/5. These parameters are chosen to create variation in habitat qualities, which leads to variation in host density among patches (Fig. 1a, b). A standard deviation within a region that is 1/5th of the mean ensures that habitat quality is more similar within a region than among regions (Fig. 1a).

We simulated across a range of host and pathogen parameters in order to cover a range of possible field systems from human measles (acute disease in a long lived host) to tuberculosis in possums (chronic disease in short lived host). In particular, we altered host turnover rates (\( \delta \)), disease recovery rates (\( \gamma \)), and the slope parameter in the movement function (\( \omega \)) across ten levels from 0.01 to 1.0 in steps of 0.11. For each parameter set of different \( \gamma \), \( \delta \) and \( \omega \) values, we ran the model using ten different spatial allocations of resources. Results were then averaged across these ten simulations to highlight the overall patterns. The results of a single simulation per parameter set, however, provide a better assessment of the underlying variability in the expected results. We present the averaged results in the main text, but provide the results from a single simulation per parameter set in Supplementary Appendix 2.

We set the total metapopulation size to 7,200 and initialized each simulation with an equal number of individuals, 50, in every patch. Metapopulation size and total resources have little influence on our results other than to contribute towards the overall movement rate. We simulated this system without disease until all the *N*_{i}*(t)*’s equilibrated. We then calculated *β* so that the mean within-patch basic reproductive number \( \overline{R}_{0i} ,\sum\limits_{{}}^{{}} {[\beta N_{i} /(\gamma + \delta )]} /144 \), equaled 1.3. Then, one infectious individual was added to a random patch and we continued the simulation until it equilibrated again. We explored other \( \overline{R}_{0i} \), values (2 and 4) and our overall conclusions remained consistent (data not shown). Rather than assume a specific timescale for our rates, we conduct most of our analyses on relative ratios of different parameters. All simulations were conducted in R version 2.14.2 (R Development Core Team 2011). The SIR model was solved numerically using the lsodar differential equation solver in the deSolve package version 1.10-3 (Soetaert et al. 2010) and simulations were assumed to be at equilibrium when the sum of the absolute values of all rates of change was less than 10^{−6}.

In this model, the true force of infection at time *t*, \( \lambda_{i} \left( t \right) \), in patch *i* is \( \beta \,I_{i} \left( t \right) \), which we denote as \( \lambda_{i}^{*} \) at equilibrium. This is the underlying transmission process which we would like to estimate from field data. Transmission, however, is typically not directly observable. For the simple case of no disease-induced mortality or re-infection one could attempt to estimate λ_{i} using seroprevalence, \( \left( {\left( {I_{i} + R_{i} } \right)/N_{i} } \right) \), data that can be collected in the field. In this case, assuming between-patch disease transmission has a negligible effect on the within-patch seroprevalence, the probability of a sero-positive individual of age *t* in patch *i*, \( \upsilon_{i} \left( t \right) \), equals \( 1 - \exp \left( { - \lambda_{i}^{*} t} \right) \) at equilibrium. Age is not explicitly included in our model, but death rate is constant, so the underlying age distribution has an exponential distribution with density \( f\left( t \right) = \delta \exp \left( { - \delta t} \right) \). Therefore, the mean seroprevalence for a patch is \( E\left( {\upsilon_{i} } \right) \, = \int {f\left( t \right)\upsilon_{i} \left( t \right)dt} \). Substituting the above equations for *f*(*t*) and *υ*_{i}(*t*) and integrating over *t* results in the following relationship:

\( E(v_{i} ) = \left[ {1 - \frac{\delta }{{\delta + \lambda_{i}^{*} }}} \right] = \frac{{\lambda_{i}^{*} }}{{\delta + \lambda_{i}^{*} }}, \) which can be rearranged to yield \( \lambda_{i}^{*} = \frac{{\delta E(v_{i} )}}{{1 - E(v_{i} )}} \).

Thus, the observed patch-specific force of infection, \( \lambda_{i}^{*} \), can be estimated from observed seroprevalence data (Grenfell and Anderson 1985; Heisey et al. 2006, 2010). We compare the true force of infection, *λ*_{i}, to the observed force of infection, \( \lambda_{i}^{*} \), for a range of different parameter sets. To assess the scale problem we compare model results collected at the patch scale to increasing levels of data aggregation (square zones of 4, 9, and 16 patches; Fig. 1). To assess the zoning problem we shifted the boundaries of which four patches were averaged together to be offset from the process generating the regional structure in resources, which would tend to aggregate patches that had very different population sizes (Fig. 1).

We quantify the bias in observational data by investigating the relationship between observed \( \lambda_{i}^{*} \) and *I*_{i} or *N*_{i} relative to modeled relationship \( (\lambda_{i} = \beta I_{i} ) \). Specifically, for each simulation we correlate both the true (λ_{i}) and observed (\( \lambda_{i}^{*} \)) force of infection with total number of infectious individuals, or population size, per patch using simple linear regressions. We quantify bias as percentage difference in the slopes of the true and observed relationships. We do not include sampling error, thus our results are a best case scenario for observational datasets. Researchers have looked for infection patterns with respect to both the population density as well as the density of infectious individuals (e.g. Knell et al. 1998; Rachowicz and Briggs 2007). Therefore, we present results using both the population density and the number of infectious individuals as explanatory variables. All statistical analyses were conducted in R version 2.14.2 (R Development Core Team 2011).

## Results

Patch-level movement rates, *μ*_{i}(*t*), and population sizes, *N*_{i}, in this model are emergent properties of the spatial distribution of resources as well as the movement function \( \omega \,N_{i} /\rho_{i} \). Areas of low habitat quality tend to have lower host population sizes and thus lower seroprevalence (Fig. 1). Movement rates tended to be right-skewed in our simulations, whereby a few low-quality patches that were neighbors of high-quality patches had high movement rates because they receive many immigrants but do not have the resources to support them. As a result, we use the weighted mean movement rate at equilibrium \( \mu^{*} = \sum {\frac{{\mu_{i} N_{i} }}{{\sum {N_{i} } }}/144} \), which is the expected movement rate of a randomly chosen individual, to summarize the amount of movement in a given simulation.

_{i}for patches with higher prevalence and overestimates λ

_{i}in lower prevalence patches (Fig. 2c–f). The magnitude of this bias increases with host movement (Figs. 2, 3). For the parameter region that we explored, bias was, on average, less than 20 % when the expected host movement rate,

*μ*

^{*}, was less than one in 100 individuals moving among patches during their lifetime (Fig. 3b). However, when, on average, one in ten individuals moved during their lifetime researchers may underestimate the effect of host density by 50–80 % (Fig. 3a, b). In the worst case, the slope between the number infected and the force of infection was 0.005 even though the true slope,

*β*was 0.02, which is a 4-fold difference or 75 % negative bias. For most field examples, this would be interpreted as no effect of the density of infected individual on the force of infection.

Interestingly, aggregating the data across patches slightly reduced the amount of bias at high movement rates. At low movement rates, however, aggregated data were more biased, particularly with respect to the regressions using total host density (Fig. 3a). The correlation between *I*_{i} and \( \lambda_{i}^{*} \) was generally high, except when movement rates were high or when the data were aggregated (Fig. 3c). The infectious period of the pathogen varies across two orders of magnitude in these simulations, but explains little variation.

The results in Fig. 3 are based upon averaging ten simulations with the same parameters but different spatial arrangements of resources. Real host-parasite systems, however, are even more variable because they are not averages of ten realizations. Results of a single simulation per parameter set are shown in Supplementary Appendix 2, which still represent conservative estimates of the underlying variability because they do not include system stochasticity and sampling error. Thus, in some cases, the best that a researcher may be able to do when analyzing aggregated data is to achieve an *R*^{2} of 0.4 due to the location of high and low quality patches (Supplementary Appendix 2). In this system, the only process driving transmission is the number of infectious individuals in the patch, therefore the *R*^{2} should be 1 when there is no sampling error. Our results were very similar regardless of how movement rate was related the ratio of population size to resources (data not shown).

*R*

_{0}and we set such that the mean within-patch

*R*

_{0}was 1.3. Host movement increases the average patch seroprevalence in the metapopulation by allowing infectious individuals to access a population size larger than their local patch and by maintaining transmission in pathogen sinks via immigration (Fig. 5).

## Discussion

Our results show that observed spatio-temporal patterns may be only weakly related to the underlying local process when individuals frequently move among heterogeneous environments. We investigated how the observed relationship between host density and disease transmission weakens with increasing host movement even though the simulated relationship remained the same. Thus, the importance of host density is likely to be underestimated, sometimes dramatically so, even in the best-case scenario without environmental stochasticity or sampling error. In addition, host movement and aggregating data across patches reduce the observed spatial variation in seroprevalence and the force of infection (Fig. 4). As a result observational studies will often underestimate the potential efficiency of control efforts targeting habitat hot-spots. Similar biases are likely to arise more generally in ecological studies of habitat quality or population regulation, whereby the signal of habitat quality may be blurred by the dispersal of individuals (Pulliam 1988; Ray and Hastings 1996). We expect our conclusions also apply to time series analyses (e.g. Ionides et al. 2006) because single populations are usually not isolated from immigration and emigration. Thus, the temporal peaks and troughs observed in disease, or population, time series would tend to be more extreme if the movement of individuals is restricted.

In this example, mobile individuals create an ‘error in covariate’ problem, whereby an individuals’ previous exposure to infection is assumed to be determined by the site where they were sampled rather than their full movement history (e.g. Bernardinelli et al. 1995; Heisey et al. 2010). Increasing movement reduces the utility of using the sampling location as a proxy for previous exposure. As result, researchers will tend to underestimate the underlying spatial variation in infection risk. In our own empirical work on chronic wasting disease in white-tailed deer, we found that the spatial variance in the infection hazard was nearly two times higher in females compared to males (Heisey et al. 2010). This is probably due to the higher movement rates of males creating the perception of a less heterogeneous infection hazard. Our results suggest that the underlying spatial infection hazard for male and female deer may be identical, but the greater movement of males creates the appearance of a more spatially uniform infection process for male compared to female deer.

One of the primary reasons for aggregating data across areas is that smaller regions tend to have less data, resulting in increased sampling variance. The choice of spatial scale then becomes a trade-off between increased sampling variation and averaging over real local heterogeneity. We assumed there was no sampling error as a best-case scenario, which results in the observed spatial variation always being less than the true spatial variation (Fig. 4a). In practical applications, sampling error and process variance would need to be appropriately addressed, potentially through the use of hierarchical spatial models (e.g. Gelman and Hill 2007). Spatial conditional autoregressive models can allow for the analysis of relatively small spatial units and the amount of spatial smoothing in the estimates of spatial effects is determined by the data (Besag et al. 1991). However, the best outcome for any statistical method is for the statistical estimates to match the observed force of infection that we simulated (i.e. the sampling error and bias approaches zero). Thus, even in this best-case scenario the inference can be highly biased.

Recent approximate Bayesian computation (ABC) and ‘plug-and-play’ methods have been used to estimate dynamical model parameters that are too complicated to be analyzed in traditional closed-form statistical analyses (e.g. Ionides et al. 2006; Toni et al. 2009; He et al. 2010). Using these approaches one could dynamically simulate the movement and transmission processes and estimate those parameters from observed data. Our simulations suggest, however, that there is the potential for strong confounding between the estimated movement and transmission rates. In particular, disease prevalence may be relatively uniform across the landscape because movement rates are high or because transmission is relatively constant. Further, a given site may have a high seroprevalence because it has high local transmission or because of high movement rates with nearby hot-spots. Therefore, the bias induced by movement in cross-sectional datasets probably cannot be resolved without longitudinal movement data. Only a fraction of individuals can be followed through time compared to those that can be included in a cross-sectional sample, so approaches combining longitudinal and cross-sectional data may be the most useful. For example, techniques for dealing with incompletely observed data (e.g. multiple imputation, Schafer 1999) may be applied to the spatial history of cross-sectionally sampled individuals, while the estimated spatial variation could be primarily informed by individual’s whose longitudinal exposure history is known. A related problem arises when estimating a time-varying infection hazard using cross-sectional data (Heisey et al. 2010). In this case, the timing of the infection is only known to be within an interval and, as a result, individuals with shorter sampling intervals are more informative about the time-varying hazard. Therefore, in both the spatial and temporal case, younger individuals may be more useful than older individuals in assessing the spatial variation (and the associated spatial covariates) because the window of time for movement among sampling patches is smaller.

Empirical support for a positive effect of host density on disease transmission is mixed (Knell et al. 1998; Woodroffe et al. 2009; Smith et al. 2009; Ferrari et al. 2011). Given our results this is not surprising. We show that with increasing movement the relationship between infection rate and population size or the number of infected individuals becomes weaker, giving the appearance of frequency dependent transmission. In particular, Fenton et al. (2002), Smith et al. (2009) and Cross et al. (2010) and found that when force of infection was estimated as \( \beta \,I^{\theta } \), *θ* was less than one, which is similar to our result that, in general, infection will not increase as dramatically as one might expect for a given increase in the density of infectious individuals. Turner et al. (2003) showed how increasing the spatial scale of analysis results in the misperception that host density is relatively unimportant using a cellular automata model. We expanded upon their results by further showing the importance of host movement and the resulting variability in seroprevalence and force of infection.

We used seroprevalence data combined with host mortality rate, both of which can be estimated from field data, to back-calculate the unobserved latent process of transmission (specifically the force of infection). Focusing statistical analyses on prevalence, perhaps by using genetic assays for the pathogen, would reduce the duration of time that individuals have to move among patches and still be test-positive relative to a serological assay. This would allow researchers to more directly link infections to the location of origin. However, we believe this solution will often create more problems than it solves. Often there are relatively few infectious individuals at any given time, making prevalence difficult to estimate at a fine spatial and temporal scale in systems where individuals are not self-reporting to hospitals. In addition, to estimate transmission rates from prevalence data one would need to also estimate the rate at which infected individuals become test-negative. This confounding between transmission and recovery rate is similar to the confounding between transmission and disease-induced mortality (Heisey et al. 2006). In many host-parasite systems, the recovery rate is poorly known, thus complicating analyses based on prevalence data. Vector-borne pathogens, for either plants or animals, are likely to have similar patterns to those we show with a mobile host population. More research is needed to understand when the effects of host and vector movement are likely to be synergistic or could be modeled well by focusing on the movement of just the host or the vector.

Our results are based on a comparison across spatial patches analyzed at equilibrium rather than approach-to-equilibrium analyses. We experimented with analyses conducted using data from time points when the pathogen was still invading the metapopulation, but these analyses were generally not informative when we used only cross-sectional data on seroprevalence. Estimating the rate of seroprevalence increases may have been more informative (Cross et al. 2010) or whether initial seroprevalence also correlates with the rate of increase (Heisey et al. 2010). However, it will often be unclear whether sites are uninfected because the pathogen has not yet been introduced or because could not invade and persist in that location.

An obvious conclusion from this study is that experimental studies are a better approach than observational studies to developing an understanding of the effects of host density on pathogen transmission. For human or endangered host species systems these treatments may be ethically or logistically impossible, but even when experimental manipulations are possible we believe they will have their own set of provisos and caveats. For example, in many cases it may be unclear how contact and transmission rates would scale up from relatively small enclosures to broad scale differences in density. We urge a multi-pronged approach that includes continued statistical development, simulations, observational and experimental studies. Observational studies may be further improved by using global positioning systems to estimate host exposure history in combination with repeated testing (Vazquez-Prokopec et al. 2009). Finally, we encourage statisticians to consider using simulations of the underlying process to generate data for the evaluation of statistical approaches rather than phenomenological statistical distributions.

## Acknowledgments

We thank M. Ebinger for help with the figures. PCC’s work was supported by U.S. Geological Survey, the NSF/NIH Ecology of Infectious Disease program DEB-1067129 and some ideas stem from working groups sponsored by the NIH/DHS funded RAPIDD program. DC’s work was supported by NSF Grant DEB-0749097 to L.A. Meyers. Any use of trade, product, or firm names is for descriptive purposes only and does not imply endorsement by the U.S. Government.