Abstract
Background
There are an increasing number of geo-coded information streams available which could improve public health surveillance accuracy and efficiency when properly integrated. Specifically, for zoonotic diseases, knowledge of spatial and temporal patterns of animal host distribution can be used to raise awareness of human risk and enhance early prediction accuracy of human incidence.
Methods
To this end, we develop a spatiotemporal joint modeling framework to integrate human case data and animal host data to offer a modeling alternative for combining multiple surveillance data streams in a novel way. A case study is provided of spatiotemporal modeling of human tularemia incidence and rodent population data from Finnish health care districts during years 1995–2012.
Results
Spatial and temporal information of rodent abundance was shown to be useful in predicting human cases and in improving tularemia risk estimates in 40 and 75% of health care districts, respectively. The human relative risk estimates’ standard deviation with rodent’s information incorporated are smaller than those from the model that has only human incidence.
Conclusions
These results support the integration of rodent population variables to reduce the uncertainty of tularemia risk estimates. However, more information on several covariates such as environmental, behavioral, and socio-economic factors can be investigated further to deeper understand the zoonotic relationship.
Similar content being viewed by others
Background
Disease risk mapping is important for the understanding of the spatial epidemiology of infectious diseases. In most cases and even for multi-host diseases such as zoonoses, risk estimation has been conducted in a univariate fashion based on human case data alone. Modeling of multivariate health data, informed by multiple streams of geo-coded information, allows observation of concurrent patterns among data streams and conditioning on one another. As a result, multivariate methods can deliver greater statistical power and lead to more precise risk estimation and enhanced event detection. Specifically, for zoonotic diseases, knowledge of spatial and temporal patterns of the animal host could inform incidence in humans.
Integration of data and analyses, whether of population or health related variables, has been suggested to improve zoonoses surveillance accuracy and efficiency [1, 2]. Integration appears more feasible for endemic zoonoses, and for those with domesticated animals as source, given the likely greater availability of animal health data. For zoonoses with a non-domesticated animal source (e.g. sylvatic yellow fever, tularemia), availability of animal health data is likely to be a limiting factor towards integration and alternative animal data sources must be sought.
Tularemia is an infectious disease caused by an intracellular bacterium, Francisella tularensis. The disease is endemic in North America and parts of Europe, with recurrent outbreaks in Sweden and Finland [3, 4]. Francisella tularensis has a wide range of hosts with transmission most commonly via arthropod vectors [5]. Rodents could play a role in the zoonotic transmission of the disease after findings of a relationship between vole population cycles and human tularemia incidence in Finland [6] and Sweden [7]. Specifically in Finland, rodent population dynamics displayed a spatiotemporal relationship with human tularemia cases, such that human case numbers peaked one year after peak rodent densities [6]. Similar findings, from studies of tularemia outbreaks, indicate that high rodent densities might relate to occurrences in humans [8,9,10,11].
This work explores the application of spatiotemporal joint models to concurrent animal and human geo-referenced data sources in an effort to explain possible patterns between the distribution and/or the abundance of the animal host and human disease.. The proposed methodology is evaluated on its performance in predicting human disease risk and improving risk estimation in a case study of tularemia human incidence and rodent population data in Finland. Not only our method contains methodological novelty with potential applications in spatial epidemiology, it also helps to reveal a disease pattern in the case study which was not considered in previous studies.
Methods
Data sources
A complete description of the rodent population and human tularemia incidence data is available from earlier reports [6]. Briefly, data on rodent population levels, predominantly bank voles (Myodes glareolus) and field voles (Microtus agrestis), were collected across Finland by the Natural Resources Institute Finland and categorized into three population levels: decline, increase, and peak [12]. Human tularemia cases were reported as laboratory-confirmed to the National Infectious Disease Register, kept by the National Institute for Health and Welfare of Finland. Both human cases and rodent data were aggregated into 20 Finnish healthcare districts over the period 1995–2012 [6]. An indicator to quantify the influence of rodent population levels on human incidence is developed for each health district. Plots of human cases and binary rodent status for the 20 Finnish health districts in the period 1995–2012 show a one-year lagged increase in the number of human tularemia cases after rodent population peaks for certain districts and years (Fig. 1). A similar pattern was found between human cases and the categorical rodent population status (Fig. 2). These support the choice of spatiotemporal models which will be developed in the next section.
Statistical methodology
We propose a Bayesian framework to jointly analyze rodent population status and human case incidence. We assume that the human cases are associated with rodents’ status through a latent structure. Let human cases (counts), hit, at health district i and time t follow a Poisson distribution with mean = eiθit where ei is the expected number of human cases in the ith health district (presumably constant across the years) and θit is the relative risk at the ith health district and year t There are a number of ways to calculate the expected rate. In this paper, the expected rates, ei, are calculated as the average case count at area i over the time period as \( {e}_i=\frac{\sum_t{h}_{it}}{T} \) where T is the length of study period (T = 8 years). Human population data was obtained from the Finnish Population Register Centre [13]. However, the population variation between regions was found to be limited and combined with the low overall rate of the disease, it was decided that a time averaged rate would be appropriate in this case.
A simple approach to jointly model human incidence and rodent population data is to consider a 2-level rodent population indicator as in [6]. As a binary variable, we denote rit = 0 if the number of rodents declined and rit = 1 if the number of rodents is at peak or increased. The 2-class rodent status is then assumed to follow a Bernoulli distribution with parameter pit being the probability of rodent for the i th health district and year t. To specify the parameters in the joint likelihoods, θitfor humans and pit for rodents, linear predictors are decomposed additively into spatial, and space-time interaction random effects as follows
αr, αh are the overall mean levels for rodent and human respectively, and assumed to have zero-mean Gaussian prior distributions. The latent variables \( {u}_i^h,{u}_i^r,{v}_i^h,{v}_i^r \) are included to model non-temporal background variation with spatial and non-spatial prior distributions. The spatial structure for \( {u}_i^h,{u}_i^r \) follow an intrinsic conditional autoregressive (ICAR) [14] model and the non-spatial distribution for \( {v}_i^h,{v}_i^r \) is assumed to be zero-mean Gaussian prior distribution. A Gaussian distribution with zero mean is assumed for \( {\lambda}_t^h,{\lambda}_t^r \) at t = 1 and an autoregressive prior distribution is assumed for \( {\lambda}_t^h,{\lambda}_t^r \) at t > 1 which allows for a type of nonparametric temporal effect. \( {\delta}_{it}^h,{\delta}_{it}^r \) represent the temporal trend for each health district at year t. We also assume that the space-time random effects of human and rodent are proportional with one-year lag. This is supported by the finding suggested in [6] that 1-year temporal lag effect can be beneficial to predict human tularemia outbreaks. To model unobserved ecological effects associated with vole cycles at the five boreal zones in Finland (Southern Finland, Southwestern and Inland Finland, Eastern Finland, Northern Finland, and Lapland) [13], two additional parameters \( {u}_k^{sr},{v}_k^{sr} \) are included as the spatial and non-spatial contextual variables for regional state level k. \( {\delta}_{it}^r,{\beta}_i^p,{\beta}_i^n \) are assumed to follow a zero-mean Gaussian prior distribution and the uniform distribution on (0,10) is used to model all standard deviation parameters [15].
Although there is some evidence that a 2-level rodent status could have a lagged predictive ability on human tularemia [6], we also want to extend our consideration to include the original three rodent levels in the joint modeling and assume a categorical likelihood for rodent population status. A specification for multiple categories of rodent status can be defined as
where j = 1 indicates the declining rodent population level, j = 2 is the increasing level, and j = 3 is the rodent’s population at peak, and \( {\delta}_{j, it}^r\sim N\left(0,{\tau}_{\delta^r}^{-1}\right) \). The other prior distributions for the random effects terms are assumed the same as the 2-level model.
We further assume that the space-time random effects of human and rodent are proportional with one-year lag. That is \( {\delta}_{it}^h\alpha\;{\delta}_{r_{it-1}, it-1}^r \), i.e. \( {\delta}_{it}^h={\beta}_{i,{r}_{it-1}}{\delta}_{r_{it-1}, it-1}^r \) where \( {\beta}_{i,{r}_{it-1}} \) is the proportional parameter. This specification is developed to examine the lagged effect of rodent status, \( {\delta}_{r_{it-1}, it-1}^r \), at level j = rit-1 in health district i at time t-1on the number of human cases through the space-time interaction term,\( {\delta}_{it}^h \).
Model evaluation
To evaluate the models, we use two goodness of fit measures: the Deviance Information Criterion (DIC) [16, 17] where the effective number of parameters is estimated in terms of deviance’s variance, and the Watanabe-Akaike information criterion (WAIC) [18,19,20]. Both measures are computed under the likelihood of human data to compare the models’ fit with and without the contextual variables. We also compare the posterior standard deviations of θit from the two models (binary and categorical) with the rodent data, with those from a model based only human data, to examine the benefits of incorporating animal data in surveillance. Results are obtained from 10,000 posterior samples using WinBUGS software after a burn-in period of 10,000 draws. To assess the mixing of posterior samplers, we adopt Gelman’s \( \widehat{R} \) statistics proposed in [18, 21] for multiple chain convergence and converged chains should have the value of \( \widehat{R} \) approximately 1.
Results
The histograms of \( \widehat{R} \) estimates of θit from posterior sampler under the models are displayed in Fig. 3. The \( \widehat{R} \) estimates under all the models are approximately or less than 1.005 which indicates the chains converge to a posterior distribution. Table 1 displays the DIC and the measures of model performance. We compared the models (binary vs. polytomous) on their DIC and WAIC values under the human likelihood. The binary model without the contextual effect shows the smallest DIC and WAIC values and hence it can be considered as having the best fit to the data. To provide evidence of the benefits from incorporating rodent information into the model, the posterior estimates of standard deviation (SD) of θit from both binary and polytomous models are compared and presented in Table 2 for each area over the time period. The SD estimates of θit with rodent’s information incorporated are smaller than those from the model that has only human incidence. This result supports the integration of rodent population variables to reduce the uncertainty of tularemia risk estimates.
To help assess the predictive performance of rodent abundance data on the occurrence of human cases, we develop a metric on the 0 to 1 scale, namely the degree of positive indicator (DP) for each health district as \( {DP}_i=\frac{\exp \left({\beta}_i^p\right)}{\exp \left({\beta}_i^p\right)+1}. \) The DP indicator is derived from the binary model without the contextual factors as this was the model with the lowest DIC and WAIC. High values of DP (close to 1) would indicate health districts with a high number of human tularemia cases in the current year, given increasing or at peak rodent populations in the past year. DP can be interpreted in a similar fashion to the sensitivity of a diagnostic test. High values of DP would suggest good predictive value of the rodent data on the occurrence of human tularemia cases. Year-specific modeling was not considered due to insufficient data. Thus positive dependence values represent an averaged effect of rodent populations on human incidence across all the years (Table 3). There are 8 (40%) health districts, mostly on the west and south of the country, with mean DP values larger than 0.8 and the lower 95% credible interval above 0.5 (Fig. 4).
Discussion
Other modifications are possible. For example, in joint modeling human and rodent data we could consider human cases being dependent on rodent population through pit as a categorical covariate. For spatial unit, smaller or administrative areas different from health districts may have led to more discriminatory findings (or not given increased noise from smaller units). Similarly, more informative data on rodent population levels, e.g. rodent densities as in other studies [11] might have resulted in more descriptive models. Rodent populations fluctuate with a highly varying amplitude, which means that abundances may vary substantially from one peak to the next, even within the same regions [12]. If tularemia transmission to humans is a phase and density dependent process, as it most likely is [6], variation in vole abundance during successive peaks may reduce the predictive value of models employed here. Subsequent studies must explore the incorporation of data on the precise location of the rodent trapping sites through the use of some form of interpolation [22]. In addition to host population changes, environmental factors also seem to impact on the occurrence of tularemia outbreaks [23, 24]. Incorporation of evidence on mosquito distribution (not purposely captured at this moment in Finland), rainfall and water bodies into our models would be straightforward.
Conclusions
Space-time proximity and contact patterns between humans and animals play a central role in infection risk. In this research, we attempted to assess the surveillance relevance of regularly collected rodent population data to i) improve tularemia risk estimates and ii) inform early prediction of human tularemia cases in Finland. To that effect, we developed binary and polytomous architectures to jointly model human incidence and rodent status with one-year lag. Our results returned a heterogeneous picture but for many health districts rodent population status was relevant to the occurrence of human tularemia cases. We have shown that the incorporation of rodent population data led to an improvement in the accuracy of human risk estimates in 15 (75%) health districts, compared to models only considering human tularemia cases. Furthermore, our purposely built indicator (DP) showed that in 8 (40%) of the health districts, increasing and at-peak rodent populations were robust predictors of human tularemia cases in the following year. However, for few districts (e.g. Länsi-Pohja and Kainuu) where the model based only on human data leads to more precise estimates of incidence, these areas also have the low values of positive indicator (DP) with the corresponding credible intervals crossing 0.5. This suggests that the distribution of zoonotic pathogens in animal and human populations spatially varies as we assumed according to local biotic and abiotic determinants. To conduct further investigation, we need more information on several covariates such as environmental, behavioral, and socio-economic factors. However, the platform proposed in this research can facilitate in identification of geographical areas that are potentially suitable for transmission.
Our objective was to develop different models to combine multiple data sources already available, on animals and humans, to better inform the occurrence of zoonoses. The present work shows the utilization of animal population data (in the absence of animal health-related data on rodents) to inform human risk. Although different model parameterizations and, in particular, evidence on other putative predictors, as reported elsewhere [25], could contribute further evidence to better inform human risk, our proposed methodology demonstrates its ability to quantify the association between the rodent status and human incidence. This is potentially useful in prediction of human outbreak when put in the health-policy perspective.
Abbreviations
- CrI:
-
Credible interval
- DIC:
-
Deviance Information Criterion
- DP:
-
Degree of positive indicator
- ICAR:
-
Intrinsic conditional autoregressive model
- SD:
-
Standard deviation
- WAIC:
-
Watanabe-Akaike information criterion
References
Vrbova L, et al. Utility of algorithms for the analysis of integrated Salmonel la surveillance data. Epidemiol Infect. 2016;144(10):2165–75.
Wendt A, Kreienbrock L, Campe A. Joint use of disparate data for the surveillance of Zoonoses: a feasibility study for a one health approach in Germany. Zoonoses Public Health. 2016;63(7):503–14.
Rossow H, et al. Risk factors for pneumonic and ulceroglandular tularaemia in Finland: a population-based case-control study. Epidemiol Infect. 2014;142(10):2207–16.
Eliasson H, et al. The 2000 tularemia outbreak: a case-control study of risk factors in disease-endemic and emergent areas, Sweden. Emerg Infect Dis. 2002;8(9):956–60.
Dennis DT, et al. Tularemia as a biological weapon: medical and public health management. Jama. 2001;285(21):2763–73.
Rossow H, et al. Incidence and seroprevalence of tularaemia in Finland, 1995 to 2013: regional epidemics with cyclic pattern. Euro Surveill. 2015;20(33). https://doi.org/10.2807/1560-7917.ES2015.20.33.21209
Tärnvik A, Sandström G, Sjöstedt A. Epidemiological analysis of tularemia in Sweden 1931–1993. FEMS Immunol Med Microbiol. 1996;13(3):201–4.
Reintjes R, et al. Tularemia outbreak investigation in Kosovo: case control and environmental studies. Emerg Infect Dis. 2002;8(1):69–73.
Allue M, et al. Tularaemia outbreak in Castilla y León, Spain, 2007: an update. Euro Surveill. 2008;13(32)
Grunow, R., et al., Surveillance of tularaemia in Kosovo*, 2001 to 2010. 2012.
Luque-Larena JJ, et al. Tularemia outbreaks and common vole (Microtus arvalis) irruptive population dynamics in northwestern Spain, 1997–2014. Vector Borne Zoonotic Dis. 2015;15(9):568–70.
Korpela K, et al. Nonlinear effects of climate on boreal rodent dynamics: mild winters do not negate high-amplitude cycles. Glob Chang Biol. 2013;19(3):697–710.
Sane J, et al. Regional differences in long-term cycles and seasonality of Puumala virus infections, Finland, 1995–2014. Epidemiol Infect. 2016;144(13):2883–2888.
Besag J. Spatial interaction and the statistical analysis of lattice systems. J R Stat Soc Ser B Methodol. 1974:192–236.
Gelman A. Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper). Bayesian Anal. 2006;1(3):515–34.
Spiegelhalter DJ, et al. Bayesian measures of model complexity and fit. J R Stat Soc Series B Stat Methodol. 2002;64(4):583–639.
Celeux G, et al. Deviance information criteria for missing data models. Bayesian Anal. 2006;1(4):651–73.
Gelman A, et al. Bayesian data analysis, vol. 2. USA: Chapman & Hall/CRC Boca Raton, FL; 2014.
Gelman A, Hwang J, Vehtari A. Understanding predictive information criteria for Bayesian models. Stat Comput. 2014;24(6):997–1016.
Watanabe S. A widely applicable Bayesian information criterion. J Mach Learn Res. 2013;14(Mar):867–97.
Brooks SP, Gelman A. General methods for monitoring convergence of iterative simulations. J Comput Graph Stat. 1998;7(4):434–55.
Diggle PJ, Menezes R, Su Tl. Geostatistical inference under preferential sampling. J R Stat Soc: Ser C: Appl Stat. 2010;59(2):191–232.
Faith S, et al. Growth conditions and environmental factors impact aerosolization but not virulence of Francisella tularensis infection in mice. Front Cell Infect Microbiol. 2012;2:126.
Leblebicioglu H, et al. Outbreak of tularemia: a case–control study and environmental investigation in Turkey. Int J Infect Dis. 2008;12(3):265–9.
Ariza-Miguel J, et al. Molecular investigation of tularemia outbreaks, Spain, 1997–2008. Emerg Infect Dis. 2014;20(5):754.
Acknowledgements
We would like to thank the reviewers for comments that greatly improved the manuscript.
Funding
This research is partially supported by the Faculty of Tropical Medicine, Mahidol University. The funding body had no role in the design or analysis of the study, interpretation of results, or writing of the manuscript.
Availability of data and materials
The data from the National Institute for Health and Welfare of Finland is available on public domain and upon request. The data from the Natural Resources Institute is available of from the representing authors upon request.
Author information
Authors and Affiliations
Contributions
Authors CR, AL and VJDRV designed the study with critical review from HR. CR performed the statistical analyses in consultation with AL, VJDRV, HR, JS, OH and HH. CR drafted the paper with input from AL and VJDRV. HR, JS, OH and HH were responsible for critical revision and improvements of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Surveillance data from the National Infectious Disease Register is aggregate, de-identifed data for public use and the secondary analyses are not subjected to ethical approval process, according to the Finnish Infectious Disease Act (Section 42).
Consent for publication
Not Applicable.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Rotejanaprasert, C., Lawson, A., Rossow, H. et al. Towards integrated surveillance of zoonoses: spatiotemporal joint modeling of rodent population data and human tularemia cases in Finland. BMC Med Res Methodol 18, 72 (2018). https://doi.org/10.1186/s12874-018-0532-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12874-018-0532-8