Abstract
Targeted intervention and resource allocation are essential in effective control of infectious diseases, particularly those like malaria that tend to occur in remote areas. Disease prediction models can help support targeted intervention, particularly if they have fine spatial resolution. But, choosing an appropriate resolution is a difficult problem since choice of spatial scale can have a significant impact on accuracy of predictive models. In this paper, we introduce a new approach to spatial clustering for disease prediction we call complexity-based spatial hierarchical clustering. The technique seeks to find spatially compact clusters that have time series that can be well characterized by models of low complexity. We evaluate our approach with 2 years of malaria case data from Tak Province in northern Thailand. We show that the technique’s use of reduction in Akaike information criterion (AIC) and Bayesian information criterion (BIC) as clustering criteria leads to rapid improvement in predictability and significantly better predictability than clustering based only on minimizing spatial intra-cluster distance for the entire range of cluster sizes over a variety of predictive models and prediction horizons.
Similar content being viewed by others
Notes
This paper is an extended version of a previous short workshop paper [35] which presented preliminary results.
References
Khamsiriwatchara A, Sudathip P, Sawang S, Vijakadge S, Potithavoranan T, Sangvichean A, Satimai W, Delacollette C, Singhasivanon P, Lawpoolsri S, Kaewkungwal J (2012) Artemisinin resistance containment project in Thailand.(I): implementation of electronic-based malaria information system for early case detection and individual case management in provinces along the Thai-Cambodian border. Malar J 11(1):247
Graham A, Atkinson P, Danson F (2004) Spatial analysis for epidemiology. Acta Trop 91:219–225
Meliker JR, Sloan CD (2011) Spatio-temporal epidemiology: principles and opportunities. Spat Spatio-temporal Epidemiol 2(1):1–9
Rissanen J (1978) Modeling by shortest data description. Automatica 14(5):465–658
Hansen MH, Yu B (2001) Model selection and the principle of minimum description length. J Am Stat Assoc 96(454):746–774
Dagliati A, Marinoni A, Cerra C, Decata P, Chiovato L, Gamba P, Bellazzi R (2016) Integration of administrative, clinical, and environmental data to support the management of type 2 diabetes mellitus: from satellites to clinical care. J Diabetes Sci Technol 10(1):19–26
Waller LA (2004) Gotway CA. Applied spatial statistics for public health data, John Wiley & Sons
Gelman A, Price PN (1999) All maps of parameter estimates are misleading. Stat Med 18(23):3221–3234
Openshaw S, Taylor PJ (1981) The modifiable areal unit problem. In: Wrigley N, Bennett R (eds) Quantitative geography: a British view. Routledge and Degan Paul, London, pp 60–69
Fotheringham AS, Wong DW (1991) The modifiable areal unit problem in multivariate statistical analysis. Environ Plan A 23(7):1025–1044
Glaz J, Naus J, Wallenstein S (2001) Scan statistics. Springer, New York, NY
Alemu K, Worku A, Berhane Y, Kumie A (2014) Spatiotemporal clusters of malaria cases at village level, Northwest Ethiopia. Malar J 13(1):223
Kulldorff M (1997) A spatial scan statistic. Commun Stat-Theory Methods 26(6):1481–1496
Mosha JF, Sturrock HJ, Greenwood B, Sutherland CJ, Gadalla NB, Atwal S, Hemelaar S, Brown JM, Drakeley C, Kibiki G, Bousema T (2014) Hot spot or not: a comparison of spatial statistical methods to predict prospective malaria infections. Malar J 13(1):53
Bousema T, Stevenson J, Baidjoe A, Stresman G, Griffin JT, Kleinschmidt I, Remarque EJ, Vulule J, Bayoh N, Laserson K, Desai M (2013) The impact of hotspot-targeted interventions on malaria transmission: study protocol for a cluster-randomized controlled trial. Trials 14(1):36
Mogeni P, Omedo I, Nyundo C, Kamau A, Noor A, Bejon P (2017) Effect of transmission intensity on hotspots and micro-epidemiology of malaria in sub-Saharan Africa. BMC Med 15(1):121
Zinszer K, Verma AD, Charland K, Brewer TF, Brownstein JS, Sun Z, Buckeridge DL (2012) A scoping review of malaria forecasting: past work and future directions. BMJ Open 2(6):e001992
Giardina F, Franke J, Vounatsou P (2015) Geostatistical modelling of the malaria risk in Mozambique: effect of the spatial resolution when using remotely-sensed imagery. Geospat Health 10
Teklehaimanot HD, Lipsitch M, Teklehaimanot A, Schwartz J (2004) Weather-based prediction of plasmodium falciparum malaria in epidemic-prone regions of Ethiopia I. Patterns of lagged weather effects reflect biological mechanisms. Malar J 3(41)
Montero P and Vilar JA (2014) TSclust: an R Package for time series clustering, Journal of Statistical Software, vol. 62, no. 1
Pedrycz W (2007) Granular computing—the emerging paradigm. J Uncertain Syst 1(1):38–61
Pedrycz W (2013 May 9) Granular computing: analysis and design of intelligent systems. CRC press
Maciel L, Ballini R, Gomide F (2016 Dec 1) Evolving granular analytics for interval time series forecasting. Granular Computing 1(4):213–224
Kulldorff M. SaTScan user guide for version 9.0. Retrieved 18 June 2018 from http://www.satscan.org
Lempel A, Ziv J (1976 Jan) On the complexity of finite sequences. IEEE Trans Inf Theory 22(1):75–81
Pincus S (1995 Mar) Approximate entropy (ApEn) as a complexity measure. Chaos 5(1):110–117
Rasheed BQ, Qian B. Hurst exponent and financial market predictability. InIASTED conference on Financial Engineering and Applications (FEA 2004) 2004 (pp. 203–209)
Nobre FF, Monteiro ABS, Telles PR, Williamson GD (2001) Dynamic linear model and SARIMA: a comparison of their forecasting performance in epidemiology. Stat Med 20(20):3051–3069
Pascual M, Cazelles B, Bouma MJ, Chaves LF, Koelle K (2008) Shifting patterns: malaria dynamics and rainfall variability in an African highland. Proc R Soc Lond B Biol Sci 275(1631):123–132
Burnham KP, Anderson DR (2004) Multimodel inference: understanding AIC and BIC in model selection. Sociol Methods Res 33(2):261–304
Khandakar Y, Hyndman RJ (2008) Automatic time series forecasting: the forecast Package for R. Journal of Statistical Software 27(03)
Haddawy P, Hasan AHMI, Kasantikul R, Lawpoolsri S, Sa-angchai P, Kaewkungwal J, Singhasivanon P (2018) Spatiotemporal Bayesian networks for malaria prediction. Artif Intell Med 84:127–138
Hasan A.H.M.I, Haddawy P, Lawpoolsri S. (2017) A comparative analysis of Bayesian network approaches to malaria outbreak prediction, Proc. 13 th Int’l Conf. on Computing and Information Technology (IC2IT2017), Bangkok
Makridakis S (1993) Accuracy measures: theoretical and practical concerns. Int J Forecast 9:527–529
Haddawy P, Su Yin M, Wisanrakkit T, Limsupavanich R, Promrat P and Lawpoolsri S (2017) AIC-driven spatial hierarchical clustering: case study for malaria prediction in Northern Thailand, In: Multi-disciplinary Trends in Artificial Intelligence, Proc. MIWAI 2017, Brunei
Acknowledgements
We thank Oliver Grübner for helpful comments on an earlier draft. This paper is based upon work supported by the U.S. Army ITC-PAC under Contract No. FA5209-15-P-0183. This work was also partially supported through a fellowship from the Hanse-Wissenschaftskolleg Institute for Advanced Study, Delmenhorst, Germany, to Haddawy and a Santander BISIP scholarship to Su Yin.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they have no conflict of interest.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A
Appendix A
Rights and permissions
About this article
Cite this article
Haddawy, P., Yin, M.S., Wisanrakkit, T. et al. Complexity-Based Spatial Hierarchical Clustering for Malaria Prediction. J Healthc Inform Res 2, 423–447 (2018). https://doi.org/10.1007/s41666-018-0031-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41666-018-0031-z