Skip to main content

Advertisement

Log in

Complexity-Based Spatial Hierarchical Clustering for Malaria Prediction

  • Research Article
  • Published:
Journal of Healthcare Informatics Research Aims and scope Submit manuscript

Abstract

Targeted intervention and resource allocation are essential in effective control of infectious diseases, particularly those like malaria that tend to occur in remote areas. Disease prediction models can help support targeted intervention, particularly if they have fine spatial resolution. But, choosing an appropriate resolution is a difficult problem since choice of spatial scale can have a significant impact on accuracy of predictive models. In this paper, we introduce a new approach to spatial clustering for disease prediction we call complexity-based spatial hierarchical clustering. The technique seeks to find spatially compact clusters that have time series that can be well characterized by models of low complexity. We evaluate our approach with 2 years of malaria case data from Tak Province in northern Thailand. We show that the technique’s use of reduction in Akaike information criterion (AIC) and Bayesian information criterion (BIC) as clustering criteria leads to rapid improvement in predictability and significantly better predictability than clustering based only on minimizing spatial intra-cluster distance for the entire range of cluster sizes over a variety of predictive models and prediction horizons.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. This paper is an extended version of a previous short workshop paper [35] which presented preliminary results.

References

  1. Khamsiriwatchara A, Sudathip P, Sawang S, Vijakadge S, Potithavoranan T, Sangvichean A, Satimai W, Delacollette C, Singhasivanon P, Lawpoolsri S, Kaewkungwal J (2012) Artemisinin resistance containment project in Thailand.(I): implementation of electronic-based malaria information system for early case detection and individual case management in provinces along the Thai-Cambodian border. Malar J 11(1):247

    Article  Google Scholar 

  2. Graham A, Atkinson P, Danson F (2004) Spatial analysis for epidemiology. Acta Trop 91:219–225

    Article  Google Scholar 

  3. Meliker JR, Sloan CD (2011) Spatio-temporal epidemiology: principles and opportunities. Spat Spatio-temporal Epidemiol 2(1):1–9

    Article  Google Scholar 

  4. Rissanen J (1978) Modeling by shortest data description. Automatica 14(5):465–658

    Article  Google Scholar 

  5. Hansen MH, Yu B (2001) Model selection and the principle of minimum description length. J Am Stat Assoc 96(454):746–774

    Article  MathSciNet  Google Scholar 

  6. Dagliati A, Marinoni A, Cerra C, Decata P, Chiovato L, Gamba P, Bellazzi R (2016) Integration of administrative, clinical, and environmental data to support the management of type 2 diabetes mellitus: from satellites to clinical care. J Diabetes Sci Technol 10(1):19–26

    Article  Google Scholar 

  7. Waller LA (2004) Gotway CA. Applied spatial statistics for public health data, John Wiley & Sons

    Google Scholar 

  8. Gelman A, Price PN (1999) All maps of parameter estimates are misleading. Stat Med 18(23):3221–3234

    Article  Google Scholar 

  9. Openshaw S, Taylor PJ (1981) The modifiable areal unit problem. In: Wrigley N, Bennett R (eds) Quantitative geography: a British view. Routledge and Degan Paul, London, pp 60–69

    Google Scholar 

  10. Fotheringham AS, Wong DW (1991) The modifiable areal unit problem in multivariate statistical analysis. Environ Plan A 23(7):1025–1044

    Article  Google Scholar 

  11. Glaz J, Naus J, Wallenstein S (2001) Scan statistics. Springer, New York, NY

    Book  Google Scholar 

  12. Alemu K, Worku A, Berhane Y, Kumie A (2014) Spatiotemporal clusters of malaria cases at village level, Northwest Ethiopia. Malar J 13(1):223

    Article  Google Scholar 

  13. Kulldorff M (1997) A spatial scan statistic. Commun Stat-Theory Methods 26(6):1481–1496

    Article  MathSciNet  Google Scholar 

  14. Mosha JF, Sturrock HJ, Greenwood B, Sutherland CJ, Gadalla NB, Atwal S, Hemelaar S, Brown JM, Drakeley C, Kibiki G, Bousema T (2014) Hot spot or not: a comparison of spatial statistical methods to predict prospective malaria infections. Malar J 13(1):53

    Article  Google Scholar 

  15. Bousema T, Stevenson J, Baidjoe A, Stresman G, Griffin JT, Kleinschmidt I, Remarque EJ, Vulule J, Bayoh N, Laserson K, Desai M (2013) The impact of hotspot-targeted interventions on malaria transmission: study protocol for a cluster-randomized controlled trial. Trials 14(1):36

    Article  Google Scholar 

  16. Mogeni P, Omedo I, Nyundo C, Kamau A, Noor A, Bejon P (2017) Effect of transmission intensity on hotspots and micro-epidemiology of malaria in sub-Saharan Africa. BMC Med 15(1):121

    Article  Google Scholar 

  17. Zinszer K, Verma AD, Charland K, Brewer TF, Brownstein JS, Sun Z, Buckeridge DL (2012) A scoping review of malaria forecasting: past work and future directions. BMJ Open 2(6):e001992

    Article  Google Scholar 

  18. Giardina F, Franke J, Vounatsou P (2015) Geostatistical modelling of the malaria risk in Mozambique: effect of the spatial resolution when using remotely-sensed imagery. Geospat Health 10

  19. Teklehaimanot HD, Lipsitch M, Teklehaimanot A, Schwartz J (2004) Weather-based prediction of plasmodium falciparum malaria in epidemic-prone regions of Ethiopia I. Patterns of lagged weather effects reflect biological mechanisms. Malar J 3(41)

    Article  Google Scholar 

  20. Montero P and Vilar JA (2014) TSclust: an R Package for time series clustering, Journal of Statistical Software, vol. 62, no. 1

  21. Pedrycz W (2007) Granular computing—the emerging paradigm. J Uncertain Syst 1(1):38–61

    Google Scholar 

  22. Pedrycz W (2013 May 9) Granular computing: analysis and design of intelligent systems. CRC press

  23. Maciel L, Ballini R, Gomide F (2016 Dec 1) Evolving granular analytics for interval time series forecasting. Granular Computing 1(4):213–224

    Article  Google Scholar 

  24. Kulldorff M. SaTScan user guide for version 9.0. Retrieved 18 June 2018 from http://www.satscan.org

  25. Lempel A, Ziv J (1976 Jan) On the complexity of finite sequences. IEEE Trans Inf Theory 22(1):75–81

    Article  MathSciNet  Google Scholar 

  26. Pincus S (1995 Mar) Approximate entropy (ApEn) as a complexity measure. Chaos 5(1):110–117

    Article  MathSciNet  Google Scholar 

  27. Rasheed BQ, Qian B. Hurst exponent and financial market predictability. InIASTED conference on Financial Engineering and Applications (FEA 2004) 2004 (pp. 203–209)

  28. Nobre FF, Monteiro ABS, Telles PR, Williamson GD (2001) Dynamic linear model and SARIMA: a comparison of their forecasting performance in epidemiology. Stat Med 20(20):3051–3069

    Article  Google Scholar 

  29. Pascual M, Cazelles B, Bouma MJ, Chaves LF, Koelle K (2008) Shifting patterns: malaria dynamics and rainfall variability in an African highland. Proc R Soc Lond B Biol Sci 275(1631):123–132

    Article  Google Scholar 

  30. Burnham KP, Anderson DR (2004) Multimodel inference: understanding AIC and BIC in model selection. Sociol Methods Res 33(2):261–304

    Article  MathSciNet  Google Scholar 

  31. Khandakar Y, Hyndman RJ (2008) Automatic time series forecasting: the forecast Package for R. Journal of Statistical Software 27(03)

  32. Haddawy P, Hasan AHMI, Kasantikul R, Lawpoolsri S, Sa-angchai P, Kaewkungwal J, Singhasivanon P (2018) Spatiotemporal Bayesian networks for malaria prediction. Artif Intell Med 84:127–138

    Article  Google Scholar 

  33. Hasan A.H.M.I, Haddawy P, Lawpoolsri S. (2017) A comparative analysis of Bayesian network approaches to malaria outbreak prediction, Proc. 13 th Int’l Conf. on Computing and Information Technology (IC2IT2017), Bangkok

  34. Makridakis S (1993) Accuracy measures: theoretical and practical concerns. Int J Forecast 9:527–529

    Article  Google Scholar 

  35. Haddawy P, Su Yin M, Wisanrakkit T, Limsupavanich R, Promrat P and Lawpoolsri S (2017) AIC-driven spatial hierarchical clustering: case study for malaria prediction in Northern Thailand, In: Multi-disciplinary Trends in Artificial Intelligence, Proc. MIWAI 2017, Brunei

Download references

Acknowledgements

We thank Oliver Grübner for helpful comments on an earlier draft. This paper is based upon work supported by the U.S. Army ITC-PAC under Contract No. FA5209-15-P-0183. This work was also partially supported through a fellowship from the Hanse-Wissenschaftskolleg Institute for Advanced Study, Delmenhorst, Germany, to Haddawy and a Santander BISIP scholarship to Su Yin.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peter Haddawy.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A

Appendix A

Table 4 Cumulative moving average of SMAPE for 1- and 4-week ARIMAX predictions using AIC alone, AIC with physical distance, and physical distance alone. Repeated entries indicate that there were no clusters of that size, and so, the cumulative average value remains unchanged
Table 5 Cumulative moving average of SMAPE for 1- and 4-week linear regression predictions using AIC alone, AIC with physical distance, and physical distance alone. Repeated entries indicate that there were no clusters of that size, and so, the cumulative average value remains unchanged
Table 6 Cumulative moving average of SMAPE for 1- and 4-week ARIMAX predictions using BIC alone, BIC with physical distance, and physical distance alone. Repeated entries indicate that there were no clusters of that size, and so, the cumulative average value remains unchanged
Table 7 Cumulative moving average of SMAPE for 1- and 4-week linear regression predictions using BIC alone, BIC with physical distance, and physical distance alone. Repeated entries indicate that there were no clusters of that size, and so, the cumulative average value remains unchanged

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Haddawy, P., Yin, M.S., Wisanrakkit, T. et al. Complexity-Based Spatial Hierarchical Clustering for Malaria Prediction. J Healthc Inform Res 2, 423–447 (2018). https://doi.org/10.1007/s41666-018-0031-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41666-018-0031-z

Keywords

Navigation