Advertisement

Environmental and Ecological Statistics

, Volume 25, Issue 2, pp 257–275 | Cite as

Multiple source spatial cluster detection via multi-criteria analysis

  • Alexandre C. L. Almeida
  • Luiz H. Duczmal
  • André L. F. CançadoEmail author
  • Fabio R. da Silva
Article

Abstract

Multiple data sources are essential to provide reliable information regarding the emergence of potential health threats, compared to single source methods. Spatial Scan Statistics have been adapted to analyze multivariate data sources, but only ad hoc procedures have been devised to address the problem of selecting the most likely cluster and computing its significance. In this work, information from multiple data sources of disease surveillance is incorporated to achieve more coherent spatial cluster detection using tools from multi-criteria analysis. The best cluster solutions are found by maximizing two objective functions simultaneously, based on the concept of dominance. To evaluate the statistical significance of solutions, a statistical approach based on the concept of attainment function is used. The multi-criteria approach has several advantages: the representation of the evaluation function for each data source is clear, and does not suffer from an artificial, and possibly confusing mixture with the other data source evaluations; it is possible to attribute, in a rigorous way, the statistical significance of each candidate cluster; and it is possible to analyze and pick-up the best cluster solutions, as given naturally by the non-dominated set. The methodology is illustrated with real datasets.

Keywords

Attainment surface Multi-criteria Multiple data sources Spatial scan statistic 

Notes

Acknowledgements

The authors are deeply indebted to CNPq (Project 311710/2016-6), CAPES and FAPEMIG (PPM-00596-17), Brazil, for financial support.

References

  1. Banks D, Datta G, Karr A, Lynch J, Niemi J, Vera F (2012) Bayesian car models for syndromic surveillance on multiple data streams: theory and practice. Inf Fus 13(2):105–116CrossRefGoogle Scholar
  2. Burkom H (2003) Biosurveillance applying scan statistics with multiple, disparate data sources. J Urban Health 80:i57–i65.  https://doi.org/10.1007/PL00022316 CrossRefPubMedPubMedCentralGoogle Scholar
  3. Burkom HS, Murphy S, Coberly J, Hurt-Mullen K (2005) Public health monitoring tools for multiple data streams. MMWR Morb Mortal Wkly Rep 54(Suppl):55–62Google Scholar
  4. Cancado ALF, Duarte AR, Duczmal LH, Ferreira SJ, Fonseca CM, Gontijo ECDM (2010) Penalized likelihood and multi-objective spatial scans for the detection and inference of irregular clusters. Int J Health Geogr.  https://doi.org/10.1186/1476-072X-9-55 PubMedPubMedCentralCrossRefGoogle Scholar
  5. da Fonseca V, Fonseca C, Hall A (2001) Inferential performance assessment of stochastic optimisers and the attainment function. In: Zitzlet E, Deb K, Thiele L, Coello CAC, Corne D (eds) Proceedings of evolutionary multi-criterion optimization, vol 1993. Lecture notes in computer science. Springer, Berlin, pp 213–225CrossRefGoogle Scholar
  6. Fonseca CM, Fonseca VGD, Paquete L (2005) Exploring the performance of stochastic multiobjective optimisers with the second-order attainment function. In: Evolutionary multi-criterion optimization (EMO 2005), LNCS 3410, Springer, Berlin, pp 250–264Google Scholar
  7. Fonseca CM, Guerreiro AP, Lopez-Ibanez M, Paquete L (2011) On the computation of the empirical attainment function. In: Takahashi R, Deb K, Wanner E, Greco S (eds) Evolutionary multi-criterion optimization, vol 6576. Lecture notes in computer science. Springer, Berlin, pp 106–120CrossRefGoogle Scholar
  8. Huang L, Kulldorff M, Gregorio D (2007) A spatial scan statistic for survival data. Biometrics 63:109–118CrossRefPubMedGoogle Scholar
  9. Instituto Paranaense de Desenvolvimento Economico e Social (2010) Idh-m - Paraná 1991/2000/2010. http://www.ipardes.gov.br/pdf/mapas/base_demografica_social/IDHM_Parana_%201991_2000_2010.pdf. Accessed 12 Sept 2017
  10. Kentucky Cancer Registry (2016) Age-adjusted cancer mortality rates by in Kentucky. http://cancer-rates.info/ky. Accessed 27 April 2016
  11. Kulldorff M (1997) A spatial scan statistic. Commun Stat Theory Methods 26(6):1481–1496CrossRefGoogle Scholar
  12. Kulldorff M, Tango T, Park PJ (2003) Power comparisons for disease clustering tests. Comput Stat Data Anal 42(4):665–684CrossRefGoogle Scholar
  13. Kulldorff M, Mostashari F, Duczmal L, Yih K, Kleinman K, Platt R (2007) Multivariate scan statistics for disease surveillance. Stat Med 26:1824–1833CrossRefPubMedGoogle Scholar
  14. Kulldorff M, Huang L, Konty K (2009) A scan statistic for continuous data based on the normal probability model. Int J Health Geogr 8(1):1–9.  https://doi.org/10.1186/1476-072X-8-58 CrossRefGoogle Scholar
  15. Naus JI (1965) Clustering of random points in two dimensions. Biometrika 52:263–267CrossRefGoogle Scholar
  16. Naus JI, Wartenberg D (1997) A double-scan statistic for clusters of two types of events. J Am Stat Assoc 92(439):1105–1113Google Scholar
  17. Neill DB (2012) Fast subset scan for spatial pattern detection. J R Stat Soc Ser B (Stat Methodol) 74(2):337–360.  https://doi.org/10.1111/j.1467-9868.2011.01014.x CrossRefGoogle Scholar
  18. Neill D, Cooper G (2010) A multivariate bayesian scan statistic for early event detection and characterization. Mach Learn 79:261–282.  https://doi.org/10.1007/s10994-009-5144-4 CrossRefGoogle Scholar
  19. Neill DB, McFowland E, Zheng H (2013) Fast subset scan for multivariate event detection. Stat Med 32(13):2185–2208.  https://doi.org/10.1002/sim.5675
  20. Rolka H, Burkom H, Cooper GF, Kulldorff M, Madigan D, Wong WK (2007) Issues in applied statistics for public health bioterrorism surveillance using multiple data streams: research needs. Stat Med 26(8):1834–1856.  https://doi.org/10.1002/sim.2793 CrossRefPubMedGoogle Scholar
  21. Shmueli G, Fienberg S (2006) Current and potential statistical methods for monitoring multiple data streams for biosurveillance. In: Wilson A, Wilson G, Olwell D (eds) Statistical methods in counterterrorism. Springer, New York, pp 109–140.  https://doi.org/10.1007/0-387-35209-08 CrossRefGoogle Scholar
  22. US Census Bureau (2015) Kentucky counties map—persons in poverty (percent). https://www.census.gov/quickfacts/fact/map/KY/IPE120216#viewtop. Accessed 12 Sept 2017
  23. Voorneveld M (2003) Characterization of pareto dominance. Oper Res Lett 31(1):7–11.  https://doi.org/10.1016/S0167-6377(02)00189-X
  24. World Cancer Research Fund International/American Institute for Cancer Research (2016) Diet, nutrition, physical activity and stomach cancer—2016. http://www.wcrf.org/sites/default/files/Stomach-Cancer-2016-Report.pdf. Accessed 12 Sept 2017

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Alexandre C. L. Almeida
    • 1
  • Luiz H. Duczmal
    • 2
  • André L. F. Cançado
    • 3
    Email author
  • Fabio R. da Silva
    • 4
  1. 1.Department of Physics and MathematicsUniversidade Federal de São João del ReiSão João del ReiBrazil
  2. 2.Department of StatisticsUniversidade Federal de Minas GeraisBelo HorizonteBrazil
  3. 3.Department of StatisticsUniversidade de BrasíliaBrasíliaBrazil
  4. 4.Department of Computer ScienceCentro Federal de Educação Tecnológica de Minas GeraisBelo HorizonteBrazil

Personalised recommendations