Skip to main content
Log in

Flexible scan statistic test to detect disease clusters in hierarchical trees

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

This paper presents a flexible scan test statistic to detect disease clusters in data sets represented as a hierarchical tree. The algorithm searches through the branches of the tree and it is able to aggregate leaves located in different branches. The test statistic combines two terms, the log-likelihood of the data and the amount of information necessary to computationally code each potential cluster. This second term penalizes the search algorithm avoiding the detection of oddly shaped clusters and it is based on the Minimum Description Length (MDL) principle. Our MDL method reaches an automatic compromise between bias and variance. We present simulated results showing that its power performance as compared to the usual scan statistic and the high accuracy of the MDL to identify clusters that are scattered on the tree. The MDL method is illustrated with a large database looking at the relationship between occupation and death from silicosis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Abrams A, Kulldorff M, Kleinman K (2006) Empirical/asymptotic p-values for monte carlo-based hypothesis testing: an application to cluster detection using the scan statistic. Adv Dis Surveill 1: 1–2

    Google Scholar 

  • Almenoff J, Tonning JM, Gould AL, Szarfman A, Hauben M, Ouellet-Hellstrom R, Ball R, Hornbuckle K, Walsh L, Yee C, Sacks ST, Yuen N, Patadia V, Blum M, Johnston M, Gerrits C, Seifert H, LaCroix K (2005) Perspectives on the use of data mining in pharmacovigilances. Drug Saf 28: 981–1007

    Article  Google Scholar 

  • Assunção R, Costa M, Tavares A, Ferreira S (2006) Fast detection of arbitrarily shaped disease clusters. Stat Med 25(5): 723–742

    Article  MathSciNet  Google Scholar 

  • Costa MA, Schrerrer LR, Assunção RM (2005) Detecção de conglomerados espaciais com geometria arbitrária. Simpósio Brasileiro de Geoinformática—GEOINFO

  • Costa MA, Assunção RM, Kulldorff M (2011) Constrained spanning tree algorithms for irregularly-shaped spatial clustering. Comput Stat Data Anal (accepted)

  • Davis RA, Lee CM, Rodriguez-Yam GA (2006) Structural break estimation for nonstationary time series models. J Am Stat Assoc 101: 223–239

    Article  MathSciNet  MATH  Google Scholar 

  • Duczmal L, Assunção RA (2004) Simulated annealing strategty for the detection of arbitrarily shaped spatial cluster. Comput Stat Data Anal 4: 269–286

    Article  Google Scholar 

  • Duczmal L, Kulldorff M, Huang L (2006) Evaluation of spatial scan statistics for irregularly shaped disease cluster. J Comput Graph Stat 15:2: 428–442

    MathSciNet  Google Scholar 

  • Grünwald DP (2007) The minimum description length principle. The MIT Press, Boston

    Google Scholar 

  • Grünwald P, Myung J, Pitt MA (2007) Advances in minimum description length: theory and applications. The MIT Press, Boston

    Google Scholar 

  • Hansen MH, Yu B (2001) Model selection and the principle of minimum description length. J Am Stat Assoc 96: 746–774

    Article  MathSciNet  MATH  Google Scholar 

  • Jornsten R, Yu B (2003) Simultaneous gene clustering and subset selection for classification via mdl. Bioinformatics 19: 1100–1109

    Article  Google Scholar 

  • Kulldorff M (1997) A spatial scan statistic. Commun Stat Theory Methods 26: 1481–1496

    Article  MathSciNet  MATH  Google Scholar 

  • Kulldorff M, Fang Z, Walsh SJ (2003) A tree-based scan statistic for database disease surveillance. Biometrics 59(2): 323–331

    Article  MathSciNet  MATH  Google Scholar 

  • Lee TCM (2000) A minimum description lengthbased image segmentation procedure, and its comparison with a crossvalidation based segmentation procedure. J Am Stat Assoc 95: 259–270

    Article  MATH  Google Scholar 

  • National Center Health Statistics: (1988) Guidelines for reporting occupation and industry death certificates. Department of Health and Human Services, Hyattsville

    Google Scholar 

  • Patil GP, Taillie C (2003) Geographic and network surveillance via scan statistics for critical area detection. Stat Sci 18(4): 457–465

    Article  MathSciNet  MATH  Google Scholar 

  • Patil GP, Taillie C (2004) Upper level set scan statistic for detecting arbitrarily shaped hotspots. Environ Ecol Stat 11: 183–197

    Article  MathSciNet  Google Scholar 

  • Rissanen J (1989) Stochastic complexity in statistical inquiry. World Scientific, Singapore

    MATH  Google Scholar 

  • Tango T, Takahashi K (2005) A flexible shaped spatial scan statistic for detecting clusters. Int J Health Geogr 4: 11

    Article  Google Scholar 

  • United States Bureauof the Census: (1982) 1980 census of population: alphabetical index of industries and occupations final edition. US Government Printing Office, Washington, D. C.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marcos O. Prates.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Prates, M.O., Assunção, R.M. & Costa, M.A. Flexible scan statistic test to detect disease clusters in hierarchical trees. Comput Stat 27, 715–737 (2012). https://doi.org/10.1007/s00180-011-0286-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-011-0286-9

Keywords

Navigation