Abstract
This paper presents a flexible scan test statistic to detect disease clusters in data sets represented as a hierarchical tree. The algorithm searches through the branches of the tree and it is able to aggregate leaves located in different branches. The test statistic combines two terms, the log-likelihood of the data and the amount of information necessary to computationally code each potential cluster. This second term penalizes the search algorithm avoiding the detection of oddly shaped clusters and it is based on the Minimum Description Length (MDL) principle. Our MDL method reaches an automatic compromise between bias and variance. We present simulated results showing that its power performance as compared to the usual scan statistic and the high accuracy of the MDL to identify clusters that are scattered on the tree. The MDL method is illustrated with a large database looking at the relationship between occupation and death from silicosis.
Similar content being viewed by others
References
Abrams A, Kulldorff M, Kleinman K (2006) Empirical/asymptotic p-values for monte carlo-based hypothesis testing: an application to cluster detection using the scan statistic. Adv Dis Surveill 1: 1–2
Almenoff J, Tonning JM, Gould AL, Szarfman A, Hauben M, Ouellet-Hellstrom R, Ball R, Hornbuckle K, Walsh L, Yee C, Sacks ST, Yuen N, Patadia V, Blum M, Johnston M, Gerrits C, Seifert H, LaCroix K (2005) Perspectives on the use of data mining in pharmacovigilances. Drug Saf 28: 981–1007
Assunção R, Costa M, Tavares A, Ferreira S (2006) Fast detection of arbitrarily shaped disease clusters. Stat Med 25(5): 723–742
Costa MA, Schrerrer LR, Assunção RM (2005) Detecção de conglomerados espaciais com geometria arbitrária. Simpósio Brasileiro de Geoinformática—GEOINFO
Costa MA, Assunção RM, Kulldorff M (2011) Constrained spanning tree algorithms for irregularly-shaped spatial clustering. Comput Stat Data Anal (accepted)
Davis RA, Lee CM, Rodriguez-Yam GA (2006) Structural break estimation for nonstationary time series models. J Am Stat Assoc 101: 223–239
Duczmal L, Assunção RA (2004) Simulated annealing strategty for the detection of arbitrarily shaped spatial cluster. Comput Stat Data Anal 4: 269–286
Duczmal L, Kulldorff M, Huang L (2006) Evaluation of spatial scan statistics for irregularly shaped disease cluster. J Comput Graph Stat 15:2: 428–442
Grünwald DP (2007) The minimum description length principle. The MIT Press, Boston
Grünwald P, Myung J, Pitt MA (2007) Advances in minimum description length: theory and applications. The MIT Press, Boston
Hansen MH, Yu B (2001) Model selection and the principle of minimum description length. J Am Stat Assoc 96: 746–774
Jornsten R, Yu B (2003) Simultaneous gene clustering and subset selection for classification via mdl. Bioinformatics 19: 1100–1109
Kulldorff M (1997) A spatial scan statistic. Commun Stat Theory Methods 26: 1481–1496
Kulldorff M, Fang Z, Walsh SJ (2003) A tree-based scan statistic for database disease surveillance. Biometrics 59(2): 323–331
Lee TCM (2000) A minimum description lengthbased image segmentation procedure, and its comparison with a crossvalidation based segmentation procedure. J Am Stat Assoc 95: 259–270
National Center Health Statistics: (1988) Guidelines for reporting occupation and industry death certificates. Department of Health and Human Services, Hyattsville
Patil GP, Taillie C (2003) Geographic and network surveillance via scan statistics for critical area detection. Stat Sci 18(4): 457–465
Patil GP, Taillie C (2004) Upper level set scan statistic for detecting arbitrarily shaped hotspots. Environ Ecol Stat 11: 183–197
Rissanen J (1989) Stochastic complexity in statistical inquiry. World Scientific, Singapore
Tango T, Takahashi K (2005) A flexible shaped spatial scan statistic for detecting clusters. Int J Health Geogr 4: 11
United States Bureauof the Census: (1982) 1980 census of population: alphabetical index of industries and occupations final edition. US Government Printing Office, Washington, D. C.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Prates, M.O., Assunção, R.M. & Costa, M.A. Flexible scan statistic test to detect disease clusters in hierarchical trees. Comput Stat 27, 715–737 (2012). https://doi.org/10.1007/s00180-011-0286-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-011-0286-9