Abstract
This paper develops and applies new techniques for the simultaneous detection of boundaries and clusters within a probabilistic framework. The new statistic “little b” (written b ij) evaluates boundaries between adjacent areas with different values, as well as links between adjacent areas with similar values. Clusters of high values (hotspots) and low values (coldspots) are then constructed by joining areas abutting locations that are significantly high (e.g., an unusually high disease rate) and that are connected through a “link” such that the values in the adjoining areas are not significantly different. Two techniques are proposed and evaluated for accomplishing cluster construction: “big B” and the “ladder” approach. We compare the statistical power and empirical Type I and Type II error of these approaches to those of wombling and the local Moran test. Significance may be evaluated using distribution theory based on the product of two continuous (e.g., non-discrete) variables. We also provide a “distribution free” algorithm based on resampling of the observed values. The methods are applied to simulated data for which the locations of boundaries and clusters is known, and compared and contrasted with clusters found using the local Moran statistic and with polygon Womble boundaries. The little b approach to boundary detection is comparable to polygon wombling in terms of Type I error, Type II error and empirical statistical power. For cluster detection, both the big B and ladder approaches have lower Type I and Type II error and are more powerful than the local Moran statistic. The new methods are not constrained to find clusters of a pre-specified shape, such as circles, ellipses and donuts, and yield a more accurate description of geographic variation than alternative cluster tests that presuppose a specific cluster shape. We recommend these techniques over existing cluster and boundary detection methods that do not provide such a comprehensive description of spatial pattern.
Similar content being viewed by others
References
Aroian LA (1947). The probability function of a product of two normally distributed variables. Ann Math Stat 18: 265–271
Besag J and Newell J (1991). The detection of clusters in rare diseases. J Roy Stat Soc Ser A 154: 143–155
Craig CC (1936). On the frequency function of xy. Ann Math Stat 7: 1–15
Csillag C, Boots B, Fortin M-J, Lowell K and Potvin F (2001). Multiscale charaterization of boundaries and landscape ecological patterns. Geomatica 55: 291–307
Glen AG, Leemis LM and Drew JH (2004). Computing the distribution of the product of two continuous random variables. Comput Stat Data Anal 44: 451–464
Goovaerts P and Jacquez GM (2004). Accounting for regional background and population size in the detection of spatial clusters and outliers using geostatistical filtering and spatial neutral models: the case of lung cancer in Long Island, New York. Int J Health Geogr 3: 14
Greiling DA, Jacquez GM, Kaufmann AM and Rommel RG (2005). Space time visualization and analysis in the Cancer Atlas Viewer. J Geogr Syst 7: 67–84
Jacquez GM (2004). Current practices in the spatial analysis of cancer: flies in the ointment. Int J Health Geogr 3: 22
Jacquez GM and Greiling DA (2003a). Geographic boundaries in breast, lung and colorectal cancers in relation to exposure to air toxics in Long Island, New York. Int J Health Geogr 2: 4
Jacquez GM and Greiling DA (2003b). Local clustering in breast, lung and colorectal cancer in Long Island, New York. Int J Health Geogr 2: 3
Jacquez GM, Waller LA, Grimson R and Wartenberg D (1996). The analysis of disease clusters, Part I: state of the art. Infect Control Hosp Epidemiol 17: 319–327
Jacquez GM, Maruca SL and Fortin MJ (2000). From fields to objects: a review of geographic boundary analysis. J Geogr Syst 2: 221–241
Kulldorff M, Heffernan R, Hartman J, Assuncao R and Mostashari F (2005). A space-time permutation scan statistic for disease outbreak detection. PLoS Med 2: e59
Kulldorff M, Huang L, Pickle L and Duczmal L (2006). An elliptic spatial scan statistic. Stat Med 25(22): 3929–3943
Lu H and Carlin BP (2005). Bayesian areal wombling for geographical boundary analysis. Geogr Anal 37(3): 265–285
Maruca SL and Jacquez GM (2002). Area-based tests for association between spatial patterns. J Geogr Syst 4: 69–84
Ord J and Getis A (1995). Local spatial autocorrelation statistics: distributional issues and an application. Geogr Anal 27: 286–306
Patil GP, Modarres R, Myers WL and Patankar AP (2006). Spatially constrained clustering and upper level set scan hotspot detection in surveillance geoinformatics. Environ Ecol Stat 13(4): 365–377
Rohatgi VK (1976). An introduction to probability theory and mathematical statistics. John Wiley & Sons, New York
Tango T. (2007). A class of multiplicity-adjusted tests for spatial clustering based on case-control point data. Biometrics. 63: 119–127
Tango T and Takahashi K (2005). A flexibly shaped spatial scan statistic for detecting clusters. Int J Health Geogr 4: 11
Ware B, Ladd F (2003) Approximating the distribution for sums of products of normal variables. University of Canterbury Mathematics and Statistics Department
Womble WH (1951). Differential systematics. Science 114: 315–322
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jacquez, G.M., Kaufmann, A. & Goovaerts, P. Boundaries, links and clusters: a new paradigm in spatial analysis?. Environ Ecol Stat 15, 403–419 (2008). https://doi.org/10.1007/s10651-007-0066-4
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10651-007-0066-4