In this paper, we propose a novel approach to the detection of spatial clusters based on linkage information of a map dataset. Spatial scan statistic has been widely used for detecting a hotspot cluster (or a coldspot cluster) in various fields, such as astronomy, biosurveillance, natural disasters, and forestry. This approach is based on the idea of finding a connected regional subset that maximizes likelihood in the whole study area. To detect a hotspot cluster, which aggregates high-risk regions so as to be maximum likelihood, we only just search such a cluster from all patterns of connected regional subsets. However, except when there are extremely few regions of the study area, since the total number of connected regional patterns usually becomes enormous, we cannot investigate all of them. This means that we have not been able to know whether a detected hotspot which is obtained under certain rules, such as using the previous studies, has the truly maximum likelihood within a given study area. A zero-suppressed binary decision diagram (ZDD), one approach to frequent item set mining, enables us to extract all of the potential cluster regions at a realistic computational load. In this study, we propose a hotspot detection method using ZDD-based enumeration, and apply it to sudden infant death syndrome in North Carolina. This completely new method enables us to detect a true hotspot cluster that has the truly maximum likelihood. To evaluate our proposed method, we compare the properties of that with existing methods such as flexible scan and echelon scan, and discuss their suitability for different purposes of detecting hotspot.
This is a preview of subscription content, log in to check access.
Buy single article
Instant access to the full article PDF.
Price includes VAT for USA
An induced connected component is a subgraph in which every two vertices of the subgraph have an edge if the edge exists on the original graph.
We conducted this experiment on a machine with Intel Xeon E5-2630 (2.30 GHz) CPU and 128 GB memory (Linux Centos 6.6). We implemented the algorithm in C++ and compiled them using gcc with the -O3 optimization option.
Anselin, L. (1995). Local indicators of spatial association-LISA. Geographic Analysis, 27(2), 93–115.
Besag, J. E., & Newell, J. (1991). The detection of clusters in rare diseases. Journal of the Royal Statistical Society, Series A, 154(1), 143–155.
Berke, O. (2004). Exploratory disease mapping: Kriging the spatial risk function from regional count data. International Journal of Health Geographics, 3(1), 18.
Cressie, N. (1992). Smoothing regional maps using empirical Bayes predictors. Geographical Analysis, 24(1), 75–95.
Cressie, N., & Chan, N. H. (1989). Spatial modeling of regional variables. Journal of American Statistical Association, 84, 393–401.
Cuzick, J., & Edwards, R. (1990). Spatial clustering for inhomogeneous populations. Journal of the Royal Statistical Society, Series B, 52(1), 73–104.
Duczmal, L., & Assunção, R. (2004). A simulated annealing strategy for the detection of arbitrarily shaped spatial clusters. Computational Statistics and Data Analysis, 45(2), 269–286.
Dwass, M. (1957). Modified randomization tests for nonparametric hypotheses. Annals of Mathematical Statistics, 28(1), 181–187.
Huang, L., Kulldorff, M., & Gregorio, D. (2007). A spatial scan statistic for survival data. Biometrics, 63(1), 109–118.
Huang, L., Tiwari, R. C., Zuo, Z., Kulldorff, M., & Feuer, E. J. (2009). Weighted normal spatial scan statistic for heterogeneous population data. Journal of the American Statistical Association, 104, 886–898.
Inoue, T., Takano, K., Watanabe, T., Kawahara, J., Yoshinaka, R., Kishimoto, A., et al. (2014). Distribution loss minimization with guaranteed error bound. IEEE Transactions on Smart Grid, 5(1), 102–111.
Ishioka, F., & Kurihara, K. (2012). Detection of spatial clusters using echelon scan. Proceedings of the 20th International Conference on Computational Statistics (COMPSTAT2012), Heidelberg: Physica-Verlag, 341–352.
Ishioka, F., Kurihara, K., Suito, H., Horikawa, Y., & Ono, Y. (2007). Detection of hotspots for 3-dimensional spatial data and its application to environmental pollution data. Journal of Environmental Science for Sustainable Society, 1, 15–24.
Jung, I., Kulldorff, M., & Klassen, A. C. (2007). A spatial scan statistic for ordinal data. Statistics in Medicine, 26(7), 1594–1607.
Jung, I., Kulldorff, M., & Richard, O. J. (2010). A spatial scan statistic for multinomial data. Statistics in Medicine, 29(18), 1910–1918.
Kawahara, J., Inoue, T., Iwashita, H., & Minato, S. (2017a). Frontier-based search for enumerating all constrained subgraphs with compressed representation. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, E100–A(9), 1773–1784.
Kawahara, J., Saitoh, T., Suzuki, H., & Yoshinaka, R. (2017b). Solving the longest oneway-ticket problem and enumerating letter graphs by augmenting the two representative approaches with ZDDs. In: S. Phon-Amnuaisuk, T.-W. Au, & S. Omar (Eds.), Computational intelligence in information systems: Proceedings of the computational intelligence in information systems conference (CIIS 2016), Cham: Springer, 294–305.
Kawahara, J., Horiyama, T., Hotta, K., & Minato, S. (2017c). Generating all patterns of graph partitions within a disparity bound. In Proceedings of the 11th International Conference and Workshops on Algorithms and Computation (WALCOM2017), 119–131.
Knuth, D.E. (2011). The Art of Computer Programming, Volume 4A, Combinatorial Algorithms, Part 1 (1st ed.). Addison-Wesley Professional.
Kulldorff, M. (1997). A spatial scan statistic. Communications in Statistics: Theory and Methods, 26(6), 1481–1496.
Kulldorff, M., & Harvard Medical School, Boston and Information Management Services Inc. (2018). SatScan™ v9.6: Software for the Spatial and Space-Time Scan Statistics. http://www.satscan.org/. Accessed 1 July 2018.
Kulldorff, M., Huang, L., & Konty, K. (2009). A scan statistic for continuous data based on the normal probability model. International Journal of Health Geographics, 8, 58.
Kulldorff, M., & Nagarwalla, N. (1995). Spatial disease clusters: Detection and inference. Statistics in Medicine, 14(8), 799–810.
Kurihara, K. (2004). Classification of geospatial lattice data and their graphical representation. In D. Banks et al. (Eds), Classification, clustering, and data mining applications (pp. 251–258). New York: Springer.
Lawson, A. B., & Clark, A. (2002). Spatial mixture relative risk models applied to disease mapping. Statistics in Medicine, 21(3), 359–370.
Minato, S. (1993). Zero-suppressed BDDs for set manipulation in combinatorial problems. In Proceedings of the 30th ACM/IEEE Design Automation Conference, 272–277.
Moran, P. A. P. (1948). The interpretation of statistical maps. Journal of the Royal Statistical Society, Series B, 10(2), 243–251.
Myers, W. L., Patil, G. P., & Joly, K. (1997). Echelon approach to areas of concern in synoptic regional monitoring. Environmental and Ecological Statistics, 4(2), 131–152.
Openshaw, S., Charlton, M., Wymer, C., & Craft, A. (1987). A mark 1 geographical analysis machine for the automated analysis of point data sets. International Journal of Geographical Information Systems, 1(4), 335–358.
Patil, G. P., & Taillie, C. (2004). Upper level set scan statistic for detecting arbitrarily shaped hotspots. Environmental and Ecological Statistics, 11(2), 183–197.
Sekine, K., Imai, H., & Tani, S. (1995). Computing the Tutte polynomial of a graph of moderate size. In Proceedings of the 6th International Symposium on Algorithms and Computation (ISAAC1995), 224–233.
Takahashi, K., Yokoyama, T., & Tango, T. (2010). FleXScan v3.1.2: Software for the Flexible Scan Statistic. National Institute of Public Health Japan. https://sites.google.com/site/flexscansoftware/. Accessed 1 July 2018.
Takizawa, A., Takechi, Y., Ohta, A., Katoh, N., Inoue, T., Horiyama, T., Kawahara, J., & Minato, S. (2013). Enumeration of region partitioning for evacuation planning based on ZDD. In 11th International Symposium on Operations Research and its Applications in Engineering, Technology and Management 2013 (ISORA 2013), Proceedings of 11th International Symposium, 1–8.
Tango, T. (1995). A class of tests for detecting “general” and “focuses” clustering of rate diseases. Statistics in Medicine, 14(21–22), 2323–2334.
Tango, T. (2000). A test for spatial disease clustering adjusted for multiple testing. Statistics in Medicine, 19(2), 191–204.
Tango, T. (2008). A spatial scan statistic with a restricted likelihood ratio. Japanese Journal of Biometrics, 29(2), 75–95.
Tango, T., & Takahashi, K. (2005). A flexible spatial scan statistic for detecting clusters. International Journal of Health Geographics, 4, 11.
Tango, T., & Takahashi, K. (2012). A flexible spatial scan statistic with a restricted likelihood ratio for detecting disease clusters. Statistics in Medicine, 31(30), 4207–4218.
This work was partly supported by JSPS KAKENHI Grant Numbers JP16K16019, JP18K04610, JP18H04091 and JP15H05711.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Ishioka, F., Kawahara, J., Mizuta, M. et al. Evaluation of hotspot cluster detection using spatial scan statistic based on exact counting. Jpn J Stat Data Sci 2, 241–262 (2019). https://doi.org/10.1007/s42081-018-0030-6
- Spatial cluster detection
- Spatial scan statistic
- Echelon analysis
- Zero-suppressed binary decision diagram