Skip to main content
Log in

Evaluation of hotspot cluster detection using spatial scan statistic based on exact counting

  • Original Paper
  • Computational statistics and machine learning
  • Published:
Japanese Journal of Statistics and Data Science Aims and scope Submit manuscript

Abstract

In this paper, we propose a novel approach to the detection of spatial clusters based on linkage information of a map dataset. Spatial scan statistic has been widely used for detecting a hotspot cluster (or a coldspot cluster) in various fields, such as astronomy, biosurveillance, natural disasters, and forestry. This approach is based on the idea of finding a connected regional subset that maximizes likelihood in the whole study area. To detect a hotspot cluster, which aggregates high-risk regions so as to be maximum likelihood, we only just search such a cluster from all patterns of connected regional subsets. However, except when there are extremely few regions of the study area, since the total number of connected regional patterns usually becomes enormous, we cannot investigate all of them. This means that we have not been able to know whether a detected hotspot which is obtained under certain rules, such as using the previous studies, has the truly maximum likelihood within a given study area. A zero-suppressed binary decision diagram (ZDD), one approach to frequent item set mining, enables us to extract all of the potential cluster regions at a realistic computational load. In this study, we propose a hotspot detection method using ZDD-based enumeration, and apply it to sudden infant death syndrome in North Carolina. This completely new method enables us to detect a true hotspot cluster that has the truly maximum likelihood. To evaluate our proposed method, we compare the properties of that with existing methods such as flexible scan and echelon scan, and discuss their suitability for different purposes of detecting hotspot.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. An induced connected component is a subgraph in which every two vertices of the subgraph have an edge if the edge exists on the original graph.

  2. We conducted this experiment on a machine with Intel Xeon E5-2630 (2.30 GHz) CPU and 128 GB memory (Linux Centos 6.6). We implemented the algorithm in C++ and compiled them using gcc with the -O3 optimization option.

References

  • Anselin, L. (1995). Local indicators of spatial association-LISA. Geographic Analysis, 27(2), 93–115.

    Article  Google Scholar 

  • Besag, J. E., & Newell, J. (1991). The detection of clusters in rare diseases. Journal of the Royal Statistical Society, Series A, 154(1), 143–155.

    Article  Google Scholar 

  • Berke, O. (2004). Exploratory disease mapping: Kriging the spatial risk function from regional count data. International Journal of Health Geographics, 3(1), 18.

    Article  Google Scholar 

  • Cressie, N. (1992). Smoothing regional maps using empirical Bayes predictors. Geographical Analysis, 24(1), 75–95.

    Article  Google Scholar 

  • Cressie, N., & Chan, N. H. (1989). Spatial modeling of regional variables. Journal of American Statistical Association, 84, 393–401.

    Article  MathSciNet  Google Scholar 

  • Cuzick, J., & Edwards, R. (1990). Spatial clustering for inhomogeneous populations. Journal of the Royal Statistical Society, Series B, 52(1), 73–104.

    MathSciNet  MATH  Google Scholar 

  • Duczmal, L., & Assunção, R. (2004). A simulated annealing strategy for the detection of arbitrarily shaped spatial clusters. Computational Statistics and Data Analysis, 45(2), 269–286.

    Article  MathSciNet  Google Scholar 

  • Dwass, M. (1957). Modified randomization tests for nonparametric hypotheses. Annals of Mathematical Statistics, 28(1), 181–187.

    Article  MathSciNet  Google Scholar 

  • Huang, L., Kulldorff, M., & Gregorio, D. (2007). A spatial scan statistic for survival data. Biometrics, 63(1), 109–118.

    Article  MathSciNet  Google Scholar 

  • Huang, L., Tiwari, R. C., Zuo, Z., Kulldorff, M., & Feuer, E. J. (2009). Weighted normal spatial scan statistic for heterogeneous population data. Journal of the American Statistical Association, 104, 886–898.

    Article  MathSciNet  Google Scholar 

  • Inoue, T., Takano, K., Watanabe, T., Kawahara, J., Yoshinaka, R., Kishimoto, A., et al. (2014). Distribution loss minimization with guaranteed error bound. IEEE Transactions on Smart Grid, 5(1), 102–111.

    Article  Google Scholar 

  • Ishioka, F., & Kurihara, K. (2012). Detection of spatial clusters using echelon scan. Proceedings of the 20th International Conference on Computational Statistics (COMPSTAT2012), Heidelberg: Physica-Verlag, 341–352.

  • Ishioka, F., Kurihara, K., Suito, H., Horikawa, Y., & Ono, Y. (2007). Detection of hotspots for 3-dimensional spatial data and its application to environmental pollution data. Journal of Environmental Science for Sustainable Society, 1, 15–24.

    Article  Google Scholar 

  • Jung, I., Kulldorff, M., & Klassen, A. C. (2007). A spatial scan statistic for ordinal data. Statistics in Medicine, 26(7), 1594–1607.

    Article  MathSciNet  Google Scholar 

  • Jung, I., Kulldorff, M., & Richard, O. J. (2010). A spatial scan statistic for multinomial data. Statistics in Medicine, 29(18), 1910–1918.

    Article  MathSciNet  Google Scholar 

  • Kawahara, J., Inoue, T., Iwashita, H., & Minato, S. (2017a). Frontier-based search for enumerating all constrained subgraphs with compressed representation. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, E100–A(9), 1773–1784.

    Article  Google Scholar 

  • Kawahara, J., Saitoh, T., Suzuki, H., & Yoshinaka, R. (2017b). Solving the longest oneway-ticket problem and enumerating letter graphs by augmenting the two representative approaches with ZDDs. In: S. Phon-Amnuaisuk, T.-W. Au, & S. Omar (Eds.), Computational intelligence in information systems: Proceedings of the computational intelligence in information systems conference (CIIS 2016), Cham: Springer, 294–305.

  • Kawahara, J., Horiyama, T., Hotta, K., & Minato, S. (2017c). Generating all patterns of graph partitions within a disparity bound. In Proceedings of the 11th International Conference and Workshops on Algorithms and Computation (WALCOM2017), 119–131.

  • Knuth, D.E. (2011). The Art of Computer Programming, Volume 4A, Combinatorial Algorithms, Part 1 (1st ed.). Addison-Wesley Professional.

  • Kulldorff, M. (1997). A spatial scan statistic. Communications in Statistics: Theory and Methods, 26(6), 1481–1496.

    Article  MathSciNet  Google Scholar 

  • Kulldorff, M., & Harvard Medical School, Boston and Information Management Services Inc. (2018). SatScan™ v9.6: Software for the Spatial and Space-Time Scan Statistics. http://www.satscan.org/. Accessed 1 July 2018.

  • Kulldorff, M., Huang, L., & Konty, K. (2009). A scan statistic for continuous data based on the normal probability model. International Journal of Health Geographics, 8, 58.

    Article  Google Scholar 

  • Kulldorff, M., & Nagarwalla, N. (1995). Spatial disease clusters: Detection and inference. Statistics in Medicine, 14(8), 799–810.

    Article  Google Scholar 

  • Kurihara, K. (2004). Classification of geospatial lattice data and their graphical representation. In D. Banks et al. (Eds), Classification, clustering, and data mining applications (pp. 251–258). New York: Springer.

  • Lawson, A. B., & Clark, A. (2002). Spatial mixture relative risk models applied to disease mapping. Statistics in Medicine, 21(3), 359–370.

    Article  Google Scholar 

  • Minato, S. (1993). Zero-suppressed BDDs for set manipulation in combinatorial problems. In Proceedings of the 30th ACM/IEEE Design Automation Conference, 272–277.

  • Moran, P. A. P. (1948). The interpretation of statistical maps. Journal of the Royal Statistical Society, Series B, 10(2), 243–251.

    MathSciNet  MATH  Google Scholar 

  • Myers, W. L., Patil, G. P., & Joly, K. (1997). Echelon approach to areas of concern in synoptic regional monitoring. Environmental and Ecological Statistics, 4(2), 131–152.

    Article  Google Scholar 

  • Openshaw, S., Charlton, M., Wymer, C., & Craft, A. (1987). A mark 1 geographical analysis machine for the automated analysis of point data sets. International Journal of Geographical Information Systems, 1(4), 335–358.

    Article  Google Scholar 

  • Patil, G. P., & Taillie, C. (2004). Upper level set scan statistic for detecting arbitrarily shaped hotspots. Environmental and Ecological Statistics, 11(2), 183–197.

    Article  MathSciNet  Google Scholar 

  • Sekine, K., Imai, H., & Tani, S. (1995). Computing the Tutte polynomial of a graph of moderate size. In Proceedings of the 6th International Symposium on Algorithms and Computation (ISAAC1995), 224–233.

  • Takahashi, K., Yokoyama, T., & Tango, T. (2010). FleXScan v3.1.2: Software for the Flexible Scan Statistic. National Institute of Public Health Japan. https://sites.google.com/site/flexscansoftware/. Accessed 1 July 2018.

  • Takizawa, A., Takechi, Y., Ohta, A., Katoh, N., Inoue, T., Horiyama, T., Kawahara, J., & Minato, S. (2013). Enumeration of region partitioning for evacuation planning based on ZDD. In 11th International Symposium on Operations Research and its Applications in Engineering, Technology and Management 2013 (ISORA 2013), Proceedings of 11th International Symposium, 1–8.

  • Tango, T. (1995). A class of tests for detecting “general” and “focuses” clustering of rate diseases. Statistics in Medicine, 14(21–22), 2323–2334.

    Article  Google Scholar 

  • Tango, T. (2000). A test for spatial disease clustering adjusted for multiple testing. Statistics in Medicine, 19(2), 191–204.

    Article  Google Scholar 

  • Tango, T. (2008). A spatial scan statistic with a restricted likelihood ratio. Japanese Journal of Biometrics, 29(2), 75–95.

    Article  Google Scholar 

  • Tango, T., & Takahashi, K. (2005). A flexible spatial scan statistic for detecting clusters. International Journal of Health Geographics, 4, 11.

    Article  Google Scholar 

  • Tango, T., & Takahashi, K. (2012). A flexible spatial scan statistic with a restricted likelihood ratio for detecting disease clusters. Statistics in Medicine, 31(30), 4207–4218.

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This work was partly supported by JSPS KAKENHI Grant Numbers JP16K16019, JP18K04610, JP18H04091 and JP15H05711.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fumio Ishioka.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ishioka, F., Kawahara, J., Mizuta, M. et al. Evaluation of hotspot cluster detection using spatial scan statistic based on exact counting. Jpn J Stat Data Sci 2, 241–262 (2019). https://doi.org/10.1007/s42081-018-0030-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42081-018-0030-6

Keywords

Navigation