Evaluation of hotspot cluster detection using spatial scan statistic based on exact counting

Abstract

In this paper, we propose a novel approach to the detection of spatial clusters based on linkage information of a map dataset. Spatial scan statistic has been widely used for detecting a hotspot cluster (or a coldspot cluster) in various fields, such as astronomy, biosurveillance, natural disasters, and forestry. This approach is based on the idea of finding a connected regional subset that maximizes likelihood in the whole study area. To detect a hotspot cluster, which aggregates high-risk regions so as to be maximum likelihood, we only just search such a cluster from all patterns of connected regional subsets. However, except when there are extremely few regions of the study area, since the total number of connected regional patterns usually becomes enormous, we cannot investigate all of them. This means that we have not been able to know whether a detected hotspot which is obtained under certain rules, such as using the previous studies, has the truly maximum likelihood within a given study area. A zero-suppressed binary decision diagram (ZDD), one approach to frequent item set mining, enables us to extract all of the potential cluster regions at a realistic computational load. In this study, we propose a hotspot detection method using ZDD-based enumeration, and apply it to sudden infant death syndrome in North Carolina. This completely new method enables us to detect a true hotspot cluster that has the truly maximum likelihood. To evaluate our proposed method, we compare the properties of that with existing methods such as flexible scan and echelon scan, and discuss their suitability for different purposes of detecting hotspot.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Notes

  1. 1.

    An induced connected component is a subgraph in which every two vertices of the subgraph have an edge if the edge exists on the original graph.

  2. 2.

    We conducted this experiment on a machine with Intel Xeon E5-2630 (2.30 GHz) CPU and 128 GB memory (Linux Centos 6.6). We implemented the algorithm in C++ and compiled them using gcc with the -O3 optimization option.

References

  1. Anselin, L. (1995). Local indicators of spatial association-LISA. Geographic Analysis, 27(2), 93–115.

    Article  Google Scholar 

  2. Besag, J. E., & Newell, J. (1991). The detection of clusters in rare diseases. Journal of the Royal Statistical Society, Series A, 154(1), 143–155.

    Article  Google Scholar 

  3. Berke, O. (2004). Exploratory disease mapping: Kriging the spatial risk function from regional count data. International Journal of Health Geographics, 3(1), 18.

    Article  Google Scholar 

  4. Cressie, N. (1992). Smoothing regional maps using empirical Bayes predictors. Geographical Analysis, 24(1), 75–95.

    MathSciNet  Article  Google Scholar 

  5. Cressie, N., & Chan, N. H. (1989). Spatial modeling of regional variables. Journal of American Statistical Association, 84, 393–401.

    MathSciNet  Article  Google Scholar 

  6. Cuzick, J., & Edwards, R. (1990). Spatial clustering for inhomogeneous populations. Journal of the Royal Statistical Society, Series B, 52(1), 73–104.

    MathSciNet  MATH  Google Scholar 

  7. Duczmal, L., & Assunção, R. (2004). A simulated annealing strategy for the detection of arbitrarily shaped spatial clusters. Computational Statistics and Data Analysis, 45(2), 269–286.

    MathSciNet  Article  Google Scholar 

  8. Dwass, M. (1957). Modified randomization tests for nonparametric hypotheses. Annals of Mathematical Statistics, 28(1), 181–187.

    MathSciNet  Article  Google Scholar 

  9. Huang, L., Kulldorff, M., & Gregorio, D. (2007). A spatial scan statistic for survival data. Biometrics, 63(1), 109–118.

    MathSciNet  Article  Google Scholar 

  10. Huang, L., Tiwari, R. C., Zuo, Z., Kulldorff, M., & Feuer, E. J. (2009). Weighted normal spatial scan statistic for heterogeneous population data. Journal of the American Statistical Association, 104, 886–898.

    MathSciNet  Article  Google Scholar 

  11. Inoue, T., Takano, K., Watanabe, T., Kawahara, J., Yoshinaka, R., Kishimoto, A., et al. (2014). Distribution loss minimization with guaranteed error bound. IEEE Transactions on Smart Grid, 5(1), 102–111.

    Article  Google Scholar 

  12. Ishioka, F., & Kurihara, K. (2012). Detection of spatial clusters using echelon scan. Proceedings of the 20th International Conference on Computational Statistics (COMPSTAT2012), Heidelberg: Physica-Verlag, 341–352.

  13. Ishioka, F., Kurihara, K., Suito, H., Horikawa, Y., & Ono, Y. (2007). Detection of hotspots for 3-dimensional spatial data and its application to environmental pollution data. Journal of Environmental Science for Sustainable Society, 1, 15–24.

    Article  Google Scholar 

  14. Jung, I., Kulldorff, M., & Klassen, A. C. (2007). A spatial scan statistic for ordinal data. Statistics in Medicine, 26(7), 1594–1607.

    MathSciNet  Article  Google Scholar 

  15. Jung, I., Kulldorff, M., & Richard, O. J. (2010). A spatial scan statistic for multinomial data. Statistics in Medicine, 29(18), 1910–1918.

    MathSciNet  Article  Google Scholar 

  16. Kawahara, J., Inoue, T., Iwashita, H., & Minato, S. (2017a). Frontier-based search for enumerating all constrained subgraphs with compressed representation. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, E100–A(9), 1773–1784.

    Article  Google Scholar 

  17. Kawahara, J., Saitoh, T., Suzuki, H., & Yoshinaka, R. (2017b). Solving the longest oneway-ticket problem and enumerating letter graphs by augmenting the two representative approaches with ZDDs. In: S. Phon-Amnuaisuk, T.-W. Au, & S. Omar (Eds.), Computational intelligence in information systems: Proceedings of the computational intelligence in information systems conference (CIIS 2016), Cham: Springer, 294–305.

  18. Kawahara, J., Horiyama, T., Hotta, K., & Minato, S. (2017c). Generating all patterns of graph partitions within a disparity bound. In Proceedings of the 11th International Conference and Workshops on Algorithms and Computation (WALCOM2017), 119–131.

  19. Knuth, D.E. (2011). The Art of Computer Programming, Volume 4A, Combinatorial Algorithms, Part 1 (1st ed.). Addison-Wesley Professional.

  20. Kulldorff, M. (1997). A spatial scan statistic. Communications in Statistics: Theory and Methods, 26(6), 1481–1496.

    MathSciNet  Article  Google Scholar 

  21. Kulldorff, M., & Harvard Medical School, Boston and Information Management Services Inc. (2018). SatScan™ v9.6: Software for the Spatial and Space-Time Scan Statistics. http://www.satscan.org/. Accessed 1 July 2018.

  22. Kulldorff, M., Huang, L., & Konty, K. (2009). A scan statistic for continuous data based on the normal probability model. International Journal of Health Geographics, 8, 58.

    Article  Google Scholar 

  23. Kulldorff, M., & Nagarwalla, N. (1995). Spatial disease clusters: Detection and inference. Statistics in Medicine, 14(8), 799–810.

    Article  Google Scholar 

  24. Kurihara, K. (2004). Classification of geospatial lattice data and their graphical representation. In D. Banks et al. (Eds), Classification, clustering, and data mining applications (pp. 251–258). New York: Springer.

  25. Lawson, A. B., & Clark, A. (2002). Spatial mixture relative risk models applied to disease mapping. Statistics in Medicine, 21(3), 359–370.

    Article  Google Scholar 

  26. Minato, S. (1993). Zero-suppressed BDDs for set manipulation in combinatorial problems. In Proceedings of the 30th ACM/IEEE Design Automation Conference, 272–277.

  27. Moran, P. A. P. (1948). The interpretation of statistical maps. Journal of the Royal Statistical Society, Series B, 10(2), 243–251.

    MathSciNet  MATH  Google Scholar 

  28. Myers, W. L., Patil, G. P., & Joly, K. (1997). Echelon approach to areas of concern in synoptic regional monitoring. Environmental and Ecological Statistics, 4(2), 131–152.

    Article  Google Scholar 

  29. Openshaw, S., Charlton, M., Wymer, C., & Craft, A. (1987). A mark 1 geographical analysis machine for the automated analysis of point data sets. International Journal of Geographical Information Systems, 1(4), 335–358.

    Article  Google Scholar 

  30. Patil, G. P., & Taillie, C. (2004). Upper level set scan statistic for detecting arbitrarily shaped hotspots. Environmental and Ecological Statistics, 11(2), 183–197.

    MathSciNet  Article  Google Scholar 

  31. Sekine, K., Imai, H., & Tani, S. (1995). Computing the Tutte polynomial of a graph of moderate size. In Proceedings of the 6th International Symposium on Algorithms and Computation (ISAAC1995), 224–233.

  32. Takahashi, K., Yokoyama, T., & Tango, T. (2010). FleXScan v3.1.2: Software for the Flexible Scan Statistic. National Institute of Public Health Japan. https://sites.google.com/site/flexscansoftware/. Accessed 1 July 2018.

  33. Takizawa, A., Takechi, Y., Ohta, A., Katoh, N., Inoue, T., Horiyama, T., Kawahara, J., & Minato, S. (2013). Enumeration of region partitioning for evacuation planning based on ZDD. In 11th International Symposium on Operations Research and its Applications in Engineering, Technology and Management 2013 (ISORA 2013), Proceedings of 11th International Symposium, 1–8.

  34. Tango, T. (1995). A class of tests for detecting “general” and “focuses” clustering of rate diseases. Statistics in Medicine, 14(21–22), 2323–2334.

    Article  Google Scholar 

  35. Tango, T. (2000). A test for spatial disease clustering adjusted for multiple testing. Statistics in Medicine, 19(2), 191–204.

    Article  Google Scholar 

  36. Tango, T. (2008). A spatial scan statistic with a restricted likelihood ratio. Japanese Journal of Biometrics, 29(2), 75–95.

    Article  Google Scholar 

  37. Tango, T., & Takahashi, K. (2005). A flexible spatial scan statistic for detecting clusters. International Journal of Health Geographics, 4, 11.

    Article  Google Scholar 

  38. Tango, T., & Takahashi, K. (2012). A flexible spatial scan statistic with a restricted likelihood ratio for detecting disease clusters. Statistics in Medicine, 31(30), 4207–4218.

    MathSciNet  Article  Google Scholar 

Download references

Acknowledgements

This work was partly supported by JSPS KAKENHI Grant Numbers JP16K16019, JP18K04610, JP18H04091 and JP15H05711.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Fumio Ishioka.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Ishioka, F., Kawahara, J., Mizuta, M. et al. Evaluation of hotspot cluster detection using spatial scan statistic based on exact counting. Jpn J Stat Data Sci 2, 241–262 (2019). https://doi.org/10.1007/s42081-018-0030-6

Download citation

Keywords

  • Spatial cluster detection
  • Spatial scan statistic
  • Echelon analysis
  • Zero-suppressed binary decision diagram