Evaluation of hotspot cluster detection using spatial scan statistic based on exact counting

Ishioka, Fumio; Kawahara, Jun; Mizuta, Masahiro; Minato, Shin-ichi; Kurihara, Koji

doi:10.1007/s42081-018-0030-6

Evaluation of hotspot cluster detection using spatial scan statistic based on exact counting

Original Paper
Computational statistics and machine learning
Published: 23 January 2019

Volume 2, pages 241–262, (2019)
Cite this article

Japanese Journal of Statistics and Data Science Aims and scope Submit manuscript

1085 Accesses
8 Citations
2 Altmetric
Explore all metrics

Abstract

In this paper, we propose a novel approach to the detection of spatial clusters based on linkage information of a map dataset. Spatial scan statistic has been widely used for detecting a hotspot cluster (or a coldspot cluster) in various fields, such as astronomy, biosurveillance, natural disasters, and forestry. This approach is based on the idea of finding a connected regional subset that maximizes likelihood in the whole study area. To detect a hotspot cluster, which aggregates high-risk regions so as to be maximum likelihood, we only just search such a cluster from all patterns of connected regional subsets. However, except when there are extremely few regions of the study area, since the total number of connected regional patterns usually becomes enormous, we cannot investigate all of them. This means that we have not been able to know whether a detected hotspot which is obtained under certain rules, such as using the previous studies, has the truly maximum likelihood within a given study area. A zero-suppressed binary decision diagram (ZDD), one approach to frequent item set mining, enables us to extract all of the potential cluster regions at a realistic computational load. In this study, we propose a hotspot detection method using ZDD-based enumeration, and apply it to sudden infant death syndrome in North Carolina. This completely new method enables us to detect a true hotspot cluster that has the truly maximum likelihood. To evaluate our proposed method, we compare the properties of that with existing methods such as flexible scan and echelon scan, and discuss their suitability for different purposes of detecting hotspot.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

K-means DTW Barycenter Averaging: a clustering analysis of COVID-19 cases and deaths on the Brazilian federal units

Article 14 April 2024

Jonatas Silva do Espirito Santo, Jackson Santos da Conceição, … Anderson Ara

Comparing implementations of global and local indicators of spatial association

Article 27 July 2018

Roger S. Bivand & David W. S. Wong

An ecological study of socioeconomic predictors in detection of COVID-19 cases across neighborhoods in New York City

Article Open access 04 September 2020

Richard S. Whittle & Ana Diaz-Artiles

Notes

An induced connected component is a subgraph in which every two vertices of the subgraph have an edge if the edge exists on the original graph.
We conducted this experiment on a machine with Intel Xeon E5-2630 (2.30 GHz) CPU and 128 GB memory (Linux Centos 6.6). We implemented the algorithm in C++ and compiled them using gcc with the -O3 optimization option.

References

Anselin, L. (1995). Local indicators of spatial association-LISA. Geographic Analysis, 27(2), 93–115.
Article Google Scholar
Besag, J. E., & Newell, J. (1991). The detection of clusters in rare diseases. Journal of the Royal Statistical Society, Series A, 154(1), 143–155.
Article Google Scholar
Berke, O. (2004). Exploratory disease mapping: Kriging the spatial risk function from regional count data. International Journal of Health Geographics, 3(1), 18.
Article Google Scholar
Cressie, N. (1992). Smoothing regional maps using empirical Bayes predictors. Geographical Analysis, 24(1), 75–95.
Article Google Scholar
Cressie, N., & Chan, N. H. (1989). Spatial modeling of regional variables. Journal of American Statistical Association, 84, 393–401.
Article MathSciNet Google Scholar
Cuzick, J., & Edwards, R. (1990). Spatial clustering for inhomogeneous populations. Journal of the Royal Statistical Society, Series B, 52(1), 73–104.
MathSciNet MATH Google Scholar
Duczmal, L., & Assunção, R. (2004). A simulated annealing strategy for the detection of arbitrarily shaped spatial clusters. Computational Statistics and Data Analysis, 45(2), 269–286.
Article MathSciNet Google Scholar
Dwass, M. (1957). Modified randomization tests for nonparametric hypotheses. Annals of Mathematical Statistics, 28(1), 181–187.
Article MathSciNet Google Scholar
Huang, L., Kulldorff, M., & Gregorio, D. (2007). A spatial scan statistic for survival data. Biometrics, 63(1), 109–118.
Article MathSciNet Google Scholar
Huang, L., Tiwari, R. C., Zuo, Z., Kulldorff, M., & Feuer, E. J. (2009). Weighted normal spatial scan statistic for heterogeneous population data. Journal of the American Statistical Association, 104, 886–898.
Article MathSciNet Google Scholar
Inoue, T., Takano, K., Watanabe, T., Kawahara, J., Yoshinaka, R., Kishimoto, A., et al. (2014). Distribution loss minimization with guaranteed error bound. IEEE Transactions on Smart Grid, 5(1), 102–111.
Article Google Scholar
Ishioka, F., & Kurihara, K. (2012). Detection of spatial clusters using echelon scan. Proceedings of the 20th International Conference on Computational Statistics (COMPSTAT2012), Heidelberg: Physica-Verlag, 341–352.
Ishioka, F., Kurihara, K., Suito, H., Horikawa, Y., & Ono, Y. (2007). Detection of hotspots for 3-dimensional spatial data and its application to environmental pollution data. Journal of Environmental Science for Sustainable Society, 1, 15–24.
Article Google Scholar
Jung, I., Kulldorff, M., & Klassen, A. C. (2007). A spatial scan statistic for ordinal data. Statistics in Medicine, 26(7), 1594–1607.
Article MathSciNet Google Scholar
Jung, I., Kulldorff, M., & Richard, O. J. (2010). A spatial scan statistic for multinomial data. Statistics in Medicine, 29(18), 1910–1918.
Article MathSciNet Google Scholar
Kawahara, J., Inoue, T., Iwashita, H., & Minato, S. (2017a). Frontier-based search for enumerating all constrained subgraphs with compressed representation. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, E100–A(9), 1773–1784.
Article Google Scholar
Kawahara, J., Saitoh, T., Suzuki, H., & Yoshinaka, R. (2017b). Solving the longest oneway-ticket problem and enumerating letter graphs by augmenting the two representative approaches with ZDDs. In: S. Phon-Amnuaisuk, T.-W. Au, & S. Omar (Eds.), Computational intelligence in information systems: Proceedings of the computational intelligence in information systems conference (CIIS 2016), Cham: Springer, 294–305.
Kawahara, J., Horiyama, T., Hotta, K., & Minato, S. (2017c). Generating all patterns of graph partitions within a disparity bound. In Proceedings of the 11th International Conference and Workshops on Algorithms and Computation (WALCOM2017), 119–131.
Knuth, D.E. (2011). The Art of Computer Programming, Volume 4A, Combinatorial Algorithms, Part 1 (1st ed.). Addison-Wesley Professional.
Kulldorff, M. (1997). A spatial scan statistic. Communications in Statistics: Theory and Methods, 26(6), 1481–1496.
Article MathSciNet Google Scholar
Kulldorff, M., & Harvard Medical School, Boston and Information Management Services Inc. (2018). SatScan™ v9.6: Software for the Spatial and Space-Time Scan Statistics. http://www.satscan.org/. Accessed 1 July 2018.
Kulldorff, M., Huang, L., & Konty, K. (2009). A scan statistic for continuous data based on the normal probability model. International Journal of Health Geographics, 8, 58.
Article Google Scholar
Kulldorff, M., & Nagarwalla, N. (1995). Spatial disease clusters: Detection and inference. Statistics in Medicine, 14(8), 799–810.
Article Google Scholar
Kurihara, K. (2004). Classification of geospatial lattice data and their graphical representation. In D. Banks et al. (Eds), Classification, clustering, and data mining applications (pp. 251–258). New York: Springer.
Lawson, A. B., & Clark, A. (2002). Spatial mixture relative risk models applied to disease mapping. Statistics in Medicine, 21(3), 359–370.
Article Google Scholar
Minato, S. (1993). Zero-suppressed BDDs for set manipulation in combinatorial problems. In Proceedings of the 30th ACM/IEEE Design Automation Conference, 272–277.
Moran, P. A. P. (1948). The interpretation of statistical maps. Journal of the Royal Statistical Society, Series B, 10(2), 243–251.
MathSciNet MATH Google Scholar
Myers, W. L., Patil, G. P., & Joly, K. (1997). Echelon approach to areas of concern in synoptic regional monitoring. Environmental and Ecological Statistics, 4(2), 131–152.
Article Google Scholar
Openshaw, S., Charlton, M., Wymer, C., & Craft, A. (1987). A mark 1 geographical analysis machine for the automated analysis of point data sets. International Journal of Geographical Information Systems, 1(4), 335–358.
Article Google Scholar
Patil, G. P., & Taillie, C. (2004). Upper level set scan statistic for detecting arbitrarily shaped hotspots. Environmental and Ecological Statistics, 11(2), 183–197.
Article MathSciNet Google Scholar
Sekine, K., Imai, H., & Tani, S. (1995). Computing the Tutte polynomial of a graph of moderate size. In Proceedings of the 6th International Symposium on Algorithms and Computation (ISAAC1995), 224–233.
Takahashi, K., Yokoyama, T., & Tango, T. (2010). FleXScan v3.1.2: Software for the Flexible Scan Statistic. National Institute of Public Health Japan. https://sites.google.com/site/flexscansoftware/. Accessed 1 July 2018.
Takizawa, A., Takechi, Y., Ohta, A., Katoh, N., Inoue, T., Horiyama, T., Kawahara, J., & Minato, S. (2013). Enumeration of region partitioning for evacuation planning based on ZDD. In 11th International Symposium on Operations Research and its Applications in Engineering, Technology and Management 2013 (ISORA 2013), Proceedings of 11th International Symposium, 1–8.
Tango, T. (1995). A class of tests for detecting “general” and “focuses” clustering of rate diseases. Statistics in Medicine, 14(21–22), 2323–2334.
Article Google Scholar
Tango, T. (2000). A test for spatial disease clustering adjusted for multiple testing. Statistics in Medicine, 19(2), 191–204.
Article Google Scholar
Tango, T. (2008). A spatial scan statistic with a restricted likelihood ratio. Japanese Journal of Biometrics, 29(2), 75–95.
Article Google Scholar
Tango, T., & Takahashi, K. (2005). A flexible spatial scan statistic for detecting clusters. International Journal of Health Geographics, 4, 11.
Article Google Scholar
Tango, T., & Takahashi, K. (2012). A flexible spatial scan statistic with a restricted likelihood ratio for detecting disease clusters. Statistics in Medicine, 31(30), 4207–4218.
Article MathSciNet Google Scholar

Download references

Acknowledgements

This work was partly supported by JSPS KAKENHI Grant Numbers JP16K16019, JP18K04610, JP18H04091 and JP15H05711.

Author information

Authors and Affiliations

Graduate School of Environmental and Life Science, Okayama University, Okayama, Japan
Fumio Ishioka & Koji Kurihara
Graduate School of Science and Technology, Nara Institute of Science and Technology, Nara, Japan
Jun Kawahara
Laboratory of Advanced Data Science, Information Initiative Center, Hokkaido University, Hokkaido, Japan
Masahiro Mizuta
Graduate School of Informatics, Kyoto University, Kyoto, Japan
Shin-ichi Minato

Authors

Fumio Ishioka
View author publications
You can also search for this author in PubMed Google Scholar
Jun Kawahara
View author publications
You can also search for this author in PubMed Google Scholar
Masahiro Mizuta
View author publications
You can also search for this author in PubMed Google Scholar
Shin-ichi Minato
View author publications
You can also search for this author in PubMed Google Scholar
Koji Kurihara
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fumio Ishioka.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ishioka, F., Kawahara, J., Mizuta, M. et al. Evaluation of hotspot cluster detection using spatial scan statistic based on exact counting. Jpn J Stat Data Sci 2, 241–262 (2019). https://doi.org/10.1007/s42081-018-0030-6

Download citation

Received: 30 July 2018
Accepted: 16 December 2018
Published: 23 January 2019
Issue Date: 01 June 2019
DOI: https://doi.org/10.1007/s42081-018-0030-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evaluation of hotspot cluster detection using spatial scan statistic based on exact counting

Abstract

Access this article

Similar content being viewed by others

K-means DTW Barycenter Averaging: a clustering analysis of COVID-19 cases and deaths on the Brazilian federal units

Comparing implementations of global and local indicators of spatial association

An ecological study of socioeconomic predictors in detection of COVID-19 cases across neighborhoods in New York City

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Evaluation of hotspot cluster detection using spatial scan statistic based on exact counting

Abstract

Access this article

Similar content being viewed by others

K-means DTW Barycenter Averaging: a clustering analysis of COVID-19 cases and deaths on the Brazilian federal units

Comparing implementations of global and local indicators of spatial association

An ecological study of socioeconomic predictors in detection of COVID-19 cases across neighborhoods in New York City

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation