Rethinking the null hypothesis in significant colocation pattern mining of spatial flows

Zhou, Mengjie; Yang, Mengjie; Ai, Tinghua; Cai, Jiannan; Chen, Zhe

doi:10.1007/s10109-024-00439-y

Rethinking the null hypothesis in significant colocation pattern mining of spatial flows

Original Article
Published: 03 May 2024

(2024)
Cite this article

Journal of Geographical Systems Aims and scope Submit manuscript

Mengjie Zhou ORCID: orcid.org/0000-0001-6054-5086^1,2,
Mengjie Yang¹,
Tinghua Ai³,
Jiannan Cai⁴ &
…
Zhe Chen¹

71 Accesses
Explore all metrics

Abstract

Spatial flows represent spatial interactions or movements. Mining colocation patterns of different types of flows may uncover the spatial dependences and associations among flows. Previous studies proposed a flow colocation pattern mining method and established a significance test under the null hypothesis of independence for the results. In fact, the definition of the null hypothesis is crucial in significance testing. Choosing an inappropriate null hypothesis may lead to misunderstandings about the spatial interactions between flows. In practice, the overall distribution patterns of different types of flows may be clustered. In these cases, the null hypothesis of independence will result in unconvincing results. Thus, considering the overall spatial pattern of flows, in this study, we changed the null hypothesis to random labeling to establish the statistical significance of flow colocation patterns. Furthermore, we compared and analyzed the impacts of different null hypotheses on flow colocation pattern mining through synthetic data tests with different preset patterns and situations. Additionally, we used empirical data from ride-hailing trips to show the practicality of the method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mining Co-location Patterns Between Network Spatial Phenomena

Big Spatial Flow Data Analytics

Operational local join count statistics for cluster detection

Article 02 May 2019

Data and codes availability

The synthetic data and codes are available in ‘figshare.com’ with the identifier(s): https://figshare.com/s/d881eee178d956d3a336.

References

Abel GJ, Sander N (2014) Quantifying global international migration flows. Science 343(6178):1520–1522
Article Google Scholar
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th international conference very large data bases, VLDB. Citeseer, pp 487–499
Andris C, Liu X, Ferreira J Jr (2018) Challenges for social flows. Comput Environ Urban Syst 70:197–207
Article Google Scholar
Anselin L (1995) Local indicators of spatial association—LISA. Geogr Anal 27(2):93–115
Article Google Scholar
Anselin L (2019) A local indicator of multivariate spatial association: extending Geary’s C. Geogr Anal 51(2):133–150
Article Google Scholar
Anselin L, Syabri I, Smirnov O (2002) Visualizing multivariate spatial correlation with dynamically linked windows. In: Proceedings, CSISS workshop on new tools for spatial data analysis, Santa Barbara, CA, Citeseer
Barua S, Sander J (2014) Mining statistically significant co-location and segregation patterns. IEEE Trans Knowl Data Eng 26(5):1185–1199
Article Google Scholar
Berglund S, Karlström A (1999) Identifying local spatial association in flow data. J Geogr Syst 1(3):219–236
Article Google Scholar
Besag J, Diggle PJ (1977) Simple Monte Carlo tests for spatial pattern. J R Stat Soc Ser C (Appl Stat) 26(3):327–333
Google Scholar
Cai J, Kwan M-P (2022) Discovering co-location patterns in multivariate spatial flow data. Int J Geogr Inf Sci 36(4):720–748
Article Google Scholar
Cai J, Liu Q, Deng M, Tang J, He Z (2018) Adaptive detection of statistically significant regional spatial co-location patterns. Comput Environ Urban Syst 68:53–63
Article Google Scholar
Cai J, Deng M, Guo Y, Xie Y, Shekhar S (2021) Discovering regions of anomalous spatial co-locations. Int J Geogr Inf Sci 35(5):974–998
Article Google Scholar
Ceyhan E (2009) Overall and pairwise segregation tests based on nearest neighbor contingency tables. Comput Stat Data Anal 53(8):2786–2808
Article Google Scholar
Chun Y, Kim H, Kim C (2012) Modeling interregional commodity flows with incorporating network autocorrelation in spatial interaction models: an application of the US interstate commodity flows. Comput Environ Urban Syst 36(6):583–591
Article Google Scholar
Cressie N (2015) Statistics for spatial data. Wiley, Hoboken
Google Scholar
Deng M, He Z, Liu Q, Cai J, Tang J (2017) Multi-scale approach to mining significant spatial co-location patterns. Trans GIS 21(5):1023–1039
Article Google Scholar
Diggle PJ (2013) Statistical analysis of spatial and spatio-temporal point patterns. CRC Press, Boca Raton
Book Google Scholar
Flores M, Villarreal A, Flores S (2017) Spatial co-location patterns of aerospace industry firms in Mexico. Appl Spat Anal Policy 10(2):233–251
Article Google Scholar
Gao Y, Li T, Wang S, Jeong M-H, Soltani K (2018) A multidimensional spatial scan statistics approach to movement pattern comparison. Int J Geogr Inf Sci 32(7):1304–1325
Article Google Scholar
Getis A, Ord J (1992) The analysis of spatial association by use of distance statistics. Geogr Anal 24(3):189
Article Google Scholar
Goreaud F, Pélissier R (2003) Avoiding misinterpretation of biotic interactions with the intertype K12-function: population independence vs. random labelling hypotheses. J Veg Sci 14(5):681–692
Google Scholar
Haining R (1991) Bivariate correlation with spatial data. Geogr Anal 23(3):210–227
Article Google Scholar
He Z, Deng M, Cai J, Xie Z, Guan Q, Yang C (2020) Mining spatiotemporal association patterns from complex geographic phenomena. Int J Geogr Inf Sci 34(6):1162–1187
Article Google Scholar
Huang Y, Shekhar S, Xiong H (2004) Discovering colocation patterns from spatial data sets: a general approach. IEEE Trans Knowl Data Eng 16(12):1472–1485
Article Google Scholar
Koperski K, Han J (1995) Discovery of spatial association rules in geographic information databases. In: International symposium on spatial databases. Springer, pp 47–66
Kulldorff M (1997) A spatial scan statistic. Commun Stat Theory Methods 26(6):1481–1496
Article Google Scholar
Lee SI (2001) Developing a bivariate spatial association measure: an integration of Pearson’s r and Moran’s I. J Geogr Syst 3:369–385
Article Google Scholar
Leslie TF, Kronenfeld BJ (2011) The colocation quotient: a new measure of spatial association between categorical subsets of points. Geogr Anal 43(3):306–326
Article Google Scholar
Liu Y, Tong D, Liu X (2015) Measuring spatial autocorrelation of vectors. Geogr Anal 47(3):300–319
Article Google Scholar
Miranda F, Doraiswamy H, Lage M, Zhao K, Gonçalves B, Wilson L, Hsieh M, Silva CT (2016) Urban pulse: capturing the rhythm of cities. IEEE Trans Visual Comput Gr 23(1):791–800
Article Google Scholar
Moran PA (1950) Notes on continuous stochastic phenomena. Biometrika 37(1/2):17–23
Article Google Scholar
Ord JK, Getis A (1995) Local spatial autocorrelation statistics: distributional issues and an application. Geogr Anal 27(4):286–306
Article Google Scholar
Shekhar S, Huang Y (2001) Discovering spatial co-location patterns: a summary of results. In: International symposium on spatial and temporal databases. Springer, pp 236–256
Shu H, Pei T, Song C, Chen X, Guo S, Liu Y, Chen J, Wang X, Zhou C (2020) L-function of geographical flows. Int J Geogr Inf Sci 35:1–28
Google Scholar
Souris M, Bichaud L (2011) Statistical methods for bivariate spatial analysis in marked points. Examples in spatial epidemiology. Spatial Spatio-temporal Epidemiol. 2(4):227–234
Article Google Scholar
Tao R, Thill JC (2016) Spatial cluster detection in spatial flow data. Geogr Anal 48(4):355–372
Article Google Scholar
Tao R, Thill JC (2019a) Flow cross K-function: a bivariate flow analytical method. Int J Geogr Inf Sci 33(10):2055–2071
Article Google Scholar
Tao R, Thill JC (2019b) FlowAMOEBA: identifying regions of anomalous spatial interactions. Geogr Anal 51(1):111–130
Article Google Scholar
Tao R, Thill JC (2020) BiFlowLISA: measuring spatial association for bivariate flow data. Comput Environ Urban Syst 83:101519
Article Google Scholar
Von Landesberger T, Brodkorb F, Roskosch P, Andrienko N, Andrienko G, Kerren A (2015) MobilityGraphs: visual analysis of mass mobility dynamics via spatio-temporal graphs and clustering. IEEE Trans Visual Comput Graphics 22(1):11–20
Article Google Scholar
Yu W, Ai T, He Y, Shao S (2017) Spatial co-location pattern mining of facility points-of-interest improved by network neighborhood and distance decay effects. Int J Geogr Inf Sci 31(2):280–296
Article Google Scholar
Zhang H, Zhou X, Tang G, Zhang X, Qin J, Xiong L (2022) Detecting colocation flow patterns in the geographical interaction data. Geogr Anal 54(1):84–103
Article Google Scholar
Zhou M, Ai T, Wu C, Gu Y, Wang N (2019) A visualization approach for discovering colocation patterns. Int J Geogr Inf Sci 33(3):567–592
Article Google Scholar
Zhou M, Yang M, Chen Z (2023) Flow colocation quotient: Measuring bivariate spatial association for flow data. Comput Environ Urban Syst 99:101916
Article Google Scholar

Download references

Acknowledgements

We would like to thank DiDi Chuxing company for the provision of the original dataset.

Funding

This research was funded by National Natural Science Foundation of China (41901314), Natural Science Foundation of Hunan Province, 2023JJ40447, RGC Postdoctoral Fellowship awarded by the Research Grants Council of Hong Kong (PDFS2223-4H01), and Scientific research project of Hunan Provincial Department of Education 23B0093.

Author information

Authors and Affiliations

School of Geographical Sciences, Hunan Normal University, Changsha, 410081, Hunan, China
Mengjie Zhou, Mengjie Yang & Zhe Chen
Hunan Key Laboratory of Geospatial Big Data Mining and Application, Changsha, China
Mengjie Zhou
School of Resource and Environment Sciences, Wuhan University, Wuhan, China
Tinghua Ai
Institute of Space and Earth Information Science, The Chinese University of Hong Kong, Shenzhen, Hong Kong, China
Jiannan Cai

Authors

Mengjie Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Mengjie Yang
View author publications
You can also search for this author in PubMed Google Scholar
Tinghua Ai
View author publications
You can also search for this author in PubMed Google Scholar
Jiannan Cai
View author publications
You can also search for this author in PubMed Google Scholar
Zhe Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mengjie Zhou.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix. Process for generating synthetic data

We derive synthetic datasets from simulations of different preset patterns and situations to test our method. There are ten flow datasets with different preset interaction patterns (colocation, not colocation) and different situations (random, clustered, abundance of flow instances), as shown in Fig.

8. Cases 1–5, 9–10 are flow datasets with two categories. Among them, the spatial flows in Cases 1–3 do not have an imposed colocation pattern, while Cases 4–10 do. Cases 6 and 7 are flow datasets with five categories. The study area in each case is a unit square, and the interaction distance \(R\) is set to 0.1.

The process to generate synthetic data are described as follows. In the synthetic data, random flow patterns were generated with a homogeneous spatial Poisson process (Shu et al. 2020). In practice, we first generated random points using a Poisson process and then randomly paired them. Clustered flow patterns were generated with a similar Matérn’s cluster process. We first generated flow cluster centers from a homogeneous spatial Poisson process. Then, we replaced the cluster centers with a number of offspring flows, where the offspring flows were generated from a Poisson process and distributed inside a distance of cluster radius \(r\) centered at flow cluster centers.

For Case 1, we first generated 100 randomly distributed red flows, and then generated 100 randomly distributed blue flows. For Case 2, we first generated 100 randomly distributed red flows, and then generated 100 clustered blue flows. For Case 3, we first generated 200 clustered flows and then randomly labeled them as red and blue. For case 4, we first generated 100 randomly distributed red flows and then randomly generated a blue flow within a distance of 0.1 for each red flow. Case 5 is similar to Case 4, but the number of the two types of flows differs greatly in Case 5. For Case 6, we first generate 150 random flows and randomly labeled them as red, purple and yellow. Then, we randomly generated a blue and a green flow within a distance of 0.1 for each red flow and ensured the generated blue and green flows were within a distance of 0.1 of each other. Finally, we randomly remove five flows of each type and randomly added five flows for each type. Case 7 is similar to Case 6, except that in the first step, we generated 150 clustered flows and randomly labeled them as red, purple and yellow. For cases 8–10, we first generated 100 clustered distributed red flows, the clustered radius \(r\) is set to 0.2, 0.4 and 0.6 respectively. Then, we randomly generated a blue flow within a distance of 0.1 for each red flow (interaction distance is preset to 0.1). Finally, we randomly remove ten flows of each type and randomly added ten flows for each type.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhou, M., Yang, M., Ai, T. et al. Rethinking the null hypothesis in significant colocation pattern mining of spatial flows. J Geogr Syst (2024). https://doi.org/10.1007/s10109-024-00439-y

Download citation

Received: 28 March 2023
Accepted: 18 March 2024
Published: 03 May 2024
DOI: https://doi.org/10.1007/s10109-024-00439-y

Keywords

JEL Classification

C21

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Rethinking the null hypothesis in significant colocation pattern mining of spatial flows

Abstract

Access this article

Similar content being viewed by others