Abstract
Spatial flows represent spatial interactions or movements. Mining colocation patterns of different types of flows may uncover the spatial dependences and associations among flows. Previous studies proposed a flow colocation pattern mining method and established a significance test under the null hypothesis of independence for the results. In fact, the definition of the null hypothesis is crucial in significance testing. Choosing an inappropriate null hypothesis may lead to misunderstandings about the spatial interactions between flows. In practice, the overall distribution patterns of different types of flows may be clustered. In these cases, the null hypothesis of independence will result in unconvincing results. Thus, considering the overall spatial pattern of flows, in this study, we changed the null hypothesis to random labeling to establish the statistical significance of flow colocation patterns. Furthermore, we compared and analyzed the impacts of different null hypotheses on flow colocation pattern mining through synthetic data tests with different preset patterns and situations. Additionally, we used empirical data from ride-hailing trips to show the practicality of the method.
Similar content being viewed by others
Data and codes availability
The synthetic data and codes are available in ‘figshare.com’ with the identifier(s): https://figshare.com/s/d881eee178d956d3a336.
References
Abel GJ, Sander N (2014) Quantifying global international migration flows. Science 343(6178):1520–1522
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th international conference very large data bases, VLDB. Citeseer, pp 487–499
Andris C, Liu X, Ferreira J Jr (2018) Challenges for social flows. Comput Environ Urban Syst 70:197–207
Anselin L (1995) Local indicators of spatial association—LISA. Geogr Anal 27(2):93–115
Anselin L (2019) A local indicator of multivariate spatial association: extending Geary’s C. Geogr Anal 51(2):133–150
Anselin L, Syabri I, Smirnov O (2002) Visualizing multivariate spatial correlation with dynamically linked windows. In: Proceedings, CSISS workshop on new tools for spatial data analysis, Santa Barbara, CA, Citeseer
Barua S, Sander J (2014) Mining statistically significant co-location and segregation patterns. IEEE Trans Knowl Data Eng 26(5):1185–1199
Berglund S, Karlström A (1999) Identifying local spatial association in flow data. J Geogr Syst 1(3):219–236
Besag J, Diggle PJ (1977) Simple Monte Carlo tests for spatial pattern. J R Stat Soc Ser C (Appl Stat) 26(3):327–333
Cai J, Kwan M-P (2022) Discovering co-location patterns in multivariate spatial flow data. Int J Geogr Inf Sci 36(4):720–748
Cai J, Liu Q, Deng M, Tang J, He Z (2018) Adaptive detection of statistically significant regional spatial co-location patterns. Comput Environ Urban Syst 68:53–63
Cai J, Deng M, Guo Y, Xie Y, Shekhar S (2021) Discovering regions of anomalous spatial co-locations. Int J Geogr Inf Sci 35(5):974–998
Ceyhan E (2009) Overall and pairwise segregation tests based on nearest neighbor contingency tables. Comput Stat Data Anal 53(8):2786–2808
Chun Y, Kim H, Kim C (2012) Modeling interregional commodity flows with incorporating network autocorrelation in spatial interaction models: an application of the US interstate commodity flows. Comput Environ Urban Syst 36(6):583–591
Cressie N (2015) Statistics for spatial data. Wiley, Hoboken
Deng M, He Z, Liu Q, Cai J, Tang J (2017) Multi-scale approach to mining significant spatial co-location patterns. Trans GIS 21(5):1023–1039
Diggle PJ (2013) Statistical analysis of spatial and spatio-temporal point patterns. CRC Press, Boca Raton
Flores M, Villarreal A, Flores S (2017) Spatial co-location patterns of aerospace industry firms in Mexico. Appl Spat Anal Policy 10(2):233–251
Gao Y, Li T, Wang S, Jeong M-H, Soltani K (2018) A multidimensional spatial scan statistics approach to movement pattern comparison. Int J Geogr Inf Sci 32(7):1304–1325
Getis A, Ord J (1992) The analysis of spatial association by use of distance statistics. Geogr Anal 24(3):189
Goreaud F, Pélissier R (2003) Avoiding misinterpretation of biotic interactions with the intertype K12-function: population independence vs. random labelling hypotheses. J Veg Sci 14(5):681–692
Haining R (1991) Bivariate correlation with spatial data. Geogr Anal 23(3):210–227
He Z, Deng M, Cai J, Xie Z, Guan Q, Yang C (2020) Mining spatiotemporal association patterns from complex geographic phenomena. Int J Geogr Inf Sci 34(6):1162–1187
Huang Y, Shekhar S, Xiong H (2004) Discovering colocation patterns from spatial data sets: a general approach. IEEE Trans Knowl Data Eng 16(12):1472–1485
Koperski K, Han J (1995) Discovery of spatial association rules in geographic information databases. In: International symposium on spatial databases. Springer, pp 47–66
Kulldorff M (1997) A spatial scan statistic. Commun Stat Theory Methods 26(6):1481–1496
Lee SI (2001) Developing a bivariate spatial association measure: an integration of Pearson’s r and Moran’s I. J Geogr Syst 3:369–385
Leslie TF, Kronenfeld BJ (2011) The colocation quotient: a new measure of spatial association between categorical subsets of points. Geogr Anal 43(3):306–326
Liu Y, Tong D, Liu X (2015) Measuring spatial autocorrelation of vectors. Geogr Anal 47(3):300–319
Miranda F, Doraiswamy H, Lage M, Zhao K, Gonçalves B, Wilson L, Hsieh M, Silva CT (2016) Urban pulse: capturing the rhythm of cities. IEEE Trans Visual Comput Gr 23(1):791–800
Moran PA (1950) Notes on continuous stochastic phenomena. Biometrika 37(1/2):17–23
Ord JK, Getis A (1995) Local spatial autocorrelation statistics: distributional issues and an application. Geogr Anal 27(4):286–306
Shekhar S, Huang Y (2001) Discovering spatial co-location patterns: a summary of results. In: International symposium on spatial and temporal databases. Springer, pp 236–256
Shu H, Pei T, Song C, Chen X, Guo S, Liu Y, Chen J, Wang X, Zhou C (2020) L-function of geographical flows. Int J Geogr Inf Sci 35:1–28
Souris M, Bichaud L (2011) Statistical methods for bivariate spatial analysis in marked points. Examples in spatial epidemiology. Spatial Spatio-temporal Epidemiol. 2(4):227–234
Tao R, Thill JC (2016) Spatial cluster detection in spatial flow data. Geogr Anal 48(4):355–372
Tao R, Thill JC (2019a) Flow cross K-function: a bivariate flow analytical method. Int J Geogr Inf Sci 33(10):2055–2071
Tao R, Thill JC (2019b) FlowAMOEBA: identifying regions of anomalous spatial interactions. Geogr Anal 51(1):111–130
Tao R, Thill JC (2020) BiFlowLISA: measuring spatial association for bivariate flow data. Comput Environ Urban Syst 83:101519
Von Landesberger T, Brodkorb F, Roskosch P, Andrienko N, Andrienko G, Kerren A (2015) MobilityGraphs: visual analysis of mass mobility dynamics via spatio-temporal graphs and clustering. IEEE Trans Visual Comput Graphics 22(1):11–20
Yu W, Ai T, He Y, Shao S (2017) Spatial co-location pattern mining of facility points-of-interest improved by network neighborhood and distance decay effects. Int J Geogr Inf Sci 31(2):280–296
Zhang H, Zhou X, Tang G, Zhang X, Qin J, Xiong L (2022) Detecting colocation flow patterns in the geographical interaction data. Geogr Anal 54(1):84–103
Zhou M, Ai T, Wu C, Gu Y, Wang N (2019) A visualization approach for discovering colocation patterns. Int J Geogr Inf Sci 33(3):567–592
Zhou M, Yang M, Chen Z (2023) Flow colocation quotient: Measuring bivariate spatial association for flow data. Comput Environ Urban Syst 99:101916
Acknowledgements
We would like to thank DiDi Chuxing company for the provision of the original dataset.
Funding
This research was funded by National Natural Science Foundation of China (41901314), Natural Science Foundation of Hunan Province, 2023JJ40447, RGC Postdoctoral Fellowship awarded by the Research Grants Council of Hong Kong (PDFS2223-4H01), and Scientific research project of Hunan Provincial Department of Education 23B0093.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix. Process for generating synthetic data
Appendix. Process for generating synthetic data
We derive synthetic datasets from simulations of different preset patterns and situations to test our method. There are ten flow datasets with different preset interaction patterns (colocation, not colocation) and different situations (random, clustered, abundance of flow instances), as shown in Fig.
8. Cases 1–5, 9–10 are flow datasets with two categories. Among them, the spatial flows in Cases 1–3 do not have an imposed colocation pattern, while Cases 4–10 do. Cases 6 and 7 are flow datasets with five categories. The study area in each case is a unit square, and the interaction distance \(R\) is set to 0.1.
The process to generate synthetic data are described as follows. In the synthetic data, random flow patterns were generated with a homogeneous spatial Poisson process (Shu et al. 2020). In practice, we first generated random points using a Poisson process and then randomly paired them. Clustered flow patterns were generated with a similar Matérn’s cluster process. We first generated flow cluster centers from a homogeneous spatial Poisson process. Then, we replaced the cluster centers with a number of offspring flows, where the offspring flows were generated from a Poisson process and distributed inside a distance of cluster radius \(r\) centered at flow cluster centers.
For Case 1, we first generated 100 randomly distributed red flows, and then generated 100 randomly distributed blue flows. For Case 2, we first generated 100 randomly distributed red flows, and then generated 100 clustered blue flows. For Case 3, we first generated 200 clustered flows and then randomly labeled them as red and blue. For case 4, we first generated 100 randomly distributed red flows and then randomly generated a blue flow within a distance of 0.1 for each red flow. Case 5 is similar to Case 4, but the number of the two types of flows differs greatly in Case 5. For Case 6, we first generate 150 random flows and randomly labeled them as red, purple and yellow. Then, we randomly generated a blue and a green flow within a distance of 0.1 for each red flow and ensured the generated blue and green flows were within a distance of 0.1 of each other. Finally, we randomly remove five flows of each type and randomly added five flows for each type. Case 7 is similar to Case 6, except that in the first step, we generated 150 clustered flows and randomly labeled them as red, purple and yellow. For cases 8–10, we first generated 100 clustered distributed red flows, the clustered radius \(r\) is set to 0.2, 0.4 and 0.6 respectively. Then, we randomly generated a blue flow within a distance of 0.1 for each red flow (interaction distance is preset to 0.1). Finally, we randomly remove ten flows of each type and randomly added ten flows for each type.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhou, M., Yang, M., Ai, T. et al. Rethinking the null hypothesis in significant colocation pattern mining of spatial flows. J Geogr Syst (2024). https://doi.org/10.1007/s10109-024-00439-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10109-024-00439-y