Skip to main content
Log in

Rethinking the null hypothesis in significant colocation pattern mining of spatial flows

  • Original Article
  • Published:
Journal of Geographical Systems Aims and scope Submit manuscript

Abstract

Spatial flows represent spatial interactions or movements. Mining colocation patterns of different types of flows may uncover the spatial dependences and associations among flows. Previous studies proposed a flow colocation pattern mining method and established a significance test under the null hypothesis of independence for the results. In fact, the definition of the null hypothesis is crucial in significance testing. Choosing an inappropriate null hypothesis may lead to misunderstandings about the spatial interactions between flows. In practice, the overall distribution patterns of different types of flows may be clustered. In these cases, the null hypothesis of independence will result in unconvincing results. Thus, considering the overall spatial pattern of flows, in this study, we changed the null hypothesis to random labeling to establish the statistical significance of flow colocation patterns. Furthermore, we compared and analyzed the impacts of different null hypotheses on flow colocation pattern mining through synthetic data tests with different preset patterns and situations. Additionally, we used empirical data from ride-hailing trips to show the practicality of the method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data and codes availability

The synthetic data and codes are available in ‘figshare.com’ with the identifier(s): https://figshare.com/s/d881eee178d956d3a336.

References

  • Abel GJ, Sander N (2014) Quantifying global international migration flows. Science 343(6178):1520–1522

    Article  Google Scholar 

  • Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th international conference very large data bases, VLDB. Citeseer, pp 487–499

  • Andris C, Liu X, Ferreira J Jr (2018) Challenges for social flows. Comput Environ Urban Syst 70:197–207

    Article  Google Scholar 

  • Anselin L (1995) Local indicators of spatial association—LISA. Geogr Anal 27(2):93–115

    Article  Google Scholar 

  • Anselin L (2019) A local indicator of multivariate spatial association: extending Geary’s C. Geogr Anal 51(2):133–150

    Article  Google Scholar 

  • Anselin L, Syabri I, Smirnov O (2002) Visualizing multivariate spatial correlation with dynamically linked windows. In: Proceedings, CSISS workshop on new tools for spatial data analysis, Santa Barbara, CA, Citeseer

  • Barua S, Sander J (2014) Mining statistically significant co-location and segregation patterns. IEEE Trans Knowl Data Eng 26(5):1185–1199

    Article  Google Scholar 

  • Berglund S, Karlström A (1999) Identifying local spatial association in flow data. J Geogr Syst 1(3):219–236

    Article  Google Scholar 

  • Besag J, Diggle PJ (1977) Simple Monte Carlo tests for spatial pattern. J R Stat Soc Ser C (Appl Stat) 26(3):327–333

    Google Scholar 

  • Cai J, Kwan M-P (2022) Discovering co-location patterns in multivariate spatial flow data. Int J Geogr Inf Sci 36(4):720–748

    Article  Google Scholar 

  • Cai J, Liu Q, Deng M, Tang J, He Z (2018) Adaptive detection of statistically significant regional spatial co-location patterns. Comput Environ Urban Syst 68:53–63

    Article  Google Scholar 

  • Cai J, Deng M, Guo Y, Xie Y, Shekhar S (2021) Discovering regions of anomalous spatial co-locations. Int J Geogr Inf Sci 35(5):974–998

    Article  Google Scholar 

  • Ceyhan E (2009) Overall and pairwise segregation tests based on nearest neighbor contingency tables. Comput Stat Data Anal 53(8):2786–2808

    Article  Google Scholar 

  • Chun Y, Kim H, Kim C (2012) Modeling interregional commodity flows with incorporating network autocorrelation in spatial interaction models: an application of the US interstate commodity flows. Comput Environ Urban Syst 36(6):583–591

    Article  Google Scholar 

  • Cressie N (2015) Statistics for spatial data. Wiley, Hoboken

    Google Scholar 

  • Deng M, He Z, Liu Q, Cai J, Tang J (2017) Multi-scale approach to mining significant spatial co-location patterns. Trans GIS 21(5):1023–1039

    Article  Google Scholar 

  • Diggle PJ (2013) Statistical analysis of spatial and spatio-temporal point patterns. CRC Press, Boca Raton

    Book  Google Scholar 

  • Flores M, Villarreal A, Flores S (2017) Spatial co-location patterns of aerospace industry firms in Mexico. Appl Spat Anal Policy 10(2):233–251

    Article  Google Scholar 

  • Gao Y, Li T, Wang S, Jeong M-H, Soltani K (2018) A multidimensional spatial scan statistics approach to movement pattern comparison. Int J Geogr Inf Sci 32(7):1304–1325

    Article  Google Scholar 

  • Getis A, Ord J (1992) The analysis of spatial association by use of distance statistics. Geogr Anal 24(3):189

    Article  Google Scholar 

  • Goreaud F, Pélissier R (2003) Avoiding misinterpretation of biotic interactions with the intertype K12-function: population independence vs. random labelling hypotheses. J Veg Sci 14(5):681–692

    Google Scholar 

  • Haining R (1991) Bivariate correlation with spatial data. Geogr Anal 23(3):210–227

    Article  Google Scholar 

  • He Z, Deng M, Cai J, Xie Z, Guan Q, Yang C (2020) Mining spatiotemporal association patterns from complex geographic phenomena. Int J Geogr Inf Sci 34(6):1162–1187

    Article  Google Scholar 

  • Huang Y, Shekhar S, Xiong H (2004) Discovering colocation patterns from spatial data sets: a general approach. IEEE Trans Knowl Data Eng 16(12):1472–1485

    Article  Google Scholar 

  • Koperski K, Han J (1995) Discovery of spatial association rules in geographic information databases. In: International symposium on spatial databases. Springer, pp 47–66

  • Kulldorff M (1997) A spatial scan statistic. Commun Stat Theory Methods 26(6):1481–1496

    Article  Google Scholar 

  • Lee SI (2001) Developing a bivariate spatial association measure: an integration of Pearson’s r and Moran’s I. J Geogr Syst 3:369–385

    Article  Google Scholar 

  • Leslie TF, Kronenfeld BJ (2011) The colocation quotient: a new measure of spatial association between categorical subsets of points. Geogr Anal 43(3):306–326

    Article  Google Scholar 

  • Liu Y, Tong D, Liu X (2015) Measuring spatial autocorrelation of vectors. Geogr Anal 47(3):300–319

    Article  Google Scholar 

  • Miranda F, Doraiswamy H, Lage M, Zhao K, Gonçalves B, Wilson L, Hsieh M, Silva CT (2016) Urban pulse: capturing the rhythm of cities. IEEE Trans Visual Comput Gr 23(1):791–800

    Article  Google Scholar 

  • Moran PA (1950) Notes on continuous stochastic phenomena. Biometrika 37(1/2):17–23

    Article  Google Scholar 

  • Ord JK, Getis A (1995) Local spatial autocorrelation statistics: distributional issues and an application. Geogr Anal 27(4):286–306

    Article  Google Scholar 

  • Shekhar S, Huang Y (2001) Discovering spatial co-location patterns: a summary of results. In: International symposium on spatial and temporal databases. Springer, pp 236–256

  • Shu H, Pei T, Song C, Chen X, Guo S, Liu Y, Chen J, Wang X, Zhou C (2020) L-function of geographical flows. Int J Geogr Inf Sci 35:1–28

    Google Scholar 

  • Souris M, Bichaud L (2011) Statistical methods for bivariate spatial analysis in marked points. Examples in spatial epidemiology. Spatial Spatio-temporal Epidemiol. 2(4):227–234

    Article  Google Scholar 

  • Tao R, Thill JC (2016) Spatial cluster detection in spatial flow data. Geogr Anal 48(4):355–372

    Article  Google Scholar 

  • Tao R, Thill JC (2019a) Flow cross K-function: a bivariate flow analytical method. Int J Geogr Inf Sci 33(10):2055–2071

    Article  Google Scholar 

  • Tao R, Thill JC (2019b) FlowAMOEBA: identifying regions of anomalous spatial interactions. Geogr Anal 51(1):111–130

    Article  Google Scholar 

  • Tao R, Thill JC (2020) BiFlowLISA: measuring spatial association for bivariate flow data. Comput Environ Urban Syst 83:101519

    Article  Google Scholar 

  • Von Landesberger T, Brodkorb F, Roskosch P, Andrienko N, Andrienko G, Kerren A (2015) MobilityGraphs: visual analysis of mass mobility dynamics via spatio-temporal graphs and clustering. IEEE Trans Visual Comput Graphics 22(1):11–20

    Article  Google Scholar 

  • Yu W, Ai T, He Y, Shao S (2017) Spatial co-location pattern mining of facility points-of-interest improved by network neighborhood and distance decay effects. Int J Geogr Inf Sci 31(2):280–296

    Article  Google Scholar 

  • Zhang H, Zhou X, Tang G, Zhang X, Qin J, Xiong L (2022) Detecting colocation flow patterns in the geographical interaction data. Geogr Anal 54(1):84–103

    Article  Google Scholar 

  • Zhou M, Ai T, Wu C, Gu Y, Wang N (2019) A visualization approach for discovering colocation patterns. Int J Geogr Inf Sci 33(3):567–592

    Article  Google Scholar 

  • Zhou M, Yang M, Chen Z (2023) Flow colocation quotient: Measuring bivariate spatial association for flow data. Comput Environ Urban Syst 99:101916

    Article  Google Scholar 

Download references

Acknowledgements

We would like to thank DiDi Chuxing company for the provision of the original dataset.

Funding

This research was funded by National Natural Science Foundation of China (41901314), Natural Science Foundation of Hunan Province, 2023JJ40447, RGC Postdoctoral Fellowship awarded by the Research Grants Council of Hong Kong (PDFS2223-4H01), and Scientific research project of Hunan Provincial Department of Education 23B0093.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mengjie Zhou.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix. Process for generating synthetic data

Appendix. Process for generating synthetic data

We derive synthetic datasets from simulations of different preset patterns and situations to test our method. There are ten flow datasets with different preset interaction patterns (colocation, not colocation) and different situations (random, clustered, abundance of flow instances), as shown in Fig. 

Fig. 8
figure 8

Synthetic datasets

8. Cases 1–5, 9–10 are flow datasets with two categories. Among them, the spatial flows in Cases 1–3 do not have an imposed colocation pattern, while Cases 4–10 do. Cases 6 and 7 are flow datasets with five categories. The study area in each case is a unit square, and the interaction distance \(R\) is set to 0.1.

The process to generate synthetic data are described as follows. In the synthetic data, random flow patterns were generated with a homogeneous spatial Poisson process (Shu et al. 2020). In practice, we first generated random points using a Poisson process and then randomly paired them. Clustered flow patterns were generated with a similar Matérn’s cluster process. We first generated flow cluster centers from a homogeneous spatial Poisson process. Then, we replaced the cluster centers with a number of offspring flows, where the offspring flows were generated from a Poisson process and distributed inside a distance of cluster radius \(r\) centered at flow cluster centers.

For Case 1, we first generated 100 randomly distributed red flows, and then generated 100 randomly distributed blue flows. For Case 2, we first generated 100 randomly distributed red flows, and then generated 100 clustered blue flows. For Case 3, we first generated 200 clustered flows and then randomly labeled them as red and blue. For case 4, we first generated 100 randomly distributed red flows and then randomly generated a blue flow within a distance of 0.1 for each red flow. Case 5 is similar to Case 4, but the number of the two types of flows differs greatly in Case 5. For Case 6, we first generate 150 random flows and randomly labeled them as red, purple and yellow. Then, we randomly generated a blue and a green flow within a distance of 0.1 for each red flow and ensured the generated blue and green flows were within a distance of 0.1 of each other. Finally, we randomly remove five flows of each type and randomly added five flows for each type. Case 7 is similar to Case 6, except that in the first step, we generated 150 clustered flows and randomly labeled them as red, purple and yellow. For cases 8–10, we first generated 100 clustered distributed red flows, the clustered radius \(r\) is set to 0.2, 0.4 and 0.6 respectively. Then, we randomly generated a blue flow within a distance of 0.1 for each red flow (interaction distance is preset to 0.1). Finally, we randomly remove ten flows of each type and randomly added ten flows for each type.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, M., Yang, M., Ai, T. et al. Rethinking the null hypothesis in significant colocation pattern mining of spatial flows. J Geogr Syst (2024). https://doi.org/10.1007/s10109-024-00439-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10109-024-00439-y

Keywords

JEL Classification

Navigation