Advertisement

Data Mining and Knowledge Discovery

, Volume 24, Issue 1, pp 159–194 | Cite as

Mining spatial colocation patterns: a different framework

  • Jin Soung YooEmail author
  • Mark Bow
Article

Abstract

Recently, there has been considerable interest in mining spatial colocation patterns from large spatial datasets. Spatial colocation patterns represent the subsets of spatial events whose instances are often located in close geographic proximity. Most studies of spatial colocation mining require the specification of two parameter constraints to find interesting colocation patterns. One is a minimum prevalent threshold of colocations, and the other is a distance threshold to define spatial neighborhood. However, it is difficult for users to decide appropriate threshold values without prior knowledge of their task-specific spatial data. In this paper, we propose a different framework for spatial colocation pattern mining. To remove the first constraint, we propose the problem of finding N-most prevalent colocated event sets, where N is the desired number of colocated event sets with the highest interest measure values per each pattern size. We developed two alternative algorithms for mining the N-most patterns. They reduce candidate events effectively and use a filter-and-refine strategy for efficiently finding colocation instances from a spatial dataset. We prove the algorithms are correct and complete in finding the N-most prevalent colocation patterns. For the second constraint, a distance threshold for spatial neighborhood determination, we present various methods to estimate appropriate distance bounds from user input data. The result can help an user to set a distance for a conceptualization of spatial neighborhood. Our experimental results with real and synthetic datasets show that our algorithmic design is computationally effective in finding the N-most prevalent colocation patterns. The discovered patterns were different depending on the distance threshold, which shows that it is important to select appropriate neighbor distances.

Keywords

Spatial data mining Spatial association patterns Colocation mining 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agarwal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the international conference on very large databases (VLDB)Google Scholar
  2. Appice A, Ceci M, Lanza A (2003) Discovery of spatial association rules in geo-referenced census data: a relational mining approach. In: Proceedings of the intelligent data analysisGoogle Scholar
  3. Arshad MU, Ayyaz MN (2006) Mining N-most interesting itemsets using support-ordered tries. In: IEEE international conference on computer systems and applicationsGoogle Scholar
  4. Bailey T, Gaterell A (1995) Interactive spatial data analysis. Longman Scientific & Technical, LondonGoogle Scholar
  5. Bembenik R, Rybinski H (2008) Mining spatial association rules with no distance parameter. In: Proceedings of the intelligent information processing and web mining, pp 499–508Google Scholar
  6. Berg M, Kreveld M, Overmars M, Schwarzkopf O (2000) Computational geometry. Springer, BerlinzbMATHGoogle Scholar
  7. Castro V, Murray A (1998) Discovering associations in spatial data—an efficient medoid based approach. In: Proceedings of the international Pacific-Asia conference on knowledge discovery and data mining (PAKDD)Google Scholar
  8. Ceci M, Appice A, Malerba D (2004) Spatial associative classification at different levels of granularity: a probabilistic approach. Lecture notes in computer science, vol 3202. Springer, BerlinGoogle Scholar
  9. Chelghoum N, Zeitouni K (2004) Spatial data mining implementation—alternative and performances. In: GeoInfoGoogle Scholar
  10. Cheung Y, Fu AW (2004) Mining frequent itemsets without support threshold: with and without item constraints. IEEE Trans Knowl Data Eng 16(9): 1052–1069CrossRefGoogle Scholar
  11. Cormen T, Leiserson C, Rivest R, Stein C (2003) Introduction to algorithms. McGraw-Hill Science, New YorkGoogle Scholar
  12. Cover T (1995) Nearest neighbor pattern classification. Knowl Based Syst 6(8): 373–389Google Scholar
  13. Cressie N (1993) Statistics for spatial data. Wiley, New YorkGoogle Scholar
  14. Ding W, Jiamthapthaksin1 R, Parmar R, Jiang D, Stepinski TF, Eick CF (2008) Towards region discovery in spatial datasets. In: Proceedings of the international Pacific-Asia conference on knowledge discovery and data mining (PAKDD)Google Scholar
  15. Easter MJ (1991) Reasoning about binary topological relations. In: Proceedings of the international symposium on advances in spatial databasesGoogle Scholar
  16. Easter M, Kriegel H, Sander J (1999) Knowledge discovery in spatial databases. In: Proceedings of the international conference on artificial intelligenceGoogle Scholar
  17. Eick CF, Parmar R, Ding W, Stepinski TF, Nicot J (2008) Finding regional co-location patterns for sets of continuous variables in spatial datasets. In: Proceedings of the 16th ACM SIGSPATIAL international conference on advances in geographic information systems (ACM–GIS)Google Scholar
  18. Frank R, Easter M, Knobbe A (2009) A multi-relational approach to spatial classification. In: Proceedings of the ACM international conference on knowledge discovery and data mining (KDD)Google Scholar
  19. Fu AW, Kwong RW, Tang J (2000) Mining N-most interesting itemsets. In: International symposium on methodologies for intelligent systemsGoogle Scholar
  20. Goodchild M (1986) Spatial autocorrelation. Geo Books, NorwichGoogle Scholar
  21. Griffith D (1987) Spatial autocorrelation: a primier. Resource publications in geography. Association of American Geographers, Washington, DCGoogle Scholar
  22. Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of the ACM SIGMOD conference on management of dataGoogle Scholar
  23. Hirate Y, Iwahashi E, Yamana H (2004) TF 2 P-growth: an efficient algorithm for mining frequent patterns without any thresholds. In: Proceedings of the IEEE ICDM’04 workshop on alternative techniques for data mining and knowledge discoveryGoogle Scholar
  24. Inokuchi A, Washio T, Motoda H (2000) An apriori-based algorithm for mining frequent substructures from graph data. In: Proceedings of the European conference on principles of data mining and knowledge discoveryGoogle Scholar
  25. Koperski K, Han J (1995) Discovery of spatial association rules in geographic information databases. In: Proceedings of international symposium on large spatial data bases, Maine, pp 47–66Google Scholar
  26. Kuramochi M, Karypis G (2001) Frequent subgraph discovery. In: Proceeding of IEEE international conference on data miningGoogle Scholar
  27. Li F, Cheng D, Hadjieleftheriou M, Kollios G, Teng S (2005) On trip planning queries in spatial databases. In: Proceedings of the 9th international symposium on advances in spatial and temporal databases (SSTD)Google Scholar
  28. Miller HJ (2006) Geographic data mining and knowledge discovery. In: Wilson JP, Fotheringham AS (eds) Handbook of geographic information science. Blackwell, OxfordGoogle Scholar
  29. Miller HJ, Han J (2001) Geographic data mining and knowledge discovery. Taylor and Francis, LondonCrossRefGoogle Scholar
  30. Morimoto Y (2001) Mining frequent neighboring class sets in spatial databases. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data miningGoogle Scholar
  31. Papadias D, Theodoridis Y (1997) Spatial relations, minimum bounding rectangles, and spatial data structures. Int J Geogr Inf Sci 11: 111–138CrossRefGoogle Scholar
  32. R for Statistical Computing. http://www.r-project.org/index.html
  33. Ripley B (1976) The second-order analysis of stationary point process. J Appl Probab 13: 255–266CrossRefzbMATHMathSciNetGoogle Scholar
  34. Roddick J, Spiliopoulou M (1999) A bibliography of temporal, spatial and spatio-temporal data mining research. Proc SIGKDD Explor 1(1): 34–38CrossRefGoogle Scholar
  35. Santos MY, Amaral LA (2005) Geo-spatial data mining in the analysis of a demographic database. Soft Comput Fusion Found Methodol Appl 9(5): 374–384Google Scholar
  36. Schlossberg M (2003) GIS, the US census and neighborhood scale analysis. Plan Pract Res 18(2–3):213–218Google Scholar
  37. Shekhar S, Chawla S (2003) Spatial databases: a tour. Prentice Hall, Upper Saddle RiverGoogle Scholar
  38. Shekhar S, Huang Y (2001) Co-location rules mining: a summary of results. In: Proceedings of the international symposium on spatio and temporal database (SSTD)Google Scholar
  39. Shekhar S, Zhang P, Huang Y, Vatsavai R (2004) Trends in spatial data mining. In: Kargupta H, Joshi A, Sivakumar K, Yesha Y (eds) Data mining: next generation challenges and future directions. AAAI/MIT Press, Menlo ParkGoogle Scholar
  40. Tobler W (1970) A computer movie simulating urban growth in the detroit region. Econ Geogr 46: 234–240CrossRefGoogle Scholar
  41. Toots B, Getis A (1998) Point pattern analysis. Sage Publications, Newbury ParkGoogle Scholar
  42. U. G. Survey. http://www.usgs.gov/
  43. U.S.E.P. Agency. Environmental Interest Type. http://www.epa.gov/enviro/html/frs_demo/interest_types.pdf
  44. U.S. EPA (Environmental Protection Agency) FRS (Facility Registry System) facilities. http://www.epa.gov/enviro/html/frs_demo/geospatial_data/geo_data_state_single.html
  45. Wan Y, Zhou J (2008) Knfcom-t: a k-nearest features-based co-location pattern mining algorithm for large spatial data sets by using t-trees. Int J Bus Intell Data Min 3(4): 375–389CrossRefMathSciNetGoogle Scholar
  46. Wang J, Han J, Lu Y, Tzvetkov P (2005) TFP: an efficient algorithm for mining top-k frequent closed itemsets. IEEE Trans Knowl Data Eng 17(5): 652–664CrossRefGoogle Scholar
  47. Xiao X, Xie X, Luo Q, Ma W (2008) Density based co-location pattern discovery. In: Proceedings of the 16th ACM SIGSPATIAL international conference on advances in geographic incormation systems (ACM-GIS)Google Scholar
  48. Xiong H, Shekhar S, Huang Y, Kumar V, Ma X, Yoo JS (2004) A framework for discovering co-location patterns in data sets with extended spatial objects. In: Proceedings of the SIAM international conference on data mining (SDM)Google Scholar
  49. Yan X, Han J (2001) gSpan: graph-baseed substructure pattern mining. In: Proceedings of IEEE international conference on data miningGoogle Scholar
  50. Yoo JS, Bow M (2009) Finding N-most prevalent colocated event sets. In: Proceedings of the international conference on data warehousing and knowledge discovery (DaWak)Google Scholar
  51. Yoo JS, Shekhar S (2004) A partial join approach for mining co-location patterns. In: Proceedings of the ACM international symposium on advances in geographic information systems (ACM-GIS)Google Scholar
  52. Yoo JS, Shekhar S (2006) A join-less approach for mining spatial co-location patterns. IEEE Trans Knowl Data Eng 18(10): 1323–1337CrossRefGoogle Scholar
  53. Zhang X, Mamoulis N, Cheung DW, Shouk Y (2004) Fast mining of spatial collocations. In: Proceedings of the ACM international conference on knowledge discovery and data mining (KDD)Google Scholar
  54. Zou L, Chen L, Lu Y (2000) Top-k subgraph matching query in a large graph. In: Proceedings of the conference on information and knowledge managementGoogle Scholar
  55. Zou S, Zhao Y, Guan J, Huang J (2005) A neighborhood-based clustering algorithm. In: Proceedings of the Pacific-Asia conference on knowledge discovery and data miningGoogle Scholar

Copyright information

© The Author(s) 2011

Authors and Affiliations

  1. 1.Department of Computer ScienceIndiana University–Purdue UniversityFort WayneUSA

Personalised recommendations