Skip to main content

Advertisement

Log in

Domain-driven co-location mining

Extraction, visualization and integration in a GIS

  • Published:
GeoInformatica Aims and scope Submit manuscript

Abstract

Co-location mining is a classical problem in spatial pattern mining. Considering a set of boolean spatial features, the goal is to find subsets of features frequently located together. It has wide applications in environmental management, public safety, transportation or tourism. These last years, many algorithms have been proposed to extract frequent co-locations. However, most solutions do a “data-centered knowledge discovery” instead of a “expert-centered knowledge discovery”. Successfully providing useful and interpretable patterns to experts is still an open problem. In this setting, we propose a domain-driven co-location mining approach that combines constraint-based mining and cartographic visualization. Experts can push new domain constraints into the mining algorithm, resulting in more relevant patterns and more efficient extraction. Then, they can visualize solutions using a new concise and intuitive cartographic visualization of co-locations. Using this original visualization approach, they identify new interesting patterns, and use uninteresting ones to define new constraints and refine their analysis. These proposals have been integrated into a prototype based on PostGIS geographic information system. Experiments have been done using a real geological datasets studying soil erosion, and results have been validated by a domain expert.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Similar content being viewed by others

References

  1. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Bocca JB, Jarke M, Zaniolo C (eds) VLDB. Morgan Kaufmann, Burlington, Massachusetts, pp 487–499

  2. Agrawal R, Imielinski T, Swami AN (1993) Mining association rules between sets of items in large databases. In: Buneman P, Jajodia S (eds) SIGMOD conference. ACM Press, pp 207–216

  3. Andrienko GL, Andrienko NV (1999) Knowledge-based visualization to support spatial data mining. In: IDA, pp 149–160

  4. Andrienko GL, Andrienko NV, Rinzivillo S, NanniM, Pedreschi D, Giannotti F (2009) Interactive visual clustering of large collections of trajectories. In: VAST. IEEE Computer Society, pp 3–10

  5. Arctur D, Zeiler M (2004) Designing geodatabases: case studies in Gis data modeling. Environmental Systems Research

  6. Atherton J, Olson D, Farley L, Qauqau I (2005) Fiji watersheds at risk: watershed assessment for healthy reefs and fisheries

  7. Bayardo RJ Jr (1998) Efficiently mining long patterns from databases. In: Haas LM, Tiwary A (eds) SIGMOD conference. ACM Press, pp 85–93

  8. Bertini E, Lalanne D (2010) Investigating and reflecting on the integration of automatic data analysis and visualization in knowledge discovery. SIGKDD Explor Newsl 11(2):9–18

    Article  Google Scholar 

  9. Bogorny V, Valiati JF, da Silva Camargo S, Engel PM, Kuijpers B, Alvares LO (2006) Mining maximal generalized frequent geographic patterns with knowledge constraints. In: ICDM. IEEE Computer Society, pp 813–817

  10. Boulicaut JF, Jeudy B (2010) Constraint-based data mining. In: Data mining and knowledge discovery handbook, pp 339–354

  11. Brunk C, Kelly J, Kohavi R (1997) Mineset: an integrated system for data mining. In: KDD, pp 135–138

  12. Burdick D, Calimlim M, Gehrke J (2001) Mafia: a maximal frequent itemset algorithm for transactional databases. In: ICDE. IEEE Computer Society, pp 443–452

  13. Cao L (2008) Domain driven data mining (d3m). In: ICDM workshops. IEEE Computer Society, pp 74–76

  14. Ceci M, Appice A, Malerba D (2007) Discovering emerging patterns in 1004 spatial databases: a multi-relational approach. In: PKDD, vol 4702. Springer, LNCS, pp 390–397

  15. Celik M, Kang JM, Shekhar S (2007) Zonal co-location pattern discovery with dynamic parameters. In: ICDM. IEEE Computer Society, pp 433–438

  16. Chen K, Liu L (2003) Validating and refining clusters via visual rendering. In: ICDM. IEEE Computer Society, pp 501–504

  17. De Marchi F, Petit JM (2003) Zigzag: a new algorithm for mining large inclusion dependencies in database. In: ICDM. IEEE Computer Society, pp 27–34

  18. Desmier E, Flouvat F, Gay D, Selmaoui-Folcher N (2011) A clustering-based visualization of colocation patterns. In: Desai BC, Cruz IF, Bernardino J (eds) IDEAS. ACM, pp 70–78

  19. Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, pp 226–231

  20. Fayyad UM, Piatetsky-Shapiro G, Smyth P (1996) From data mining to knowledge discovery in databases. AI Mag 17(3):37–54

    Google Scholar 

  21. Flouvat F, DeMarchi F, Petit JM(2004) ABS: Adaptive Borders Search of frequent itemsets. In: Bayardo RJ, Goethals B, Zaki MJ (eds) FIMI, CEUR-WS.org, CEUR Workshop Proceedings, vol 126

  22. Flouvat F, De Marchi F, Petit JM (2009) The izi project: easy prototyping of interesting pattern mining algorithms. In: Advanced techniques for datamining and knowledge discovery. Springer, LNCS, pp 1–15

  23. Guo D (2009) Flow mapping and multivariate visualization of large spatial interaction data. Trans Vis Comput Graph 15(6):1041–1048

    Article  Google Scholar 

  24. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update, vol 11

  25. Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: ChenW, Naughton JF, Bernstein PA (eds) SIGMOD conference. ACM, pp 1–12

  26. Heer J, Boyd D (2005) Vizster: visualizing online social networks, pp 23–25

  27. Hsu W, Lee ML, Wang J (2007) Temporal and spatio-temporal data mining. IGI Publishing, Hershey

    Google Scholar 

  28. Huang Y, Shekhar S, Xiong H (2004) Discovering colocation patterns from spatial data sets: a general approach. IEEE Trans Knowl Data Eng 16(12):1472–1485

    Article  Google Scholar 

  29. Huang Y, Pei J, Xiong H (2006) Mining co-location patterns with rare events from spatial data sets. GeoInformatica 10(3):239–260

    Article  Google Scholar 

  30. Huhtala Y, Kärkkäinen J, Porkka P, Toivonen H (1999) Tane: an efficient algorithm for discovering functional and approximate dependencies. Comput J 42(2):100–111

    Article  Google Scholar 

  31. Jaffré T (1992) Floristic and ecological diversity of the vegetation on ultramafic rocks in new caledonia. The vegetation of ultramafic (serpentine) soils, pp 101–107

  32. Janeja VP, Adam NR, Atluri V, Vaidya J (2010) Spatial neighborhood based anomaly detection in sensor datasets. Data Min Knowl Discov 20(2):221–258

    Article  Google Scholar 

  33. Jaudoin H, Flouvat F, Petit JM, Toumani F (2009) Towards a scalable query rewriting algorithm in presence of value constraints. J Data Semant 12:37–65

    Article  Google Scholar 

  34. Keim DA, Schneidewind J, Sips M (2005) FP-Viz: visual frequent pattern mining. In: Proceedings of IEEE symposium on information visualization (InfoVis ’05), Poster Paper

  35. Koperski K, Han J (1995) Discovery of spatial association rules in geographic information databases. In: Egenhofer MJ, Herring JR (eds) SSD, vol 951. Springer, Lecture Notes in Computer Science, pp 47–66

  36. Leung CKS, Irani P, Carmichael CL (2008) Wifisviz: effective visualization of frequent itemsets. In: ICDM. IEEE Computer Society, pp 875–880

  37. Lin DI, Kedem ZM (1998) Pincer search: a new algorithm for discovering the maximum frequent set. In: Schek HJ, Saltor F, Ramos I, Alonso G (eds) EDBT, vol 1377. Springer, Lecture Notes in Computer Science, pp 105–119

  38. Lisi FA, Malerba D (2004) Inducing multi-level association rules from multiple relations. Mach Learn 55(2):175–210

    Article  Google Scholar 

  39. Lloyd S (1982) Least squares quantization in pcm. IEEE Trans Inf Theory 28(2):129–137

    Article  Google Scholar 

  40. Malerba D (2008) A relational perspective on spatial data mining. Int J Data Mining Model Manag 1(1):103–118

    Article  Google Scholar 

  41. Mannila H, Toivonen H (1997) Levelwise search and borders of theories in knowledge discovery. Data Min Knowl Disc 1(3):241–258

    Article  Google Scholar 

  42. McGarry K (2005) A survey of interestingness measures for knowledge discovery. Knowl Eng Rev 20(01):39

    Article  Google Scholar 

  43. Morrison A, Ross G, Chalmers M (2003) Fast multidimensional scaling through sampling, springs and interpolation. Inf Vis 2(1):68–77

    Article  Google Scholar 

  44. Ng RT, Lakshmanan LVS, Han J, Pang A (1998) Exploratory mining and pruning optimizations of constrained associations rules. ACM SIGMOD Record 27(2):13–24

    Article  Google Scholar 

  45. Nourine L, Petit JM (2012) Extending set-based dualization: application to pattern mining. In: Raedt LD, Bessière C, Dubois D, Doherty P, Frasconi P, Heintz F, Lucas PJF (eds) ECAI, vol 242. IOS Press, Frontiers in Artificial Intelligence and Applications, pp 630–635

  46. Pei J, Han J, Lakshmanan LVS (2001) Mining frequent itemsets with convertible constraints. Data Eng (Section 4):433–442

  47. Pelleg D, Moore AW (2000) X-means: extending k-means with efficient estimation of the number of clusters. In: Langley P (ed) ICML. Morgan Kaufmann, Burlington, Massachusetts, pp 727–734

  48. Qian F, He Q, He J (2009) Mining spatial co-location patterns with dynamic neighborhood constraint. In: ECML/PKDD’09, vol 5782. Springer, LNCS, pp 238–253

  49. Raedt LD, Zimmerman A (2007) Constraint-based pattern set mining. In: ICDM. IEEE Computer Society, pp 1–12

  50. Selmaoui-Folcher N, Flouvat F, Gay D, Rouet I (2011) Spatial pattern mining for soil erosion characterization. IJAEIS 2(2):73–92

    Google Scholar 

  51. Shekhar S, Huang Y (2001) Discovering spatial co-location patterns: a summary of results. In: SSTD, pp 236–256

  52. Tobler W (1979) Cellular geography. In: Gale S, Olsson G (eds) Philosophy in geography. Reidel, Dordrecht, pp 379–389

  53. Yang J, PengW,Ward MO, Rundensteiner EA (2003) Interactive hierarchical dimension ordering, spacing and filtering for exploration of high dimensional datasets. In: INFOVIS. IEEE Computer Society, pp 105–112

  54. Yoo JS, Bow M (2012) Mining spatial colocation patterns: a different framework. Data Min Knowl Discov 24(1):159–194

    Article  Google Scholar 

  55. Yoo JS, Shekhar S (2006) A joinless approach for mining spatial colocation patterns. IEEE TKDE 18(10):1323–1337

    Google Scholar 

  56. Zaki MJ, Parthasarathy S, Ogihara M, Li W (1997) New algorithms for fast discovery of association rules. In: KDD, pp 283–286

Download references

Acknowledgments

This work was funded by French contract ANR-2010-COSI-012-01 FOSTER.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Frédéric Flouvat.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Flouvat, F., Van Soc, JF.N., Desmier, E. et al. Domain-driven co-location mining. Geoinformatica 19, 147–183 (2015). https://doi.org/10.1007/s10707-014-0209-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10707-014-0209-3

Keywords

Navigation