Advertisement

GeoInformatica

, Volume 15, Issue 1, pp 1–28 | Cite as

A framework for regional association rule mining and scoping in spatial datasets

  • Wei DingEmail author
  • Christoph F. Eick
  • Xiaojing Yuan
  • Jing Wang
  • Jean-Philippe Nicot
Article

Abstract

The motivation for regional association rule mining and scoping is driven by the facts that global statistics seldom provide useful insight and that most relationships in spatial datasets are geographically regional, rather than global. Furthermore, when using traditional association rule mining, regional patterns frequently fail to be discovered due to insufficient global confidence and/or support. In this paper, we systematically study this problem and address the unique challenges of regional association mining and scoping: (1) region discovery: how to identify interesting regions from which novel and useful regional association rules can be extracted; (2) regional association rule scoping: how to determine the scope of regional association rules. We investigate the duality between regional association rules and regions where the associations are valid: interesting regions are identified to seek novel regional patterns, and a regional pattern has a scope of a set of regions in which the pattern is valid. In particular, we present a reward-based region discovery framework that employs a divisive grid-based supervised clustering for region discovery. We evaluate our approach in a real-world case study to identify spatial risk patterns from arsenic in the Texas water supply. Our experimental results confirm and validate research results in the study of arsenic contamination, and our work leads to the discovery of novel findings to be further explored by domain scientists.

Keywords

Association rule mining and scoping Region discovery Clustering Spatial data mining 

References

  1. 1.
    Agrawal R, Imielinski T, Swami AN (1993) Mining association rules between sets of items in large databases. In: Buneman P, Jajodia S (eds) Proceedings of the 1993 ACM SIGMOD international conference on management of data, Washington, D.C., vol 26–28, pp 207–216Google Scholar
  2. 2.
    Appice A, Ceci M, Lanza A, Lisi FA, Malerba D (2003) Discovery of spatial association rules in geo-referenced census data: a relational mining approach. Intell Data Anal 7(6):541–566Google Scholar
  3. 3.
    Bistarelli S, Bonchi F (2005) Interestingness is not a dichotomy: introducing softness in constrained pattern mining. In: The ninth European conference on principles and practice of knowledge discovery in databases (PKDD). Lecture notes in computer science, vol 3721. Springer, Porto, PortugalGoogle Scholar
  4. 4.
    Bogorny V, Camargo S, Engel PM, Alvares LO (2006) Mining frequent geographic patterns with knowledge constraints. In: GIS ’06: proceedings of the 14th annual ACM international symposium on advances in geographic information systems, Arlington, Virginia, USA, pp 139–146Google Scholar
  5. 5.
    Bogorny V, Kuijpers B, Alvares L (2008) Reducing uninteresting spatial association rules in geographic databases using background knowledge: a summary of results. Int J Geogr Inf Sci 22(4):361–386CrossRefGoogle Scholar
  6. 6.
    Bogorny V, Valiati J, Camargo S, Engel P, Kuijpers B, Alvares L (2006) Mining maximal generalized frequent geographic patterns with knowledge constraints. In: The 6th international conference on data mining, Hong Kong, pp 813–817Google Scholar
  7. 7.
    Brimicombe AJ (2005) Cluster detection in point event data having tendency towards spatially repetitive events. In: The 8th intl. conf. on GeoComputationGoogle Scholar
  8. 8.
    CougarSquared Data Mining and Machine Learning Framework, Data Mining and Machine Learning Group (2009) University of Houston. http://cougarsquared.dev.java.net/
  9. 9.
    Data Mining and Machine Learning Group (2009) University of Houston. http://www.tlc2.uh.edu/dmmlg/Datasets
  10. 10.
    Ding W, Eick CF, Wang J, Yuan X (2006) A framework for regional association rule mining in spatial datasets. In: The 6th IEEE international conference on data mining (ICDM)Google Scholar
  11. 11.
    Ding W, Eick CF, Yuan X, Wang J, Nicot J-P (2007) On regional association rule scoping. In: The international workshop on spatial and spatio-temporal data mining in cooperation with IEEE ICDM 2007, Omaha, NE, USAGoogle Scholar
  12. 12.
    Dougherty J, Kohavi R, Sahami M (1995) Supervised and unsupervised discretization of continuous features. In: International conference on machine learning, pp 194–202Google Scholar
  13. 13.
    Egenhofer MJ, Franzosa RD (1991) Pointset topological spatial relations. Int J Geogr Inf Syst 5(2):161–174CrossRefGoogle Scholar
  14. 14.
    EH S (1951) The interpretation of interaction in contingency tables. J R Stat Soc B13:238–241Google Scholar
  15. 15.
    Eick C, Vaezian B, Jiang D, Wang J (2006) Discovering of interesting regions in spatial data sets using supervised cluster. In: PKDD’06, 10th European conference on principles and practice of knowledge discovery in databasesGoogle Scholar
  16. 16.
    Eick CF, Zeidat N, Zhao Z (2004) Supervised clustering: Algorithms and application. In: International conference on tools with AI, Boca Raton, Florida, pp 774–776Google Scholar
  17. 17.
    Fayyad UM, Irani KB (1993) Multi-interval discretization of continuous-valued attributes for classification learning. In: Kaufmann M (ed) Proceedings of the 13th international joint conference on artificial intelligence, pp 1022–1027Google Scholar
  18. 18.
    Getis A, Ord JK (1992) The analysis of spatial association by use of distance statistics. Geogr Anal 24:189–206CrossRefGoogle Scholar
  19. 19.
    Goodchild MF (2003) The fundamental laws of GIScience. Invited talk at University Consortium for Geographic Information Science, University of California, Santa BarbaraGoogle Scholar
  20. 20.
    Han J, Kamber M, Tung AKH (2001) Spatial clustering methods in data mining: a survey. In: Geographic data mining and knowledge discoveryGoogle Scholar
  21. 21.
    Hudak PF (2003) Arsenic, nitrate, chloride and bromide contamination in the gulf coast aquifer, south-central Texas, USA. Int J Environ Stud 60:123–133CrossRefGoogle Scholar
  22. 22.
    Karypis G, Han E-HS, Kumar V (1999) Chameleon: hierarchical clustering using dynamic modeling. IEEE Computer 32(8):68–75Google Scholar
  23. 23.
    Koperski K, Han J (1995) Discovery of spatial association rules in geographic information databases. In: Egenhofer MJ, Herring JR (eds) Proc. 4th int. symp. advances in spatial databases, SSD, vol 951, pp 47–66, 6–9Google Scholar
  24. 24.
    Kulldorff M (2001) Prospective time periodic geographical disease surveillance using a scan statistic. J R Stat Soc Ser A 164:61–72CrossRefGoogle Scholar
  25. 25.
    Lee LM Herbert B (2001) A GIS survey of arsenic and other trace metals in groundwater resources of Texas. In: Natural arsenic in groundwater: science, regulation, and health implications (Posters)Google Scholar
  26. 26.
    Li W, Han J, Pei J (2001) CMAR: accurate and efficient classification based on multiple class-association rules. In: International conference on data mining (ICDM’01), San Jose, CAGoogle Scholar
  27. 27.
    Mennis J, Liu J (2005) Mining association rules in spatio-temporal data: an analysis of urban socioeconomic and sand cover change. Trans GIS 9:5–17CrossRefGoogle Scholar
  28. 28.
    Merriam-Webster Online Dictionary (2009) http://www.merriam-webster.com
  29. 29.
    Munro R, Chawla S, Sun P (2003) Complex spatial relationships. In: The third IEEE international conference on data mining (ICDM)Google Scholar
  30. 30.
    National Water-Quality Assessment Program, U.S. Department of the Interior and U.S. Geological Survey (2001) Ground-water quality of the southern high plains aquifer, Texas and New Mexico, Open-File Report 03-345Google Scholar
  31. 31.
    Openshaw S (1994) Two exploratory space–time attribute pattern analysers relevant to GIS. In: Fotheringham S, Rogerson P (eds) Spatial analysis and GIS. Taylor and Francis, London, pp 83–104Google Scholar
  32. 32.
    Openshaw S (1995) Developing automated and smart spatial pattern exploration tools for geographical information systems applications. Statistician 44(1):3–16CrossRefGoogle Scholar
  33. 33.
    Openshaw S (1999) Geographical data mining: key design issues. In: GeoComputationGoogle Scholar
  34. 34.
    Ord JK, Getis A (1995) Local spatial autocorrelation statistics: distributional issues and an application. Geogr Anal 27(4):286–306CrossRefGoogle Scholar
  35. 35.
    Papadimitriou S, Gionis A, Tsaparas P, Väisänen A, Mannila H, Faloutsos C (2005) Parameter-free spatial data mining using MDL. In: 5th international conference on data mining (ICDM)Google Scholar
  36. 36.
    Parker R (2001) Ground water discharge from mid-tertiary rhyolitic ash-rich sediments as the source of elevated arsenic in South Texas surface waters. In: Natural arsenic in groundwater: science, regulation, and health implicationsGoogle Scholar
  37. 37.
    Roddick JF, Spiliopoulou M (1999) A bibliography of temporal, spatial and spatio-temporal data mining research. In: SIGKDD explorations, vol 1, pp 34–38Google Scholar
  38. 38.
    Sharma L, Tiwary U, Vyas O (2004) An efficient approach to spatial association rule mining. In: Int. conf. on ISPR IIIT, Allahabad, India, pp 1–5Google Scholar
  39. 39.
    Shekhar S (2004) Spatial data mining: accomplishments and research needs. In: Keynote speech at GIScience 2004 (3rd bi-annual international conference on geographic information science)Google Scholar
  40. 40.
    Shekhar S, Chawla S (2003) Spatial databases: a tour. Prentice Hall, Upper Saddle River (ISBN 013-017480-7)Google Scholar
  41. 41.
    Shekhar S, Zhang P, Huang Y, Vatsavai RR (2003) Book chapter in data mining: next generation challenges and future directions. In: Kargupta H, Joshi A (eds) Spatial data mining. AAAI/MIT, CambridgeGoogle Scholar
  42. 42.
    Smith A, Hopenhayn-Rich C (1992) Cancer risks from arsenic in drinking water. In: Environmental health perspectives, vol 97, pp 259–267Google Scholar
  43. 43.
    Tan P-N, Steinbach M, Kumar V (2006) Introduction to data mining. Addison-Wesley, New YorkGoogle Scholar
  44. 44.
    Tay SC, Hsu W, Lim KH (2003) Spatial data mining: clustering of hot spots and pattern recognition. In: IEEE international geoscience and remote sensing symposiumGoogle Scholar
  45. 45.
    Texas Water Development Board (2009) http://www.twdb.state.tx.us/home/index.asp
  46. 46.
    U.S. Environmental Protection Agency (2009) http://www.epa.gov/
  47. 47.
    Wang W, Yang J, Muntz RR (1997) STING: a statistical information grid approach to spatial data mining. In: Twenty-third international conference on very large data bases. Morgan Kaufmann, Athens, pp 186–195Google Scholar
  48. 48.
    World Health Organization (2009) http://www.who.int/

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Wei Ding
    • 1
    Email author
  • Christoph F. Eick
    • 2
  • Xiaojing Yuan
    • 3
  • Jing Wang
    • 2
  • Jean-Philippe Nicot
    • 4
  1. 1.Department of Computer ScienceUniversity of Massachusetts-BostonBostonUSA
  2. 2.Department of Computer ScienceUniversity of HoustonHoustonUSA
  3. 3.Engineering Technology DepartmentUniversity of HoustonHoustonUSA
  4. 4.Bureau of Economic Geology, John A. & Katherine G. Jackson School of GeosciencesThe University of Texas at AustinAustinUSA

Personalised recommendations