, Volume 15, Issue 3, pp 399–416 | Cite as

Controlling patterns of geospatial phenomena

  • Tomasz F. Stepinski
  • Wei Ding
  • Christoph F. Eick


Modeling spatially distributed phenomena in terms of its controlling factors is a recurring problem in geoscience. Most efforts concentrate on predicting the value of response variable in terms of controlling variables either through a physical model or a regression model. However, many geospatial systems comprises complex, nonlinear, and spatially non-uniform relationships, making it difficult to even formulate a viable model. This paper focuses on spatial partitioning of controlling variables that are attributed to a particular range of a response variable. Thus, the presented method surveys spatially distributed relationships between predictors and response. The method is based on association analysis technique of identifying emerging patterns, which are extended in order to be applied more effectively to geospatial data sets. The outcome of the method is a list of spatial footprints, each characterized by a unique “controlling pattern”—a list of specific values of predictors that locally correlate with a specified value of response variable. Mapping the controlling footprints reveals geographic regionalization of relationship between predictors and response. The data mining underpinnings of the method are given and its application to a real world problem is demonstrated using an expository example focusing on determining variety of environmental associations of high vegetation density across the continental United States.


Predictors–response relationship Association analysis Mapping predicting relationship Vegetation density Data mining 



The work is supported in part by the National Science Foundation under Grant IIS-0812271. A portion of this research was conducted at the Lunar and Planetary Institute, which is operated by the USRA under contract CAN-NCC5-679 with NASA. This is LPI Contribution No.1532.


  1. 1.
    Agrawal R, Imielinski T, Swami AN (1993) Mining association rules between sets of items in large databases. In: Buneman P, Jajodia S (eds) Proceedings of the 1993 ACM SIGMOD international conference on management of data. Washington, D.C., pp 26–28, 207–216Google Scholar
  2. 2.
    Boulesteix AL, Tutz G, Strimmer K (2003) A cart-based approach to discover emerging patterns in microarray data. Bioinformatics 19(18):2465–2472CrossRefGoogle Scholar
  3. 3.
    Burdick D, Calimlim M, Gehrke J (2001) Mafia: a maximal frequent itemset algorithm for transactional databases. In: Proceedings of the 17th international conference on data engineering. Heidelberg, GermanyGoogle Scholar
  4. 4.
    Ceci M, Appice A, Malerba D (2007) Discovering emerging patterns in spatial databases: a multi-relational approach. In: Knowledge discovery in databases: PKDD 2007, series: lecture notes in artificial intelligence, vol 4702. Springer, Berlin, pp 390–397CrossRefGoogle Scholar
  5. 5.
    Cormode G, Muthukrishnan S (2004) What’s new: finding significant differences in network data streams. In: IEEE INFOCOMGoogle Scholar
  6. 6.
    Cressie, NA (1993) Statistics for spatial data. Wiley, New YorkGoogle Scholar
  7. 7.
    Ding W, Stepinski TF, Parmar R, Jiang D, Eick CF (2009) Discovery of feature-based hot spots using supervised clustering. Comput Geosci 35:1508–1516CrossRefGoogle Scholar
  8. 8.
    Ding W, Stepinski TF, Salazar, J (2009) Discovery of geospatial discriminating patterns from remote sensing datasets. In: SIAM international conference on data mining (SDM), Nevada, April 2009Google Scholar
  9. 9.
    Dong G, Li J (1999) Efficient mining of emerging patterns: discovering trends and differences. In: KDD ’99: proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining. San Diego, California, United StatesGoogle Scholar
  10. 10.
    Korkalainen T, Lauren A (2006) Using phytogeomorphology, cartography and GIS to explain forest site productivity expressed as tree height in southern and central Finland. Geomorphology 74:271–284CrossRefGoogle Scholar
  11. 11.
    Larsen DR, Speckman, PL (2004) Multivariate regression trees for analysis of abundance data. Biometrics, 60(2):543–549CrossRefGoogle Scholar
  12. 12.
    Li J, Wong L (2005) Structural geography of the space of emerging patterns. Intelligent Data Analysis 9(6):567–588Google Scholar
  13. 13.
    Li J, Yang Q (2007) Strong compound-risk factors: efficient discovery through emerging patterns and contrast sets. IEEE Trans Inf Technol Biomed 11:544–552CrossRefGoogle Scholar
  14. 14.
    Li J, Liu H, S-K Ng, Wong L (2003) Discovery of significant rules for classifying cancer diagnosis data. Bioinformatics 19:ii93–ii102Google Scholar
  15. 15.
    Liaghati T, Preda M, Cox M (2003) Heavy metal distribution and controlling factors within coastal plain sediments, Bells Creek catchment, southeast Queensland, Australia. Environ Int 29:935–948CrossRefGoogle Scholar
  16. 16.
    Lobell, J. I. Ortiz-Monasterio, Asner GP, Naylor RL, Falcon WP (2005) Combining field surveys, remote sensing, and regression trees to understand yield variations in an irrigated wheat landscape. Agron J 97:241–249Google Scholar
  17. 17.
    Munkres J (1999) Topology, 2nd edn. Prentice Hall, Upper Saddle RiverGoogle Scholar
  18. 18.
    Navas A, Machín J (2002) Spatial distribution of heavy metals and arsenic in soils of Aragón (northeast Spain): controlling factors and environmental implications. Appl Geochem 17:961–973CrossRefGoogle Scholar
  19. 19.
    ORNL (2009) Oak Ridge National Laboratory distributed active archive center data holdings.Google Scholar
  20. 20.
    Podraza R, Tomaszewski K (2005) KTDA: emerging patterns based data analysis system. In: XXI fall meeting of polish information processing society, pp 213–221Google Scholar
  21. 21.
    PRISM (2009) PRISM (parameter-elevation regressions on independent slopes model) climate mapping system products matrix. PRISM, CorvallisGoogle Scholar
  22. 22.
    Remmel TK, Csillag, F (2006) Mutual information spectra for comparing categorical maps. Int J Remote Sens 27:1425–1452CrossRefGoogle Scholar
  23. 23.
    Rousseeuw J, Croux C (1993) Alternatives to the median absolute deviation. J Am Stat Assoc 88:1273–1283CrossRefGoogle Scholar
  24. 24.
    Rusjan S, Mikos, M (2008) Assessment of hydrological and seasonal controls over the nitrate flushing from a forested watershed using a data mining technique. Hydrol Earth Syst Sci 12:645–656CrossRefGoogle Scholar
  25. 25.
    Seamless (2009) National map seamless server. USGS, DenverGoogle Scholar
  26. 26.
    Steegen A, Govers G, Takkena I, Nachtergaelea J, Poesena J, Merckxb R (2001) Factors controlling sediment and phosphorus export from two Belgian agricultural catchments. J Environ Qual 30:1249–1258CrossRefGoogle Scholar
  27. 27.
    Stepinski T, Ding W, Eick C (2008) Discovering controlling factors of geospatial variables. In: The 16th ACM SIGSPATIAL international conference on advances in geographic information systems (ACM GIS 2008). Irvine, CA, USA, pp 1–4Google Scholar
  28. 28.
    Wang X, Qin Y (2005) Spatial distribution of metals in urban topsoils of Xuzhou (China): controlling factors and environmental implications. Environ Geol 49(6):905–914CrossRefGoogle Scholar
  29. 29.
    White D, Sifneos JC (2002) Regression tree cartography. J Comput Graph Stat 11(3):600–614CrossRefGoogle Scholar
  30. 30.
    Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn (Morgan Kaufmann series in data management systems). Morgan Kaufmann, San FranciscoGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Tomasz F. Stepinski
    • 1
  • Wei Ding
    • 2
  • Christoph F. Eick
    • 3
  1. 1.Lunar and Planetary InstituteHoustonUSA
  2. 2.Department of Computer ScienceUniversity of Massachusetts BostonBostonUSA
  3. 3.Department of Computer ScienceUniversity of HoustonHoustonUSA

Personalised recommendations