Global and Local Spatial Autocorrelation in Predictive Clustering Trees

  • Daniela Stojanova
  • Michelangelo Ceci
  • Annalisa Appice
  • Donato Malerba
  • Sašo Džeroski
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6926)

Abstract

Spatial autocorrelation is the correlation among data values, strictly due to the relative location proximity of the objects that the data refer to. This statistical property clearly indicates a violation of the assumption of observation independence - a pre-condition assumed by most of the data mining and statistical models. Inappropriate treatment of data with spatial dependencies could obfuscate important insights when spatial autocorrelation is ignored. In this paper, we propose a data mining method that explicitly considers autocorrelation when building the models. The method is based on the concept of predictive clustering trees (PCTs). The proposed approach combines the possibility of capturing both global and local effects and dealing with positive spatial autocorrelation. The discovered models adapt to local properties of the data, providing at the same time spatially smoothed predictions. Results show the effectiveness of the proposed solution.

Keywords

Variance Reduction Geographically Weighted Regression Predictive Function Descriptive Space Relative Root Mean Square Error 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bel, D., Allard, L., Laurent, J., Cheddadi, R., Bar-Hen, A.: Cart algorithm for spatial data: application to environmental and ecological data. Computational Statistics and Data Analysis 53, 3082–3093 (2009)MathSciNetCrossRefMATHGoogle Scholar
  2. 2.
    Blockeel, H., De Raedt, L., Ramon, J.: Top-down induction of clustering trees. In: Proc. 15th Intl. Conf. on Machine Learning, pp. 55–63 (1998)Google Scholar
  3. 3.
    Breiman, L., Friedman, J., Olshen, R., Stone, J.: Classification and Regression trees. Wadsworth & Brooks, Belmont (1984)MATHGoogle Scholar
  4. 4.
    Brent, R.: Algorithms for Minimization without Derivatives. Prentice-Hall, Englewood Cliffs (1973)MATHGoogle Scholar
  5. 5.
    Ceci, M., Appice, A.: Spatial associative classification: propositional vs structural approach. Journal of Intelligent Information Systems 27(3), 191–213 (2006)CrossRefGoogle Scholar
  6. 6.
    Demšar, D., Debeljak, M., Lavigne, C., Džeroski, S.: Modelling pollen dispersal of genetically modified oilseed rape within the field. In: Abstracts of the 90th ESA Annual Meeting, p. 152. The Ecological Society of America (2005)Google Scholar
  7. 7.
    Džeroski, S., Gjorgjioski, V., Slavkov, I., Struyf, J.: Analysis of time series data with predictive clustering trees. In: Džeroski, S., Struyf, J. (eds.) KDID 2006. LNCS, vol. 4747, pp. 63–80. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  8. 8.
    Ester, M., Kriegel, H., Sander, J.: Spatial data mining: A database approach. In: Proc. 5th Intl. Symp. on Spatial Databases, pp. 47–66 (1997)Google Scholar
  9. 9.
    Fotheringham, A.S., Brunsdon, C., Charlton, M.: Geographically Weighted Regression: The Analysis of Spatially Varying Relationships. Wiley, Chichester (2002)MATHGoogle Scholar
  10. 10.
    Gora, G., Wojna, A.: RIONA: A classifier combining rule induction and k-NN method with automated selection of optimal neighbourhood. In: Proc. 13th European Conf. on Machine Learning, pp. 111–123 (2002)Google Scholar
  11. 11.
    Huang, Y., Shekhar, S., Xiong, H.: Discovering colocation patterns from spatial data sets: A general approach. IEEE Trans. Knowl. Data Eng. 16(12), 1472–1485 (2004)CrossRefGoogle Scholar
  12. 12.
    Jensen, D., Neville, J.: Linkage and autocorrelation cause feature selection bias in relational learning. In: Proc. 9th Intl. Conf. on Machine Learning, pp. 259–266 (2002)Google Scholar
  13. 13.
    Kühn, I.: Incorporating spatial autocorrelation invert observed patterns. Diversity and Distributions 13(1), 66–69 (2007)Google Scholar
  14. 14.
    Legendre, P.: Spatial autocorrelation: Trouble or new paradigm? Ecology 74(6), 1659–1673 (1993)CrossRefGoogle Scholar
  15. 15.
    LeSage, J.H., Pace, K.: Spatial dependence in data mining. In: Data Mining for Scientific and Engineering Applications, pp. 439–460. Kluwer Academic, Dordrecht (2001)CrossRefGoogle Scholar
  16. 16.
    Li, X., Claramunt, C.: A spatial entropy-based decision tree for classification of geographical information. Transactions in GIS 10, 451–467 (2006)CrossRefGoogle Scholar
  17. 17.
    Malerba, D., Appice, A., Varlaro, A., Lanza, A.: Spatial clustering of structured objects. In: Kramer, S., Pfahringer, B. (eds.) ILP 2005. LNCS (LNAI), vol. 3625, pp. 227–245. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  18. 18.
    Malerba, D., Ceci, M., Appice, A.: Mining model trees from spatial data. In: Proc. 9th European Conf. on Principles of Knowledge Discovery and Databases, pp. 169–180 (2005)Google Scholar
  19. 19.
    Mehta, M., Agrawal, R., Rissanen, J.: Sliq: A fast scalable classifier for data mining. In: Apers, P.M.G., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057, pp. 18–32. Springer, Heidelberg (1996)CrossRefGoogle Scholar
  20. 20.
    Michalski, R.S., Stepp, R.E.: Machine Learning: An Artificial Intelligence Approach. In: Learning From Observation: Conceptual Clustering, pp. 331–363 (2003)Google Scholar
  21. 21.
    Pace, P., Barry, R.: Quick computation of regression with a spatially autoregressive dependent variable. Geographical Analysis 29(3), 232–247 (1997)CrossRefGoogle Scholar
  22. 22.
    Robinson, W.S.: Ecological correlations and the behavior of individuals. American Sociological Review 15, 351–357 (1950)CrossRefGoogle Scholar
  23. 23.
    Scrucca, L.: Clustering multivariate spatial data based on local measures of spatial autocorrelation. Università di Puglia 20/2005 (2005)Google Scholar
  24. 24.
    Tobler, W.: A computer movie simulating urban growth in the Detroit region. Economic Geography 46(2), 234–240 (1970)CrossRefGoogle Scholar
  25. 25.
    Zhang, P., Huang, Y., Shekhar, S., Kumar, V.: Exploiting spatial autocorrelation to efficiently process correlation-based similarity queries. In: Hadzilacos, T., Manolopoulos, Y., Roddick, J., Theodoridis, Y. (eds.) SSTD 2003. LNCS, vol. 2750, pp. 449–468. Springer, Heidelberg (2003)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Daniela Stojanova
    • 1
  • Michelangelo Ceci
    • 2
  • Annalisa Appice
    • 2
  • Donato Malerba
    • 2
  • Sašo Džeroski
    • 1
  1. 1.Department of Knowledge TechnologiesJožef Stefan InstituteLjubljanaSlovenia
  2. 2.Dipartimento di InformaticaUniversità degli Studi di BariBariItaly

Personalised recommendations