Abstract
We propose a general mechanism to represent the spatial transactions in a way that allows the use of the existing data mining methods. Our proposal allows the analyst to exploit the layered structure of geographical information systems in order to define the layers of interest and the relevant spatial relations among them. Given a reference object, it is possible to describe its neighborhood by considering the attribute of the object itself and the objects related by the chosen relations. The resulting spatial transactions may be either considered like “traditional” transactions, by considering only the qualitative spatial relations, or their spatial extension can be exploited during the data mining process. We explore both these cases. First we tackle the problem of classifying a spatial dataset, by taking into account the spatial component of the data to compute the statistical measure (i.e., the entropy) necessary to learn the model. Then, we consider the task of extracting spatial association rules, by focusing on the qualitative representation of the spatial relations. The feasibility of the process has been tested by implementing the proposed method on top of a GIS tool and by analyzing real world data.
Similar content being viewed by others
References
Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB), Santiago, Chile (pp. 487–499). San Mateo, California: Morgan Kaufmann.
Agrawal, R., Imieliński, T., & Swami, A. (1993). Mining association rules between sets of items in large databases. SIGMOD Record (ACM Special Interest Group on Management of Data), 22(2), 207–216.
Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and regression trees. California: Wadsworth and Brooks.
Clementini, E., & Di Felice, P. (1993). A comparison of methods for representing topological relationships. Information Sciences,3, 149–178.
Clementini, E., Di Felice, P., & Oosterorn, O. (1993). A small set of formal topological relationships suitable for end-user interaction. In LNCS 692, Singapore (SSD ’93) (pp. 277–295). Berlin Heidelberg New York: Springer.
Duda, R.O., & Hart, P. E. (1972). Pattern classification and scene analysis. New York: Wiley.
Egenhofer, M., (1991). Reasoning about binary topological relations. In A.P. Buchmann, O. Günther, T.R. Smith, & Y.F. Wang (Eds.), Proceedinds of the 2nd International Symposium on Large Spatial Databases (SSD), Lecture Notes in Computer Science, Berlin Heidelberg New York: Springer. Zurich, Switzerland (pp. 143–160).
Egenhofer, M. J., Rashid, A., Shariff, B.M. (1998). Metric details for natural-language spatial relations. ACM Transactions on Information Systems, 16(4), 295–321.
Ester, M., Frommelt, A., Kriegel, H., & Sander, J. (2000). Spatial data mining: Database primitives, algorithms and efficient DBMS support. Data Mining and Knowledge Discovery, 4(2/3), 193–216.
Han, J., & Kamber, M. (2001). Data mining: concepts and techniques. San Francisco, California: Morgan Kaufmann.
Huang, Y., Shekhar, S., & Xiong, H. (2004). Discovering colocation patterns from spatial data sets: A general approach. IEEE Transactions on Knowledge and Data Engineering, 16(12), 1472–1485.
Koperski, K.,& Han, J. (1995). Discovery of spatial association rules in geographic information databases. In M.J. Egenhofer, & J.R.Herring (Eds.), Proceedings of the 4th International Symposium Advances in Spatial Databases, SSD, volume 951 of Lecture Notes in Computer Science, (pp. 47–66), Portland, Maine. Berlin Heidelberg New York: Springer.
Loh, W.-Y., & Shih, Y.-S. (1997). Split selection methods for classification trees. Statistica Sinica. 7, 815–840.
Loh, W.Y., & Vanichsetakul, N. (1988). Tree-structured classification via generalized discriminant analysis. Journal of the American Statistical Association, 83, 715–728.
Malerba, D., Esposito, F., & Lisi, F.A. (2001). Mining spatial association rules in census data. In Proceedings of the Joint Conference on New Techniques and Technologies for Statistcs and Exchange of Technology and Know-how, Crete, Greece (pp. 541–550).
Malerba, D. & Lisi, F.A. (2001). An ILP method for spatial association rule mining. In A. Knobbe & D. van derWallen, (Eds.), Notes of the ECML/ PKDD 2001Workshop onMulti-RelationalData Mining , Freiburg, Germany (pp. 18–29).
Mitchell, T. M. (1997). Machine learning. New York: McGraw-Hill.
Openshaw, S., & Turton, I. (2001). Using a geographical explanations machine to explore spatial factors relating to primary school performance. Geographical and Environmental Modelling, 5(1), 85–101.
Quinlan, J.R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106.
Quinlan, J.R. (1992). C45: Programs for machine learning. San Mateo, California: Morgan Kaufmann.
Rinzivillo, S., & Turini, F. (2004). Classification in geographical information system. In 8th European Conference on Principles and Practice of Knowledfe Discovery in Databases, Pisa, Italy (pp. 374–385). Berlin Heidelberg New York: Springer.
Rinzivillo, S., & Turini, F. (2005). Extracting spatial association rules from spatial transactions. In C. Shahabi, & O. Boucelma, (Eds.), 13th ACM International Workshop on Geographic Ingormation Systems, ACM-GIS, Bremen, Germany (pp. 79–86). New York: ACM.
Rumelhart, D. E., Hinton, G. E. & Hinton, R. J. (1986). Learning internal representations by back-propagating errors. In D.E. Rumelhart, (Ed.), Parallel Distributed Processing: Explorations in the Microstructure of Cognition (pp. 533–536) Cambridge, Massachusetts: Bradford Books.
Weiss, S.M. (1991). Kulikowski, C.A.: Computer systems that learn: classification and prediction methods from statistics, neural networks, machine learning and expert systems. San Mateo, California: Morgan Kaufmann.
Witten, I.H., & Frank, E. (2005). Data Mining: practical machine learning tools and techniques. San Mateo, California: Morgan Kaufmann.
Zaniolo, C. & Wang, H. (1999). Logic-based userdefined aggregates for the next generation of database systems. In K.R. Apt, V. Marek, M. Truszczynski, & D.S. Warren, (Eds.), The logic programming paradigm: current trends and future directions (pp. 401–424). Berlin Heidelberg New York: Springer.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Rinzivillo, S., Turini, F. Knowledge discovery from spatial transactions. J Intell Inf Syst 28, 1–22 (2007). https://doi.org/10.1007/s10844-006-0001-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-006-0001-4