Abstract
In Spatial Data Mining, spatial dimension adds a substantial complexity to the data mining task. First, spatial objects are characterized by a geometrical representation and relative positioning with respect to a reference system, which implicitly define both spatial relationships and properties. Second, spatial phenomena are characterized by autocorrelation, i.e., observations of spatially distributed random variables are not location-independent. Third, spatial objects can be considered at different levels of abstraction (or granularity). The recently proposed SPADA algorithm deals with all these sources of complexity, but it offers a solution for the task of spatial association rules discovery. In this paper the problem of mining spatial classifiers is faced by building an associative classification framework on SPADA. We consider two alternative solutions for associative classification: a propositional and a structural method. In the former, SPADA obtains a propositional representation of training data even in spatial domains which are inherently non-propositional, thus allowing the application of traditional data mining algorithms. In the latter, the Bayesian framework is extended following a multi-relational data mining approach in order to cope with spatial classification tasks. Both methods are evaluated and compared on two real-world spatial datasets and results provide several empirical insights on them.
Similar content being viewed by others
References
Aiello, M. (2001). Spatial reasoning: theory and practice. PhD thesis, University of Amsterdam, Holland.
Appice, A., Ceci, M., Lanza, A., Lisi, F. A., & Malerba, D. (2003). Discovery of spatial association rules in georeferenced census data: A relational mining approach. Intelligent Data Analysis, 7(6), 541–566.
Appice, A., Ceci, M., Rawles, S., & Flach, P. A. (2004a). Redundant feature elimination for multi-class problems. In Greiner, R. & Schuurmans, D. (Eds.), Proceedings of the 21st international conference on machine learning, (pp. 33–40). New York: ACM.
Appice, A., Lanza, A., Malerba, D., & Turi, A. (2004b). Mining spatial association rules from census data with ARES. In May, M. & Malerba, D. (Eds.), Notes of the KdNet workshop symposium knowledge-based services for public sector: mining official data.
Baralis, E., & Garza, P. (2003). Majority classification by means of association rules. In Lavrac, N., Gamberger, D., Todorovski, L., & Blockeel, H. (Eds.), Proceedings of the 7th European conference on principles and practice of knowledge discovery in databases, volume 2838 of LNAI. (pp. 35–46). Berlin Heidelberg New York: Springer.
Bellman, R. E. (1961). Adaptive control processes. New Jersey: Princeton University Press.
Blockeel, H. (1998). Top-down induction of first order logical decision trees. PhD thesis, Department of Computer Science, Katholieke Universiteit, Leuven, Belgium.
Ceci, M., Appice, A., & Malerba, D. (2003). Mr-SBC: A multi-relational naive bayes classifier. In Lavrac, N., Gamberger, D., Todorovski, L. & Blockeel, H. (Eds.), Proceedings of the 7th European conference on principles and practice of knowledge discovery in databases, volume 2838 of LNAI. (pp. 95–106). Berlin Heidelberg New York: Springer.
Cohen, W. W. (1995). Fast effective rule induction. In Proceedings of the 12th international conference on machine Learning, (pp. 115–124).
Dehaspe, L. & Toivonen, H. (1999). Discovery of frequent datalog patterns. Data Mining and Knowledge Discovery, 3(1), 7–36.
Dehaspe, L. & Toivonen, H. (2000). Relational data mining, Discovery of relational association rules (chapter), (pp. 189–208). Berlin Heidelberg New York: Springer.
Domingos, P. & Pazzani, M. (1997). On the optimality of the simple bayesian classifier under Zero-Ones loss. Machine Learning, 28(2–3), 103–130.
Dong, G. & Li, J. (1999). Efficient mining of emerging patterns: Discovering trends and differences. In Knowledge discovery and data mining. (pp. 43–52). New York: ACM.
Džeroski, S. & Lavrač, N. (2001). Relational data mining. Berlin Heidelberg New York: Springer.
Egenhofer, M. J. (1991). Reasoning about binary topological relations. In Proceedings of the 2nd symposium on large spatial databases, (pp. 143–160). Zurich, Switzerland.
Ester, M., Kriegel, H., & Sander, J. (1997) Spatial data mining: A database approach. In Proceedings international symposium on large databases, (pp. 47–66).
Fitzpatrick, J. (2001). Geographic variations in health. London: The Stationery Office.
Flach, P. & Lachiche, N. (2004). Naive bayesian classification of structured data. Machine Learning, 57(3), 233–269.
German, S., Bienenenstock, E., & Doursat, R. (1992).Neural networks and the bias/variance dilemma, 4, 1–58.
Getoor, L. (2001). Multi-relational data mining using probabilistic relational models: research summary. In Knobbe, A. & Van der Wallen, D. M. G. (Eds.), Proceedings of the 1st workshop in multi-relational data mining, Freiburg, Germany.
Knobbe, J., Haas, M., & Siebes, A. (2001). Propositionalisation and aggregates. In Raedt, L. D. and Siebes, A. (Eds.), Proceedings of PKDD 2001, volume 2168 of LNAI. (pp. 277–288). Berlin Heidelberg New York: Springer.
Koperski, K. (1999) Progressive refinement approach to spatial data mining. PhD thesis, Computing Science, Simon Fraser University, British Columbia, Canada.
Koperski, K., & Han, J. (1995). Discovery of spatial association rules in geographic information databases. In Proceedings of the 4th international symposium on large spatial databases: Advances in spatial databases, LNCS, volume 951, 47–66. Berlin Heidelberg New York: Springer.
Kramer, S. (1999). Relational learning vs. propositionalization: Investigations in inductive logic programming and propositional machine learning. PhD thesis, Vienna University of Technology, Vienna, Austria.
Kramer, S., Lavrač, N., & Flach, P. (2001). Relational data mining, Propositionalization approaches to relational data mining, LNAI. (pp. 262–291). Berlin Heidelberg New York: Springer.
Krogel, M. A. (2005). On propositionalization for knowledge discovery in relational databases. PhD thesis, Fakultat fur Informatik, Germany.
Krogel, M., Rawles, S., Zelezny, F., Flach, P., Lavrač, N., & Wrobel, S. (2003). Comparative evaluation of approaches to propositionalization. In Horvath, V. and Yamamoto, A. (Eds.), Proceedings of international conference on inductive logic programming, volume 2835 of LNAI, (pp. 197–214). Berlin Heidelberg New York: Springer.
Lavrač, N. & Džeroski, S. (1994). Inductive logic programming: Techniques and applications. Chichester, UK: Ellis Horwood.
Leiva, H. A. (2002). MRDTL: A multi-relational decision tree learning algorithm. Master’s thesis, University of Iowa, USA.
Li, W., Han, J., & Pei, J. (2001). CMAR: Accurate and efficient classification based on multiple class-association rules. In ICDM, (pp. 369–376) San Jose, California.
Lisi, F. A., & Malerba, D. (2004). Inducing multi-level association rules from multiple relations. Machine Learning, 55, 175–210.
Liu, B., Hsu, W., & Ma, Y. (1998). Integrative classification and association rule mining. In Proceedings of AAAI Conference of knowledge discovery in databases.
Ludl, M. C., & Widmer, G. (2000). Relative unsupervised discretization for association rule mining. In Zighed, D. A., Komorowski, H. J. & Zytkow, J. M. (Eds.), Proceedings of the 4th European conference on principles of data mining and knowledge discovery, volume 1910 of LNCS, (pp. 148–158). Berlin Heidelberg New York: Springer.
Malerba, D., Esposito, F., Lanza, A., Lisi, F. A., & Appice, A. (2003). Empowering a gis with inductive learning capabilities: The case of ingens. Journal of Computers, Environment and Urban Systems, Elsevier Science, 27, 265–281.
Mannila, H. & Toivonen, H. (1997). Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery, 1(3), 241–258.
Mitchell, T. (1997). Machine learning. New York: McGraw Hill.
Modrzejewski, M. (1993). Feature selection using roughsets theory. In Proceedings of the European conference on machine learning, (pp. 213–226). Berlin Heidelberg New York: Springer.
Morimoto, Y. (2001). Mining frequent neighboring class sets in spatial databases. In Proceedings of the 7th ACM SIGKDD international conference on knowledge discovery and data mining, (pp. 353– 358).
Muggleton, S. (1995). Inverse entailment and progol. New Generation Computing, Special issue on Inductive Logic Programming, 13(3-4), 245–286.
Pazzani, M., Mani, S., & Shankle, W. (1997). Beyond concise and colorful: Learning intelligible rules. In Proceedings of the 4th international conference on knowledge discovery and data mining, (pp. 235–238). Menlo Park, California: AAAI.
Quinlan, J. R. (1993). C45: programs for machine learning. San Mateo, California: Morgan Kaufmann.
Robinson, J. A. (1965). A machine oriented logic based on the resolution principle. Journal of the ACM, 12, 23–41.
Sharma, J. (1996). Integrated spatial reasoning in geographic information systems: Combining topology and direction. PhD thesis, University of Maine, Bangor, Maine.
Shekhar, S., Schrater, P. R., Vatsavai, R., Wu, W. & Chawla, S. (2002). Spatial contextual classification and prediction models for mining geospatial data. IEEE Transactions on Multimedia, 4(2), 174–188.
Srinivasan, A. & King, R. (1996). Feature construction with inductive logic programming: A study of quantitative predictions of biological activity aided by structural attributes. In Muggleton, S. (Ed.), Proceedings of the 6th international workshop on inductive logic programming, (pp. 352–367). Stockholm University, Royal Institute of Technology, Stockholm, Sweden.
Wang, S. S., Liu, D. Y., Wang, X. Y. & Liu, J. (2006). Spatial reasoning based spatial data mining for precision agriculture. In Proceedings of APWeb workshops 2006, (pp. 506–510).
Yin, X. & Han, J. (2003). CPAR: Classification based on predictive association rules. In SIAM International conference on data mining.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ceci, M., Appice, A. Spatial associative classification: propositional vs structural approach. J Intell Inf Syst 27, 191–213 (2006). https://doi.org/10.1007/s10844-006-9950-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-006-9950-x