Abstract
We describe an efficient implementation (MRDTL-2) of the Multi-relational decision tree learning (MRDTL) algorithm [23] which in turn was based on a proposal by Knobbe et al. [19]. We describe some simple techniques for speeding up the calculation of sufficient statistics for decision trees and related hypothesis classes from multi-relational data. Because missing values are fairly common in many real-world applications of data mining, our implementation also includes some simple techniques for dealing with missing values. We describe results of experiments with several real-world data sets from the KDD Cup 2001 data mining competition and PKDD 2001 discovery challenge. Results of our experiments indicate that MRDTL is competitive with the state-of-the-art algorithms for learning classifiers from relational databases.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Blockeel, H.: Top-down induction of first order logical decision trees. Department of Computer Science, Katholieke Universiteit Leuven (1998)
Blockeel, H., De Raedt, L.: Relational Knowledge Discovery in Databases. In: Proceedings of the sixth internal workshop of Inductive Logic Programming. LNCS (LNAI), vol. 1312, pp. 199–212. Springer, Heidelberg (1996)
Caragea, D., Silvescu, A., Honavar, V.: Decision Tree Induction from Distributed, Heterogeneous, Autonomous Data Sources. In: Proceedings of the Conference on Intelligent Systems Design and Applications (ISDA 2003) (2003) (in press)
Caragea, D., Silvescu, A., Honavar, V.: Invited Chapter. Toward a Theoretical Framework for Analysis and Synthesis of Agents That Learn from Distributed Dynamic Data Sources Technical Report. In: Emerging Neural Architectures Based on Neuroscience, Springer, Berlin (2001)
Cheng, J., Krogel, M., Sese, J., Hatzis, C., Morishita, S., Hayashi, H., Page, D.: KDD Cup 2001 Report. ACM Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD) Explorations 3(2) (2002)
Costa, S., et al.: Query Transformations for Improving the Efficiency of ILP Systems. Journal of Machine Learning Research (2002)
Coursac, I., Duteil, N., Lucas, N.: pKDD 2001 Discovery Challenge - Medical Domain. In: PKDD Discovery Challenge 2001, vol. 3(2) (2002)
Dehaspe, L., De Raedt, L.: Mining Association Rules in Multiple Relations. In: Džeroski, S., Lavrač, N. (eds.) ILP 1997. LNCS, vol. 1297, pp. 125–132. Springer, Heidelberg (1997)
Dzeroski, S., Lavrac, N.: Relational Data Mining. Springer, Heidelberg (2001)
Fayyad, U.M., Irani, K.B.: On the handling of continuous-valued attributes in decision tree generation. Machine Learning 8 (1992)
Friedman, N., Getoor, L., Koller, D., Pfeffer, A.: Learning probabilistic relational models. In: Proceedings of the 6th International Joint Conference on Artificial Intelligence (1999)
Getoor, L.: Multi-relational data mining using probabilistic relational models: research summary. In: Proceedings of the First Workshop in Multi-relational Data Mining (2001)
Ito, M., Ohwada, H.: Efficient Database Access for Implementing a Scalable ILP engine. In: Work-In-Progress Report of the Eleventh International Conference on Inductive Logic Programming (2001)
Jaeger, M.: Relational Bayesian networks. In: Proceedings of the 13th Annual Conference on Uncertainty in Artificial Intelligence (UAI-1997) (1997)
Jensen, D., Neville, J.: Autocorrelation and Linkage Cause Bias in Evaluation of Relational Learners. In: Matwin, S., Sammut, C. (eds.) ILP 2002. LNCS (LNAI), vol. 2583, pp. 101–116. Springer, Heidelberg (2003)
Karalic, A., Bratko, I.: First order regression. Machine Learning 26 (1997)
The KDD Cup 2001 dataset, http://www.cs.wisc.edu/~dpage/kddcup2001/
Kersting, K., De Raedt, L.: Bayesian Logic Programs. In: Proceedings of the Work-in-Progress Track at the 10th International Conference on Inductive Logic Programming (2000)
Knobbe, J., Blockeel, H., Siebes, A., Van der Wallen, D.: Multi-relational Data Mining. In: Proceedings of Benelearn (1999)
Knobbe, J., Blockeel, H., Siebes, A., Van der Wallen, D.: Multi-relational decision tree induction. In: Żytkow, J.M., Rauch, J. (eds.) PKDD 1999. LNCS (LNAI), vol. 1704, pp. 378–383. Springer, Heidelberg (1999)
Koller, D.: Probabilistic Relational Models. In: Džeroski, S., Flach, P.A. (eds.) ILP 1999. LNCS (LNAI), vol. 1634, p. 3. Springer, Heidelberg (1999)
Krogel, M., Wrobel, S.: Transformation-Based Learning Using Multirelational Aggregation. In: Rouveirol, C., Sebag, M. (eds.) ILP 2001. LNCS (LNAI), vol. 2157, p. 142. Springer, Heidelberg (2001)
Leiva, H.A.: A multi-relational decision tree learning algorithm. M.S. thesis. Deparment of Computer Science. Iowa State University (2002)
Morik, K., Brockhausen, P.: A multistrategy approach to relational discovery in databases. Machine Learning 27(3), 287–312 (1997)
The mutagenesis dataset, http://web.comlab.ox.ac.uk/oucl/research/areas/machlearn/mutagenesis.html
Pfeffer, A.: A Bayesian Language for Cumulative Learning. In: Proceedings of AAAI 2000 Workshop on Learning Statistical Models from Relational Data, AAAI Press, Menlo Park (2000)
The PKDD 2001 Discovery Challenge dataset, http://www.uncc.edu/knowledgediscovery
Reinoso-Castillo, J.: Ontology-driven information extraction and integration from Heterogeneous Distributed Autonomous Data Sources. M.S. Thesis. Department of Computer Science. Iowa State University (2002)
Quinlan, R.: Improved Use of Continuous Attributes in C4.5. Journal of Artificial Intelligence Research 4 (1996)
Quinlan, R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)
Srinivasan, A., King, R.D., Muggleton, S.: The role of background knowledge: using a problem from chemistry to examine the performance of an ILP program. Technical Report PRG-TR-08-99, Oxford University Computing Laboratory, Oxford (1999)
Wang, X., Schroeder, D., Dobbs, D., Honavar, V.: Data-Driven Discovery of Rules for Protein Function Classification Based on Sequence Motifs. Information Sciences (2003) (in press)
Zhang, J., Honavar, V.: Learning Decision Tree Classifiers from Attribute-Value Taxonomies and Partially Specified Data. In: Proceedings of the International Conference on Machine Learning. Washington, DC (2003) ( in press)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Atramentov, A., Leiva, H., Honavar, V. (2003). A Multi-relational Decision Tree Learning Algorithm – Implementation and Experiments. In: Horváth, T., Yamamoto, A. (eds) Inductive Logic Programming. ILP 2003. Lecture Notes in Computer Science(), vol 2835. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39917-9_5
Download citation
DOI: https://doi.org/10.1007/978-3-540-39917-9_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20144-1
Online ISBN: 978-3-540-39917-9
eBook Packages: Springer Book Archive