Propositionalization Approaches to Relational Data Mining

Kramer, Stefan; Lavrač, Nada; Flach, Peter

doi:10.1007/978-3-662-04599-2_11

Stefan Kramer²,
Nada Lavrač³ &
Peter Flach⁴

524 Accesses
116 Citations

Abstract

This chapter surveys methods that transform a relational representation of a learning problem into a propositional (feature-based, attribute-value) representation. This kind of representation change is known as propositionalization. Taking such an approach, feature construction can be decoupled from model construction. It has been shown that in many relational data mining applications this can be done without loss of predictive performance. After reviewing both general-purpose and domaindependent propositionalization approaches from the literature, an extension to the Linus propositionalization method that overcomes the system’s earlier inability to deal with non-determinate local variables is described.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A.I. Verkamo. Fast discovery of association rules. In U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 307–328. MIT press, Cambridge, MA, 1996.
Google Scholar
E. Alphonse and C. Rouveirol. Lazy propositionalisation for relational learning. Proceedings of the Fourteenth European Conference on Artificial Intelligence, pages 256–260. IOS Press, Amsterdam, 2000.
Google Scholar
I. Bratko, I. Mozetič, and N. Lavrač. KARDIO: A Study in Deep and Qualitative Knowledge for Expert Systems. MIT Press, Cambridge, MA, 1989.
Google Scholar
C.J.C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2), pages 121–167, 1998.
Article Google Scholar
B. Cestnik, I. Kononenko, and I. Bratko. ASSISTANT 86: A knowledge elicitation tool for sophisticated users. In Proceedings of the Second European Working Session on Learning, pages 31–44. Sigma Press, Wilmslow, UK, 1987.
Google Scholar
Y. Chevaleyre and J-D. Zucker. Noise-tolerant rule induction from multi-instance data. Proceedings of the ICML-2000 workshop on Attribute- Value and Relational Learning: Crossing the Boundaries, pages 1–11. Stanford University, Stanford, CA, 2000.
Google Scholar
P. Clark and R. Boswell. Rule induction with CN2: Some recent improvements. In Proceedings Fifth European Working Session on Learning, pages 151–163. Springer, Berlin, 1991.
Google Scholar
P. Clark and T. Niblett. The CN2 induction algorithm. Machine Learning, 3(4):261–283, 1989.
Google Scholar
W.W. Cohen. PAC-learning nondeterminate clauses. In Proceedings of the Twelfth National Conference on Artificial Intelligence, pages 676–681. AAAI Press, Menlo Park, CA, 1994.
Google Scholar
W.W. Cohen. Learning trees and rules with set-valued features. In Proceedings of the Thirteenth National Conference on Artificial Intelligence, pages 709–716. AAAI Press, Menlo Park, CA, 1996.
Google Scholar
D.J. Cook and L.B. Holder. Substructure discovery using minimum description length and background knowledge. Journal of Artificial Intelligence Research, 1:231–255, 1994.
Google Scholar
L. Dehaspe and H. Toivonen. Discovery of frequent Datalog patterns. Data Mining and Knowledge Discovery, 3(l):7–36, 1999.
Article Google Scholar
L. De Raedt. Logical settings for concept learning. Artificial Intelligence, 95:187–201, 1997.
Article MathSciNet MATH Google Scholar
L. De Raedt. Attribute-value learning versus inductive logic programming: The missing links (extended abstract). In Proceedings of the Eighth International Conference on Inductive Logic Programming, pages 1–8. Springer, Berlin, 1998.
Chapter Google Scholar
T.G. Dietterich, R.H. Lathrop and T. Lozano-Perez. Solving the multiple-instance problem with axis-parallel rectangles. Artificial Intelligence 89(1–2): 31–71, 1997.
Article MATH Google Scholar
S. Dzeroski, H. Blocked, B. Kompare, S. Kramer, B. Pfahringer, and W. Van Laer. Experiments in Predicting Biodegradability. In Proceedings of the Ninth International Workshop on Inductive Logic Programming, pages 80–91. Springer, Berlin, 1999.
Chapter Google Scholar
D. Fensel, M. Zickwolff, and M. Wiese. Are substitutions the better examples? Learning complete sets of clauses with Frog. In Proceedings of the Fifth International Workshop on Inductive Logic Programming, pages 453–474. Department of Computer Science, Katholieke Universiteit Leuven, 1995.
Google Scholar
P. Flach. Knowledge representation for inductive learning. In Proceedings of the European Conference on Symbolic and Quantitative Approaches to Reasoning and Uncertainty, pages 160–167. Springer, Berlin, 1999.
Chapter Google Scholar
P. Flach, C. Giraud-Carrier, and J.W. Lloyd. Strongly typed inductive concept learning. In Proceedings of the Eighth International Conference on Inductive Logic Programming, pages 185–194. Springer, Berlin, 1998.
Chapter Google Scholar
P. Flach and N. Lachiche. 1BC: A first-order Bayesian classifier. In Proceedings of the Ninth International Workshop on Inductive Logic Programming, pages 92–103. Springer, Berlin, 1999.
Chapter Google Scholar
P. Flach and N. Lachiche. Confirmation-guided discovery of first-order rules with Tertius. Machine Learning, 42(1–2): 61–95, 2001.
Article MATH Google Scholar
P. Geibel and F. Wysotzki. Relational learning with decision trees. In Proceedings Twelfth European Conference on Artificial Intelligence, pages 428–432. IOS Press, Amsterdam, 1996.
Google Scholar
G. Klopman. Artificial intelligence approach to structure-activity studies: computer automated structure evaluation of biological activity of organic molecules. Journal of the American Chemical Society, 106:7315–7321, 1984.
Article Google Scholar
G. Klopman. MultiCASE: A hierarchical computer automated structure evaluation program. Quantitative Structure Activity Relationships, 11:176–184, 1992.
Article Google Scholar
W. Klösgen. EXPLORA: A multipattern and multistrategy discovery assistant. In U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 249–271. AAAI Press, Menlo Park, CA, 1996.
Google Scholar
R. Kohavi, D. Sommerfield, and J. Dougherty. Data mining using MLC++: A machine learning library in C++. In Proceedings of the Eighth IEEE International Conference on Tools for Artificial Intelligence, pages 234–245. IEEE Computer Society Press, Los Alamitos, CA, 1996. http://www.sgi.com/Technology/mlc.
Chapter Google Scholar
D. Koller and M. Sahami. Toward optimal feature selection. In Proceedings of the Thirteenth International Conference on Machine Learning, pages 284–292. Morgan Kaufmann, San Francisco, CA, 1996.
Google Scholar
S. Kramer. Structural regression trees. In Proceedings of the Thirteenth National Conference on Artificial Intelligence, pages 812–810. AAAI Press, Menlo Park, CA, 1996.
Google Scholar
S. Kramer and E. Frank. Bottom-Up propositionalization. In Proceedings of the ILP-2000 Work-in-Progress Track, pages 156–162. Imperial College, London, 2000.
Google Scholar
S. Kramer, B. Pfahringer, and C. Helma. Stochastic propositionalization of non-determinate background knowledge. In Proceedings of the Eighth International Conference on Inductive Logic Programming, pages 80–94. Springer, Berlin, 1998.
Chapter Google Scholar
N. Lavrac and S. Dšeroski. Inductive Logic Programming: Techniques and Applications. Ellis Horwood, Chichester, 1994. Freely available at http://www-ai.ijs.si/SasoDzeroski/ILPBook/.
MATH Google Scholar
N. Lavrac, S. Dzeroski, and M. Grobelnik. Learning nonrecursive definitions of relations with LINUS. In Proceedings of the Fifth European Working Session on Learning, pages 265–281. Springer-Verlag, Berlin, 1991.
Google Scholar
N. Lavrač, D. Gamberger, P. Turney. A relevancy filter for constructive induction. IEEE Intelligent Systems, 13: 50–56, 1998.
Google Scholar
H. Mannila and H. Toivonen. Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery, 1 :241–258, 1997.
Article Google Scholar
D. Michie, S. Muggleton, D. Page, and A. Srinivasan. To the international computing community: A new East-West challenge. Technical report, Oxford University Computing laboratory, Oxford, UK, 1994.
Google Scholar
F. Mizoguchi, H. Ohwada, M. Daidoji, and S. Shirato. Learning rules that classify ocular fundus images for glaucoma diagnosis. In Proceedings of the Sixth International Workshop on Inductive Logic Programming, pages 146–162. Springer-Verlag, Berlin, 1996.
Google Scholar
I. Mozetič. NEWGEM: Program for learning from examples, technical documentation and user’s guide. Reports of Intelligent Systems Group UIUCDCS-F-85–949, Department of Computer Science, University of Illinois, Urbana Champaign, IL, 1985.
Google Scholar
S. Muggleton. Inverse entailment and Progol. New Generation Computing, 13: 245–286, 1995.
Article Google Scholar
S. Muggleton and C. Feng. Efficient induction of logic programs. In S. Muggleton, editor, Inductive Logic Programming, pages 281–298. Academic Press, London, 1992.
Google Scholar
S. Muggleton, R.D. King, and M.J.E Sternberg. Protein secondary structure prediction using logic. In Proceedings of the Second International Workshop on Inductive Logic Programming, pages 228–259. TM-1182, ICOT, Tokyo, 1992.
Google Scholar
S. Muggleton, A. Srinivasan, R. King, and M. Sternberg. Biochemical knowledge discovery using Inductive Logic Programming. In Proceedings of the First Conference on Discovery Science, pages 326–341. Springer, Berlin, 1998.
Google Scholar
A.L. Oliveira and A. Sangiovanni-Vincentelli. Constructive induction using a non-greedy strategy for feature selection. In Proceedings of the Ninth International Workshop on Machine Learning, pages 354–360. Morgan Kaufmann, San Francisco, CA, 1992.
Google Scholar
G. Pagallo and D. Haussler. Boolean feature discovery in empirical learning. Machine Learning, 5:71–99, 1990.
Article Google Scholar
J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA, 1993.
Google Scholar
B.L. Richards and R.J. Mooney. Learning relations by pathfinding. In Proceedings of the Tenth National Conference on Artificial Intelligence, pages 50–55. AAAI Press, Menlo Park, CA, 1992.
Google Scholar
M. Sebag and C. Rouveirol. Tractable induction and classification in first order logic via stochastic matching. In Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence, pages 888–893. Morgan Kaufmann, San Francisco, CA, 1997.
Google Scholar
A. Srinivasan and R. King. Feature construction with inductive logic programming: a study of quantitative predictions of biological activity aided by structural attributes. Data Mining and Knowledge Discovery, 3(l):37–57, 1999.
Article Google Scholar
A. Srinivasan, R. King and D.W. Bristol, An assessment of submissions made to the Predictive Toxicology Evaluation Challenge. Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, 270–275. Morgan Kaufmann, San Francisco, CA, 1999.
Google Scholar
A. Srinivasan, S. Muggleton, R.D. King and M. Sternberg. Theories for mutagenicity: a study of first-order and feature based induction. Artificial Intelligence, 85(1–2):277–299, 1996.
Article Google Scholar
I. Stahl. Predicate invention in inductive logic programming. In L. De Raedt, editor, Advances in Inductive Logic Programming, pages 34–47. IOS Press, Amsterdam, 1996.
Google Scholar
P. Turney. Low size-complexity inductive logic programming: The East-West challenge considered as a problem in cost-sensitive classification. In L. De Raedt, editor, Advances in Inductive Logic Programming, pages 308–321. IOS Press, Amsterdam, 1996.
Google Scholar
V. Vapnik. Estimation of Dependencies Based on Empirical Data. Springer Verlag, Berlin, 1982.
Google Scholar
V. Vapnik. The Nature of Statistical Learning Theory. Springer Verlag, Berlin, 1995.
Book MATH Google Scholar
J. Wnek and R.S. Michalski. Hypothesis-driven constructive induction in AQ17: A method and experiments. In Proceedings of IJCAI-91 Workshop on Evaluating and Changing Representations in Machine Learning, pages 13–22. Sydney, Australia, 1991.
Google Scholar
S. Wrobel. An algorithm for multi-relational discovery of subgroups. In Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery, pages 78–87. Springer, Berlin, 1997.
Chapter Google Scholar
J-D. Zucker and J-G. Ganascia. Representation changes for efficient learning in structural domains. In Proceedings of the Thirteenth International Conference on Machine Learning, pages 543–551. Morgan Kaufmann, San Francisco, CA, 1996.
Google Scholar
J-D. Zucker and J-G. Ganascia. Learning structurally indeterminate clauses. In Proceedings of the Eighth International Conference on Inductive Logic Programming, pages 235–244. Springer, Berlin, 1998.
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Machine Learning and Natural Language Processing Lab, Institute for Computer Science, Albert-Ludwigs University Freiburg, Am Flughafen 17, D-79110, Freiburg i. Br., Germany
Stefan Kramer
Jožef Stefan Institute, Jamova 39, 1000, Ljubljana, Slovenia
Nada Lavrač
Department of Computer Science, University of Bristol, The Merchant Venturers Building, Woodland Road, Bristol, BS8 1UB, UK
Peter Flach

Authors

Stefan Kramer
View author publications
You can also search for this author in PubMed Google Scholar
Nada Lavrač
View author publications
You can also search for this author in PubMed Google Scholar
Peter Flach
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Jožef Stefan Institute, Jamova 39, 1000, Ljubljana, Slovenia
Sašo Džeroski & Nada Lavrač &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kramer, S., Lavrač, N., Flach, P. (2001). Propositionalization Approaches to Relational Data Mining. In: Džeroski, S., Lavrač, N. (eds) Relational Data Mining. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-04599-2_11

Download citation

DOI: https://doi.org/10.1007/978-3-662-04599-2_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-07604-6
Online ISBN: 978-3-662-04599-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics