Abstract
We describe an algorithm for constructing a set of tree-like conjunctive relational features by combining smaller conjunctive blocks. Unlike traditional level-wise approaches which preserve the monotonicity of frequency, our block-wise approach preserves monotonicity of feature reducibility and redundancy, which are important in propositionalization employed in the context of classification learning. With pruning based on these properties, our block-wise approach efficiently scales to features including tens of first-order atoms, far beyond the reach of state-of-the art propositionalization or inductive logic programming systems.
Article PDF
Similar content being viewed by others
References
Blockeel, H., Dehaspe, L., Demoen, B., Janssens, G., Ramon, J., & Vandecasteele, H. (2002). Improving the efficiency of inductive logic programming through the use of query packs. The Journal of Artificial Intelligence Research, 16(1), 135–166.
Bringmann, B., Zimmermann, A., Raedt, L. D., & Nijssen, S. (2006). Don’t be afraid of simpler patterns. In PKDD ’06: 10th European conference on principles and practice of knowledge discovery in databases (pp. 55–66). Berlin: Springer.
Davis, J., Burnside, E., Page, D., Dutra, I., & Costa, V. S. (2005). Learning Bayesian networks of rules with SAYU. In Proceedings of the 4th international workshop on Multi-relational mining. New York: ACM.
Dechter, R. (2003). Constraint processing. San Mateo: Morgan Kaufmann.
Dehaspe, L., & Toivonen, H. (1999). Discovery of frequent datalog patterns. Data Mining and Knowledge Discovery, 3(1), 7–36.
Dolsak, B., & Muggleton, S. (1992). The application of inductive logic programming to finite element mesh design. In Inductive logic programming (pp. 453–472). San Diego: Academic Press.
Fagin, R. (1983). Degrees of acyclicity for hypergraphs and relational database schemes. Journal of the Association for Computing Machinery, 30(3), 514–550.
Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., & Lin, C.-J. (2008). Liblinear: a library for large linear classification. Journal of Machine Learning Research, 9, 1871–1874.
Helma, C., King, R. D., Kramer, S., & Srinivasan, A. (2001). The predictive toxicology challenge 2000–2001. Bioinformatics, 17(1), 107–108.
Koopman, A., & Siebes, A. (2009). Characteristic relational patterns. In KDD ’09: proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 437–446). New York: ACM.
Kramer, S., & De Raedt, L. (2001). Feature construction with version spaces for biochemical applications. In ICML ’01: proceedings of the eighteenth international conference on machine learning (pp. 258–265). San Mateo: Morgan Kaufmann.
Krogel, M. A., & Wrobel, S. (2001). Transformation-based learning using multirelational aggregation. In ILP ’01: proceedings of the 11th international conference on inductive logic programming (pp. 142–155). Berlin: Springer.
Krogel, M.-A., Rawles, S., Železný, F., Flach, P. A., Lavrač, N., & Wrobel, S. (2003). Comparative evaluation of approaches to propositionalization. In International conference on inductive logic programming (ILP 03’). Berlin: Springer.
Kuželka, O., & Železný, F. (2009). Block-wise construction of acyclic relational features with monotone irreducibility and relevancy properties. In ICML 2009: the 26th int. conf. on machine learning.
Landwehr, N., Passerini, A., De Raedt, L., & Frasconi, P. (2006). kFOIL: learning simple relational kernels. In AAAI’06: proceedings of the 21st national conference on artificial intelligence (pp. 389–394). Menlo Park: AAAI Press.
Landwehr, N., Kersting, K., & De Raedt, L. (2007). Integrating naïve bayes and FOIL. Journal of Machine Learning Research, 8, 481–507.
Lavrač, N., & Flach, P. A. (2001). An extended transformation approach to inductive logic programming. ACM Transactions on Computational Logic, 2(4), 458–494.
Lavrač, N., Gamberger, D., & Jovanoski, V. (1999). A study of relevance for learning in deductive databases. Journal of Logic Programming, 40(2/3), 215–249.
Lodhi, H., & Muggleton, S. (2005). Is mutagenesis still challenging. In International conference on inductive logic programming (ILP ’05), late-breaking papers (pp. 35–40).
Muggleton, S. (1995). Inverse entailment and Progol. New Generation Computing, Special Issue on Inductive Logic Programming, 13(3–4), 245–286.
Nienhuys-Cheng, S.-H., & de Wolf, R. (1997). Foundations of inductive logic programming. New York: Springer.
Nijssen, S., & Kok, J. N. (2005). The Gaston tool for frequent subgraph mining. Electronic Notes in Theoretical Computer Science, 127(1), 77–87.
Perkins, S., & Theiler, J. (2003). Online feature selection using grafting. In ICML (pp. 592–599). Menlo Park: AAAI Press.
Quinlan, J. R. (1990). Learning logical definitions from relations. Machine Learning, 5(3), 239–266.
Scheffer, T., & Herbrich, R. (1997). Unbiased assessment of learning algorithms. In 15th international joint conference on artificial intelligence (IJCAI ’97) (pp. 798–803).
Srinivasan, A., King, R. D., Muggleton, S., & Sternberg, M. J. E. (1997). Carcinogenesis predictions using ILP. In ILP ’97: proceedings of the 7th international workshop on inductive logic programming (pp. 273–287). Berlin: Springer.
Swamidass, S. J., Chen, J., Bruand, J., Phung, P., Ralaivola, L., & Baldi, P. (2005). Kernels for small molecules and the prediction of mutagenicity, toxicity and anti-cancer activity. Bioinformatics, 21(1), 359–368.
Van Leeuwen, M., Vreeken, J., & Siebes, A. (2006). Compression picks item sets that matter. In PKDD ’06: 10th European conference on principles and practice of knowledge discovery in databases (pp. 585–592). Berlin: Springer.
Vapnik, V. N. (1995). The nature of statistical learning theory. Berlin: Springer.
Železný, F., & Lavrač, N. (2006). Propositionalization-based relational subgroup discovery with RSD. Machine Learning, 62, 33–63.
Witten, I. H., & Frank, E. (2005). Data mining: practical machine learning tools and techniques (2nd ed.). San Francisco: Morgan Kaufmann.
Wörlein, M., Meinl, T., Fischer, I., & Philippsen, M. (2005). A quantitative comparison of the subgraph miners MoFa, gSpan, FFSM, and Gaston. In LNCS. PKDD 2005, 9th European conference on principles and practice of knowledge discovery in databases (pp. 392–403). Berlin: Springer.
Yannakakis, M. (1981). Algorithms for acyclic database schemes. In International conference on very large data bases (VLDB ’81) (pp. 82–94).
Žáková, M., Železný, F., Garcia-Sedano, J., Tissot, C. M., Lavrač, N., Křemen, P., & Molina, J. (2007). Relational data mining applied to virtual engineering of product designs. In International conference on inductive logic programming (ILP ’07). Berlin: Springer.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editors: Hendrik Blockeel, Karsten Borgwardt, Luc De Raedt, Pedro Domingos, Kristian Kersting, Xifeng Yan.
Rights and permissions
About this article
Cite this article
Kuželka, O., Železný, F. Block-wise construction of tree-like relational features with monotone reducibility and redundancy. Mach Learn 83, 163–192 (2011). https://doi.org/10.1007/s10994-010-5208-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-010-5208-5