Abstract
The similarity measures used in first-order IBL so far have been limited to the function-free case. In this paper we show that a lot of power can be gained by allowing lists and other terms in the input representation and designing similarity measures that work directly on these structures. We present an improved similarity measure for the first-order instance-based learner ribl that employs the concept of edit distances to efficiently compute distances between lists and terms, discuss its computational and formal properties, and empirically demonstrate its additional power on a problem from the domain of biochemistry. The paper also includes a thorough reconstruction of ribl's overall algorithm.
Article PDF
Similar content being viewed by others
References
Aha, D. (1997). Lazy learning. Dordrecht: Kluwer Academic Publishers.
Aha, D., Kibler, D.,& Albert, M. (1991). Instance-based learning algorithms. Machine Learning, 6(1), 37–66.
Aho, A. (1990). Algorithms for finding patterns in strings. In J. van Leeuwen (Ed.), Handbook of theoretical computer science, algorithms and complexity, (Vol. A). Amsterdam: Elsevier.
Belasco, J. G.& Brawerman G. (1993). Control of Messenger RNA Stability. Oxford: Academic Press.
Bisson, G. (1992a). Conceptual clustering in a first-order logic representation. In Proceedings of the Tenth European Conference on Artificial Intelligence (pp. 458–462). Chichester: John Wiley and Sons.
Bisson, G. (1992b). Learning in FOL with a similarity measure. In Proceedings of the Tenth National Conference on Artificial Intelligence (pp. 82–87). Cambridge: The MIT Press.
Bohnebeck, U., Horváth, T.,& Wrobel, S. (1998a). Term comparisons in first-order similarity measures. In Proceedings of the Eighth International Conference on Inductive Logic Programming, (pp. 65–79). Berlin: Springer. Vol. 1446 of Lecture Notes in Artificial Intelligence.
Bohnebeck, U., Sälter, W., Horváth, T., Wrobel, S.,& Blohm, D. (1998b). Measuring similarity of RNA structures by relational instance-based learning: A first step toward detecting RNA signal structures in silico. In Proceedings of the German Conference on Bioinformatics. Technical Report, Universität Köln.
Džeroski, S., Schulze-Kremer, S., Heidtke, K., Siems, K.,& Wettschereck, D. (1996). Diterpene structure elucidation from 13C NMR spectra with machine learning. In N. Lavrač, E. Keravnou,& B. Zupan (Eds.), Intelligent data analysis in medicine and pharmacology, Dordrecht: Kluwer Academic Publishers.
Eiter, T.& Mannila, H. (1997). Distance measures for point sets and their computation. Acta Informatica, 34(2), 109–133.
Emde, W.& Wettschereck, D. (1996). Relational instance-based learning. In Proceedings of the Thirteenth International Conference on Machine Learning (pp. 122–130). San Mateo: Morgan Kaufmann.
Horváth, T., Alexin, Z., Gyimóthy, T.,& Wrobel, S. (1999). Application of different learning methods to Hungarian part-of-speech tagging. In Proceedings of the Ninth International Workshop on Inductive Logic Programming, (pp. 128–139). Berlin: Springer. Vol. 1634 of Lecture Notes in Artificial Intelligence.
Hutchinson, A. (1997). Metrics on terms and clauses. In Proceedings of the Ninth European Conference on Machine Learning, (pp. 138–145). Berlin: Springer. Vol. 1224 of Lecture Notes in Artificial Intelligence.
Klausner, R. D., Rouault, T. A.,& Harford J. B. (1993). Regulating the fate of mRNA: The control of cellular iron metabolism. Cell, 72, 19–28.
Low, S. L.& Berry, M. J. (1996). Knowing when not to stop: Selenocysteine incorporation in eukaryotes. Trends in Biochemistry Sciences, 21, 203–208.
McCarthy, J. E. G.& Kollmus, H. (1995). Cytoplasmic mRNA-protein interactions in eukaryotic gene expression. Trends in Biochemistry Sciences, 20, 191–197.
Muggleton, S.& De Raedt, L. (1994). Inductive logic programming: Theory and methods. Journal of Logic Programming, 19/20, 629–679.
Nienhuys-Cheng, S.-H. (1997). Distance Between Herbrand Interpretations: A measure for approximations to a target concept. In Proceedings of the Seventh International Workshop on Inductive Logic Programming, (pp. 213–226). Berlin: Springer. Vol. 1297 of Lecture Notes in Artificial Intelligence.
Nienhuys-Cheng, S.-H. (1998). Distances and limits on Herbrand interpretations. In Proceedings of the Eighth International Conference on Inductive Logic Programming, (pp. 250–260). Berlin: Springer. Vol. 1446 of Lecture Notes in Artificial Intelligence.
Plotkin, G. (1970). A note on inductive generalization. In B. Meltzer& D. Michie (Eds.), Machine intelligence, (Vol. 5). Edinburgh University Press.
Ramon, J.& Bruynooghe, M. (1998). A framework for defining distances between first-order logic objects. In Proceedings of the Eighth International Conference on Inductive Logic Programming, (pp. 271–280). Berlin: Springer. Vol. 1446 of Lecture Notes in Artificial Intelligence.
Salzberg, S. (1991). A nearest hyperrectangle learning method. Machine Learning, 6, 251–276.
Sebag, M. (1997). Distance induction in first order logic. In Proceedings of the Seventh International Workshop on Inductive Logic Programming, (pp. 264–272). Berlin: Springer. Vol. 1297 of Lecture Notes in Artificial Intelligence.
Sebag, M. (1998). A Stochastic Simple Similarity. In Proceedings of the Eighth International Conference on Inductive Logic Programming, (pp. 95–105). Berlin: Springer. Vol. 1446 of Lecture Notes in Artificial Intelligence.
Sebag, M.& Rouveirol, C. (1997). Tractable induction and classification in first order logic via stochastic matching. In Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence. (pp. 888–893). San Francisco: Morgan Kaufmann.
Shapiro, B. A.& Zhang, K. (1990). Comparing multiple RNA secondary structures using tree comparisons. Computer Applications in Biosciences, 6(4), 309–318.
Tai, K.-C. (1979). The tree-to-tree correction problem. Journal of the ACM, 26(3), 422–433.
Ukkonen, E. (1985). Algorithms for approximate string matching. Information and Control, 64, 100–118.
Wagner, R. A.& Fischer, M. J. (1974). The string-to-string correction problem. Journal of the ACM, 21(1), 168–173.
Wettschereck, D.& Dietterich, T. G. (1995). An experimental comparison of the nearest-neighbor and nearesthyperrectangle algorithms. Machine Learning, 19(1), 5–27.
Wettschereck, D., Mohri, T.,& Aha, D. W. (1997). A review and comparative evaluation of feature weighting methods for lazy learning algorithms. AI Review Journal, 11, 273–314.
Wrobel, S. (1996). Inductive logic programming. In G. Brewka (Ed.), Advances in knowledge representation and reasoning. Studies in Logic, Language and Information. Stanford, CSLI-Publishers.
Zhang, K.& Shasha, D. (1989). Simple fast algorithms for the editing distance between trees and related problems. SIAM Journal on Computing, 18(6), 1245–1262.
Zuker, M. (1989). On finding all suboptimal foldings of an RNA Molecule, Science 244, 48–52.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Horváth, T., Wrobel, S. & Bohnebeck, U. Relational Instance-Based Learning with Lists and Terms. Machine Learning 43, 53–80 (2001). https://doi.org/10.1023/A:1007668716498
Issue Date:
DOI: https://doi.org/10.1023/A:1007668716498