Learning with Kernels and Logical Representations

  • Paolo Frasconi
  • Andrea Passerini
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4911)


In this chapter, we describe a view of statistical learning in the inductive logic programming setting based on kernel methods. The relational representation of data and background knowledge are used to form a kernel function, enabling us to subsequently apply a number of kernel-based statistical learning algorithms. Different representational frameworks and associated algorithms are explored in this chapter. In kernels on Prolog proof trees, the representation of an example is obtained by recording the execution trace of a program expressing background knowledge. In declarative kernels, features are directly associated with mereotopological relations. Finally, in kFOIL, features correspond to the truth values of clauses dynamically generated by a greedy search algorithm guided by the empirical risk.


Support Vector Regression Noun Phrase Logical Representation Convolution Kernel Inductive Logic Programming 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Schölkopf, B., Smola, A.: Learning with Kernels. The MIT Press, Cambridge (2002)Google Scholar
  2. 2.
    Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)Google Scholar
  3. 3.
    Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification using string kernels. J. Mach. Learn. Res. 2, 419–444 (2002)zbMATHCrossRefGoogle Scholar
  4. 4.
    Jaakkola, T., Haussler, D.: Exploiting generative models in discriminative classifiers. In: Advances in Neural Information Processing Systems 11, pp. 487–493. MIT Press, Cambridge (1999)Google Scholar
  5. 5.
    Leslie, C.S., Eskin, E., Noble, W.S.: The spectrum kernel: A string kernel for svm protein classification. In: Pacific Symposium on Biocomputing, pp. 566–575 (2002)Google Scholar
  6. 6.
    Cortes, C., Haffner, P., Mohri, M.: Rational kernels: Theory and algorithms. Journal of Machine Learning Research 5, 1035–1062 (2004)MathSciNetGoogle Scholar
  7. 7.
    Collins, M., Duffy, N.: New ranking algorithms for parsing and tagging: Kernels over discrete structures, and the voted perceptron. In: Proceedings of the Fortieth Annual Meeting on Association for Computational Linguistics, Philadelphia, PA, USA, pp. 263–270 (2002)Google Scholar
  8. 8.
    Viswanathan, S., Smola, A.J.: Fast kernels for string and tree matching. In: Becker, S.T., Obermayer, K. (eds.) Advances in Neural Information Processing Systems, vol. 15, pp. 569–576. MIT Press, Cambridge (2003)Google Scholar
  9. 9.
    Gärtner, T.: A survey of kernels for structured data. SIGKDD Explorations Newsletter 5(1), 49–58 (2003)CrossRefGoogle Scholar
  10. 10.
    Smola, A.J., Kondor, R.: Kernels and Regularization on Graphs. In: Schölkopf, B., Warmuth, M.K. (eds.) COLT/Kernel 2003. LNCS (LNAI), vol. 2777, pp. 144–158. Springer, Heidelberg (2003)Google Scholar
  11. 11.
    Kashima, H., Tsuda, K., Inokuchi, A.: Marginalized kernels between labeled graphs. In: Proceedings of ICML 2003 (2003)Google Scholar
  12. 12.
    Mahé, P., Ueda, N., Akutsu, T., Perret, J.L., Vert, J.P.: Extensions of marginalized graph kernels. In: Greiner, R., D. Schuurmans, A.P. (eds.) Proceedings of the Twenty-first International Conference on Machine Learning, Banff, Alberta, Canada, pp. 552–559 (2004)Google Scholar
  13. 13.
    Horváth, T., Gärtner, T., Wrobel, S.: Cyclic pattern kernels for predictive graph mining. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 158–167. ACM Press, New York (2004)CrossRefGoogle Scholar
  14. 14.
    Menchetti, S., Costa, F., Frasconi, P.: Weighted decomposition kernels. In: Proceedings of the Twenty-second International Conference on Machine Learning, pp. 585–592. ACM Press, New York (2005)Google Scholar
  15. 15.
    Kramer, S., Lavrac, N., Flach, P.: Propositionalization approaches to relational data mining. In: Relational Data Mining, pp. 262–286. Springer, Heidelberg (2000)Google Scholar
  16. 16.
    Cumby, C.M., Roth, D.: Learning with feature description logics. In: Matwin, S., Sammut, C. (eds.) ILP 2002. LNCS (LNAI), vol. 2583, pp. 32–47. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  17. 17.
    Cumby, C.M., Roth, D.: On kernel methods for relational learning. In: Proceedings of ICML 2003 (2003)Google Scholar
  18. 18.
    Ramon, J., Bruynooghe, M.: A Framework for Defining Distances Between First-Order Logic Objects. In: Proc. of the 8th International Conf. on Inductive Logic Programming, pp. 271–280 (1998)Google Scholar
  19. 19.
    Kirsten, M., Wrobel, S., Horváth, T.: Distance based approaches to relational learning and clustering. In: Relational Data Mining, pp. 213–230. Springer, Heidelberg (2001)Google Scholar
  20. 20.
    Ramon, J.: Clustering and instance based learning in first order logic. AI Communications 15(4), 217–218 (2002)Google Scholar
  21. 21.
    Cortes, C., Vapnik, V.N.: Support vector networks. Machine Learning 20, 1–25 (1995)Google Scholar
  22. 22.
    De Raedt, L.: Logical and Relational Learning: From ILP to MRDM. Springer, Heidelberg (2006)Google Scholar
  23. 23.
    Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, New York (1995)zbMATHGoogle Scholar
  24. 24.
    Herbrich, R., Graepel, T., Obermayer, K.: Support vector learning for ordinal regression. In: Artificial Neural Networks, 1999. ICANN 1999. Ninth International Conference on (Conf. Publ. No. 470), vol. 1 (1999)Google Scholar
  25. 25.
    Tax, D., Duin, R.: Support vector domain description. Pattern Recognition Letters 20, 1991–1999 (1999)Google Scholar
  26. 26.
    Ben-Hur, A., Horn, D., Siegelmann, H., Vapnik, V.: Support vector clustering. Journal of Machine Learning Research 2, 125–137 (2001)CrossRefGoogle Scholar
  27. 27.
    Schölkopf, B., Smola, A., Müller, K.: Nonlinear component analysis as a kernel eigenvalue problem. Neural computation 10(5), 1299–1319 (1998)CrossRefGoogle Scholar
  28. 28.
    Kramer, S.: Structural regression trees. In: Proceedings of the Thirteenth National Conference on Artificial Intelligence, pp. 812–819 (1996)Google Scholar
  29. 29.
    Kramer, S.: Prediction of Ordinal Classes Using Regression Trees. Fundamenta Informaticae 47(1), 1–13 (2001)zbMATHMathSciNetGoogle Scholar
  30. 30.
    Cucker, F., Smale, S.: On the mathematical foundations of learning. Bulletin (New Series) of the American Mathematical Society 39(1), 1–49 (2002)zbMATHCrossRefMathSciNetGoogle Scholar
  31. 31.
    Lin, Y.: Support Vector Machines and the Bayes Rule in Classification. Data Mining and Knowledge Discovery 6(3), 259–275 (2002)CrossRefMathSciNetGoogle Scholar
  32. 32.
    Bartlett, P., Jordan, M., McAuliffe, J.: Large margin classifiers: Convex loss, low noise, and convergence rates. Advances in Neural Information Processing Systems 16 (2003)Google Scholar
  33. 33.
    Ng, A., Jordan, M.: On Discriminative vs. Generative classifiers: A comparison of logistic regression and naive Bayes. Neural Information Processing Systems (2001)Google Scholar
  34. 34.
    Passerini, A., Frasconi, P.: Kernels on prolog ground terms. In: Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence, Edinburgh, Scotland, UK, pp. 1626–1627 (2005)Google Scholar
  35. 35.
    Gärtner, T., Lloyd, J., Flach, P.: Kernels for structured data. In: Matwin, S., Sammut, C. (eds.) ILP 2002. LNCS (LNAI), vol. 2583, pp. 66–83. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  36. 36.
    Passerini, A., Frasconi, P., De Raedt, L.: Kernels on prolog proof trees: Statistical learning in the ILP setting. Journal of Machine Learning Research 7, 307–342 (2006)Google Scholar
  37. 37.
    Landwehr, N., Passerini, A., Raedt, L.D., Frasconi, P.: kFOIL: Learning simple relational kernels. In: Gil, Y., Mooney, R. (eds.) Proc. Twenty-First National Conference on Artificial Intelligence (AAAI 2006), AAAI Press, Menlo Park (2006)Google Scholar
  38. 38.
    Quinlan, J.R.: Learning Logical Definitions from Relations. Machine Learning 5, 239–266 (1990)Google Scholar
  39. 39.
    Saunders, G., Gammerman, A., Vovk, V.: Ridge regression learning algorithm in dual variables. In: Proc. 15th International Conf. on Machine Learning, pp. 515–521 (1998)Google Scholar
  40. 40.
    Poggio, T., Smale, S.: The mathematics of learning: Dealing with data. Notices of the American Mathematical Society 50(5), 537–544 (2003)zbMATHMathSciNetGoogle Scholar
  41. 41.
    Kimeldorf, G.S., Wahba, G.: A correspondence between Bayesian estimation on stochastic processes and smoothing by splines. The Annals of Mathematical Statistics 41, 495–502 (1970)CrossRefMathSciNetGoogle Scholar
  42. 42.
    Freund, Y., Schapire, R.E.: Large margin classification using the perceptron algorithm. Machine Learning 37(3), 277–296 (1999)zbMATHCrossRefGoogle Scholar
  43. 43.
    Haussler, D.: Convolution kernels on discrete structures. Technical Report UCSC-CRL-99-10, University of California, Santa Cruz (1999)Google Scholar
  44. 44.
    Lodhi, H., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification using string kernels. Advances in Neural Information Processing Systems, 563–569 (2000)Google Scholar
  45. 45.
    Collins, M., Duffy, N.: Convolution kernels for natural language. In: NIPS 14, pp. 625–632 (2001)Google Scholar
  46. 46.
    Gärtner, T., Flach, P., Kowalczyk, A., Smola, A.: Multi-instance kernels. In: Sammut, C., Hoffmann, A. (eds.) Proceedings of the 19th International Conference on Machine Learning, pp. 179–186. Morgan Kaufmann, San Francisco (2002)Google Scholar
  47. 47.
    Srinivasan, A., Muggleton, S., Sternberg, M.J.E., King, R.D.: Theories for mutagenicity: A study in first-order and feature-based induction. Artificial Intelligence 85(1-2), 277–299 (1996)CrossRefGoogle Scholar
  48. 48.
    Lloyd, J.W.: Logic for learning: Learning comprehensible theories from structured data. Springer, Heidelberg (2003)zbMATHGoogle Scholar
  49. 49.
    Taskar, B., Abbeel, P., Koller, D.: Discriminative probabilistic models for relational data. In: Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann, San Francisco (2002)Google Scholar
  50. 50.
    Neville, J., Jensen, D.: Collective classification with relational dependency networks. In: Proceedings of the Second International Workshop on Multi-Relational Data Mining, pp. 77–91 (2003)Google Scholar
  51. 51.
    Lakshman, T.K., Reddy, U.S.: Typed prolog: A semantic reconstruction of the mycroft-O’keefe type system. In: Saraswat, Vijay, Ueda, K. (eds.) Proceedings of the 1991 International Symposium on Logic Programming (ISLP 1991), pp. 202–220. MIT Press, San Diego (1991)Google Scholar
  52. 52.
    Gärtner, T., Lloyd, J., Flach, P.: Kernels and distances for structured data. Machine Learning 57(3), 205–232 (2004)zbMATHCrossRefGoogle Scholar
  53. 53.
    Ramon, J., Bruynooghe, M.: A polynomial time computable metric between point sets. Acta Informatica 37(10), 765–780 (2001)zbMATHCrossRefMathSciNetGoogle Scholar
  54. 54.
    Horváth, T., Wrobel, S., Bohnebeck, U.: Relational instance-based learning with lists and terms. Machine Learning 43(1/2), 53–80 (2001)zbMATHCrossRefGoogle Scholar
  55. 55.
    Passerini, A., Frasconi, P., De Raedt, L.: Kernels on prolog proof trees: Statistical learning in the ILP setting. Journal of Machine Learning Research 7, 307–342 (2006)Google Scholar
  56. 56.
    Bianucci, A., Micheli, A., Sperduti, A., Starita, A.: Application of cascade correlation networks for structures to chemistry. Appl. Intell. 12, 117–146 (2000)CrossRefGoogle Scholar
  57. 57.
    Leśniewski, S.: Podstawy ogólnej teorii mnogości. Moscow (1916)Google Scholar
  58. 58.
    Leonard, H.S., Goodman, N.: The calculus of individuals and its uses. Journal of Symbolic Logic 5(2), 45–55 (1940)zbMATHCrossRefMathSciNetGoogle Scholar
  59. 59.
    Casati, R., Varzi, A.: Parts and places: The structures of spatial representation. MIT Press, Cambridge, MA and London (1999)Google Scholar
  60. 60.
    Joachims, T.: Making large-scale SVM learning practical. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods – Support Vector Learning, pp. 169–185. MIT Press, Cambridge (1998)Google Scholar
  61. 61.
    Srinivasan, A.: The Aleph Manual. Oxford University Computing Laboratory (2001)Google Scholar
  62. 62.
    Biermann, A., Krishnaswamy, R.: Constructing programs from example computations. IEEE Transactions on Software Engineering 2(3), 141–153 (1976)CrossRefMathSciNetGoogle Scholar
  63. 63.
    Mitchell, T.M., Utgoff, P.E., Banerji, R.: Learning by experimentation: Acquiring and refining problem-solving heuristics. In: Machine learning: An artificial intelligence approach, vol. 1, pp. 163–190. Morgan Kaufmann, San Francisco (1983)Google Scholar
  64. 64.
    Shapiro, E.Y.: Algorithmic program debugging. MIT Press, Cambridge (1983)Google Scholar
  65. 65.
    Zelle, J.M., Mooney, R.J.: Combining FOIL and EBG to speed-up logic programs. In: Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence, Chambéry, France, pp. 1106–1111 (1993)Google Scholar
  66. 66.
    De Raedt, L., Kersting, K., Torge, S.: Towards learning stochastic logic programs from proof-banks. In: Proceedings of the Twentieth National Conference on Artificial Intelligence (AAAI 2005), pp. 752–757 (2005)Google Scholar
  67. 67.
    Muggleton, S., Lodhi, H., Amini, A., Sternberg, M.: Support vector inductive logic programming. In: Hoffmann, A., Motoda, H., Scheffer, T. (eds.) DS 2005. LNCS (LNAI), vol. 3735, pp. 163–175. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  68. 68.
    Russell, S., Norvig, P.: Artifical Intelligence: A Modern Approach, 2nd edn. Prentice-Hall, Englewood Cliffs (2002)Google Scholar
  69. 69.
    Bongard, M.: Pattern Recognition. Spartan Books (1970)Google Scholar
  70. 70.
    Landwehr, N., Kersting, K., De Raedt, L.: nFOIL: Integrating Naïve Bayes and FOIL. In: Proc. of the 20th National Conf. on Artificial Intelligence, pp. 795–800 (2005)Google Scholar
  71. 71.
    Blockeel, H., Dzeroski, S., Kompare, B., Kramer, S., Pfahringer, B., Laer, W.: Experiments in Predicting Biodegradability. Applied Artificial Intelligence 18(2), 157–181 (2004)CrossRefGoogle Scholar
  72. 72.
    Ray, S., Craven, M.: Representing sentence structure in hidden Markov models for information extraction. In: Proceedings of IJCAI 2001, pp. 1273–1279 (2001)Google Scholar
  73. 73.
    Goadrich, M., Oliphant, L., Shavlik, J.W.: Learning ensembles of first-order clauses for recall-precision curves: A case study in biomedical information extraction. In: Camacho, R., King, R., Srinivasan, A. (eds.) ILP 2004. LNCS (LNAI), vol. 3194, pp. 98–115. Springer, Heidelberg (2004)Google Scholar
  74. 74.
    Goadrich, M.: Personal communication (2005)Google Scholar
  75. 75.
    Turcotte, M., Muggleton, S., Sternberg, M.: The effect of relational background knowledge on learning of protein three-dimensional fold signatures. Machine Learning 43(1-2), 81–96 (2001)zbMATHCrossRefGoogle Scholar
  76. 76.
    Chen, J., Kelley, L., Muggleton, S., Sternberg, M.: Multi-class prediction using stochastic logic programs. In: Muggleton, S., Otero, R., Tamaddoni-Nezhad, A. (eds.) ILP 2006. LNCS (LNAI), vol. 4455, Springer, Heidelberg (2007)CrossRefGoogle Scholar
  77. 77.
    Lanckriet, G.R.G., Cristianini, N., Bartlett, P., Ghaoui, L.E., Jordan, M.I.: Learning the kernel matrix with semidefinite programming. J. Mach. Learn. Res. 5, 27–72 (2004)Google Scholar
  78. 78.
    Ong, C.S., Smola, A.J., Williamson, R.C.: Hyperkernels. In: Adv. in Neural Inf. Proc. Systems (2002)Google Scholar
  79. 79.
    Micchelli, C.A., Pontil, M.: Learning the Kernel Function via Regularization. Journal of Machine Learning Research 6, 1099–1125 (2005)MathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Paolo Frasconi
    • 1
  • Andrea Passerini
    • 1
  1. 1.Machine Learning and Neural Networks Group Dipartimento di Sistemi e InformaticaUniversità degli Studi di FirenzeItaly

Personalised recommendations