Advertisement

Machine Learning

, Volume 78, Issue 3, pp 305–342 | Cite as

Fast learning of relational kernels

  • Niels Landwehr
  • Andrea Passerini
  • Luc De Raedt
  • Paolo Frasconi
Article

Abstract

We develop a general theoretical framework for statistical logical learning with kernels based on dynamic propositionalization, where structure learning corresponds to inferring a suitable kernel on logical objects, and parameter learning corresponds to function learning in the resulting reproducing kernel Hilbert space. In particular, we study the case where structure learning is performed by a simple FOIL-like algorithm, and propose alternative scoring functions for guiding the search process. We present an empirical evaluation on several data sets in the single-task as well as in the multi-task setting.

Statistical relational learning Inductive logic programming Kernel methods Dynamic propositionalization Kernel learning Multi-task learning 

References

  1. Argyriou, A., Hauser, R., Micchelli, C. A., & Pontil, M. (2006). A DC-programming algorithm for kernel selection. In Proceedings of the 23rd international conference of machine learning (ICML-2006) (pp. 41–48), Pittsburgh, PA, USA. Google Scholar
  2. Argyriou, A., Evgeniou, T., & Pontil, M. (2007). Multi-task feature learning. In Advances in neural information processing systems 19 (pp. 41–48). Cambridge: MIT Press. Google Scholar
  3. Bartlett, P. L., Jordan, M. I., & McAuliffe, J. D. (2006). Convexity, classification, and risk bounds. Journal of the American Statistical Association, 101, 138–156. MATHCrossRefMathSciNetGoogle Scholar
  4. Ben-David, S., Eiron, N., & Simon, H. U. (2002). Limitations of learning via embeddings in Euclidean half spaces. Journal of Machine Learning Research, 3, 441–461. CrossRefMathSciNetGoogle Scholar
  5. Bengio, Y., Delalleau, O., & Roux, N. L. (2005). The curse of highly variable functions for local kernel machines. In Advances in neural information processing systems 18. Cambridge: MIT Press. Google Scholar
  6. Blockeel, H., De Raedt, L., & Ramon, J. (1998). Top-down induction of clustering trees. In Proceeding of the 15th international conference on machine learning, Madison, Wisconsin, USA. Google Scholar
  7. Blockeel, H., Dzeroski, S., Kompare, B., Kramer, S., Pfahringer, B., & Laer, W. (2004). Experiments in predicting biodegradability. Applied Artificial Intelligence, 18(2), 157–181. CrossRefGoogle Scholar
  8. Bordes, A., Ertekin, S., Weston, J., & Bottou, L. (2005). Fast kernel classifiers with online and active learning. Journal of Machine Learning Research, 6, 1579–1619. MathSciNetGoogle Scholar
  9. Caponnetto, A., Micchelli, C., Pontil, M., & Ying, Y. (2008). Universal kernels for multi-task learning. Journal of Machine Learning Research. Google Scholar
  10. Caruana, R. (1997). Multitask learning. Machine Learning, 28(1), 41–75. CrossRefGoogle Scholar
  11. Chapelle, O., Vapnik, V., Bousquet, O., & Mukherjee, S. (2002). Choosing multiple parameters for support vector machines. Machine Learning, 46(1–3), 131–159. MATHCrossRefGoogle Scholar
  12. Cortes, C., & Vapnik, V. (1995). Support vector networks. Machine Learning, 20, 1–25. Google Scholar
  13. Cristianini, N., Shawe-Taylor, J., Elisseef, A., & Kandola, J. (2001). On kernel-target alignment. In Advances in neural information processing systems 14. Cambridge: MIT Press. Google Scholar
  14. Cucker, F., & Smale, S. (2002). On the mathematical foundations of learning. Bulletin (New Series) of the American Mathematical Society, 39(1), 1–49. MATHCrossRefMathSciNetGoogle Scholar
  15. Cumby, C. M., & Roth, D. (2003). On kernel methods for relational learning. In Proceedings of the twentieth international conference on machine learning (pp. 107–114), Washington, DC, USA. Google Scholar
  16. Datta, P., & Kibler, D. F. (1993). Concept sharing: a means to improve multi-concept learning. In Proceedings of the 10th international conference on machine learning, Amherst, MA, USA. Google Scholar
  17. Davis, J., Burnside, E., de Castro Dutra, I., Page, D., & Costa, V. S. (2005). An integrated approach to learning Bayesian networks of rules. In Lecture notes in computer science : Vol. 3720. Machine learning, 16th European conference (pp. 84–95), Porto, Portugal. Berlin: Springer. Google Scholar
  18. De Raedt, L. (2008). Logical and relational learning. Berlin: Springer. MATHCrossRefGoogle Scholar
  19. De Raedt, L., & Ramon, J. (2004). Condensed representations for inductive logic programming. In Proceedings of the 9th international conference on the principles of knowledge representation and reasoning. Google Scholar
  20. De Raedt, L., Lavrac, N., & Dzeroski, S. (1993). Multiple predicate learning. In Proceedings of the 13th international joint conference on artificial intelligence (pp. 1037–1043), Chambery, France. Google Scholar
  21. De Raedt, L., Frasconi, P., Kersting, K., & Muggleton, S. (2008). Lecture notes in computer science : Vol. 4911. Probabilistic inductive logic programming—theory and applications. Berlin: Springer. MATHGoogle Scholar
  22. Dehaspe, L., Toivonen, H., & King, R. (1998). Finding frequent substructures in chemical compounds. In Proceedings of the 4th international conference on knowledge discovery and data mining. Google Scholar
  23. Deshpande, A., Milch, B., Zettlemoyer, L., & Kaelbling, L. (2007). Learning probabilistic relational dynamics for multiple tasks. In Proceedings of the 23rd conference on uncertainty in artificial intelligence (UAI-07) (pp. 83–92). Google Scholar
  24. Evgeniou, T., Micchelli, C. A., & Pontil, M. (2005). Learning multiple tasks with kernel methods. Journal of Machine Learning Research, 6, 615–637. MathSciNetGoogle Scholar
  25. Fang, H., Tong, W., Shi, L., Blair, R., Perkins, R., Branham, W., Hass, B., Xie, Q., Dial, S., Moland, C., & Sheehan, D. (2001). Structure-activity relationships for a large diverse set of natural, synthetic, and environmental estrogens. Chemical Research in Toxicology, 14(3), 280–294. CrossRefGoogle Scholar
  26. Frasconi, P., Passerini, A., Muggleton, S., & Lodhi, H. (2005). Declarative kernels. In Kramer, S., & Pfahringer, B. (Eds.), Proceedings of the 15th international conference on inductive logic programming, late-breaking papers (pp. 17–19). Google Scholar
  27. Frasconi, P., Jaeger, M., & Passerini, A. (2008). Feature discovery with type extension trees. In Lecture notes in computer science : Vol. 5194. ILP 2008: Proceedings of the 18th international conference. Berlin: Springer. Google Scholar
  28. Freund, Y., & Schapire, R. (1999). Large margin classification using the perceptron algorithm. Machine Learning, 37(3), 277–296. MATHCrossRefGoogle Scholar
  29. Gärtner, T. (2003). A survey of kernels for structured data. SIGKDD Explorations, 5(1), 49–58. CrossRefGoogle Scholar
  30. Gärtner, T., Lloyd, J., & Flach, P. (2004). Kernels and distances for structured data. Machine Learning, 57(3), 205–232. MATHCrossRefGoogle Scholar
  31. Getoor, L., & Taskar, B. (2007). Introduction to statistical relational learning. Cambridge: MIT Press. MATHGoogle Scholar
  32. Höffgen, K.-U., Simon, H.-U., & van Horn, K. S. (1995). Robust trainability of single neurons. Journal of Computer and System Sciences, 50(1), 114–125. MATHCrossRefMathSciNetGoogle Scholar
  33. Jebara, T. (2004). Multi-task feature and kernel selection for SVMs. In Proceedings of the 21st international conference on machine learning, Banff, Alberta, Canada. Google Scholar
  34. Joachims, T. (1999). Making large-scale support vector machine learning practical. In Advances in kernel methods: support vector learning (pp. 169–184). Cambridge: MIT Press. Google Scholar
  35. Karalič, A., & Bratko, I. (1997). First order regression. Machine Learning, 26(2–3), 147–176. MATHCrossRefGoogle Scholar
  36. Kersting, K., & De Raedt, L. (2007). Bayesian logic programming: theory and tools. In Getoor, L. & Taskar, B. (Eds.), Introduction to statistical relational learning. Cambridge: MIT Press. Google Scholar
  37. Khan, K., Muggleton, S., & Parson, R. (1998). Repeat learning using predicate invention. In Lecture notes in computer science : Vol. 1446. Inductive logic programming, 8th international workshop, Proceedings (pp. 165–174), Madison, Wisconsin, USA. Berlin: Springer. Google Scholar
  38. Kimeldorf, G. S., & Wahba, G. (1970). A correspondence between Bayesian estimation on stochastic processes and smoothing by splines. The Annals of Mathematical Statistics, 41, 495–502. MATHCrossRefMathSciNetGoogle Scholar
  39. King, R., Srinivasan, A., & Sternberg, M. (1995). Relating chemical activity to structure: an examination of ILP successes. New Generation Computing, 13(2, 4), 411–433. CrossRefGoogle Scholar
  40. Kirsten, M., Wrobel, S., & Horváth, T. (2001). Distance based approaches to relational learning and clustering. In Relational data mining (pp. 213–230). Berlin: Springer. Google Scholar
  41. Kok, S., & Domingos, P. (2005). Learning the structure of Markov logic networks. In Proceedings of the 22nd international conference on machine learning (pp. 441–448), Bonn, Germany. New York: ACM. CrossRefGoogle Scholar
  42. Kramer, S., & De Raedt, L. (2001). Feature construction with version spaces for biochemical applications. In Proceedings of the 18th international conference on machine learning (pp. 258–265). San Mateo: Morgan Kaufmann. Google Scholar
  43. Lanckriet, G. R. G., Cristianini, N., Bartlett, P., Ghaoui, L. E., & Jordan, M. I. (2004). Learning the kernel matrix with semidefinite programming. Journal of Machine Learning Research, 5, 27–72. Google Scholar
  44. Landwehr, N., Kersting, K., & De Raedt, L. (2005). nFOIL: integrating naive Bayes and FOIL. In Proceedings of the 20th national conference on artificial intelligence (pp. 795–800), Pittsburgh, Pennsylvania, USA. Google Scholar
  45. Landwehr, N., Passerini, A., De Raedt, L., & Frasconi, P. (2006). kFOIL: Learning simple relational kernels. In Proceedings of the 21st national conference on artificial intelligence, July 16–20, 2006, Boston, Massachusetts, USA. Google Scholar
  46. Lavrac, N., & Dzeroski, S. (1994). ILP: techniques and application. Chichester: Ellis Horwood. Google Scholar
  47. Leslie, C. S., Eskin, E., & Noble, W. S. (2002). The spectrum kernel: a string kernel for SVM protein classification. In Pacific symposium on biocomputing (pp. 566–575), Lihue, Hawaii, USA. Google Scholar
  48. Lloyd, J. W. (1987). Foundations of logic programming (2nd extended ed.). Berlin: Springer. MATHGoogle Scholar
  49. Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., & Watkins, C. (2002). Text classification using string kernels. The Journal of Machine Learning Research, 2, 419–444. MATHCrossRefGoogle Scholar
  50. Micchelli, C. A., & Pontil, M. (2005). Learning the kernel function via regularization. Journal of Machine Learning Research, 6, 1099–1125. MathSciNetGoogle Scholar
  51. Micchelli, C., Xu, Y., & Zhang, H. (2006). Universal kernels. The Journal of Machine Learning Research, 7, 2651–2667. MathSciNetGoogle Scholar
  52. Muggleton, S. (2000). Learning stochastic logic programs. In Getoor, L. & Jensen, D. (Eds.), Proceedings of the AAAI2000 workshop on learning statistical models from relational data. Google Scholar
  53. Muggleton, S., & De Raedt, L. (1994). Inductive logic programming: theory and methods. Journal of Logic Programming, 19/20, 629–679. CrossRefMathSciNetGoogle Scholar
  54. Muggleton, S., Amini, A., & Sternberg, M. (2005). Support vector inductive logic programming. In Lecture notes in computer science : Vol. 3735. Discovery science, 8th international conference, Proceedings (pp. 163–175), Singapore. Berlin: Springer. Google Scholar
  55. Obozinski, G., Taskar, B., & Jordan, M. (June, 2006). Multi-task feature selection. Technical report, Dept. of Statistics, UC Berkeley. Google Scholar
  56. Ong, C. S., Smola, A. J., & Williamson, R. C. (2005). Learning the kernel with hyperkernels. Journal of Machine Learning Research, 6, 1043–1071. MathSciNetGoogle Scholar
  57. Passerini, A., Frasconi, P., & De Raedt, L. (2006). Kernels on prolog proof trees: statistical learning in the ILP setting. In Probabilistic, logical and relational learning—towards a synthesis. Google Scholar
  58. Platt, J. (1999). Sequential minimal optimization: a fast algorithm for training support vector machines. In Advances in kernel methods: support vector learning (pp. 185–208). Google Scholar
  59. Poggio, T., & Smale, S. (2003). The mathematics of learning: dealing with data. Notices of the American Mathematical Society, 50(5), 537–544. MATHMathSciNetGoogle Scholar
  60. Popescul, A., Ungar, L., Lawrence, S., & Pennock, D. (2003). Statistical relational learning for document mining. In Proceedings of the 3rd IEEE international conference on data mining (pp. 275–282). Google Scholar
  61. Provost, F., Fawcett, T., & Kohavi, R. (1998). The case against accuracy estimation for comparing induction algorithms. In Proceeding of the 15th international conference on machine learning, Madison, Wisconsin, USA. Google Scholar
  62. Quinlan, J. (1990). Learning logical definitions from relations. Machine Learning, 5, 239–266. Google Scholar
  63. Rakotomamonjy, A. (2004). Optimizing area under ROC curve with SVMs. In Workshop on ROC analysis in AI at the 15th European conference on artificial intelligence, Pisa, Italy. Google Scholar
  64. Ramon, J. (2002). Clustering and instance based learning in first order logic. Ph.D. thesis, Katholieke Universiteit Leuven, Belgium. Google Scholar
  65. Ramon, J., & Bruynooghe, M. (1998). A framework for defining distances between first-order logic objects. In Lecture notes in computer science : Vol. 1446. Inductive logic programming, 8th international workshop, Proceedings (pp. 271–280), Madison, Wisconsin, USA. Berlin: Springer. Google Scholar
  66. Reid, M. D. (2004). Improving rule evaluation using multitask learning. In Lecture notes in artificial intelligence : Vol. 3194. Inductive logic programming, 14th international conference, Proceedings, Porto, Portugal. Berlin: Springer. Google Scholar
  67. Rückert, U., & Kramer, S. (2007). Margin-based first-order rule learning. Machine Learning, 70(2–3), 189–206. Google Scholar
  68. Saigo, H., Nowozin, S., Kadowaki, T., Kudo, T., & Tsuda, K. (2009). gBoost: a mathematical programming approach to graph classification and regression. Machine Learning, 75(1), 69–89. CrossRefGoogle Scholar
  69. Shalev-Shwartz, S., Singer, Y., & Srebro, N. (2007). Pegasos: primal estimated sub-GrAdient SOlver for SVM. Proceedings of the 24th international conference on machine learning (pp. 807–814), Corvallis, Oregon, USA. Google Scholar
  70. Shawe-Taylor, J., & Cristianini, N. (2004). Kernel methods for pattern analysis. Cambridge: Cambridge University Press. Google Scholar
  71. Slattery, S., & Craven, M. (1998). Combining statistical and relational methods for learning in hypertext domains. In Lecture notes in computer science. : Vol. 1446. Inductive logic programming, 8th international workshop, Proceedings, Madison, Wisconsin, USA. Berlin: Springer. Google Scholar
  72. Srinivasan, A., Muggleton, S., King, R., & Sternberg, M. (1996). Theories for mutagenicity: a study of first-order and feature-based induction. Artificial Intelligence, 85, 277–299. CrossRefGoogle Scholar
  73. Srinivasan, A., King, R. D., & Bristol, D. (1999). An assessment of ILP-assisted models for toxicology and the PTE-3 experiment. In Lecture notes in computer science : Vol. 1634. Inductive logic programming, 9th international workshop, ILP-99, Proceedings, Bled, Slovenia, June 24–27, 1999. Berlin: Springer. Google Scholar
  74. Steck, H. (2007). Hinge rank loss and the area under the ROC curve. In Proceedings of the 18th European conference on machine learning, Warsaw, Poland. Google Scholar
  75. Swamidass, S. J., Chen, J., Bruand, J., Phung, P., Ralaivola, L., & Baldi, P. (2005). Kernels for small molecules and the prediction of mutagenicity, toxicity and anti-cancer activity. Bioinformatics, 21(1), 359–368. CrossRefGoogle Scholar
  76. Wachman, G., & Khardon, R. (2007). Learning from interpretations: a rooted kernel for ordered hypergraphs. In Proceedings of the 24th international conference on machine learning, Corvallis, Oregon, USA. Google Scholar
  77. Weston, J., Schölkopf, B., Eskin, E., Leslie, C., & Noble, W. (2003). Dealing with large diagonals in kernel matrices. Annals of the Institute of Statistical Mathematics, 55(2), 391–408. MATHMathSciNetGoogle Scholar
  78. Wheeler, D. L. L., Barrett, T., Benson, D. A. A., Bryant, S. H. H., Canese, K., Chetvernin, V., Church, D. M. M., Dicuccio, M., Edgar, R., Federhen, S., Feolo, M., Geer, L. Y. Y., Helmberg, W., Kapustin, Y., Khovayko, O., Landsman, D., Lipman, D. J. J., Madden, T. L. L., Maglott, D. R. R., Miller, V., Ostell, J., Pruitt, K. D. D., Schuler, G. D. D., Shumway, M., Sequeira, E., Sherry, S. T. T., Sirotkin, K., Souvorov, A., Starchenko, G., Tatusov, R. L. L., Tatusova, T. A. A., Wagner, L., & Yaschenko, E. (2008). Database resources of the national center for biotechnology information. Nucleic Acids Research. Google Scholar

Copyright information

© The Author(s) 2009

Authors and Affiliations

  • Niels Landwehr
    • 1
  • Andrea Passerini
    • 2
  • Luc De Raedt
    • 3
  • Paolo Frasconi
    • 4
  1. 1.Department for Computer ScienceUniversity of PotsdamPotsdamGermany
  2. 2.Dipartimento di Ingegneria e Scienza dell’InformazioneUniversità degli Studi di TrentoPovoItaly
  3. 3.Departement ComputerwetenschappenKatholieke Universiteit LeuvenHeverleeBelgium
  4. 4.Machine Learning and Neural Networks Group, Dipartimento di Sistemi e InformaticaUniversità degli Studi di FirenzeFirenzeItaly

Personalised recommendations