Fisher Kernels for Relational Data

  • Uwe Dick
  • Kristian Kersting
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4212)


Combining statistical and relational learning receives currently a lot of attention. The majority of statistical relational learning approaches focus on density estimation. For classification, however, it is well-known that the performance of such generative models is often lower than that of discriminative classifiers. One approach to improve the performance of generative models is to combine them with discriminative algorithms. Fisher kernels were developed to combine them with kernel methods, and have shown promising results for the combinations of support vector machines with (logical) hidden Markov models and Bayesian networks. So far, however, Fisher kernels have not been considered for relational data, i.e., data consisting of a collection of objects and relational among these objects. In this paper, we develop Fisher kernels for relational data and empirically show that they can significantly improve over the results achieved without Fisher kernels.


Support Vector Machine Bayesian Network Relational Data Neural Information Processing System Inductive Logic Programming 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Cheng, J., Hatzis, C., Krogel, M.–A., Morishita, S., Page, D., Sese, J.: KDD Cup 2001 Report. SIGKDD Explorations 3(2), 47–64 (2002)CrossRefGoogle Scholar
  2. 2.
    Craven, M., DiPasquo, D., Freitag, D., McCallum, A., Mitchell, T.M., Nigam, K., Slattery, S.: Learning to Construct Knowledge Bases from the World Wide Web. Artificial Intelligence Journal 118(1–2), 69–113 (2000)zbMATHCrossRefGoogle Scholar
  3. 3.
    Frasconi, P., Passerini, A., Muggleton, S.H., Lodhi, H.: Declarative kernels (submitted, 2005)Google Scholar
  4. 4.
    Fung, G., Mangasaruan, O., Shavlik, J.: Knowledge-based Support Vector Machine Classifier. In: Advances in Neural Information Processing Systems 15 (2002)Google Scholar
  5. 5.
    Fürnkranz, J.: Round Robin Classification. Journal of Machine Learning Research (JMLR) 2, 721–747 (2002)zbMATHCrossRefMathSciNetGoogle Scholar
  6. 6.
    Gärtner, T.: Kernel-based Learning in Multi-Relational Data Mining. ACM-SIGKDD Explorations 5(1), 49–58 (2003)CrossRefGoogle Scholar
  7. 7.
    Getoor, L., Friedman, N., Koller, D., Taskar, B.: Learning Probabilistic Models of Link Structure. Journal of Machine Leaning Research (JMLR) 3, 679–707 (2002)CrossRefMathSciNetGoogle Scholar
  8. 8.
    Haussler, D.: Convolution kernels on discrete structures. Technical report, Department of Computer Science, University of California at Santa Cruz (1999)Google Scholar
  9. 9.
    Jaakkola, T., Haussler, D.: Exploiting Generative Models in Discriminative Classifiers. In: Kearns, M.J., Solla, S.A., Cohn, D.A. (eds.) Advances in Neural Information Processing Systems 11, pp. 487–493 (1999)Google Scholar
  10. 10.
    Kersting, K., De Raedt, L.: Adaptive bayesian logic programs. In: Rouveirol, C., Sebag, M. (eds.) ILP 2001. LNCS, vol. 2157, pp. 104–131. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  11. 11.
    Kersting, K., De Raedt, L.: Bayesian Logic Programming: Theory and Tool. In: Getoor, L., Taskar, B. (eds.) An Introduction to Statistical Relational Learning. MIT Press, Cambridge (to appear, 2006)Google Scholar
  12. 12.
    Kersting, K., Gärtner, T.: Fisher Kernels for Logical Sequences. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS, vol. 3201, pp. 205–216. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  13. 13.
    Landwehr, N., Kersting, K., De Raedt, L.: nFOIL: Integrating Naïve Bayes and Foil. In: Veloso, M., Kambhampati, S. (eds.) Proceedings of the Twentieth National Conference on Artificial Intelligence (AAAI 2005), Pittsburgh, Pennsylvania, USA, July 9–13, pp. 795–800. AAAI, Menlo Park (2005)Google Scholar
  14. 14.
    Landwehr, N., Passerini, A., De Raedt, L., Frasconi, P.: kFOIL: Learning Simple Relational Kernels. In: Proceedings of the Twenty-First National Conference on Artificial Intelligence (AAAI 2006). AAAI, Menlo Park (to appear, 2006)Google Scholar
  15. 15.
    Lu, Q., Getoor, L.: Link-based Classification. In: Fawcett, T., Mishra, N. (eds.) Proceedings of the International Conference on Machine Learning (ICML 2003), Washington, DC USA, August 21-24, pp. 496–503 (2003)Google Scholar
  16. 16.
    Macskassy, S.A., Provost, F.: Classification in Networked Data: A toolkit and a univariate case study. Technical Report CeDER-04-08, CeDER Working Paper, Stern School of Business, New York University, New York, USA (2004)Google Scholar
  17. 17.
    Muggleton, S.H., De Raedt, L.: Inductive Logic Programming: Theory and Methods. Journal of Logic Programming 19(20), 629–679 (1994)CrossRefMathSciNetGoogle Scholar
  18. 18.
    Passerini, A., Frasconi, P., De Raedt, L.: Kernels on Prolog Proof Trees: Statistical Learning in the ILP Setting. JMLR 7, 307–342 (2006)Google Scholar
  19. 19.
    Pearl, J.: Reasoning in Intelligent Systems: Networks of Plausible Inference, 2nd edn. Morgan Kaufmann, San Francisco (1991)Google Scholar
  20. 20.
    Schölkopf, B., Smola, A.J.: Learning with Kernels. MIT Press, Cambridge (2002)Google Scholar
  21. 21.
    Singla, P., Domingos, P.: Discriminative training of markov logic networks. In: Veloso, M., Kambhampati, S. (eds.) Proceedings of the Twentieth National Conference on Artificial Intelligence (AAAI 2005), Pittsburgh, Pennsylvania, USA, July 9–13, pp. 868–873. AAAI Press, Menlo Park (2005)Google Scholar
  22. 22.
    Taskar, B., Abbeel, P., Koller, D.: Discriminative Probabilistic Models for Relational Data. In: Darwiche, A., Friedman, N. (eds.) Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence (UAI 2002), Edmonton, Alberta, Canada, August 1–4, pp. 485–492 (2002)Google Scholar
  23. 23.
    Taskar, B., Chatalbashev, V., Koller, D., Guestrin, C.: Learning Structured Prediction Models: A Large Margin Approach. In: De Raedt, L., Wrobel, S. (eds.) Proceedings of the Twenty Second International Conference on Machine Learning (ICML 2005), Bonn, Germany, August 7–10, pp. 897–902 (2005)Google Scholar
  24. 24.
    Taskar, B., Guestrin, C., Koller, D.: Max-Margin Networks. In: Advances in Neural Information Processing Systems 16 (2004)Google Scholar
  25. 25.
    Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large Margin Methods for Structured and Interdependent Output Variables. Journal of Machine Learning Research (JMLR) 6, 1453–1484 (2005)MathSciNetGoogle Scholar
  26. 26.
    Tsuda, K., Kawanabe, M., Rätsch, G., Sonnenburg, S., Müller, K.-R.: A new discriminative kernel from probabilistic models. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems, vol. 14, pp. 977–984. The MIT Press, Cambridge (2002)Google Scholar
  27. 27.
    Tsuda, K., Kin, T., Asai, K.: Marginalized kernels for biological sequences. Bioinformatics (2002)Google Scholar
  28. 28.
    Watkins, C.: Kernels from matching operations. Technical report, Department of Computer Science, Royal Holloway, University of London (1999)Google Scholar
  29. 29.
    Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Uwe Dick
    • 1
  • Kristian Kersting
    • 1
  1. 1.Institute for Computer Science, Machine Learning LabUniversity of FreiburgFreiburgGermany

Personalised recommendations