Machine Learning

, Volume 57, Issue 3, pp 205–232 | Cite as

Kernels and Distances for Structured Data

  • Thomas Gärtner
  • John W. Lloyd
  • Peter A. Flach
Article

Abstract

This paper brings together two strands of machine learning of increasing importance: kernel methods and highly structured data. We propose a general method for constructing a kernel following the syntactic structure of the data, as defined by its type signature in a higher-order logic. Our main theoretical result is the positive definiteness of any kernel thus defined. We report encouraging experimental results on a range of real-world data sets. By converting our kernel to a distance pseudo-metric for 1-nearest neighbour, we were able to improve the best accuracy from the literature on the Diterpene data set by more than 10%.

kernel methods structured data inductive logic programming higher-order logic instance-based learning 

References

  1. Andrews, S., Tsochantaridis, I., & Hofmann, T. (2003). Support vector machines for multiple-instance learning. In Advances in neural information processing systems (Vol. 15) MIT Press.Google Scholar
  2. Aronszajn, N. (1950). Theory of reproducing kernels. Transactions of the American Mathematical Society, 68.Google Scholar
  3. Ben-Hur, A., Horn, D., Siegelmann, H. T., & Vapnik, V. (2001). Support vector clustering. Journal of MachineLearning Research, 2, 125–137.Google Scholar
  4. Blockeel, H., & De Raedt, L. (1998). Top-down induction of first order logical decision trees. Artificial Intelligence, 101:1/2, 285–297.Google Scholar
  5. Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). A training algorithm for optimal margin classifiers. In D. Haussler (Ed.), Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory (pp. 144–152). ACM Press.Google Scholar
  6. Church, A. (1940). A formulation of the simple theory of types. Journal of Symbolic Logic, 5, 56–68.Google Scholar
  7. Collins, M., & Duffy, N. (2002). Convolution kernels for natural language. In T. G. Dietterich, S. Becker, & Z. Ghahramani (Eds.), Advances in neural information processing systems (Vol. 14) MIT Press.Google Scholar
  8. Cristianini, N., & Shawe-Taylor, J. (2000). An introduction to support vector machines (and other kernel-basedlearning methods). Cambridge University Press.Google Scholar
  9. De Raedt, L. (1998). Attribute value learning versus inductive logic programming: The missing links (extended abstract). In D. Page (Ed.), Proceedings of the 8th International Conference on Inductive Logic Programming, Vol. 1446 of Lecture Notes in Artificial Intelligence (pp. 1–8). Springer-Verlag.Google Scholar
  10. De Raedt, L., & Van Laer, W. (1995). Inductive constraint logic. In K. Jantke, T. Shinohara, & T. Zeugmann (Eds.), Proceedings of the 6th InternationalWorkshop on Algorithmic Learning Theory, Vol. 997 of LNAI, (pp. 80–94).Springer Verlag.Google Scholar
  11. Dietterich, T. G., Lathrop, R. H., & Lozano-Pérez, T. (1997). Solving the multiple instance problem with axisparallel rectangles. Artificial Intelligence, 89:1/2, 31–71.Google Scholar
  12. D?zeroski, S., & Lavrač N. (Eds.) (2001). Relational data mining. Springer-Verlag.Google Scholar
  13. D?zeroski, S., Schulze-Kremer, S., Heidtke, K., Siems, K., Wettschereck, D., & Blockeel, H. (1998). Diterpene structure elucidation from 13C NMR spectra with inductive logic programming. Applied Artificial Intelligence, 12:5, 363–383. Special Issue on First-Order Knowledge Discovery in Databases.Google Scholar
  14. Emde, W., & Wettschereck, D. (1996). Relational instance-based learning. In Proceedings of the 13th International Conference on Machine Learning (pp. 122–130). Morgan Kaufmann.Google Scholar
  15. Evgeniou, T., Pontil, M., & Poggio, T. (2000). Regularization networks and support vector machines. Advances in Computational Mathematics.Google Scholar
  16. Gärtner, T. (2002). Exponential and geometric kernels for graphs. In NIPS Workshop on Unreal Data: Principles of Modeling Nonvectorial Data.Google Scholar
  17. Gärtner, T. (2003). A survey of kernels for structured data. SIGKDD Explorations.Google Scholar
  18. Gärtner, T., Flach, P. A., Kowalczyk, A., & Smola, A. J. (2002). Multi-instance kernels. In C. Sammut & A. Hoffmann (Eds.), Proceedings of the 19th International Conference on Machine Learning (pp. 179–186). Morgan Kaufmann.Google Scholar
  19. Gärtner, T., Flach, P. A., & Wrobel, S. (2003). On graph kernels: Hardness results and efficient alternatives. In Proceedings of the 16th Annual Conference on Computational Learning Theory and the 7th Kernel Workshop.Google Scholar
  20. Haussler, D. (1999). Convolution kernels on discrete structures.Technical report, Department of Computer Science, University of California at Santa Cruz.Google Scholar
  21. Horváth, T., Wrobel, S., & Bohnebeck, U. (2001). Relational instance-based learning with lists and terms. Machine Learning, 43:1/2, 53–80.Google Scholar
  22. Jones, S. P., & Hughes J. (Eds.) (1998). Haskell98: A Non-Strict Purely Functional Language. Available at http://haskell.org/.Google Scholar
  23. Kashima, H., & Inokuchi, A. (2002). Kernels for graph classification. In ICDM Workshop on Active Mining.Google Scholar
  24. Keeler, J. D., Rumelhart, D. E., & Leow, W.-K. (1991). Integrated segmentation and recognition of hand-printed numerals. In R. Lippmann, J. Moody, & D. Touretzky (Eds.), Advances in neural information processing systems, Vol. 3 (pp. 557–563). Morgan Kaufmann.Google Scholar
  25. Lloyd, J. W. (2003). Logic for learning. Springer-Verlag.Google Scholar
  26. Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., & Watkins, C. (2002). Text classification using string kernels. Journal of Machine Learning Research, 2, 419–444.Google Scholar
  27. Maron, O., & Lozano-Pérez, T. (1998). A framework for multiple-instance learning. In M. I. Jordan, M. J. Kearns, & S. A. Solla (Eds.), Advances in neural information processing systems, Vol. 10. MIT Press.Google Scholar
  28. Michie, D., Muggleton, S., Page, D., & Srinivasan, A. (1994). To the international computing community: A new EastWest challenge. Technical report, Oxford University Computing laboratory, Oxford, UK.Google Scholar
  29. Müller, K.-R., Mika, S., Rätsch, G., Tsuda, K., & Schölkopf, B. (2001). An introduction to kernel-based learning algorithms. IEEE Transactions on Neural Networks, 2:2.Google Scholar
  30. Provost, F., & Fawcett, T. (2001). Robust classification for imprecise environments. Machine Learing, 42:3, 203–231.Google Scholar
  31. Quinlan, J. (1990). Learning logical definitions from relations. Machine Learning, 5:3, 239–266.Google Scholar
  32. Ramon, J., & Bruynooghe, M. (2001). A polynomial time computable metric between point sets. Acta Informatica, 37:10, 765–780.Google Scholar
  33. Ramon, J., & De Raedt, L. (2000). Multi instance neural networks. In Attribute-Value and Relational Learning: Crossing the Boundaries.AWorkshop at the Seventeenth International Conference on Machine Learning (ICML-2000).Google Scholar
  34. Schölkopf, B., Herbrich, R., & Smola, A. J. (2001). A generalized representer theorem. In Proceedings of the 14th Annual Conference on Learning Theory.Google Scholar
  35. Schölkopf, B., & Smola, A. J. (2002). Learning with kernels. MIT Press.Google Scholar
  36. Schölkopf, B., Smola, A. J., & Müller, K.-R. (1999). Kernel principal component analysis. In B. Schölkopf, C. Burges, & A. Smola (Eds.), Advances in kernel methods-support vector learning ( pp. 327–352). MIT Press.Google Scholar
  37. Tikhonov, A. N., & Arsenin, V. Y. (1977). Solutions of Ill-posed problems. W.H. Winston.Google Scholar
  38. Vapnik, V. (1995). The nature of statistical learning theory. Springer-Verlag.Google Scholar
  39. Wahba, G. (1990). Spline Models for Observational Data, Vol. 59 of CBMS-NSF Regional Conference Series in Applied Mathematics. Philadelphia: SIAM.Google Scholar
  40. Witten, I. H., & Frank, E. (2000). Data mining: practical machine learning tools and techniques with Java implementations. Morgan Kaufmann.Google Scholar
  41. Zhang, Q., & Goldman, S. (2002). EM-DD: An improved multiple-instance learning technique. In T. Dietterich, S. Becker, & Z. Ghahramani (Eds.), Advances in neural information processing systems, Vol. 14. MIT Press.Google Scholar
  42. Zien, A., Ratsch, G., Mika, S., Schölkopf, B., Lengauer, T., & Muller, K.-R. (2000). Engineering support vector machine kernels that recognize translation initiation sites. Bioinformatics, 16:9, 799–807.Google Scholar

Copyright information

© Kluwer Academic Publishers 2004

Authors and Affiliations

  • Thomas Gärtner
    • 1
    • 2
  • John W. Lloyd
    • 3
  • Peter A. Flach
    • 4
  1. 1.Fraunhofer Institut Autonome Intelligente Systeme, Germany; Department of Computer ScienceUniversity of BristolUnited Kingdom
  2. 2.Department of Computer Science IIIUniversity of BonnGermany
  3. 3.Research School of Information Sciences and EngineeringThe Australian National UniversityAustralia
  4. 4.Machine Learning, Department of Computer ScienceUniversity of BristolUnited Kingdom

Personalised recommendations