Joint Kernel Maps

  • Jason Weston
  • Bernhard Schölkopf
  • Olivier Bousquet
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3512)


We develop a methodology for solving high dimensional dependency estimation problems between pairs of data types, which is viable in the case where the output of interest has very high dimension, e.g., thousands of dimensions. This is achieved by mapping the objects into continuous or discrete spaces, using joint kernels. Known correlations between input and output can be defined by such kernels, some of which can maintain linearity in the outputs to provide simple (closed form) pre-images. We provide examples of such kernels and empirical results.


Support Vector Machine Support Vector Regression Machine Translation Ridge Regression Reproduce Kernel Hilbert Space 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Mercer, J.: Functions of positive and negative type and their connection with the theory of integral equations. Philosophical Transactions of the Royal Society A 209, 415–446 (1909)CrossRefGoogle Scholar
  2. 2.
    Aizerman, M.A., Braverman, E.M., Rozonoer, L.I.: Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control 25, 821–837 (1964)MathSciNetGoogle Scholar
  3. 3.
    Boser, B.E., Guyon, I.M., Vapnik, V.: A training algorithm for optimal margin classifiers. In: Haussler, D. (ed.) Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, Pittsburgh, PA, pp. 144–152. ACM Press, New York (1992)CrossRefGoogle Scholar
  4. 4.
    Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, New York (1995)zbMATHGoogle Scholar
  5. 5.
    Schölkopf, B., Smola, A.J.: Learning with Kernels. MIT Press, Cambridge (2002)Google Scholar
  6. 6.
    Kimeldorf, G.S., Wahba, G.: Some results on Tchebycheffian spline functions. Journal of Mathematical Analysis and Applications 33, 82–95 (1971)zbMATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Weston, J., Chapelle, O., Elisseeff, A., Schölkopf, B., Vapnik, V.: Kernel dependency estimation. Neural Processing Information Systems 15 (2002)Google Scholar
  8. 8.
    Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and structured output spaces. ICML (2004)Google Scholar
  9. 9.
    Haussler, D.: Convolution kernels on discrete structure. Technical report, UC Santa Cruz (1999)Google Scholar
  10. 10.
    Watkins, C.: Dynamic alignment kernels. In: Smola, A.J., Bartlett, P.L., Schölkopf, B., Schuurmans, D. (eds.) Advances in Large Margin Classifiers, pp. 39–50. MIT Press, Cambridge (2000)Google Scholar
  11. 11.
    Vapnik, V.N.: Statistical Learning Theory. Springer, Heidelberg (1998)zbMATHGoogle Scholar
  12. 12.
    Pérez-Cruz, F., Camps, G., Soria, E., Pérez, J., Figueiras-Vidal, A.R., Artés-Rodrguez, A.: Multi-dimensional function approximation and regression estimation. In: ICANN 2002 (2002)Google Scholar
  13. 13.
    Weston, J., Watkins, C.: Multi-class support vector machines. Royal Holloway Technical Report CSD-TR-98-04 (1998)Google Scholar
  14. 14.
    Hofmann, T., Tsochantaridis, I., Altun, Y.: Learning over discrete output spaces via joint kernel functions. Kernel Methods Workshop, Neural Processing Information Systems 15 (2002)Google Scholar
  15. 15.
    Collins, M., Duffy, N.: Convolution kernels for natural language. Neural Processing Information Systems 14 (2001)Google Scholar
  16. 16.
    Micchelli, C.A., Pontil, M.: On learning vector-valued functions. Research Note RN/03/08, Dept of Computer Science, UCL (2003)Google Scholar
  17. 17.
    Guestrin, C., Taskar, B., Koller, D.: Max-margin markov networks. Neural Information Processing Systems 16 (2003)Google Scholar
  18. 18.
    Kwok, J.T., Tsang, I.W.: Finding the pre-images in kernel principal component analysis. In: 6th Annual Workshop On Kernel Machines, Whistler, Canada (2002)Google Scholar
  19. 19.
    Bakir, G.H., Weston, J., Schölkopf, B.: Learning to find pre-images. Advances in Neural Information Processing Systems 16 (2004)Google Scholar
  20. 20.
    Saunders, C., Gammerman, A., Vovk, V.: Ridge regression learning algorithm in dual variables. In: Proceedings of the Fifteenth International Conference on Machine Learning, pp. 515–521. Morgan Kaufmann Publishers Inc, San Francisco (1998)Google Scholar
  21. 21.
    Max Planck Institute Face Database,
  22. 22.
    Blanz, V., Vetter, T.: A morphable model for the synthesis of 3d faces. In: SIGGRAPH 1999, pp. 187–194 (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Jason Weston
    • 1
  • Bernhard Schölkopf
    • 2
  • Olivier Bousquet
    • 2
  1. 1.NEC Laboratories, AmericaPrincetonUSA
  2. 2.Max Planck Institute for Biological CyberneticsTübingenGermany

Personalised recommendations