Kernels, Pre-images and Optimization

  • John C. Snyder
  • Sebastian Mika
  • Kieron Burke
  • Klaus-Robert Müller


In the last decade, kernel-based learning has become a state-of-the-art technology in Machine Learning. We briefly review kernel PCAKernel principal component analysis (kPCA) (kPCA) and the pre-image problem that occurs in kPCA. Subsequently, we discuss a novel direction where kernel-based models are used for property optimization. For this purpose, a stable estimation of the model’s gradient is essential and non-trivial to achieve. The appropriate use of pre-image projections is key to successful gradient-based optimization—as will be shown for toy and real-world problems from quantum chemistry and physics.


Feature Space Tangent Space Input Space Kernel Matrix Local Linear Embedding 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



KRM thanks Vladimir N. Vapnik for continuous mentorship and collaboration since their first discussion in April 1995. This wonderful and serendipitous moment has profoundly changed the scientific agenda of KRM. From then on, KRM’s IDA group—then at GMD FIRST in Berlin—and later the offspring of this group have contributed actively to the exciting research on kernel methods. KRM acknowledges funding by the DFG, the BMBF, the EU and other sources that have helped in this endeavour. This work is supported by the World Class University Program through the National Research Foundation of Korea, funded by the Ministry of Education, Science, and Technology (grant R31-10008). JS and KB thank the NSF (Grant No. CHE-1240252) for funding.


  1. 1.
    Bartók, A.P., Payne, M.C., Kondor, R., Csányi, G.: Gaussian approximation potentials: the accuracy of quantum mechanics, without the electrons. Phys. Rev. Lett. 104, 136403 (2010)CrossRefGoogle Scholar
  2. 2.
    Baudat, G., Anouar, F.: Generalized discriminant analysis using a kernel approach. Neural Comput. 12(10), 2385–2404 (2000)CrossRefGoogle Scholar
  3. 3.
    Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15(6), 1373–1396 (2003)CrossRefMATHGoogle Scholar
  4. 4.
    Boser, B., Guyon, I., Vapnik, V.: A training algorithm for optimal margin classifiers. In: Haussler, D. (ed.) Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, Pittsburgh, pp. 144–152 (1992)Google Scholar
  5. 5.
    Bradley, P., Fayyad, U., Mangasarian, O.: Mathematical programming for data mining: formulations and challenges. J. Comput. 11(3), 217–238 (1999)MathSciNetMATHGoogle Scholar
  6. 6.
    Braun, M., Buhmann, J., Müller, K.R.: On relevant dimensions in kernel feature spaces. J. Mach. Learn. Res. 9, 1875–1908 (2008)MathSciNetMATHGoogle Scholar
  7. 7.
    Burges, C.: A tutorial on support vector machines for pattern recognition. Knowl. Discov. Data Min. 2(2), 121–167 (1998)CrossRefGoogle Scholar
  8. 8.
    Burke, K.: Perspective on density functional theory. J. Chem. Phys. 136(15), 150,901 (2012)Google Scholar
  9. 9.
    Coifman, R.R., Lafon, S.: Diffusion maps. Appl. Comput. Harmon. Anal. 21(1), 5–30 (2006)MathSciNetCrossRefMATHGoogle Scholar
  10. 10.
    Cortes, C., Vapnik, V.: Support vector networks. Mach. Learn. 20, 273–297 (1995)MATHGoogle Scholar
  11. 11.
    Diamantaras, K., Kung, S.: Principal Component Neural Networks. Wiley, New York (1996)MATHGoogle Scholar
  12. 12.
    Donoho, D.L., Grimes, C.: Hessian eigenmaps: locally linear embedding techniques for high-dimensional data. Proc. Natl. Acad. Sci. 100(10), 5591–5596 (2003)MathSciNetCrossRefMATHGoogle Scholar
  13. 13.
    Dreizler, R.M., Gross, E.K.U.: Density Functional Theory: An Approach to the Quantum Many-Body Problem. Springer, New York (1990)CrossRefMATHGoogle Scholar
  14. 14.
    Farabet, C., Couprie, C., Najman, L., LeCun, Y.: Learning hierarchical features for scene labeling. IEEE Trans. Pattern Anal. Mach. Intell. (2013, in press)Google Scholar
  15. 15.
    Gestel, T.V., Suykens, J.A.K., Brabanter, J.D., Moor, B.D., Vandewalle, J.: Kernel canonical correlation analysis and least squares support vector machines. In: Proceedings of the International Conference on Artificial Neural Networks (ICANN 2001), Vienna, pp. 381–386 (2001)Google Scholar
  16. 16.
    Harmeling, S., Ziehe, A., Kawanabe, M., Müller, K.R.: Kernel-based nonlinear blind source separation. Neural Comput. 15, 1089–1124 (2003)CrossRefMATHGoogle Scholar
  17. 17.
    Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Data Mining, Inference, and Prediction, 2nd edn. Springer, New York (2009)Google Scholar
  18. 18.
    Hohenberg, P., Kohn, W.: Inhomogeneous electron gas. Phys. Rev. B 136(3B), 864–871 (1964)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Joachims, T.: Making large-scale SVM learning practical. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods—Support Vector Learning, pp. 169–184. MIT, Cambridge (1999)Google Scholar
  20. 20.
    Kohn, W., Sham, L.J.: Self-consistent equations including exchange and correlation effects. Phys. Rev. A 140(4A), 1133–1138 (1965)MathSciNetCrossRefGoogle Scholar
  21. 21.
    Laskov, P., Gehl, C., Krüger, S., Müller, K.R.: Incremental support vector learning: analysis, implementation and applications. J. Mach. Learn. Res. 7, 1909–1936 (2006)MathSciNetMATHGoogle Scholar
  22. 22.
    Mika, S., Rätsch, G., Weston, J., Schölkopf, B., Müller, K.R.: Fisher discriminant analysis with kernels. In: Hu, Y.H., Larsen, J., Wilson, E., Douglas, S. (eds.) Neural Networks for Signal Processing IX, pp. 41–48. IEEE, New York (1999)Google Scholar
  23. 23.
    Mika, S., Schölkopf, B., Smola, A., Müller, K.R., Scholz, M., Rätsch, G.: Kernel PCA and de-noising in feature spaces. In: Kearns, M., Solla, S., Cohn, D. (eds.) Advances in Neural Information Processing Systems, vol. 11, pp. 536–542. MIT, Cambridge (1999)Google Scholar
  24. 24.
    Mika, S., Rätsch, G., Weston, J., Schölkopf, B., Smola, A., Müller, K.R.: Constructing descriptive and discriminative nonlinear features: Rayleigh coefficients in kernel feature spaces. IEEE Trans. Patterns Anal. Mach. Intell. 25(5), 623–627 (2003)CrossRefGoogle Scholar
  25. 25.
    Montavon, G., Braun, M., Krüger, T., Müller, K.R.: Analyzing local structure in kernel-based learning: explanation, complexity and reliability assessment. IEEE Signal Process. Mag. 30(4), 62–74 (2013)CrossRefGoogle Scholar
  26. 26.
    Montavon, G., Braun, M., Müller, K.R.: A kernel analysis of deep networks. J. Mach. Learn. Res. 12, 2579–2597 (2011)Google Scholar
  27. 27.
    Montavon, G., Müller, K.R.: Big learning and deep neural networks. In: Montavon, G., Orr, G.B., Müller, K.R. (eds.) Neural Networks: Tricks of the Trade, Lecture Notes in Computer Science, vol. 7700, pp. 419–420. Springer, Berlin/Heidelberg (2012)CrossRefGoogle Scholar
  28. 28.
    Montavon, G., Orr, G., Müller, K.R. (eds.): Neural Networks: Tricks of the Trade, vol. 7700. In: LNCS. Springer, New York (2012)CrossRefGoogle Scholar
  29. 29.
    Müller, K.R., Mika, S., Rätsch, G., Tsuda, K., Schölkopf, B.: An introduction to kernel-based learning algorithms. IEEE Trans. Neural Netw. 12(2), 181–201 (2001)CrossRefGoogle Scholar
  30. 30.
    Platt, J.: Fast training of support vector machines using sequential minimal optimization. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods — Support Vector Learning, pp. 185–208. MIT, Cambridge (1999)Google Scholar
  31. 31.
    Pozun, Z.D., Hansen, K., Sheppard, D., Rupp, M., Müller, K.R., Henkelman, G.: Optimizing transition states via kernel-based machine learning. J. Chem. Phys. 136(17), 174101 (2012)CrossRefGoogle Scholar
  32. 32.
    Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)CrossRefGoogle Scholar
  33. 33.
    Rupp, M., Tkatchenko, A., Müller, K.R., von Lilienfeld, O.A.: Fast and accurate modeling of molecular atomization energies with machine learning. Phys. Rev. Lett. 108(5), 058301 (2012)CrossRefGoogle Scholar
  34. 34.
    Schölkopf, B., Smola, A., Müller, K.: Nonlinear component analysis as a kernel eigenvalue problem. Neural comput. 10(5), 1299–1319 (1998)CrossRefGoogle Scholar
  35. 35.
    Scholkopf, B., Mika, S., Burges, C., Knirsch, P., Muller, K.R., Ratsch, G., Smola, A.: Input space versus feature space in kernel-based methods. IEEE Trans. Neural Netw. 10, 1000–1017 (1999)CrossRefGoogle Scholar
  36. 36.
    Schölkopf, B., Platt, J., Shawe-Taylor, J., Smola, A., Williamson, R.: Estimating the support of a high-dimensional distribution. Neural Comput. 13(7), 1443–1471 (2001)CrossRefMATHGoogle Scholar
  37. 37.
    Smola, A., Mika, S., Schölkopf, B., Williamson, R.: Regularized principal manifolds. J. Mach. Learn. Res. 1, 179–209 (2001)MathSciNetMATHGoogle Scholar
  38. 38.
    Snyder, J.C., Rupp, M., Hansen, K., Müller, K.R., Burke, K.: Finding density functionals with machine learning. Phys. Rev. Lett. 108, 253002 (2012)CrossRefGoogle Scholar
  39. 39.
    Snyder, J.C., Rupp, M., Hansen, K., Blooston, L., Müller, K.R., Burke, K.: Orbital-free bond breaking via machine learning. Submitted to J. Chem. Phys. (2013)Google Scholar
  40. 40.
    Snyman, J.A.: Practical Mathematical Optimization. Springer, New York (2005)MATHGoogle Scholar
  41. 41.
    Tipping, M.: The relevance vector machine. In: Solla, S., Leen, T., Müller, K.R. (eds.) Advances in Neural Information Processing Systems, vol. 12, pp. 652–658. MIT, Cambridge (2000)Google Scholar
  42. 42.
    Tresp, V.: Scaling kernel-based systems to large data sets. Data Min. Knowl. Discov. 5, 197–211 (2001)CrossRefMATHGoogle Scholar
  43. 43.
    Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)CrossRefMATHGoogle Scholar
  44. 44.
    Wang, J.: Improve local tangent space alignment using various dimensional local coordinates. Neurocomputing 71(16), 3575–3581 (2008)CrossRefGoogle Scholar
  45. 45.
    Zhang, Z.Y., Zha, H.Y.: Principal manifolds and nonlinear dimensionality reduction via tangent space alignment. J. Shanghai University (English Edition) 8(4), 406–424 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.Department of Chemistry, Department of PhysicsUniversity of CaliforniaIrvineUSA
  2. 2.idalab GmbHBerlinGermany
  3. 3.Machine Learning GroupTechnical University of BerlinBerlinGermany
  4. 4.Department of Brain and Cognitive EngineeringKorea UniversitySeoulKorea

Personalised recommendations