Skip to main content

Part of the book series: Springer Handbooks ((SHB))

  • 11k Accesses

Abstract

This chapter addresses the study of kernel methods, a class of techniques that play a major role in machine learning and nonparametric statistics. Among others, these methods include support vector machines (GlossaryTerm

SVM

s) and least squares GlossaryTerm

SVM

s, kernel principal component analysis, kernel Fisher discriminant analysis, and Gaussian processes. The use of kernel methods is systematic and properly motivated by statistical principles. In practical applications, kernel methods lead to flexible predictive models that often outperform competing approaches in terms of generalization performance. The core idea consists of mapping data into a high-dimensional space by means of a feature map. Since the feature map is normally chosen to be nonlinear, a linear model in the feature space corresponds to a nonlinear rule in the original domain. This fact suits many real world data analysis problems that often require nonlinear models to describe their structure.

In Sect. 32.1 we present historical notes and summarize the main ingredients of kernel methods. In Sect. 32.2 we present the core ideas of statistical learning and show how regularization can be employed to devise practical learning algorithms. In Sect. 32.3 we show a selection of techniques that are representative of a large class of kernel methods; these techniques – termed primal–dual methods – use Lagrange duality as the main mathematical tools. Section 32.4 discusses Gaussian processes, a class of kernel methods that uses a Bayesian approach to perform inference and learning. Section 32.5 recalls different approaches for the tuning of parameters. In Sect. 32.6 we review the mathematical properties of different yet equivalent notions of kernels and recall a number of specialized kernels for learning problems involving structured data. We conclude the chapter by presenting applications in Sect. 32.7.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 269.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 349.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Abbreviations

ERM:

empirical risk minimization

GACV:

generalized approximate cross-validation

GP:

Gaussian process

HS:

Hilbert space

i.i.d.:

independent, identically distributed

KKT:

Karush–Kuhn–Tucker

LASSO:

least absolute shrinkage and selection operator

LOO:

leave-one-out

LS:

least square

MAP:

maximum a posteriori

MEG:

magnetoencephalography

MKL:

multiple kernel learning

ML:

maximum likelihood

MLP:

multilayer perceptron

PCA:

principal component analysis

QP:

quadratic programming

r.k.:

reproducing kernel

RBF:

radial basis function

RKHS:

reproducing kernel Hilbert space

SMO:

sequential minimum optimization

SRM:

structural risk minimization

SVC:

support vector classification

SVD:

singular value decomposition

SVM:

support vector machine

VC:

Vapnik–Chervonenkis

References

  1. J. Shawe-Taylor, N. Cristianini: Kernel Methods for Pattern Analysis (Cambridge Univ. Press, Cambridge 2004)

    Book  MATH  Google Scholar 

  2. B. Schölkopf, A.J. Smola: Learning with Kernels: Support Vector Machines, Regularization, Optimization, Beyond (MIT Press, Cambridge 2002)

    Google Scholar 

  3. A.J. Smola, B. Schölkopf: A tutorial on support vector regression, Stat. Comput. 14(3), 199–222 (2004)

    Article  MathSciNet  Google Scholar 

  4. T. Hofmann, B. Schölkopf, A.J. Smola: Kernel methods in machine learning, Ann. Stat. 36(3), 1171–1220 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  5. K.R. Müller, S. Mika, G. Ratsch, K. Tsuda, B. Schölkopf: An introduction to kernel-based learning algorithms, IEEE Trans. Neural Netw. 12(2), 181–201 (2001)

    Article  Google Scholar 

  6. F. Jäkel, B. Schölkopf, F.A. Wichmann: A tutorial on kernel methods for categorization, J. Math. Psychol. 51(6), 343–358 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  7. C. Campbell: Kernel methods: A survey of current techniques, Neurocomputing 48(1), 63–84 (2002)

    Article  MATH  Google Scholar 

  8. J. Mercer: Functions of positive and negative type, and their connection with the theory of integral equations, Philos. Trans. R. Soc. A 209, 415–446 (1909)

    Article  MATH  Google Scholar 

  9. E.H. Moore: On properly positive Hermitian matrices, Bull. Am. Math. Soc. 23(59), 66–67 (1916)

    MATH  Google Scholar 

  10. T. Kailath: RKHS approach to detection and estimation problems – I: Deterministic signals in Gaussian noise, IEEE Trans. Inf. Theory 17(5), 530–549 (1971)

    Article  MathSciNet  MATH  Google Scholar 

  11. E. Parzen: An approach to time series analysis, Ann. Math. Stat. 32, 951–989 (1961)

    Article  MathSciNet  MATH  Google Scholar 

  12. N. Aronszajn: Theory of reproducing kernels, Trans. Am. Math. Soc. 68, 337–404 (1950)

    Article  MathSciNet  MATH  Google Scholar 

  13. G. Wahba: Spline Models for Observational Data, CBMS-NSF Regional Conference Series in Applied Mathematics, Vol. 59 (SIAM, Philadelphia 1990)

    Book  MATH  Google Scholar 

  14. A. Berlinet, C. Thomas-Agnan: Reproducing Kernel Hilbert Spaces in Probability and Statistics (Springer, New York 2004)

    Book  MATH  Google Scholar 

  15. S. Saitoh: Integral Transforms, Reproducing Kernels and Their Applications, Chapman Hall/CRC Research Notes in Mathematics, Vol. 369 (Longman, Harlow 1997)

    MATH  Google Scholar 

  16. M. Aizerman, E.M. Braverman, L.I. Rozonoer: Theoretical foundations of the potential function method in pattern recognition learning, Autom. Remote Control 25, 821–837 (1964)

    MathSciNet  Google Scholar 

  17. V. Vapnik: Pattern recognition using generalized portrait method, Autom. Remote Control 24, 774–780 (1963)

    Google Scholar 

  18. V. Vapnik, A. Chervonenkis: A note on one class of perceptrons, Autom. Remote Control 25(1), 112–120 (1964)

    MATH  Google Scholar 

  19. V. Vapnik, A. Chervonenkis: Theory of Pattern Recognitition (Nauka, Moscow 1974), in Russian, German Translation: W. Wapnik, A. Tscherwonenkis, Theorie der Zeichenerkennung (Akademie-Verlag, Berlin 1979)

    MATH  Google Scholar 

  20. V. Vapnik: Estimation of Dependences Based on Empirical Data (Springer, New York 1982)

    MATH  Google Scholar 

  21. B.E. Boser, I.M. Guyon, V.N. Vapnik: A training algorithm for optimal margin classifiers, Proc. 5th Ann. ACM Workshop Comput. Learn. Theory, ed. by D. Haussler (1992) pp. 44–152

    Google Scholar 

  22. I. Guyon, B. Boser, V. Vapnik: Automatic capacity tuning of very large VC-dimension classifiers, Adv. Neural Inf. Process. Syst. 5, 147–155 (1993)

    Google Scholar 

  23. I. Guyon, V. Vapnik, B. Boser, L. Bottou, S.A. Solla: Structural risk minimization for character recognition, Adv. Neural Inf. Process. Syst. 4, 471–479 (1992)

    Google Scholar 

  24. C. Cortes, V. Vapnik: Support vector networks, Mach. Learn. 20, 273–297 (1995)

    MATH  Google Scholar 

  25. V. Vapnik: The Nature of Statistical Learning Theory (Springer, New York 1995)

    Book  MATH  Google Scholar 

  26. J.A.K. Suykens, T. Van Gestel, J. De Brabanter, B. De Moor, J. Vandewalle: Least squares support vector machines (World Scientific, Singapore 2002)

    Book  MATH  Google Scholar 

  27. O. Chapelle, B. Schölkopf, A. Zien: Semi-Supervised Learning (MIT Press, Cambridge 2006)

    Book  Google Scholar 

  28. M. Belkin, P. Niyogi: Semi-supervised learning on Riemannian manifolds, Mach. Learn. 56(1), 209–239 (2004)

    Article  MATH  Google Scholar 

  29. M. Belkin, P. Niyogi, V. Sindhwani: Manifold regularization: A geometric framework for learning from labeled and unlabeled examples, J. Mach. Learn. Res. 7, 2399–2434 (2006)

    MathSciNet  MATH  Google Scholar 

  30. M. Belkin, P. Niyogi: Laplacian eigenmaps for dimensionality reduction and data representation, Neural Comput. 15(6), 1373–1396 (2003)

    Article  MATH  Google Scholar 

  31. V. Sindhwani, P. Niyogi, M. Belkin: Beyond the point cloud: From transductive to semi-supervised learning, Int. Conf. Mach. Learn. (ICML), Vol. 22 (2005) pp. 824–831

    Google Scholar 

  32. V. Vapnik, A. Chervonenkis: The necessary and sufficient conditions for consistency in the empirical risk minimization method, Pattern Recognit. Image Anal. 1(3), 283–305 (1991)

    Google Scholar 

  33. V. Vapnik, A. Chervonenkis: Uniform convergence of frequencies of occurrence of events to their probabilities, Dokl. Akad. Nauk SSSR 181, 915–918 (1968)

    MATH  Google Scholar 

  34. V. Vapnik, A. Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities, Theory Probab. Appl. 16(2), 264–280 (1971)

    Article  MATH  Google Scholar 

  35. O. Bousquet, S. Boucheron, G. Lugosi: Introduction to statistical learning theory, Lect. Notes Comput. Sci. 3176, 169–207 (2004)

    Article  MATH  Google Scholar 

  36. F. Cucker, D.X. Zhou: Learning Theory: An Approximation Theory Viewpoint, Cambridge Monographs on Applied and Computational Mathematics (Cambridge Univ. Press, New York 2007)

    Book  MATH  Google Scholar 

  37. I. Steinwart, A. Christmann: Support Vector Machines, Information Science and Statistics (Springer, New York 2008)

    MATH  Google Scholar 

  38. V. Vapnik: Transductive inference and semi-supervised learning. In: Semi-Supervised Learning, ed. by O. Chapelle, B. Schölkopf, A. Zien (MIT Press, Cambridge 2006) pp. 453–472

    Google Scholar 

  39. A.N. Tikhonov: On the stability of inverse problems, Dokl. Akad. Nauk SSSR 39, 195–198 (1943)

    MathSciNet  Google Scholar 

  40. A.N. Tikhonov: Solution of incorrectly formulated problems and the regularization method, Sov. Math. Dokl. 5, 1035 (1963)

    MATH  Google Scholar 

  41. A.N. Tikhonov, V.Y. Arsenin: Solutions of Ill-posed Problems (W.H. Winston, Washington 1977)

    MATH  Google Scholar 

  42. J. Hadamard: Sur les problèmes aux dérivées partielles et leur signification physique, Princet. Univ. Bull. 13, 49–52 (1902)

    MathSciNet  Google Scholar 

  43. G. Kimeldorf, G. Wahba: Some results on Tchebycheffian spline functions, J. Math. Anal. Appl. 33, 82–95 (1971)

    Article  MathSciNet  MATH  Google Scholar 

  44. T. Evgeniou, M. Pontil, T. Poggio: Regularization networks and support vector machines, Adv. Comput. Math. 13(1), 1–50 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  45. B. Schölkopf, R. Herbrich, A.J. Smola: A generalized representer theorem, Proc. Ann. Conf. Comput. Learn. Theory (COLT) (2001) pp. 416–426

    Chapter  Google Scholar 

  46. F. Dinuzzo, B. Schölkopf: The representer theorem for Hilbert spaces: A necessary and sufficient condition, Adv. Neural Inf. Process. Syst. 25, 189–196 (2012)

    Google Scholar 

  47. S.P. Boyd, L. Vandenberghe: Convex Optimization (Cambridge Univ. Press, Cambridge 2004)

    Book  MATH  Google Scholar 

  48. A.E. Hoerl, R.W. Kennard: Ridge regression: Biased estimation for nonorthogonal problems, Technometrics 12(1), 55–67 (1970)

    Article  MathSciNet  MATH  Google Scholar 

  49. D.W. Marquardt: Generalized inverses, ridge regression, biased linear estimation, and nonlinear estimation, Technometrics 12(3), 591–612 (1970)

    Article  MATH  Google Scholar 

  50. C. Gu: Smoothing Spline ANOVA Models (Springer, New York 2002)

    Book  MATH  Google Scholar 

  51. D.P. Bertsekas: Nonlinear Programming (Athena Scientific, Belmont 1995)

    MATH  Google Scholar 

  52. R. Tibshirani: Regression shrinkage and selection via the LASSO, J. R. Stat. Soc. Ser. B 58(1), 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  53. P. Zhao, G. Rocha, B. Yu: The composite absolute penalties family for grouped and hierarchical variable selection, Ann. Stat. 37, 3468–3497 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  54. R. Jenatton, J.Y. Audibert, F. Bach: Structured variable selection with sparsity-inducing norms, J. Mach. Learn. Res. 12, 2777–2824 (2011)

    MathSciNet  MATH  Google Scholar 

  55. M. Yuan, Y. Lin: Model selection and estimation in regression with grouped variables, J. R. Stat. Soc. Ser. B 68(1), 49–67 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  56. C.A. Micchelli, M. Pontil: Learning the Kernel Function via Regularization, J. Mach. Learn. Res. 6, 1099–1125 (2005)

    MathSciNet  MATH  Google Scholar 

  57. C.A. Micchelli, M. Pontil: Feature space perspectives for learning the kernel, Mach. Learn. 66(2), 297–319 (2007)

    Article  Google Scholar 

  58. F.R. Bach, G.R.G. Lanckriet, M.I. Jordan: Multiple kernel learning, conic duality, and the SMO algorithm, Proc. 21st Int. Conf. Mach. Learn. (ICML) (ACM, New York 2004)

    Google Scholar 

  59. G.R.G. Lanckriet, T. De Bie, N. Cristianini, M.I. Jordan, W.S. Noble: A statistical framework for genomic data fusion, Bioinformatics 20(16), 2626–2635 (2004)

    Article  Google Scholar 

  60. F.R. Bach, R. Thibaux, M.I. Jordan: Computing regularization paths for learning multiple kernels, Adv. Neural Inf. Process. Syst. 17, 41–48 (2004)

    Google Scholar 

  61. J. Baxter: Theoretical models of learning to learn. In: Learning to Learn, ed. by L. Pratt, S. Thrun (Springer, New York 1997) pp. 71–94

    Google Scholar 

  62. R. Caruana: Multitask learning. In: Learning to Learn, ed. by S. Thrun, L. Pratt (Springer, New York 1998) pp. 95–133

    Chapter  Google Scholar 

  63. S. Thrun: Life-long learning algorithms. In: Learning to Learn, ed. by S. Thrun, L. Pratt (Springer, New York 1998) pp. 181–209

    Chapter  Google Scholar 

  64. A. Argyriou, T. Evgeniou, M. Pontil: Multi-task feature learning, Adv. Neural Inf. Process. Syst. 19, 41–48 (2007)

    Google Scholar 

  65. A. Argyriou, T. Evgeniou, M. Pontil: Convex multi-task feature learning, Mach. Learn. 73(3), 243–272 (2008)

    Article  Google Scholar 

  66. L. Debnath, P. Mikusiński: Hilbert Spaces with Application (Elsevier, San Diego 2005)

    MATH  Google Scholar 

  67. M. Fazel: Matrix Rank Minimization with Application, Ph.D. Thesis (Stanford University, Stanford 2002)

    Google Scholar 

  68. Z. Liu, L. Vandenberghe: Semidefinite programming methods for system realization and identification, Proc. 48th IEEE Conf. Decis. Control (CDC) (2009) pp. 4676–4681

    Google Scholar 

  69. Z. Liu, L. Vandenberghe: Interior-point method for nuclear norm approximation with application to system identification, SIAM J. Matrix Anal. Appl. 31(3), 1235–1256 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  70. M. Signoretto, J.A.K. Suykens: Convex estimation of cointegrated var models by a nuclear norm penalty, Proc. 16th IFAC Symp. Syst. Identif. (SYSID) (2012)

    Google Scholar 

  71. A. Argyriou, C.A. Micchelli, M. Pontil: On spectral learning, J. Mach. Learn. Res. 11, 935–953 (2010)

    MathSciNet  MATH  Google Scholar 

  72. J. Abernethy, F. Bach, T. Evgeniou, J.P. Vert: A new approach to collaborative filtering: Operator estimation with spectral regularization, J. Mach. Learn. Res. 10, 803–826 (2009)

    MATH  Google Scholar 

  73. P.L. Bartlett, S. Mendelson: Rademacher and Gaussian complexities: Risk bounds and structural results, J. Mach. Learn. Res. 3, 463–482 (2003)

    MathSciNet  MATH  Google Scholar 

  74. P.K. Shivaswamy, T. Jebara: Maximum relative margin and data-dependent regularization, J. Mach. Learn. Res. 11, 747–788 (2010)

    MathSciNet  MATH  Google Scholar 

  75. P.K. Shivaswamy, T. Jebara: Relative margin machines, Adv. Neural Inf. Process. Syst. 21(1–8), 7 (2008)

    MATH  Google Scholar 

  76. B. Schölkopf, A.J. Smola, R.C. Williamson, P.L. Bartlett: New support vector algorithms, Neural Comput. 12(5), 1207–1245 (2000)

    Article  Google Scholar 

  77. J. Platt: Fast training of support vector machines using sequential minimal optimization. In: Advances in Kernel Methods – Support Vector Learning, ed. by B. Schölkopf, C.J.C. Burges, A.J. Smola (MIT Press, Cambridge 1999) pp. 185–208

    Google Scholar 

  78. C.C. Chang, C.J. Lin: LIBSVM: a library for support vector machines, ACM Trans. Intell. Syst. Technol. 2(3), 27 (2011)

    Article  Google Scholar 

  79. R.E. Fan, P.H. Chen, C.J. Lin: Working set selection using second order information for training support vector machines, J. Mach. Learn. Res. 6, 1889–1918 (2005)

    MathSciNet  MATH  Google Scholar 

  80. T. Joachims: Making large–scale SVM learning practical. In: Advance in Kernel Methods – Support Vector Learning, ed. by B. Schölkopf, C.J.C. Burges, A.J. Smola (MIT Press, Cambridge 1999) pp. 169–184

    Google Scholar 

  81. J.A.K. Suykens, J. Vandewalle: Least squares support vector machine classifiers, Neural Process. Lett. 9(3), 293–300 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  82. J. Nocedal, S.J. Wright: Numerical Optimization (Springer, New York 1999)

    Book  MATH  Google Scholar 

  83. K. Pelckmans, J. De Brabanter, J.A.K. Suykens, B. De Moor: The differogram: Non-parametric noise variance estimation and its use for model selection, Neurocomputing 69(1), 100–122 (2005)

    Article  Google Scholar 

  84. K. Saadi, G.C. Cawley, N.L.C. Talbot: Fast exact leave-one-out cross-validation of least-square support vector machines, Eur. Symp. Artif. Neural Netw. (ESANN-2002) (2002)

    Google Scholar 

  85. R.M. Rifkin, R.A. Lippert: Notes on regularized least squares, Tech. Rep. MIT-CSAIL-TR-2007-025, CBCL-268 (2007)

    Google Scholar 

  86. T. Van Gestel, J.A.K. Suykens, B. Baesens, S. Viaene, J. Vanthienen, G. Dedene, B. De Moor, J. Vandewalle: Benchmarking least squares support vector machine classifiers, Mach. Learn. 54(1), 5–32 (2004)

    Article  MATH  Google Scholar 

  87. G. Baudat, F. Anouar: Generalized discriminant analysis using a kernel approach, Neural Comput. 12(10), 2385–2404 (2000)

    Article  Google Scholar 

  88. S. Mika, G. Rätsch, J. Weston, B. Schölkopf, K.R. Müllers: Fisher discriminant analysis with kernels, Proc. 1999 IEEE Signal Process. Soc. Workshop (1999) pp. 41–48

    Google Scholar 

  89. T. Poggio, F. Girosi: Networks for approximation and learning, Proc. IEEE 78(9), 1481–1497 (1990)

    Article  MATH  Google Scholar 

  90. C. Saunders, A. Gammerman, V. Vovk: Ridge regression learning algorithm in dual variables, Int. Conf. Mach. Learn. (ICML) (1998) pp. 515–521

    Google Scholar 

  91. N. Cressie: The origins of kriging, Math. Geol. 22(3), 239–252 (1990)

    Article  MathSciNet  MATH  Google Scholar 

  92. D.J.C. MacKay: Introduction to Gaussian processes, NATO ASI Ser. F Comput. Syst. Sci. 168, 133–166 (1998)

    MATH  Google Scholar 

  93. C.K.I. Williams, C.E. Rasmussen: Gaussian processes for regression, Advances in Neural Information Processing Systems, Vol.8 (MIT Press, Cambridge 1996) pp. 514–520

    Google Scholar 

  94. J.A.K. Suykens, T. Van Gestel, J. Vandewalle, B. De Moor: A support vector machine formulation to pca analysis and its kernel version, IEEE Trans. Neural Netw. 14(2), 447–450 (2003)

    Article  Google Scholar 

  95. C. Alzate, J.A.K. Suykens: Multiway spectral clustering with out-of-sample extensions through weighted kernel PCA, IEEE Trans. Pattern Anal. Mach. Intell. 32(2), 335–347 (2010)

    Article  Google Scholar 

  96. T. Van Gestel, J.A.K. Suykens, J. De Brabanter, B. De Moor, J. Vandewalle: Kernel canonical correlation analysis and least squares support vector machines, Lect. Notes Comput. Sci. 2130, 384–389 (2001)

    Article  MATH  Google Scholar 

  97. J.A.K. Suykens: Data visualization and dimensionality reduction using kernel maps with a reference point, IEEE Trans. Neural Netw. 19(9), 1501–1517 (2008)

    Article  Google Scholar 

  98. J.A.K. Suykens, J. Vandewalle: Recurrent least squares support vector machines, IEEE Trans. Circuits Syst. I: Fundam. Theory Appl. 47(7), 1109–1114 (2000)

    Article  Google Scholar 

  99. J.A.K. Suykens, J. Vandewalle, B. De Moor: Optimal control by least squares support vector machines, Neural Netw. 14(1), 23–35 (2001)

    Article  Google Scholar 

  100. J.A.K. Suykens, C. Alzate, K. Pelckmans: Primal and dual model representations in kernel-based learning, Stat. Surv. 4, 148–183 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  101. J.A.K. Suykens, J. De Brabanter, L. Lukas, J. Vandewalle: Weighted least squares support vector machines: Robustness and sparse approximation, Neurocomputing 48(1), 85–105 (2002)

    Article  MATH  Google Scholar 

  102. C.K.I. Williams, M. Seeger: Using the Nyström method to speed up kernel machines, Adv. Neural Inf. Process. Syst. 15, 682–688 (2001)

    Google Scholar 

  103. K. De Brabanter, J. De Brabanter, J.A.K. Suykens, B. De Moor: Optimized fixed-size kernel models for large data sets, Comput. Stat. Data Anal. 54(6), 1484–1504 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  104. B. Schölkopf, A. Smola, K.-R. Müller: Nonlinear component analysis as a kernel eigenvalue problem, Neural Comput. 10, 1299–1319 (1998)

    Article  Google Scholar 

  105. I. Jolliffe: Principle Component Analysis. In: Encyclopedia of Statistics in Behavioral Science, (Wiley, Chichester 2005)

    Google Scholar 

  106. J.A.K. Suykens, T. Van Gestel, J. Vandewalle, B. De Moor: A support vector machine formulation to PCA analysis and its kernel version, IEEE Trans. Neural Netw. 14(2), 447–450 (2003)

    Article  Google Scholar 

  107. N. Weiner: Extrapolation, Interpolation, Smoothing of Stationary Time Series with Engineering Applications (MIT Press, Cambridge 1949)

    Google Scholar 

  108. A.N. Kolmogorov: Sur l'interpolation et extrapolation des suites stationnaires, CR Acad. Sci. 208, 2043–2045 (1939)

    MATH  Google Scholar 

  109. C.E. Rasmussen, C.K.I. Williams: Gaussian Processes for Machine Learning, Vol. 1 (MIT Press, Cambridge 2006)

    MATH  Google Scholar 

  110. J.O. Berger: Statistical Decision Theory and Bayesian Analysis (Springer, New York 1985)

    Book  MATH  Google Scholar 

  111. K. Duan, S.S. Keerthi, A.N. Poo: Evaluation of simple performance measures for tuning svm hyperparameters, Neurocomputing 51, 41–59 (2003)

    Article  Google Scholar 

  112. P.L. Bartlett, S. Boucheron, G. Lugosi: Model selection and error estimation, Mach. Learn. 48(1), 85–113 (2002)

    Article  MATH  Google Scholar 

  113. N. Shawe-Taylor, A. Kandola: On kernel target alignment, Adv. Neural Inf. Process. Syst. 14(1), 367–373 (2002)

    Google Scholar 

  114. G.C. Cawley: Leave-one-out cross-validation based model selection criteria for weighted LS-SVMS, Int. Joint Conf. Neural Netw. (IJCNN) (2006) pp. 1661–1668

    Google Scholar 

  115. G.C. Cawley, N.L.C. Talbot: Preventing over-fitting during model selection via Bayesian regularisation of the hyper-parameters, J. Mach. Learn. Res. 8, 841–861 (2007)

    MATH  Google Scholar 

  116. D.J.C. MacKay: Bayesian interpolation, Neural Comput. 4, 415–447 (1992)

    Article  MATH  Google Scholar 

  117. D.J.C. MacKay: The evidence framework applied to classification networks, Neural Comput. 4(5), 720–736 (1992)

    Article  Google Scholar 

  118. D.J.C. MacKay: Probable networks and plausible predictions – A review of practical Bayesian methods for supervised neural networks, Netw. Comput. Neural Syst. 6(3), 469–505 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  119. I. Steinwart, D. Hush, C. Scovel: An explicit description of the reproducing kernel Hilbert spaces of Gaussian RBF kernels, IEEE Trans. Inform. Theory 52, 4635–4643 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  120. J.B. Conway: A Course in Functional Analysis (Springer, New York 1990)

    MATH  Google Scholar 

  121. F. Riesz, B.S. Nagy: Functional Analysis (Frederick Ungar, New York 1955)

    MATH  Google Scholar 

  122. I. Steinwart: On the influence of the kernel on the consistency of support vector machines, J. Mach. Learn. Res. 2, 67–93 (2002)

    MathSciNet  MATH  Google Scholar 

  123. T. Gärtner: Kernels for Structured Data, Machine Perception and Artificial Intelligence, Vol. 72 (World Scientific, Singapore 2008)

    MATH  Google Scholar 

  124. D. Haussler: Convolution kernels on discrete structures, Tech. Rep. (UC Santa Cruz, Santa Cruz 1999)

    Google Scholar 

  125. T. Jebara, R. Kondor, A. Howard: Probability product kernels, J. Mach. Learn. Res. 5, 819–844 (2004)

    MathSciNet  MATH  Google Scholar 

  126. T.S. Jaakkola, D. Haussler: Exploiting generative models in discriminative classifiers, Adv. Neural Inf. Process. Syst. 11, 487–493 (1999)

    Google Scholar 

  127. K. Tsuda, S. Akaho, M. Kawanabe, K.R. Müller: Asymptotic properties of the Fisher kernel, Neural Comput. 16(1), 115–137 (2004)

    Article  MATH  Google Scholar 

  128. S.V.N. Vishwanathan, N.N. Schraudolph, R. Kondor, K.M. Borgwardt: Graph kernels, J. Mach. Learn. Res. 11, 1201–1242 (2010)

    MathSciNet  MATH  Google Scholar 

  129. T. Gärtner, P. Flach, S. Wrobel: On graph kernels: Hardness results and efficient alternatives, Lect. Notes Comput. Sci. 2777, 129–143 (2003)

    Article  MATH  Google Scholar 

  130. S.V.N. Vishwanathan, A.J. Smola, R. Vidal: Binet-Cauchy kernels on dynamical systems and its application to the analysis of dynamic scenes, Int. J. Comput. Vis. 73(1), 95–119 (2007)

    Article  Google Scholar 

  131. P.M. Kroonenberg: Applied Multiway Data Analysis (Wiley, Hoboken 2008)

    Book  MATH  Google Scholar 

  132. M. Signoretto, L. De Lathauwer, J.A.K. Suykens: A kernel-based framework to tensorial data analysis, Neural Netw. 24(8), 861–874 (2011)

    Article  MATH  Google Scholar 

  133. L. De Lathauwer, B. De Moor, J. Vandewalle: A multilinear singular value decomposition, SIAM J. Matrix Anal. Appl. 21(4), 1253–1278 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  134. M. Signoretto, E. Olivetti, L. De Lathauwer, J.A.K. Suykens: Classification of multichannel signals with cumulant-based kernels, IEEE Trans. Signal Process. 60(5), 2304–2314 (2012)

    Article  MathSciNet  Google Scholar 

  135. Y. LeCun, L.D. Jackel, L. Bottou, A. Brunot, C. Cortes, J.S. Denker, H. Drucker, I. Guyon, U.A. Muller, E. Sackinger, P. Simard, V. Vapnik: Comparison of learning algorithms for handwritten digit recognition, Int. Conf. Artif. Neural Netw. (ICANN) 2 (1995) pp. 53–60

    Google Scholar 

  136. D. Decoste, B. Schölkopf: Training invariant support vector machines, Mach. Learn. 46(1), 161–190 (2002)

    Article  MATH  Google Scholar 

  137. V. Blanz, B. Schölkopf, H. Bülthoff, C. Burges, V. Vapnik, T. Vetter: Comparison of view-based object recognition algorithms using realistic 3D models, Lect. Notes Comput. Sci. 1112, 251–256 (1996)

    Article  Google Scholar 

  138. T. Joachims: Text categorization with support vector machines: Learning with many relevant features, Lect. Notes Comput. Sci. 1398, 137–142 (1998)

    Article  Google Scholar 

  139. S. Dumais, J. Platt, D. Heckerman, M. Sahami: Inductive learning algorithms and representations for text categorization, Proc. 7th Int. Conf. Inf. Knowl. Manag. (1998) pp. 148–155

    Google Scholar 

  140. S. Mukherjee, E. Osuna, F. Girosi: Nonlinear prediction of chaotic time series using support vector machines, 1997 IEEE Workshop Neural Netw. Signal Process. VII (1997) pp. 511–520

    Chapter  Google Scholar 

  141. D. Mattera, S. Haykin: Support vector machines for dynamic reconstruction of a chaotic system. In: Advances in Kernel Methods, ed. by B. Schölkopf, C.J.C. Burges, A.J. Smola (MIT Press, Cambridge 1999) pp. 211–241

    Google Scholar 

  142. K.R. Müller, A. Smola, G. Rätsch, B. Schölkopf, J. Kohlmorgen, V. Vapnik: Predicting time series with support vector machines, Lect. Notes Comput. Sci. 1327, 999–1004 (1997)

    Article  Google Scholar 

  143. M. Espinoza, J.A.K. Suykens, B. De Moor: Short term chaotic time series prediction using symmetric ls-svm regression, Proc. 2005 Int. Symp. Nonlinear Theory Appl. (NOLTA) (2005) pp. 606–609

    Google Scholar 

  144. M. Espinoza, T. Falck, J.A.K. Suykens, B. De Moor: Time series prediction using ls-svms, Eur. Symp. Time Ser. Prediction (ESTSP), Vol. 8 (2008) pp. 159–168

    Google Scholar 

  145. M. Espinoza, J.A.K. Suykens, R. Belmans, B. De Moor: Electric load forecasting, IEEE Control Syst. 27(5), 43–57 (2007)

    Article  MathSciNet  Google Scholar 

  146. T. Van Gestel, J.A.K. Suykens, D.E. Baestaens, A. Lambrechts, G. Lanckriet, B. Vandaele, B. De Moor, J. Vandewalle: Financial time series prediction using least squares support vector machines within the evidence framework, IEEE Trans. Neural Netw. 12(4), 809–821 (2001)

    Article  Google Scholar 

  147. M.P.S. Brown, W.N. Grundy, D. Lin, N. Cristianini, C.W. Sugnet, T.S. Furey, M. Ares, D. Haussler: Knowledge-based analysis of microarray gene expression data by using support vector machines, Proc. Natl. Acad. Sci. USA 97(1), 262–267 (2000)

    Article  Google Scholar 

  148. J. Luts, F. Ojeda, R. Van de Plas, B. De Moor, S. Van Huffel, J.A.K. Suykens: A tutorial on support vector machine-based methods for classification problems in chemometrics, Anal. Chim. Acta 665(2), 129 (2010)

    Article  Google Scholar 

  149. A. Daemen, M. Signoretto, O. Gevaert, J.A.K. Suykens, B. De Moor: Improved microarray-based decision support with graph encoded interactome data, PLoS ONE 5(4), 1–16 (2010)

    Article  Google Scholar 

  150. S. Yu, L.C. Tranchevent, B. Moor, Y. Moreau: Kernel-based Data Fusion for Machine Learning, Studies in Computational Intelligence, Vol. 345 (Springer, Berlin 2011)

    MATH  Google Scholar 

  151. T. Jaakkola, M. Diekhans, D. Haussler: A discriminative framework for detecting remote protein homologies, J. Comput. Biol. 7(1/2), 95–114 (2000)

    Article  Google Scholar 

  152. C. Lu, T. Van Gestel, J.A.K. Suykens, S. Van Huffel, D. Timmerman, I. Vergote: Classification of ovarian tumors using Bayesian least squares support vector machines, Lect. Notes Artif. Intell. 2780, 219–228 (2003)

    Google Scholar 

  153. F. Ojeda, M. Signoretto, R. Van de Plas, E. Waelkens, B. De Moor, J.A.K. Suykens: Semi-supervised learning of sparse linear models in mass spectral imaging, Pattern Recognit. Bioinform. (PRIB) (Nijgmegen) (2010) pp. 325–334

    Chapter  Google Scholar 

  154. D. Widjaja, C. Varon, A.C. Dorado, J.A.K. Suykens, S. Van Huffel: Application of kernel principal component analysis for single lead ECG-derived respiration, IEEE Trans. Biomed. Eng. 59(4), 1169–1176 (2012)

    Article  Google Scholar 

  155. V. Van Belle, K. Pelckmans, S. Van Huffel, J.A.K. Suykens: Support vector methods for survival analysis: A comparison between ranking and regression approaches, Artif. Intell. Med. 53(2), 107–118 (2011)

    Article  Google Scholar 

  156. V. Van Belle, K. Pelckmans, S. Van Huffel, J.A.K. Suykens: Improved performance on high-dimensional survival data by application of survival-SVM, Bioinformatics 27(1), 87–94 (2011)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marco Signoretto .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Signoretto, M., Suykens, J.A.K. (2015). Kernel Methods. In: Kacprzyk, J., Pedrycz, W. (eds) Springer Handbook of Computational Intelligence. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-43505-2_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-43505-2_32

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-43504-5

  • Online ISBN: 978-3-662-43505-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics