A Generalized Representer Theorem

Schölkopf, Bernhard; Herbrich, Ralf; Smola, Alex J.

doi:10.1007/3-540-44581-1_27

Bernhard Schölkopf^3,4,5,
Ralf Herbrich^3,4 &
Alex J. Smola³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2111))

Included in the following conference series:

International Conference on Computational Learning Theory

3043 Accesses
596 Citations
3 Altmetric

Abstract

Wahba’s classical representer theorem states that the solutions of certain risk minimization problems involving an empirical risk term and a quadratic regularizer can be written as expansions in terms of the training examples. We generalize the theorem to a larger class of regularizers and empirical risk terms, and give a self-contained proof utilizing the feature space associated with a kernel. The result shows that a wide range of problems have optimal solutions that live in the finite dimensional span of the training examples mapped into feature space, thus enabling us to carry out kernel algorithms independent of the (potentially infinite) dimensionality of the feature space.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

M.A. Aizerman, É.M. Braverman, and L.I. Rozonoér. Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control, 25:821–837, 1964.
Google Scholar
N. Aronszajn. Theory of reproducing kernels. Transactions of the American Mathematical Society, 68:337–404, 1950.
Article MATH MathSciNet Google Scholar
P.L. Bartlett and J. Shawe-Taylor. Generalization performance of support vector machines and other pattern classifiers. In B. Schölkopf, C.J.C. Burges, and A.J. Smola, editors, Advances in Kernel Methods — Support Vector Learning, pages 43–54, Cambridge, MA, 1999. MIT Press.
Google Scholar
C. Berg, J.P.R. Christensen, and P. Ressel. Harmonic Analysis on Semigroups. Springer-Verlag, New York, 1984.
MATH Google Scholar
B.E. Boser, I.M. Guyon, and V.N. Vapnik. A training algorithm for optimal margin classifiers. In D. Haussler, editor, Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, pages 144–152, Pittsburgh, PA, July 1992. ACM Press.
Google Scholar
O. Bousquet and A. Elisseeff. Algorithmic stability and generalization performance. In T.K. Leen, T.G. Dietterich, and V. Tresp, editors, Advances in Neural Information ProcessingSystems 13. MIT Press, 2001.
Google Scholar
D. Cox and F.O’ Sullivan. Asymptotic analysis of penalized likelihood and related estimators. Annals of Statistics, 18:1676–1695, 1990.
Article MATH MathSciNet Google Scholar
L. Csató and M. Opper. Sparse representation for Gaussian process models. In T.K. Leen, T.G. Dietterich, and V. Tresp, editors, Advances in Neural Information ProcessingSystems 13. MIT Press, 2001.
Google Scholar
Y. Freund and R.E. Schapire. Large margin classification using the perceptron algorithm. In J. Shavlik, editor, Machine Learning: Proceedings of the Fifteenth International Conference, San Francisco, CA, 1998. Morgan Kaufmann.
Google Scholar
T.-T. Frieß, N. Cristianini, and C. Campbell. The kernel adatron algorithm: A fast and simple learning procedure for support vector machines. In J. Shavlik, editor, 15th International Conf. Machine Learning, pages 188–196. Morgan Kaufmann Publishers, 1998.
Google Scholar
C. Gentile. Approximate maximal margin classification with respect to an arbitrary norm. Unpublished.
Google Scholar
F. Girosi, M. Jones, and T. Poggio. Regularization theory and neural networks architectures. Neural Computation, 7(2):219–269, 1995.
Article Google Scholar
D. Haussler. Convolutional kernels on discrete structures. Technical Report UCSCCRL-99-10, Computer Science Department, University of California at Santa Cruz, 1999.
Google Scholar
G.S. Kimeldorf and G. Wahba. A correspondence between Bayesian estimation on stochastic processes and smoothing by splines. Annals of Mathematical Statistics, 41:495–502, 1970.
Article MATH MathSciNet Google Scholar
G.S. Kimeldorf and G. Wahba. Some results on Tchebycheffian spline functions. J. Math. Anal. Applic., 33:82–95, 1971.
Article MATH MathSciNet Google Scholar
J. Kivinen, A.J. Smola, P. Wankadia, and R.C. Williamson. On-line algorithms for kernel methods. in preparation, 2001.
Google Scholar
A. Kowalczyk. Maximal margin perceptron. In A.J. Smola, P.L. Bartlett, B. Schölkopf, and D. Schuurmans, editors, Advances in Large Margin Classifiers, pages 75–113, Cambridge, MA, 2000. MIT Press.
Google Scholar
H. Lodhi, J. Shawe-Taylor, N. Cristianini, and C. Watkins. Text classification using string kernels. Technical Report 2000-79, NeuroCOLT, 2000. Published in: T.K. Leen, T.G. Dietterich and V. Tresp (eds.), Advances in Neural Information ProcessingSystems 13, MIT Press, 2001.
Google Scholar
O.L. Mangasarian. Nonlinear Programming. McGraw-Hill, New York, NY, 1969.
MATH Google Scholar
J. Mercer. Functions of positive and negative type and their connection with the theory of integral equations. Philosophical Transactions of the Royal Society, London, A 209:415–446, 1909.
Article Google Scholar
B. Schölkopf, A. Smola, and K.-R. Müller. Kernel principal component analysis. In B. Schölkopf, C.J.C. Burges, and A.J. Smola, editors, Advances in Kernel Methods-Support Vector Learning, pages 327–352. MIT Press, Cambridge, MA, 1999.
Google Scholar
A. Smola, T. Frieß, and B. Schölkopf. Semiparametric support vector and linear programming machines. In M.S. Kearns, S.A. Solla, and D.A. Cohn, editors, Advances in Neural Information ProcessingSystems 11, pages 585–591, Cambridge, MA, 1999. MIT Press.
Google Scholar
V. Vapnik. The Nature of Statistical Learning Theory. Springer, NY, 1995.
MATH Google Scholar
G. Wahba. Spline Models for Observational Data, volume 59 of CBMS-NSF Regional Conference Series in Applied Mathematics. SIAM, Philadelphia, 1990.
Google Scholar
G. Wahba. Support vector machines, reproducing kernel Hilbert spaces and the randomized GACV. In B. Schölkopf, C.J.C. Burges, and A.J. Smola, editors, Advances in Kernel Methods — Support Vector Learning, pages 69–88, Cambridge, MA, 1999. MIT Press.
Google Scholar
C. Watkins. Dynamic alignment kernels. In A.J. Smola, P.L. Bartlett, B. Schölkopf, and D. Schuurmans, editors, Advances in Large Margin Classifiers, pages 39–50, Cambridge, MA, 2000. MIT Press.
Google Scholar
C.K.I. Williams. Prediction with Gaussian processes: From linear regression to linear prediction and beyond. In M.I. Jordan, editor, Learningand Inference in Graphical Models. Kluwer, 1998.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Engineering, Australian National University, Canberra, ACT, 0200, Australia
Bernhard Schölkopf, Ralf Herbrich & Alex J. Smola
Microsoft Research Ltd., 1 Guildhall Street, Cambridge, UK
Bernhard Schölkopf & Ralf Herbrich
Biowulf Technologies, Floor 9, 305 Broadway, New York, NY, 10007, USA
Bernhard Schölkopf

Authors

Bernhard Schölkopf
View author publications
You can also search for this author in PubMed Google Scholar
Ralf Herbrich
View author publications
You can also search for this author in PubMed Google Scholar
Alex J. Smola
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Engineering, Department of Computer Science, University of California, Santa Cruz, Santa Cruz, CA, 95064, USA
David Helmbold
Research School of Information Sciences and Engineering Department of Telecommunications Engineering, Australian National University, Canberra, 0200, Australia
Bob Williamson

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schölkopf, B., Herbrich, R., Smola, A.J. (2001). A Generalized Representer Theorem. In: Helmbold, D., Williamson, B. (eds) Computational Learning Theory. COLT 2001. Lecture Notes in Computer Science(), vol 2111. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44581-1_27

Download citation

DOI: https://doi.org/10.1007/3-540-44581-1_27
Published: 13 September 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42343-0
Online ISBN: 978-3-540-44581-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics