Generalization Performance of Classifiers in Terms of Observed Covering Numbers

Shawe-Taylor, John; Williamson, Robert C.

doi:10.1007/3-540-49097-3_22

John Shawe-Taylor³ &
Robert C. Williamson⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1572))

Included in the following conference series:

European Conference on Computational Learning Theory

666 Accesses
9 Citations

Abstract

It is known that the covering numbers of a function class on a double sample (length 2m) can be used to bound the generalization performance of a classifier by using a margin based analysis. In this paper we show that one can utilize an analogous argument in terms of the observed covering numbers on a single m-sample (being the actual observed data points). The significance of this is that for certain interesting classes of functions, such as support vector machines, there are new techniques which allow one to find good estimates for such covering numbers in terms of the speed of decay of the eigenvalues of a Gram matrix. These covering numbers can be much less than a priori bounds indicate in situations where the particular data received is “easy”. The work can be considered an extension of previous results which provided generalization performance bounds in terms of the VC-dimension of the class of hypotheses restricted to the sample, with the considerable advantage that the covering numbers can be readily computed, and they often are small.

This work was supported by the Australian Research Council and the European Commission under the Working Group Nr. 27150 (NeuroCOLT2).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

N. Alon, S. Ben-David, N. Cesa-Bianchi and D. Haussler, “Scale-sensitive Dimensions, Uniform Convergence, and Learnability,” Journal of the ACM 44(4), 615–631, (1997).
Article MATH MathSciNet Google Scholar
Peter L. Bartlett, “The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network,” IEEE Trans. Inf. Theory, 44(2), 525–536, (1998).
Article MATH MathSciNet Google Scholar
C. Cortes and V. Vapnik, “Support-Vector Networks,” Machine Learning, 20, 273–297 (1995).
MATH Google Scholar
Michael J. Kearns and Robert E. Schapire, “Efficient Distribution-free Learning of Probabilistic Concepts,” pages 382–391 in Proceedings of the 31st Symposium on the Foundations of Computer Science, IEEE Computer Society Press, Los Alamitos, CA, 1990.
Google Scholar
Gábor Lugosi and Márta Pintér, “A Data-dependent Skeleton Estimate for Learning,” pages 51–56 in Proceedings of the Ninth Annual Workshop on Computational Learning Theory, Association for Computing Machinery, New York, 1996.
Google Scholar
John Shawe-Taylor, Peter Bartlett, Robert Williamson and Martin Anthony, “A Framework for Structural Risk Minimization”, pages 68–76 in Proceedings of the 9th Annual Conference on Computational Learning Theory, Association for Computing Machinery, New York, 1996.
Google Scholar
John Shawe-Taylor, Peter Bartlett, Robert Williamson and Martin Anthony, “Structural Risk Minimization over Data-Dependent Hierarchies”, IEEE Transactions on Information Theory, 44(5), 1926–1940 (1998).
Article MATH MathSciNet Google Scholar
Vladimir N. Vapnik, Estimation of Dependences Based on Empirical Data, Springer-Verlag, New York, 1982.
MATH Google Scholar
Vladimir N. Vapnik and Aleksei Ja. Chervonenkis, “On the Uniform Convergence of Relative Frequencies of Events to their Probabilities,” Theory of Probability and Applications, 16, 264–280 (1971).
Article MATH Google Scholar
Robert C. Williamson, Alex J. Smola and Bernhard Schölkopf, “Entropy Numbers, Operators and Support Vector Kernels,” EuroCOLT’99 (these proceedings). See also “Generalization Performance of Regularization Networks and Support Vector Machines via Entropy Numbers of Compact Operators,” http://spigot.anu.edu.au/~williams/papers/P100.ps submitted to IEEE Transactions on Information Theory, July 1998.

Download references

Author information

Authors and Affiliations

Department of Computer Science, Royal Holloway, University of London, Egham, TW20 0EX, UK
John Shawe-Taylor
Department of Engineering, Australian National University, Canberra, 0200, Australia
Robert C. Williamson

Authors

John Shawe-Taylor
View author publications
You can also search for this author in PubMed Google Scholar
Robert C. Williamson
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Lehrstuhl für Informatik II, Universität Dortmund, D-44221, Dortmund, Germany
Paul Fischer
Fakultät für Mathematik, Ruhr Universität Bochum, D-44780, Bochum, Germany
Hans Ulrich Simon

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shawe-Taylor, J., Williamson, R.C. (1999). Generalization Performance of Classifiers in Terms of Observed Covering Numbers. In: Fischer, P., Simon, H.U. (eds) Computational Learning Theory. EuroCOLT 1999. Lecture Notes in Computer Science(), vol 1572. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-49097-3_22

Download citation

DOI: https://doi.org/10.1007/3-540-49097-3_22
Published: 19 November 1999
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65701-9
Online ISBN: 978-3-540-49097-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics