Skip to main content

The Structure of Version Space

  • Chapter

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 194))

Abstract

We investigate the generalisation performance of consistent classifiers, i.e. classifiers that are contained in the so-called version space, both from a theoretical and experimental angle. In contrast to classical VC analysis—where no single classifier within version space is singled out on grounds of a generalisation error bound—the data dependent structural risk minimisation framework suggests that there exists one particular classifier that is to be preferred because it minimises the generalisation error bound. This is usually taken to provide a theoretical justification for learning algorithms such as the well known support vector machine. A reinterpretation of a recent PAC-Bayesian result, however, reveals that given a suitably chosen hypothesis space there exists a large fraction of classifiers with small generalisation error albeit we cannot identify them for a specific learning task. In the particular case of linear classifiers we show that classifiers found by the classical p erceptron algorithm have guarantees bounded by the size of version space. These results are complemented with an empirical study for kernel classifiers on the task of handwritten digit recognition which demonstrates that even classifiers with a small margin may exhibit excellent generalisation. In order to perform this analysis we introduce the kernel Gibbs sampler—an algorithm which can be used to sample consistent kernel classifiers.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • O. Bousquet and A. Elisseeff. Algorithmic stability and generalization performance. In T. K. Leen, T. G. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems 13, pages 196–202. MIT Press, 2001.

    Google Scholar 

  • C. Cortes and V. Vapnik. Support vector networks. Machine Learning, 20:273–297, 1995.

    MATH  Google Scholar 

  • S. Floyd and M. Warmuth. Sample compression, learnability, and the Vapnik Chervonenkis dimension. Machine Learning, 27:1–36, 1995.

    Google Scholar 

  • Y. Freund. An adaptive version of the boost by majority algorithm. In Proceedings of the Annual Conference on Computational Learning Theory, 1999.

    Google Scholar 

  • Y. Gat. A learning generalization bound with an application to sparse-representation classifiers. Machine Learning, 42(3):233–240, 2001.

    Article  MATH  Google Scholar 

  • T. Graepel and R. Herbrich. The kernel Gibbs sampler. In T. K. Leen, T. G. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems 13, pages 514–520, Cambridge, MA, 2001. MIT Press.

    Google Scholar 

  • T. Graepel, R. Herbrich, and K. Obermayer. Bayesian Transduction. In S. A. Solla, T. K. Leen, and K.-R. Müller, editors, Advances in Neural Information Processing Systems 12, pages 456–462, Cambridge, MA, 2000a. MIT Press.

    Google Scholar 

  • T. Graepel, R. Herbrich, and J. Shawe-Taylor. Generalisation error bounds for sparse linear classifiers. In Proceedings of the Thirteenth Annual Conference on Computational Learning Theory, pages 298–303, 2000b.

    Google Scholar 

  • T. Graepel, R. Herbrich, and R. C. Williamson. From margin to sparsity. In T. K. Leen, T. G. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems 13, pages 210–216, Cambridge, MA, 2001. MIT Press.

    Google Scholar 

  • R. Herbrich and T. Graepel. A PAC-Bayesian margin bound for linear classifiers. IEEE Transactions on Information Theory, 2002.

    Google Scholar 

  • R. Herbrich, T. Graepel, and C. Campbell. Bayes point machines. Journal of Machine Learning Research, 1:245–279, 2001.

    Article  MathSciNet  MATH  Google Scholar 

  • R. Herbrich and R. C. Williamson. Algorithmic luckiness. Journal of Machine Learning Research, 3:175–212, 2002.

    Article  MathSciNet  Google Scholar 

  • M. J. Kearns and R. E. Schapire. Efficient distribution-free learning of probabilistic concepts. Journal of Computer and System Sciences, 48(3):464–497, 1994.

    Article  MathSciNet  MATH  Google Scholar 

  • S. S. Keerthi, S. K. Shevade, C. Bhattacharyya, and K. R. K. Murthy. A fast iterative nearest point algorithm for support vector machine classifier design. Technical Report Technical Report TR-ISL-99-03, Indian Institute of Science, Bangalore, 1999.

    Google Scholar 

  • G. S. Kimeldorf and G. Wahba. A correspondence between Bayesian estimation on stochastic processes and smoothing by splines. Annals of Mathematical Statistics, 41:495–502, 1970.

    MathSciNet  MATH  Google Scholar 

  • N. Littlestone and M. Warmuth. Relating data compression and learnability. Technical report, University of California Santa Cruz, 1986.

    Google Scholar 

  • L. Lovasz. Hit-And-Run mixes fast. Mathematical Programming A, 86:443–461, 1999.

    Article  MATH  MathSciNet  Google Scholar 

  • D. J. C. MacKay. The evidence framework applied to classification networks. Neural Computation, 4(5):720–736, 1992.

    Google Scholar 

  • D. A. McAllester. Some PAC Bayesian theorems. In Proceedings of the Annual Conference on Computational Learning Theory, pages 230–234, Madison, Wisconsin, 1998. ACM Press.

    Google Scholar 

  • S. Mika, G. Rätsch, J. Weston, B. Schölkopf, and K.-R. Müller. Fisher discriminant analysis with kernels. In Y.-H. Hu, J. Larsen, E. Wilson, and S. Douglas, editors, Neural Networks for Signal Processing IX, pages 41–48. IEEE, 1999.

    Google Scholar 

  • T. M. Mitchell. Generalization as search. Artificial Intelligence, 18(2):202–226, 1982.

    Article  Google Scholar 

  • A. B. J. Novikoff. On convergence proofs on perceptrons. In Proceedings of the Symposium on the Mathematical Theory of Automata, volume 12, pages 615–622. Polytechnic Institute of Brooklyn, 1962.

    Google Scholar 

  • J. Platt. Fast training of support vector machines using sequential minimal optimization. In B. Schölkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods—Support Vector Learning, pages 185–208, Cambridge, MA, 1999. MIT Press.

    Google Scholar 

  • F. Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65(6):386–408, 1958.

    MathSciNet  Google Scholar 

  • B. Schölkopf, R. Herbrich, and A. Smola. A generalized representer theorem. In Proceedings of the Annual Conference on Computational Learning Theory, pages 416–426, 2001.

    Google Scholar 

  • J. Shawe-Taylor, P. L. Bartlett, R. C. Williamson, and M. Anthony. Structural risk minimization over data-dependent hierarchies. IEEE Transactions on Information Theory, 44(5):1926–1940, 1998.

    Article  MathSciNet  MATH  Google Scholar 

  • R. L. Smith. Efficient Monte-Carlo procedures for generating points uniformly distributed over bounded regions. Operations Research, 32:1296–1308, 1984.

    Article  MATH  MathSciNet  Google Scholar 

  • A. J. Smola and B. Schölkopf. Sparse greedy matrix approximation for machine learning. In P. Langley, editor, Proceedings of the International Conference on Machine Learning, pages 911–918, San Francisco, 2000. Morgan Kaufmann Publishers.

    Google Scholar 

  • V. Vapnik. The Nature of Statistical Learning Theory. Springer, New York, 1995. ISBN 0-387-94559-8.

    MATH  Google Scholar 

  • V. N. Vapnik. Estimation of Dependences Based on Empirical Data. Springer, Berlin, 1982.

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Herbrich, R., Graepel, T., Williamson, R.C. (2006). The Structure of Version Space. In: Holmes, D.E., Jain, L.C. (eds) Innovations in Machine Learning. Studies in Fuzziness and Soft Computing, vol 194. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-33486-6_9

Download citation

  • DOI: https://doi.org/10.1007/3-540-33486-6_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-30609-2

  • Online ISBN: 978-3-540-33486-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics