Machine Learning

, Volume 55, Issue 1, pp 71–97 | Cite as

Leave One Out Error, Stability, and Generalization of Voting Combinations of Classifiers

  • Theodoros Evgeniou
  • Massimiliano Pontil
  • André Elisseeff


We study the leave-one-out and generalization errors of voting combinations of learning machines. A special case considered is a variant of bagging. We analyze in detail combinations of kernel machines, such as support vector machines, and present theoretical estimates of their leave-one-out error. We also derive novel bounds on the stability of combinations of any classifiers. These bounds can be used to formally show that, for example, bagging increases the stability of unstable learning machines. We report experiments supporting the theoretical findings.

cross-validation bagging combinations of machines stability 


  1. Boucheron, S., Lugosi, G., & Massart, P. (2000). A sharp concentration inequality with applications. Random Structures and Algorithms, 16, 277–292.Google Scholar
  2. Bousquet, O., & Elisseeff, A. (2002). Stability and generalization. In Journal of Machine Learning Research, 2, 499–526.Google Scholar
  3. Breiman, L. (1996). Bagging predictors. Machine Learning, 26:2, 123–140.Google Scholar
  4. Chapelle, O., & Vapnik, V. (1999). Model selection for support vector machines. In Advances in Neural Information Processing Systems.Google Scholar
  5. Devroye, L., Györfi, L., & Lugosi, G. (1996). A probabilistic theory of pattern recognition. Number 31 in Applications of mathematics. New York: Springer.Google Scholar
  6. Devroye, L., & Wagner, T. J. (1979). Distribution-free performance bounds for potential function rules. IEEE Trans. on Information Theory, 25:5, 601–604.Google Scholar
  7. Evgeniou, T., Pontil, M., & Poggio, T. (2000). Regularization networks and support vector machines. Advances in Computational Mathematics, 13, 1–50.Google Scholar
  8. Friedman, J., Hastie, T., & Tibshirani, R. (1998). Additive logistic regression: A statistical view of boosting. Technical report, Technical Report, Department of Statistics, Stanford University.Google Scholar
  9. Jaakkola, T., & Haussler, D. (1998). Exploiting generative models in discriminative classifiers. In Proc. of Neural Information Processing Conference.Google Scholar
  10. Kearns, M., & Ron, D. (1999). Algorithmic stability and sanity check bounds for leave-one-out cross validation bounds. Neural Computation, 11:6, 1427–1453.Google Scholar
  11. LeCun, Y., Boser, B., Denker, J., Henderson, D., Howard, R. E., Hubbard, W., & Jackel, L. J. (1989). Backpropagation applied to handwritten zip code recognition. Neural Computation, 1:4, 541–551.Google Scholar
  12. Osuna, E., Freund, R., & Girosi, F. (1997). Support vector machines: Training and applications. A.I. Memo 1602, MIT A.I. Lab.Google Scholar
  13. Platt, J. C. (1998). Fast training of support vector machines using sequential minimal optimization. In C. Burges & B. Schölkopf (Eds.), Advances in Kernel Methods—Support Vector Learning. MIT press.Google Scholar
  14. Schölkopf, B., Burges, C., & Smola, A. (1998). Advances in kernel methods—Support vector learning. MIT Press.Google Scholar
  15. Shapire, R., Freund, Y., Bartlett, P., & Lee, W. S. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics, 26:5, 1651–1686.Google Scholar
  16. Vapnik, V. N. (1998). Statistical learning theory. Wiley, New York.Google Scholar
  17. Wahba, G. (1990). Splines models for observational data. Series in Applied Mathematics, (vol. 59), SIAM, Philadelphia.Google Scholar

Copyright information

© Kluwer Academic Publishers 2004

Authors and Affiliations

  • Theodoros Evgeniou
    • 1
  • Massimiliano Pontil
    • 2
  • André Elisseeff
    • 3
  1. 1.Technology Management, INSEAD, Boulevard de ConstanceFontainebleauFrance
  2. 2.DII, University of SienaSienaItaly
  3. 3.Max Planck Institute for Biological CyberneticsTübingenGermany

Personalised recommendations