Abstract
We study the leave-one-out and generalization errors of voting combinations of learning machines. A special case considered is a variant of bagging. We analyze in detail combinations of kernel machines, such as support vector machines, and present theoretical estimates of their leave-one-out error. We also derive novel bounds on the stability of combinations of any classifiers. These bounds can be used to formally show that, for example, bagging increases the stability of unstable learning machines. We report experiments supporting the theoretical findings.
Article PDF
Similar content being viewed by others
References
Boucheron, S., Lugosi, G., & Massart, P. (2000). A sharp concentration inequality with applications. Random Structures and Algorithms, 16, 277–292.
Bousquet, O., & Elisseeff, A. (2002). Stability and generalization. In Journal of Machine Learning Research, 2, 499–526.
Breiman, L. (1996). Bagging predictors. Machine Learning, 26:2, 123–140.
Chapelle, O., & Vapnik, V. (1999). Model selection for support vector machines. In Advances in Neural Information Processing Systems.
Devroye, L., Györfi, L., & Lugosi, G. (1996). A probabilistic theory of pattern recognition. Number 31 in Applications of mathematics. New York: Springer.
Devroye, L., & Wagner, T. J. (1979). Distribution-free performance bounds for potential function rules. IEEE Trans. on Information Theory, 25:5, 601–604.
Evgeniou, T., Pontil, M., & Poggio, T. (2000). Regularization networks and support vector machines. Advances in Computational Mathematics, 13, 1–50.
Friedman, J., Hastie, T., & Tibshirani, R. (1998). Additive logistic regression: A statistical view of boosting. Technical report, Technical Report, Department of Statistics, Stanford University.
Jaakkola, T., & Haussler, D. (1998). Exploiting generative models in discriminative classifiers. In Proc. of Neural Information Processing Conference.
Kearns, M., & Ron, D. (1999). Algorithmic stability and sanity check bounds for leave-one-out cross validation bounds. Neural Computation, 11:6, 1427–1453.
LeCun, Y., Boser, B., Denker, J., Henderson, D., Howard, R. E., Hubbard, W., & Jackel, L. J. (1989). Backpropagation applied to handwritten zip code recognition. Neural Computation, 1:4, 541–551.
Osuna, E., Freund, R., & Girosi, F. (1997). Support vector machines: Training and applications. A.I. Memo 1602, MIT A.I. Lab.
Platt, J. C. (1998). Fast training of support vector machines using sequential minimal optimization. In C. Burges & B. Schölkopf (Eds.), Advances in Kernel Methods—Support Vector Learning. MIT press.
Schölkopf, B., Burges, C., & Smola, A. (1998). Advances in kernel methods—Support vector learning. MIT Press.
Shapire, R., Freund, Y., Bartlett, P., & Lee, W. S. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics, 26:5, 1651–1686.
Vapnik, V. N. (1998). Statistical learning theory. Wiley, New York.
Wahba, G. (1990). Splines models for observational data. Series in Applied Mathematics, (vol. 59), SIAM, Philadelphia.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Evgeniou, T., Pontil, M. & Elisseeff, A. Leave One Out Error, Stability, and Generalization of Voting Combinations of Classifiers. Machine Learning 55, 71–97 (2004). https://doi.org/10.1023/B:MACH.0000019805.88351.60
Issue Date:
DOI: https://doi.org/10.1023/B:MACH.0000019805.88351.60