Soft Margins for AdaBoost

Rätsch, G.; Onoda, T.; Müller, K.-R.

doi:10.1023/A:1007618119488

Soft Margins for AdaBoost

Published: March 2001

Volume 42, pages 287–320, (2001)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Soft Margins for AdaBoost

Download PDF

G. Rätsch¹,
T. Onoda² &
K.-R. Müller^3,4

8251 Accesses
886 Citations
6 Altmetric
1 Mention
Explore all metrics

Abstract

Recently ensemble methods like ADABOOST have been applied successfully in many problems, while seemingly defying the problems of overfitting.

ADABOOST rarely overfits in the low noise regime, however, we show that it clearly does so for higher noise levels. Central to the understanding of this fact is the margin distribution. ADABOOST can be viewed as a constraint gradient descent in an error function with respect to the margin. We find that ADABOOST asymptotically achieves a hard margin distribution, i.e. the algorithm concentrates its resources on a few hard-to-learn patterns that are interestingly very similar to Support Vectors. A hard margin is clearly a sub-optimal strategy in the noisy case, and regularization, in our case a “mistrust” in the data, must be introduced in the algorithm to alleviate the distortions that single difficult patterns (e.g. outliers) can cause to the margin distribution. We propose several regularization methods and generalizations of the original ADABOOST algorithm to achieve a soft margin. In particular we suggest (1) regularized ADABOOST_REG where the gradient decent is done directly with respect to the soft margin and (2) regularized linear and quadratic programming (LP/QP-) ADABOOST, where the soft margin is attained by introducing slack variables.

Extensive simulations demonstrate that the proposed regularized ADABOOST-type algorithms are useful and yield competitive results for noisy data.

References

Bennett, K. (1998). Combining support vector and mathematical programming methods for induction. In B. Schölkopf, C. Burges,& A. Smola (Eds.), Advances in kernel methods—SV learning. Cambridge, MA: MIT Press.
Google Scholar
Bennett, K.& Mangasarian, O. (1992). Robust linear programming discrimination of two linearly inseparable sets. Optimization Methods and Software, 1, 23–34.
Google Scholar
Bertoni, A., Campadelli, P.,& Parodi, M. (1997).Aboosting algorithm for regression. In W. Gerstner, A. Germond, M. Hasler,& J.-D. Nicoud (Eds.), LNCS, Vol. V: Proceedings ICANN'97: Int. Conf. on Artificial Neural Networks (pp. 343–348). Berlin: Springer.
Google Scholar
Bishop, C. (1995). Neural Networks for Pattern Recognition. Oxford: Clarendon Press.
Google Scholar
Boser, B., Guyon, I.,& Vapnik, V. (1992). A training algorithm for optimal margin classifiers. In D. Haussler (Ed.), Proceedings COLT'92: Conference on Computational Learning Theory (pp. 144–152). New York, NY: ACM Press.
Google Scholar
Breiman, L. (1996). Bagging predictors. Mechine Learning, 26(2), 123–140.
Google Scholar
Breiman, L. (1997a). Arcing the edge. Technical Report 486, Statistics Department, University of California.
Breiman, L. (1997b). Prediction games and arcing algorithms. Technical Report 504, Statistics Department, University of California.
Breiman, L. (1998). Arcing classifiers. The Annals of Statistics, 26(3), 801–849.
Google Scholar
Breiman, L. (1999). Using adaptive bagging to debias regressions. Technical Report 547, Statistics Department, University of California.
Cortes, C.& Vapnik, V. (1995). Support vector networks. Machine Learning, 20, 273–297.
Google Scholar
Frean, M.& Downs, T. (1998). A simple cost function for boosting. Technical Report, Department of Computer Science and Electrical Engineering, University of Queensland.
Freund, Y.& Schapire, R. (1994). A decision-theoretic generalization of on-line learning and an application to boosting. In Proceedings EuroCOLT'94: European Conference on Computational Learning Theory. LNCS.
Freund, Y.& Schapire, R. (1996). Game theory, on-line prediction and boosting. In Proceedings COLT'86: Conf. on Comput. Learning Theory (pp. 325–332). New York, NY: ACM Press.
Google Scholar
Friedman, J. (1999). Greedy function approximation. Technical Report, Department of Statistics, Stanford University.
Friedman, J., Hastie, T.,& Tibshirani, R. (1998). Additive logistic regression: A statistical view of boosting. Technical Report, Department of Statistics, Sequoia Hall, Stanford University.
Frieß, T.& Harrison, R. (1998). Perceptrons in kernel feature space. Research Report RR-720, Department of Automatic Control and Systems Engineering, University of Sheffield, Sheffield, UK.
Google Scholar
Grove, A.& Schuurmans, D. (1998). Boosting in the limit: Maximizing the margin of learned ensembles. In Proceedings of the Fifteenth National Conference on Artifical Intelligence.
Kirkpatrick, S. (1984). Optimization by simulated annealing: Quantitative studies. J. Statistical Physics, 34, 975–986.
Google Scholar
LeCun, Y., Jackel, L., Bottou, L., Cortes, C., Denker, J., Drucker, H., Guyon, I., Müller, U., Säckinger, E., Simard, P.,& Vapnik, V. (1995). Learning algorithms for classification: A comparism on handwritten digit recognition. Neural Networks, 261–276.
Mangasarian, O. (1965). Linear and nonlinear separation of patterns by linear programming. Operations Research, 13, 444–452.
Google Scholar
Mason, L., Bartlett, P. L.,& Baxter, J. (2000a). Improved generalization through explicit optimization of margins. Machine Learning 38(3), 243–255.
Google Scholar
Mason, L., Baxter, J., Bartlett, P. L.,& Frean, M. (2000b). Functional gradient techniques for combining hypotheses. In A. J. Smola, P. Bartlett, B. Schölkopf,& C. Schuurmans (Eds.), Advances in Large Margin Classifiers. Cambridge, MA: MIT Press.
Google Scholar
Moody, J.& Darken, C. (1989). Fast learning in networks of locally-tuned processing units. Neural Computation, 1(2), 281–294.
Google Scholar
Müller, K.-R., Smola, A., Rätsch, G., Schölkopf, B., Kohlmorgen, J.,& Vapnik, V. (1998). Using support vector machines for time series prediction. In B. Schölkopf, C. Burges,& A. Smola (Eds.), Advances in Kernel Methods—Support Vector Learning. Cambridge, MA: MIT Press.
Google Scholar
Onoda, T., Rätsch, G.,& Müller, K.-R. (1998). An asymptotic analysis of ADABOOST in the binary classification case. In L. Niklasson, M. Bodén,& T. Ziemke (Eds.), Proceedings ICANN'98: Int. Conf. on Artificial Neural Networks (pp. 195–200).
Onoda, T., Rätsch, G.,& Müller, K.-R. (2000). An asymptotical analysis and improvement of ADABOOST in the binary classification case. Journal of Japanese Society for AI, 15(2), 287–296 (in Japanese).
Google Scholar
Press, W., Flannery, B., Teukolsky, S.,& Vetterling, W. (1992). Numerical Recipes in C (2nd ed.). Cambridge: Cambridge University Press.
Google Scholar
Quinlan, J. (1992). C4.5: Programs for Machine Learning. Los Altos, CA: Morgan Kaufmann.
Google Scholar
Quinlan, J. (1996). Boosting first-order learning. In S. Arikawa& A. Sharma (Eds.), LNAI, Vol. 1160: Proceedings of the 7th International Workshop on Algorithmic Learning Theory (pp. 143–155). Berlin: Springer.
Google Scholar
Rätsch, G. (1998). Ensemble learning methods for classification. Master's Thesis, Department of Computer Science, University of Potsdam, Germany (in German).
Google Scholar
Rätsch, G., Onoda, T.,& Müller, K.-R. (1998). Soft margins for ADABOOST. Technical Report NC-TR-1998-021, Department of Computer Science, Royal Holloway, University of London, Egham, UK.
Google Scholar
Rätsch, G., Onoda, T.,& Müller, K.-R. (1999). Regularizing ADABOOST. In M. Kearns, S. Solla,& D. Cohn (Eds.), Advances in Neural Information Processing Systems 11 (pp. 564–570). Cambridge, MA: MIT Press.
Google Scholar
Rätsch, G., Schölkopf, B., Smola, A., Mika, S., Onoda, T.,& Müller, K.-R. (2000). Robust ensemble learning. In A. Smola, P. Bartlett, B. Schölkopf,& D. Schuurmans (Eds.), Advances in Large Margin Classifiers (pp. 207–219). Cambridge, MA: MIT Press.
Google Scholar
R00E4;tsch, G., Warmuth, M., Mika, S., Onoda, T., Lemm, S.,& Müller, K.-R. (2000). Barrier boosting. In Proceedings COLT'00: Conference on Computational Learning Theory (pp. 170–179). Los Altos, CA: Morgan Kaufmann.
Google Scholar
Rokui, J.& Shimodaira, H. (1998). Improving the generalization performance of the minimum classification error learning and its application to neural networks. In Proc. of the Int. Conf. on Neural Information Processing (ICONIP) (pp. 63–66). Japan, Kitakyushu.
Schapire, R. (1999). Theoretical views of boosting. In Proceedings EuroCOLT'99: European Conference on Computational Learning Theory.
Schapire, R., Freund, Y., Bartlett, P.,& Lee, W. (1997). Boosting the margin:Anewexplanation for the effectiveness of voting methods. In Proceedings ICML'97: International Conference on Machine Learning (pp. 322–330). Los Altos, CA: Morgan Kaufmann.
Google Scholar
Schapire, R.& Singer, Y. (1998). Improved boosting algorithms using confidence-rated predictions. In Proceedings COLT'98: Conference on Computational Learning Theory (pp. 80–91).
Schölkopf, B. (1997). Support Vector Learning. R. Oldenbourg Verlag, Berlin.
Google Scholar
Schölkopf, B., Smola, A.,& Williamson, R. (2000). New support vector algorithms. Neural Computation. also NeuroCOLT TR-31-89, 12:1083–1121.
Google Scholar
Schwenk, H.& Bengio, Y. (1997). AdaBoosting neural networks. In W. Gerstner, A. Germond, M. Hasler,& J.-D. Nicoud (Eds.), Proceedings ICANN'97: Int. Conf. on Artificial Neural Networks, Vol. 1327 of LNCS (pp. 967–972). Berlin: Springer.
Google Scholar
Smola, A. J. (1998). Learning with kernels. Ph.D. Thesis, Technische Universität Berlin.
Smola, A., Schölkopf, B.,& Müller, K.-R. (1998). The connection between regularization operators and support vector kernels. Neural Networks, 11, 637–649.
Google Scholar
Tikhonov, A.& Arsenin, V. (1977). Solutions of Ill-Posed Problems. Washington, D.C.: W.H. Winston.
Google Scholar
Vapnik, V. (1995). The Nature of Statistical Learning Theory. Berlin: Springer.
Google Scholar
Weston, J. (1999). LOO-support vector machines. In Proceedings of IJCNN'99.
Weston, J., Gammerman, A., Stitson, M. O., Vapnik, V., Vovk, V.,& Watkins, C. (1997). Density estimation using SV machines. Technical Report CSD-TR-97-23, Royal Holloway, University of London, Egham, UK.
Google Scholar

Download references

Author information

Authors and Affiliations

GMD FIRST, Kekuléstr. 7, 12489, Berlin, Germany
G. Rätsch
CRIEPI, Komae-shi, 2-11-1, Iwado Kita, Tokyo, Japan
T. Onoda
GMD FIRST, Kekuléstr. 7, 12489, Berlin, Germany
K.-R. Müller
University of Potsdam, Neues Palais 10, 14469, Potsdam, Germany
K.-R. Müller

Authors

G. Rätsch
View author publications
You can also search for this author in PubMed Google Scholar
T. Onoda
View author publications
You can also search for this author in PubMed Google Scholar
K.-R. Müller
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rätsch, G., Onoda, T. & Müller, KR. Soft Margins for AdaBoost. Machine Learning 42, 287–320 (2001). https://doi.org/10.1023/A:1007618119488

Download citation

Issue Date: March 2001
DOI: https://doi.org/10.1023/A:1007618119488

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Soft Margins for AdaBoost

Abstract

Article PDF

Similar content being viewed by others

Supervised Classification Algorithms in Machine Learning: A Survey and Review

Introduction to Machine Learning

Feature selection techniques for machine learning: a survey of more than two decades of research

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Soft Margins for AdaBoost

Abstract

Article PDF

Similar content being viewed by others

Supervised Classification Algorithms in Machine Learning: A Survey and Review

Introduction to Machine Learning

Feature selection techniques for machine learning: a survey of more than two decades of research

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation