Machine Learning

, Volume 78, Issue 3, pp 287–304 | Cite as

Random classification noise defeats all convex potential boosters

  • Philip M. Long
  • Rocco A. Servedio


A broad class of boosting algorithms can be interpreted as performing coordinate-wise gradient descent to minimize some potential function of the margins of a data set. This class includes AdaBoost, LogitBoost, and other widely used and well-studied boosters. In this paper we show that for a broad class of convex potential functions, any such boosting algorithm is highly susceptible to random classification noise. We do this by showing that for any such booster and any nonzero random classification noise rate η, there is a simple data set of examples which is efficiently learnable by such a booster if there is no noise, but which cannot be learned to accuracy better than 1/2 if there is random classification noise at rate η. This holds even if the booster regularizes using early stopping or a bound on the L 1 norm of the voting weights. This negative result is in contrast with known branching program based boosters which do not fall into the convex potential function framework and which can provably learn to high accuracy in the presence of random classification noise.

Boosting Learning theory Noise-tolerant learning Misclassification noise Convex loss Potential boosting 


  1. Bartlett, P. L., & Traskin, M. (2007). Adaboost is consistent. Journal of Machine Learning Research, 8, 2347–2368. MathSciNetGoogle Scholar
  2. Bradley, J., & Schapire, R. (2007). Filterboost: Regression and classification on large datasets. In Proceedings of the twenty-first annual conference on neural information processing systems (NIPS). Google Scholar
  3. Breiman, L. (1997). Arcing the edge (Technical report 486). Department of Statistics. Berkeley: University of California. Google Scholar
  4. Breiman, L. (2004). Some infinity theory for predictor ensembles. Annals of Statistics, 32(1), 1–11. zbMATHCrossRefMathSciNetGoogle Scholar
  5. Dietterich, T.G. (2000). An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Machine Learning, 40(2), 139–158. CrossRefGoogle Scholar
  6. Domingo, C., & Watanabe, O. (2000). MadaBoost: a modified version of AdaBoost. In Proceedings of the thirteenth annual conference on computational learning theory (COLT) (pp. 180–189). Google Scholar
  7. Duffy, N., & Helmbold, D. (1999). Potential boosters? In Advances in neural information processing systems (NIPS) (pp. 258–264). Google Scholar
  8. Duffy, N., & Helmbold, D. (2002). A geometric approach to leveraging weak learners. Theoretical Computer Science, 284, 67–108. zbMATHCrossRefMathSciNetGoogle Scholar
  9. Freund, Y. (1995). Boosting a weak learning algorithm by majority. Information and Computation, 121(2), 256–285. zbMATHCrossRefMathSciNetGoogle Scholar
  10. Freund, Y. (2001). An adaptive version of the boost-by-majority algorithm. Machine Learning, 43(3), 293–318. zbMATHCrossRefGoogle Scholar
  11. Freund, Y., & Schapire, R. (1996). Experiments with a new boosting algorithm. In Proceedings of the thirteenth international conference on machine learning (pp. 148–156). Google Scholar
  12. Freund, Y. & Schapire, R. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119–139. zbMATHCrossRefMathSciNetGoogle Scholar
  13. Friedman, J., Hastie, T., & Tibshirani, R. (1998). Additive logistic regression: a statistical view of boosting. Annals of Statistics, 28(2), 337–407. CrossRefMathSciNetGoogle Scholar
  14. Kalai, A., & Servedio, R. (2005). Boosting in the presence of noise. Journal of Computer & System Sciences, 71(3), 266–290. zbMATHCrossRefMathSciNetGoogle Scholar
  15. Long, P., & Servedio, R. (2005). Martingale boosting. In Proc. 18th annual conference on learning theory (COLT) (pp. 79–94). Google Scholar
  16. Long, P., & Servedio, R. (2008). Adaptive martingale boosting. In Proc. 22nd annual conference on neural information processing systems (NIPS) (pp. 977–984). Google Scholar
  17. Lugosi, G., & Vayatis, N. (2004). On the bayes-risk consistency of regularized boosting methods. Annals of Statistics, 32(1), 30–55. zbMATHMathSciNetGoogle Scholar
  18. Maclin, R., & Opitz, D. (1997). An empirical evaluation of bagging and boosting. In Proceedings of the fourteenth national conference on artificial intelligence and ninth innovative applications of artificial intelligence conference (AAAI/IAAI) (pp. 546–551). Google Scholar
  19. Mannor, S., Meir, R., & Zhang, T. (2003). The consistency of greedy algorithms for classification. Journal of Machine Learning Research, 4, 713–741. CrossRefMathSciNetGoogle Scholar
  20. Margineantu, D.D., & Dietterich, T.G. (1997). Pruning adaptive boosting. In Proc. 14th international conference on machine learning (pp. 211–218). San Mateo: Morgan Kaufmann. Google Scholar
  21. Mason, L., Baxter, J., Bartlett, P., & Frean, M. (1999). Boosting algorithms as gradient descent. In Advances in neural information processing systems (NIPS) (pp. 512–518). Google Scholar
  22. Ratsch, G., Onoda, T., & Müller, K.-R. (2001). Soft margins for AdaBoost. Machine Learning, 42(3), 287–320. CrossRefGoogle Scholar
  23. Schapire, R., & Singer, Y. (1999). Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37, 297–336. zbMATHCrossRefGoogle Scholar
  24. Servedio, R. (2003). Smooth boosting and learning with malicious noise. Journal of Machine Learning Research, 4, 633–648. CrossRefMathSciNetGoogle Scholar
  25. Zhang, T. (2004). Statistical behavior and consistency of classification methods based on convex risk minimization. Annals of Statistics, 32(1), 56–85. zbMATHCrossRefMathSciNetGoogle Scholar
  26. Zhang, T., & Yu, B. (2005). Boosting with early stopping: Convergence and consistency. Annals of Statistics, 33(4), 1538–1579. zbMATHCrossRefMathSciNetGoogle Scholar

Copyright information

© The Author(s) 2009

Authors and Affiliations

  1. 1.GoogleMountain ViewUSA
  2. 2.Computer Science DepartmentColumbia UniversityNew YorkUSA

Personalised recommendations