Machine Learning

, Volume 37, Issue 3, pp 297–336 | Cite as

Improved Boosting Algorithms Using Confidence-rated Predictions

  • Robert E. Schapire
  • Yoram Singer

Abstract

We describe several improvements to Freund and Schapire's AdaBoost boosting algorithm, particularly in a setting in which hypotheses may assign confidences to each of their predictions. We give a simplified analysis of AdaBoost in this setting, and we show how this analysis can be used to find improved parameter settings as well as a refined criterion for training weak hypotheses. We give a specific method for assigning confidences to the predictions of decision trees, a method closely related to one used by Quinlan. This method also suggests a technique for growing decision trees which turns out to be identical to one proposed by Kearns and Mansour. We focus next on how to apply the new boosting algorithms to multiclass classification problems, particularly to the multi-label case in which each example may belong to more than one class. We give two boosting methods for this problem, plus a third method based on output coding. One of these leads to a new method for handling the single-label case which is simpler but as effective as techniques suggested by Freund and Schapire. Finally, we give some experimental results comparing a few of the algorithms discussed in this paper.

boosting algorithms multiclass classification output coding decision trees 

References

  1. Bartlett, P.L. (1998). The sample complexity of pattern classification with neural networks: The size of the weights is more important than the size of the network. IEEE Transactions on Information Theory, 44(2), 525–536.Google Scholar
  2. Bauer, E., & Kohavi, R. An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning, 36(1/2): 105–139, 1999.Google Scholar
  3. Baum, E.B., & Haussler, D. (1989). What size net gives valid generalization? Neural Computation, 1(1), 151–160.Google Scholar
  4. Blum, A. (1997). Empirical support for winnow and weighted-majority based algorithms: results on a calendar scheduling domain. Machine Learning, 26, 5–23.Google Scholar
  5. Breiman, L. (1998). Arcing classifiers. The Annals of Statistics, 26(3), 801–849.Google Scholar
  6. Csiszár, I., & Tusnády, G. (1984). Information geometry and alternaning minimization procedures. Statistics and Decisions, Supplement Issue, 1, 205–237.Google Scholar
  7. Dietterich, T.G. An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning, to appear.Google Scholar
  8. Dietterich, T.G., & Bakiri, G. (1995). Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research, 2, 263–286.Google Scholar
  9. Drucker, H., & Cortes, C. (1996). Boosting decision trees. In Advances in Neural Information Processing Systems, 8, MIT Press.Google Scholar
  10. Fletcher, R. (1987). Practical Methods of Optimization (second edition), John Wiley.Google Scholar
  11. Freund, Y., Iyer, R., Schapire, R.E., & Singer, Y. (1998). An efficient boosting algorithm for combining preferences. Machine Learning: Proceedings of the Fifteenth International Conference.Google Scholar
  12. Freund, Y., & Schapire, R.E. (1996). Experiments with a new boosting algorithm. Machine Learning: Proceedings of the Thirteenth International Conference (pp. 148–156).Google Scholar
  13. Freund, Y., & Schapire, R.E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119–139.Google Scholar
  14. Freund, Y., Schapire, R.E., Singer, Y., & Warmuth, M.K. (1997). Using and combining predictors that specialize. Proceedings of the Twenty-Ninth Annual ACM Symposium on the Theory of Computing (pp. 334–343).Google Scholar
  15. Friedman, J., Hastie, T., & Tibshirani, R. (1998). Additive logistic regression: A statistical view of boosting Technical Report.Google Scholar
  16. Haussler, D. (1992). Decision theoretic generalizations of the PAC model for neural net and other learning applications. Information and Computation, 100(1), 78–150.Google Scholar
  17. Haussler, D., & Long, P.M. (1995). A generalization of Sauer's lemma. Journal of Combinatorial Theory, Series A, 71(2), 219–240.Google Scholar
  18. Kearns, M., & Mansour, Y. (1996). On the boosting ability of top-down decision tree learning algorithms. Proceedings of the Twenty-Eighth Annual ACM Symposium on the Theory of Computing.Google Scholar
  19. Maclin, R., & Opitz, D. (1997). An empirical evaluation of bagging and boosting. Proceedings of the Fourteenth National Conference on Artificial Intelligence (pp. 546–551).Google Scholar
  20. Margineantu, D.D., & Dietterich, T.G. (1997). Pruning adaptive boosting. Machine Learning: Proceedings of the Fourteenth International Conference (pp. 211–218).Google Scholar
  21. Merz, C.J., & Murphy, P.M. (1998). UCI repository of machine learning databases. http://www.ics.uci.edu/~mlearn/MLRepository.html.Google Scholar
  22. Quinlan, J.R. (1996). Bagging, boosting, and C4.5. Proceedings of the Thirteenth National Conference on Artificial Intelligence (pp. 725–730).Google Scholar
  23. Schapire, R.E. (1997). Using output codes to boost multiclass learning problems. Machine Learning: Proceedings of the Fourteenth International Conference (pp. 313–321).Google Scholar
  24. Schapire, R.E., Freund, Y., Bartlett, P., & Lee, W.S. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics, 26(5), 1651–1686.Google Scholar
  25. Schapire, R.E., & Singer, Y. BoosTexter: A boosting-based system for text categorization. Machine Learning, to appear.Google Scholar
  26. Schwenk, H., & Bengio, Y. (1998). Training methods for adaptive boosting of neural networks. In Advances in Neural Information Processing Systems 10. MIT Press.Google Scholar

Copyright information

© Kluwer Academic Publishers 1999

Authors and Affiliations

  • Robert E. Schapire
    • 1
  • Yoram Singer
    • 1
  1. 1.Shannon LaboratoryAT&T LabsFlorham ParkUSA

Personalised recommendations