Improved Boosting Algorithms Using Confidencerated Predictions
 Robert E. Schapire,
 Yoram Singer
 … show all 2 hide
Abstract
We describe several improvements to Freund and Schapire's AdaBoost boosting algorithm, particularly in a setting in which hypotheses may assign confidences to each of their predictions. We give a simplified analysis of AdaBoost in this setting, and we show how this analysis can be used to find improved parameter settings as well as a refined criterion for training weak hypotheses. We give a specific method for assigning confidences to the predictions of decision trees, a method closely related to one used by Quinlan. This method also suggests a technique for growing decision trees which turns out to be identical to one proposed by Kearns and Mansour. We focus next on how to apply the new boosting algorithms to multiclass classification problems, particularly to the multilabel case in which each example may belong to more than one class. We give two boosting methods for this problem, plus a third method based on output coding. One of these leads to a new method for handling the singlelabel case which is simpler but as effective as techniques suggested by Freund and Schapire. Finally, we give some experimental results comparing a few of the algorithms discussed in this paper.
 Bartlett, P.L. (1998). The sample complexity of pattern classification with neural networks: The size of the weights is more important than the size of the network. IEEE Transactions on Information Theory, 44(2), 525–536.
 Bauer, E., & Kohavi, R. An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning, 36(1/2): 105–139, 1999.
 Baum, E.B., & Haussler, D. (1989). What size net gives valid generalization? Neural Computation, 1(1), 151–160.
 Blum, A. (1997). Empirical support for winnow and weightedmajority based algorithms: results on a calendar scheduling domain. Machine Learning, 26, 5–23.
 Breiman, L. (1998). Arcing classifiers. The Annals of Statistics, 26(3), 801–849.
 Csiszár, I., & Tusnády, G. (1984). Information geometry and alternaning minimization procedures. Statistics and Decisions, Supplement Issue, 1, 205–237.
 Dietterich, T.G. An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning, to appear.
 Dietterich, T.G., & Bakiri, G. (1995). Solving multiclass learning problems via errorcorrecting output codes. Journal of Artificial Intelligence Research, 2, 263–286.
 Drucker, H., & Cortes, C. (1996). Boosting decision trees. In Advances in Neural Information Processing Systems, 8, MIT Press.
 Fletcher, R. (1987). Practical Methods of Optimization (second edition), John Wiley.
 Freund, Y., Iyer, R., Schapire, R.E., & Singer, Y. (1998). An efficient boosting algorithm for combining preferences. Machine Learning: Proceedings of the Fifteenth International Conference.
 Freund, Y., & Schapire, R.E. (1996). Experiments with a new boosting algorithm. Machine Learning: Proceedings of the Thirteenth International Conference (pp. 148–156).
 Freund, Y., & Schapire, R.E. (1997). A decisiontheoretic generalization of online learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119–139.
 Freund, Y., Schapire, R.E., Singer, Y., & Warmuth, M.K. (1997). Using and combining predictors that specialize. Proceedings of the TwentyNinth Annual ACM Symposium on the Theory of Computing (pp. 334–343).
 Friedman, J., Hastie, T., & Tibshirani, R. (1998). Additive logistic regression: A statistical view of boosting Technical Report.
 Haussler, D. (1992). Decision theoretic generalizations of the PAC model for neural net and other learning applications. Information and Computation, 100(1), 78–150.
 Haussler, D., & Long, P.M. (1995). A generalization of Sauer's lemma. Journal of Combinatorial Theory, Series A, 71(2), 219–240.
 Kearns, M., & Mansour, Y. (1996). On the boosting ability of topdown decision tree learning algorithms. Proceedings of the TwentyEighth Annual ACM Symposium on the Theory of Computing.
 Maclin, R., & Opitz, D. (1997). An empirical evaluation of bagging and boosting. Proceedings of the Fourteenth National Conference on Artificial Intelligence (pp. 546–551).
 Margineantu, D.D., & Dietterich, T.G. (1997). Pruning adaptive boosting. Machine Learning: Proceedings of the Fourteenth International Conference (pp. 211–218).
 Merz, C.J., & Murphy, P.M. (1998). UCI repository of machine learning databases. http://www.ics.uci.edu/~mlearn/MLRepository.html.
 Quinlan, J.R. (1996). Bagging, boosting, and C4.5. Proceedings of the Thirteenth National Conference on Artificial Intelligence (pp. 725–730).
 Schapire, R.E. (1997). Using output codes to boost multiclass learning problems. Machine Learning: Proceedings of the Fourteenth International Conference (pp. 313–321).
 Schapire, R.E., Freund, Y., Bartlett, P., & Lee, W.S. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics, 26(5), 1651–1686.
 Schapire, R.E., & Singer, Y. BoosTexter: A boostingbased system for text categorization. Machine Learning, to appear.
 Schwenk, H., & Bengio, Y. (1998). Training methods for adaptive boosting of neural networks. In Advances in Neural Information Processing Systems 10. MIT Press.
 Title
 Improved Boosting Algorithms Using Confidencerated Predictions
 Journal

Machine Learning
Volume 37, Issue 3 , pp 297336
 Cover Date
 19991201
 DOI
 10.1023/A:1007614523901
 Print ISSN
 08856125
 Online ISSN
 15730565
 Publisher
 Kluwer Academic Publishers
 Additional Links
 Topics
 Keywords

 boosting algorithms
 multiclass classification
 output coding
 decision trees
 Industry Sectors
 Authors

 Robert E. Schapire ^{(1)}
 Yoram Singer ^{(1)}
 Author Affiliations

 1. Shannon Laboratory, AT&T Labs, 180 Park Avenue, Room A279, Florham Park, NJ, 079320971, USA