, Volume 36, Issue 1, pp 105139
First online:
An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants
 Eric BauerAffiliated withComputer Science Department, Stanford University
 , Ron KohaviAffiliated withBlue Martini Software
Abstract
Methods for voting classification algorithms, such as Bagging and AdaBoost, have been shown to be very successful in improving the accuracy of certain classifiers for artificial and realworld datasets. We review these algorithms and describe a large empirical study comparing several variants in conjunction with a decision tree inducer (three variants) and a NaiveBayes inducer. The purpose of the study is to improve our understanding of why and when these algorithms, which use perturbation, reweighting, and combination techniques, affect classification error. We provide a bias and variance decomposition of the error to show how different methods and variants influence these two terms. This allowed us to determine that Bagging reduced variance of unstable methods, while boosting methods (AdaBoost and Arcx4) reduced both the bias and variance of unstable methods but increased the variance for NaiveBayes, which was very stable. We observed that Arcx4 behaves differently than AdaBoost if reweighting is used instead of resampling, indicating a fundamental difference. Voting variants, some of which are introduced in this paper, include: pruning versus no pruning, use of probabilistic estimates, weight perturbations (Wagging), and backfitting of data. We found that Bagging improves when probabilistic estimates in conjunction with nopruning are used, as well as when the data was backfit. We measure tree sizes and show an interesting positive correlation between the increase in the average tree size in AdaBoost trials and its success in reducing the error. We compare the meansquared error of voting methods to nonvoting methods and show that the voting methods lead to large and significant reductions in the meansquared errors. Practical problems that arise in implementing boosting algorithms are explored, including numerical instabilities and underflows. We use scatterplots that graphically show how AdaBoost reweights instances, emphasizing not only “hard” areas but also outliers and noise.
 Title
 An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants
 Journal

Machine Learning
Volume 36, Issue 12 , pp 105139
 Cover Date
 199907
 DOI
 10.1023/A:1007515423169
 Print ISSN
 08856125
 Online ISSN
 15730565
 Publisher
 Kluwer Academic Publishers
 Additional Links
 Topics
 Keywords

 classification
 boosting
 Bagging
 decision trees
 NaiveBayes
 meansquared error
 Industry Sectors
 Authors

 Eric Bauer ^{(1)}
 Ron Kohavi ^{(2)}
 Author Affiliations

 1. Computer Science Department, Stanford University, Stanford, CA, 94305
 2. Blue Martini Software, 2600 Campus Dr. Suite 175, San Matis, CA, 94403