When Efficient Model Averaging Out-Performs Boosting and Bagging

  • Ian Davidson
  • Wei Fan
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4213)


The Bayes optimal classifier (BOC) is an ensemble technique used extensively in the statistics literature. However, compared to other ensemble techniques such as bagging and boosting, BOC is less known and rarely used in data mining. This is partly due to BOC being perceived as being inefficient and because bagging and boosting consistently outperforms a single model, which raises the question: “Do we even need BOC in datamining?”. We show that the answer to this question is “yes” by illustrating several recent efficient model averaging approximations to BOC can significantly outperform bagging and boosting in realistic situations such as extensive class label noise, sample selection bias and many-class problems. That model averaging techniques outperform bagging and boosting in these situations has not been published in the machine learning, mining or statistical communities to our knowledge.


Class Label Model Average Model Uncertainty Sample Selection Bias Bayesian Model Average 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Buntine, W.: A Theory of Learning Classification Rules, Ph.D. Thesis, UTS (1990)Google Scholar
  2. 2.
    Davidson, I.: An Ensemble Technique for Stable Learners with Performance Bounds. In: AAAI 2004 (2004)Google Scholar
  3. 3.
    Dietterich, T.G.: An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning 40(2) (2000)Google Scholar
  4. 4.
    Domingos, P.: Why Does Bagging Work? A Bayesian Account and its Implications. In: KDD 1997 (1997)Google Scholar
  5. 5.
    Domingos, P.: Bayesian Averaging of Classifiers and the Overfitting Problem. In: AAAI 2000 (2000)Google Scholar
  6. 6.
    Efron, B.: The jackknife, the bootstrap, and other resampling plans. SIAM Monograph 38 (1982)Google Scholar
  7. 7.
    Fan, W., Davidson, I., Zadrozny, B., Yu, P.: An Improved Categorization of Classifier’s Sensitivity on Sample Selection Bias. In: ICDM 2005 (2005)Google Scholar
  8. 8.
    Fan, W., Wang, H., Yu, P.S., Ma, S.: Is random model better? On its accuracy and Efficiency. In: ICDM 2003 (2003)Google Scholar
  9. 9.
    Liu, F.T., Ting, K.M., Fan, W.: Maximizing Tree Diversity by Building Complete-Random Decision Trees. In: Ho, T.-B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518. Springer, Heidelberg (2005)Google Scholar
  10. 10.
    Frank, Eibe: Personal Communication (2004)Google Scholar
  11. 11.
    Hoeting, J., Madigan, D., Raftery, A., Volinsky, C.: Bayesian Model Averaging: A Tutorial. Statistical Science 14 (1999)Google Scholar
  12. 12.
    Kohavi, R., Wolpert, D.: Bias Plus Variance Decomposition for 0-1 Loss Functions. In: ICML 1996 (1996)Google Scholar
  13. 13.
    McCallum, A.: Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering (1996),
  14. 14.
    Mitchell, T.: Machine Learning. McGraw-Hill, New York (1997)MATHGoogle Scholar
  15. 15.
    Minka, T.P.: Bayesian model averaging is not model combination, MIT Media Lab note (7/6/2000),
  16. 16.
    Rennie, J.: 20 Newsgroups. Technical Report, Dept C.S., MIT (2003)Google Scholar
  17. 17.
    Zadrozny, B.: Learning and evaluating classifiers under sample selection bias. In: ICML 2004 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Ian Davidson
    • 1
  • Wei Fan
    • 2
  1. 1.State University of New YorkAlbany
  2. 2.IBM T.J. WatsonUSA

Personalised recommendations