Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu

Bagging

  • Wei Fan
  • Kun Zhang
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_567

Synonyms

Bootstrap aggregating

Definition

Bagging (Bootstrap Aggregating) uses “majority voting” to combine the output of different inductive models, constructed from bootstrap samples of the same training set. A bootstrap has the same size as the training data, and is uniformly sampled from the original training set with replacement. That is, after an example is selected from the training set, it is still kept in the training set for subsequent sampling and the same example could be selected multiple times into the same bootstrap sample. When the training set is sufficiently large, on average, a bootstrap sample has 63.2 % unique examples from the original training set, and the rest are duplicates. In order to make full use of bagging, typically, one need to generate at least 50 bootstrap samples and construct 50 classifiers using these samples. During prediction, the class label receiving the most votes or most predictions from the base level 50 classifiers will be the final...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Amit Y, Geman D. Shape quantization and recognition with randomized trees. Neural Comput. 1997;9(7):1545–88.CrossRefGoogle Scholar
  2. 2.
    Bradford JP, Kunz C, Kohavi R, Brunk C, Brodley CE. Pruning decision trees with misclassification costs. In: Proceedings of the European Conference on Machine Learning; 1998. p. 131–6.Google Scholar
  3. 3.
    Breiman L. Bagging predictors. Mach Learn. 1996;24(2):123–40.zbMATHGoogle Scholar
  4. 4.
    Buntine W. Learning classification trees. In: Hand DJ, editor. Artificial intelligence frontiers in statistics. London: Chapman & Hall; 1993. p. 182–201.CrossRefGoogle Scholar
  5. 5.
    Domingos P. Occam’s two razors: the sharp and the blunt. In: Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining; 1998.Google Scholar
  6. 6.
    Fan W, Wang H, Yu PS, Ma S. Is random model better? On its accuracy and efficiency. In: Proceedings of the 19th International Conference on Data Engineering; 2003.Google Scholar
  7. 7.
    Fan W, Greengrass E, McCloskey J, Yu PS, Drummey K. Effective estimation of posterior probabilities: explaining the accuracy of randomized decision tree approaches. In: Proceedings of the IEEE International Conference on Data Mining; 2005. p. 154–61.Google Scholar
  8. 8.
    Freund Y, Schapire R. A decision-theoretic generalization of on-line learning and an application to boosting. Comput Syst Sci. 1997;55(1):119–39.MathSciNetzbMATHCrossRefGoogle Scholar
  9. 9.
    Gehrke J, Ganti V, Ramakrishnan R, Loh W-Y. BOAT-optimistic decision tree construction. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 1999.Google Scholar
  10. 10.
    Kearns M, Mansour Y. On the boosting ability of top-down decision tree learning algorithms. In: Proceedings of the Annual ACM Symposium on the Theory of Computing; 1996. p. 459–68.Google Scholar
  11. 11.
    Mehta M, Rissanen J, Agrawal R. MDL-based decision tree pruning. In: Proceedings of the 1st International Conference on Knowledge Discovery and Data Mining; 1995. p. 216–21.Google Scholar
  12. 12.
    Quinlan R. C4.5: programs for machine learning. Los Altos: Morgan Kaufmann; 1993.Google Scholar
  13. 13.
    Shawe-Taylor J, Cristianini N. Data-dependent structural risk minimisation for perceptron decision trees. In: Jordan M, Kearns M, Solla S, editors. Advances in neural information processing systems 10. Cambridge, MA: MIT Press; 1998. p. 336–42.Google Scholar
  14. 14.
    Zhang K, Fan W. Forecasting skewed biased stochastic ozone days: analyses, solutions and beyond. Knowl Inf Syst. 2008;14(3):299–326.CrossRefGoogle Scholar
  15. 15.
    Zhang K, Xu Z, Peng J, Buckles BP. Learning through changes: an empirical study of dynamic behaviors of probability estimation trees. In: Proceedings of the IEEE International Conference on Data Mining; 2005. p. 817–20.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.IBM T.J. Watson ResearchHawthorneUSA
  2. 2.Xavier University of LouisianaNew OrleansUSA

Section editors and affiliations

  • Kyuseok Shim
    • 1
  1. 1.School of Elec. Eng. and Computer ScienceSeoul National Univ.SeoulRepublic of Korea