Skip to main content

The Boosting Approach to Machine Learning: An Overview

  • Chapter
Nonlinear Estimation and Classification

Part of the book series: Lecture Notes in Statistics ((LNS,volume 171))

Summary

Boosting is a general method for improving the accuracy of any given learning algorithm. Focusing primarily on the AdaBoost algorithm, this chapter overviews some of the recent work on boosting including analyses of AdaBoost’s training error and generalization error; boosting’s connection to game theory and linear programming; the relationship between boosting and logistic regression; extensions of AdaBoost for multiclass classification problems; methods of incorporating human knowledge into boosting; and experimental and applied work using boosting.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Steven Abney, Robert E. Schapire, and Yoram Singer. Boosting applied to tagging and PP attachment. In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, 1999.

    Google Scholar 

  2. Erin L. Allwein, Robert E. Schapire, and Yoram Singer. Reducing multiclass to binary: A unifying approach for margin classifiers. Journal of Machine Learning Research, 1:113–141, 2000.

    MathSciNet  Google Scholar 

  3. Peter L. Bartlett. The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network. IEEE Transactions on Information Theory, 44(2):525–536, March 1998.

    Article  MathSciNet  MATH  Google Scholar 

  4. Eric Bauer and Ron Kohavi. An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning, 36(1/2):105–139, 1999.

    Article  Google Scholar 

  5. Eric B. Baum and David Haussler. What size net gives valid generalization? Neural Computation, 1(1):151–160, 1989.

    Article  Google Scholar 

  6. Anselm Blumer, Andrzej Ehrenfeucht, David Haussler, and Manfred K. Warmuth. Learnability and the Vapnik-Chervonenkis dimension. Journal of the Association for Computing Machinery, 36(4):929–965, October 1989.

    Article  MathSciNet  MATH  Google Scholar 

  7. Bernhard E. Boser, Isabelle M. Guyon, and Vladimir N. Vapnik. A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, pages 144-152, 1992.

    Google Scholar 

  8. Leo Breiman. Arcing classifiers. The Annals of Statistics, 26(3):801–849, 1998.

    Article  MathSciNet  MATH  Google Scholar 

  9. Leo Breiman. Prediction games and arcing classifiers. Neural Computation, 11(7): 1493–1517, 1999.

    Article  Google Scholar 

  10. William Cohen. Fast effective rule induction. In Proceedings of the Twelfth International Conference on Machine Learning, pages 115-123, 1995.

    Google Scholar 

  11. William W. Cohen and Yoram Singer. A simple, fast, and effective rule learner. In Proceedings of the Sixteenth National Conference on Artificial Intelligence, 1999.

    Google Scholar 

  12. Michael Collins. Discriminative reranking for natural language parsing. In Proceedings of the Seventeenth International Conference on Machine Learning, 2000.

    Google Scholar 

  13. Michael Collins, Robert E. Schapire, and Yoram Singer. Logistic regression, AdaBoost and Bregman distances. Machine Learning, to appear.

    Google Scholar 

  14. Corinna Cortes and Vladimir Vapnik. Support-vector networks. Machine Learning, 20(3):273–297, September 1995.

    MATH  Google Scholar 

  15. J. N. Darroch and D. Ratcliff. Generalized iterative scaling for log-linear models. The Annals of Mathematical Statistics, 43(5):1470–1480, 1972.

    Article  MathSciNet  MATH  Google Scholar 

  16. Stephen Delia Pietra, Vincent Delia Pietra, and John Lafferty. Inducing features of random fields. IEEE Transactions Pattern Analysis and Machine Intelligence, 19(4):1–13, April 1997.

    Google Scholar 

  17. Ayhan Demiriz, Kristin P. Bennett, and John Shawe-Taylor. Linear programming boosting via column generation. Machine Learning, 46(1/2/3):225–254, 2002.

    Article  MATH  Google Scholar 

  18. Thomas G. Dietterich. An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning, 40(2):139–158, 2000.

    Article  Google Scholar 

  19. Thomas G. Dietterich and Ghulum Bakiri. Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research, 2:263–286, January 1995.

    MATH  Google Scholar 

  20. Harris Drucker. Improving regressors using boosting techniques. In Machine Learning: Proceedings of the Fourteenth International Conference, pages 107-115, 1997.

    Google Scholar 

  21. Harris Drucker and Corinna Cortes. Boosting decision trees. In Advances in Neural Information Processing Systems 8, pages 479-485, 1996.

    Google Scholar 

  22. Harris Drucker, Robert Schapire, and Patrice Simard. Boosting performance in neural networks. International Journal of Pattern Recognition and Artificial Intelligence, 7(4):705–719, 1993.

    Article  Google Scholar 

  23. Nigel Duffy and David Helmbold. Potential boosters? In Advances in Neural Information Processing Systems 11, 1999.

    Google Scholar 

  24. Nigel Duffy and David Helmbold. Boosting methods for regression. Machine Learning, 49(2/3), 2002.

    Google Scholar 

  25. Gerard Escudero, Lluís Márquez, and German Rigau. Boosting applied to word sense disambiguation. In Proceedings of the 12th European Conference on Machine Learning, pages 129-141, 2000.

    Google Scholar 

  26. Yoav Freund. Boosting a weak learning algorithm by majority. Information and Computation, 121(2):256–285, 1995.

    Article  MathSciNet  MATH  Google Scholar 

  27. Yoav Freund. An adaptive version of the boost by majority algorithm. Machine Learning, 43(3):293–318, June 2001.

    Article  MATH  Google Scholar 

  28. Yoav Freund, Raj Iyer, Robert E. Schapire, and Yoram Singer. An efficient boosting algorithm for combining preferences. In Machine Learning: Proceedings of the Fifteenth International Conference, 1998.

    Google Scholar 

  29. Yoav Freund and Llew Mason. The alternating decision tree learning algorithm. In Machine Learning: Proceedings of the Sixteenth International Conference, pages 124-133, 1999.

    Google Scholar 

  30. Yoav Freund and Robert E. Schapire. Experiments with a new boosting algorithm. In Machine Learning: Proceedings of the Thirteenth International Conference, pages 148-156, 1996.

    Google Scholar 

  31. Yoav Freund and Robert E. Schapire. Game theory, on-line prediction and boosting. In Proceedings of the Ninth Annual Conference on Computational Learning Theory, pages 325-332, 1996.

    Google Scholar 

  32. Yoav Freund and Robert E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119–139, August 1997.

    Article  MathSciNet  MATH  Google Scholar 

  33. Yoav Freund and Robert E. Schapire. Adaptive game playing using multiplicative weights. Games and Economic Behavior, 29:79–103, 1999.

    Article  MathSciNet  MATH  Google Scholar 

  34. Jerome Friedman, Trevor Hastie, and Robert Tibshirani. Additive logistic regression: A statistical view of boosting. The Annals of Statistics, 38(2):337–374, April 2000.

    Article  MathSciNet  Google Scholar 

  35. Jerome H. Friedman. Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5), October 2001.

    Google Scholar 

  36. Johannes Fürnkranz and Gerhard Widmer. Incremental reduced error pruning. In Machine Learning: Proceedings of the Eleventh International Conference, pages 70-77, 1994.

    Google Scholar 

  37. Adam J. Grove and Dale Schuurmans. Boosting in the limit: Maximizing the margin of learned ensembles. In Proceedings of the Fifteenth National Conference on Artificial Intelligence, 1998.

    Google Scholar 

  38. Masahiko Haruno, Satoshi Shirai, and Yoshifumi Ooyama. Using decision trees to construct a practical parser. Machine Learning, 34:131–149, 1999.

    Article  MATH  Google Scholar 

  39. Raj D. Iyer, David D. Lewis, Robert E. Schapire, Yoram Singer, and Amit Singhal. Boosting for document routing. In Proceedings of the Ninth International Conference on Information and Knowledge Management, 2000.

    Google Scholar 

  40. Jeffrey C. Jackson and Mark W. Craven. Learning sparse perceptrons. In Advances in Neural Information Processing Systems 8, pages 654-660, 1996.

    Google Scholar 

  41. Michael Kearns and Leslie G. Valiant. Learning Boolean formulae or finite automata is as hard as factoring. Technical Report TR-14-88, Harvard University Aiken Computation Laboratory, August 1988.

    Google Scholar 

  42. Michael Kearns and Leslie G. Valiant. Cryptographic limitations on learning Boolean formulae and finite automata. Journal of the Association for Computing Machinery, 41(1):67–95, January 1994.

    Article  MathSciNet  MATH  Google Scholar 

  43. Jyrki Kivinen and Manfred K. Warmuth. Boosting as entropy projection. In Proceedings of the Twelfth Annual Conference on Computational Learning Theory, pages 134-144, 1999.

    Google Scholar 

  44. V. Koltchinskii and D. Panchenko. Empirical margin distributions and bounding the generalization error of combined classifiers. The Annals of Statistics, 30(1), February 2002.

    Google Scholar 

  45. Vladimir Koltchinskii, Dmitriy Panchenko, and Fernando Lozano. Further explanation of the effectiveness of voting methods: The game between margins and weights. In Proceedings 14th Annual Conference on Computational Learning Theory and 5th European Conference on Computational Learning Theory, pages 241-255, 2001.

    Google Scholar 

  46. Vladimir Koltchinskii, Dmitriy Panchenko, and Fernando Lozano. Some new bounds on the generalization error of combined classifiers. In Advances in Neural Information Processing Systems 13, 2001.

    Google Scholar 

  47. John Lafferty. Additive models, boosting and inference for generalized divergences. In Proceedings of the Twelfth Annual Conference on Computational Learning Theory, pages 125-133, 1999.

    Google Scholar 

  48. Guy Lebanon and John Lafferty. Boosting and maximum likelihood for exponential models. In Advances in Neural Information Processing Systems 14, 2002.

    Google Scholar 

  49. Richard Maclin and David Opitz. An empirical evaluation of bagging and boosting. In Proceedings of the Fourteenth National Conference on Artificial Intelligence, pages 546-551, 1997.

    Google Scholar 

  50. Llew Mason, Peter Bartlett, and Jonathan Baxter. Direct optimization of margins improves generalization in combined classifiers. In Advances in Neural Information Processing Systems 12, 2000.

    Google Scholar 

  51. Llew Mason, Jonathan Baxter, Peter Bartlett, and Marcus Prean. Functional gradient techniques for combining hypotheses. In Alexander J. Smola, Peter J. Bartlett, Bernhard Schölkopf, and Dale Schuurmans, editors, Advances in Large Margin Classifiers. MIT Press, 1999.

    Google Scholar 

  52. Llew Mason, Jonathan Baxter, Peter Bartlett, and Marcus Prean. Boosting algorithms as gradient descent. In Advances in Neural Information Processing Systems 12, 2000.

    Google Scholar 

  53. Stefano Merler, Cesare Purlanello, Barbara Larcher, and Andrea Sboner. Tuning cost-sensitive boosting and its application to melanoma diagnosis. In Multiple Classifier Systems: Proceedings of the 2nd International Workshop, pages 32-42, 2001.

    Google Scholar 

  54. C. J. Merz and P. M. Murphy. UCI repository of machine learning databases, 1999. www.ics.uci.edu/~mlearn/MLRepository.html.

  55. Pedro J. Moreno, Beth Logan, and Bhiksha Raj. A boosting approach for confidence scoring. In Proceedings of the 7th European Conference on Speech Communication and Technology, 2001.

    Google Scholar 

  56. Michael C. Mozer, Richard Wolniewicz, David B. Grimes, Eric Johnson, and Howard Kaushansky. Predicting subscriber dissatisfaction and improving retention in the wireless telecommunications industry. IEEE Transactions on Neural Networks, 11:690–696, 2000.

    Article  Google Scholar 

  57. Takashi Onoda, Gunnar Ratsch, and Klaus-Robert Müller. Applying support vector machines and boosting to a non-intrusive monitoring system for household electric appliances with inverters. In Proceedings of the Second ICSC Symposium on Neural Computation, 2000.

    Google Scholar 

  58. Dmitriy Panchenko. New zero-error bounds for voting algorithms. Unpublished manuscript, 2001.

    Google Scholar 

  59. J. R. Quinlan. Bagging, boosting, and C4.5. In Proceedings of the Thirteenth National Conference on Artificial Intelligence, pages 725-730, 1996.

    Google Scholar 

  60. J. Ross Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.

    Google Scholar 

  61. G. Ratsch, T. Onoda, and K.-R. Müller. Soft margins for AdaBoost. Machine Learning, 42(3):287–320, 2001.

    Article  Google Scholar 

  62. Gunnar Ratsch, Manfred Warmuth, Sebastian Mika, Takashi Onoda, Steven Lemm, and Klaus-Robert Müller. Barrier boosting. In Proceedings of the Thirteenth Annual Conference on Computational Learning Theory, pages 170-179, 2000.

    Google Scholar 

  63. Greg Ridgeway, David Madigan, and Thomas Richardson. Boosting methodology for regression problems. In Proceedings of the International Workshop on AI and Statistics, pages 152-161, 1999.

    Google Scholar 

  64. M. Rochery, R. Schapire, M. Rahim, N. Gupta, G. Riccardi, S. Bangalore, H. Alshawi, and S. Douglas. Combining prior knowledge and boosting for call classification in spoken language dialogue. Unpublished manuscript, 2001.

    Google Scholar 

  65. Marie Rochery, Robert Schapire, Mazin Rahim, and Narendra Gupta. BoosTexter for text categorization in spoken language dialogue. Unpublished manuscript, 2001.

    Google Scholar 

  66. Robert E. Schapire. The strength of weak learnability. Machine Learning, 5(2): 197–227, 1990.

    Google Scholar 

  67. Robert E. Schapire. Using output codes to boost multiclass learning problems. In Machine Learning: Proceedings of the Fourteenth International Conference, pages 313-321, 1997.

    Google Scholar 

  68. Robert E. Schapire. Drifting games. Machine Learning, 43(3):265–291, June 2001.

    Article  MATH  Google Scholar 

  69. Robert E. Schapire, Yoav Freund, Peter Bartlett, and Wee Sun Lee. Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics, 26(5):1651–1686, October 1998.

    Article  MathSciNet  MATH  Google Scholar 

  70. Robert E. Schapire and Yoram Singer. Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37(3):297–336, December 1999.

    Article  MATH  Google Scholar 

  71. Robert E. Schapire and Yoram Singer. BoosTexter: A boosting-based system for text categorization. Machine Learning, 39(2/3): 135–168, May/June 2000.

    Article  Google Scholar 

  72. Robert E. Schapire, Yoram Singer, and Amit Singhal. Boosting and Rocchio applied to text filtering. In Proceedings of the 21st Annual International Conference on Research and Development in Information Retrieval, 1998.

    Google Scholar 

  73. Holger Schwenk and Yoshua Bengio. Training methods for adaptive boosting of neural networks. In Advances in Neural Information Processing Systems 10, pages 647-653, 1998.

    Google Scholar 

  74. Kinh Tieu and Paul Viola. Boosting image retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2000.

    Google Scholar 

  75. L. G. Valiant. A theory of the learnable. Communications of the ACM, 27(11):1134–1142, November 1984.

    Article  Google Scholar 

  76. V. N. Vapnik and A. Ya. Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its applications, XVI(2):264–280, 1971.

    Article  MathSciNet  Google Scholar 

  77. Vladimir N. Vapnik. The Nature of Statistical Learning Theory. Springer, 1995.

    Google Scholar 

  78. Marilyn A. Walker, Owen Rambow, and Monica Rogati. SPoT: A trainable sentence planner. In Proceedings of the 2nd Annual Meeting of the North American Chapter of the Associataion for Computational Linguistics, 2001.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer Science+Business Media New York

About this chapter

Cite this chapter

Schapire, R.E. (2003). The Boosting Approach to Machine Learning: An Overview. In: Denison, D.D., Hansen, M.H., Holmes, C.C., Mallick, B., Yu, B. (eds) Nonlinear Estimation and Classification. Lecture Notes in Statistics, vol 171. Springer, New York, NY. https://doi.org/10.1007/978-0-387-21579-2_9

Download citation

  • DOI: https://doi.org/10.1007/978-0-387-21579-2_9

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-0-387-95471-4

  • Online ISBN: 978-0-387-21579-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics