Skip to main content

An Introduction to Boosting and Leveraging

  • Chapter
  • First Online:
Advanced Lectures on Machine Learning

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2600))

Abstract

We provide an introduction to theoretical and practical aspects of Boosting and Ensemble learning, providing a useful reference for researchers in the field of Boosting as well as for those seeking to enter this fascinating area of research. We begin with a short background concerning the necessary learning theoretical foundations of weak learners and their linear combinations. We then point out the useful connection between Boosting and the Theory of Optimization, which facilitates the understanding of Boosting and later on enables us to move on to new Boosting algorithms, applicable to a broad spectrum of problems. In order to increase the relevance of the paper to practitioners, we have added remarks, pseudo code, “tricks of the trade”, and algorithmic considerations where appropriate. Finally, we illustrate the usefulness of Boosting algorithms by giving an overview of some existing applications. The main ideas are illustrated on the problem of binary classification, although several extensions are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 49.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. S. Abney, R. E. Schapire, and Y. Singer. Boosting applied to tagging and pp attachment. In Proc. of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, 1999.

    Google Scholar 

  2. H. Akaike. A new look at the statistical model identification. IEEE Trans. Automat. Control, 19(6):716–723, 1974.

    Article  MATH  MathSciNet  Google Scholar 

  3. E. L. Allwein, R. E. Schapire, and Y. Singer. Reducing multiclass to binary: A unifying approach for margin classifiers. Journal of Machine Learning Research, 1:113–141, 2000.

    Article  MathSciNet  Google Scholar 

  4. M. Anthony and P. L. Bartlett. Neural Network Learning: Theoretical Foundations. Cambridge University Press, 1999.

    Google Scholar 

  5. A. Antos, B. Kégl, T. Linder, and G. Lugosi. Data-dependent margin-based generalization bounds for classification. JMLR, 3:73–98, 2002.

    Article  Google Scholar 

  6. J. A. Aslam. Improving algorithms for boosting. In Proc. COLT, San Francisco, 2000. Morgan Kaufmann.

    Google Scholar 

  7. F. Audrino and P. Bühlmann. Volatility estimation with functional gradient descent for very high-dimensional financial time series. Journal of Computational Finance., 2002. To appear. See http://www.stat.ethz.ch/~buhlmann/bibliog.html.

  8. J. P. Barnes. Capacity control in boosting using a p-convex hull. Master’s thesis, Australian National University, 1999. supervised by R. C. Williamson.

    Google Scholar 

  9. P. Bartlett, P. Boucheron, and G. Lugosi. Model selction and error estimation. Machine Learning, 48:85–2002, 2002.

    Article  MATH  Google Scholar 

  10. P. L. Bartlett, O. Bousquet, and S. Mendelson. Localized rademacher averages. In Procedings COLT’02, volume 2375 of LNAI, pages 44–58, Sydney, 2002. Springer.

    Google Scholar 

  11. P. L. Bartlett and S. Mendelson. Rademacher and gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 2002. to appear 10/02.

    Google Scholar 

  12. E. Bauer and R. Kohavi. An empirical comparison of voting classification algorithm: Bagging, boosting and variants. Machine Learning, 36:105–142, 1999.

    Article  Google Scholar 

  13. H. H. Bauschke and J. M. Borwein. Legendre functions and the method of random Bregman projections. Journal of Convex Analysis, 4:27–67, 1997.

    MATH  MathSciNet  Google Scholar 

  14. S. Ben-David, P. Long, and Y. Mansour. Agnostic boosting. In Proceedings of the Fourteenth Annual Conference on Computational Learning Theory, pages 507–516, 2001.

    Google Scholar 

  15. K. P. Bennett and O. L. Mangasarian. Multicategory separation via linear programming. Optimization Methods and Software, 3:27–39, 1993.

    Article  Google Scholar 

  16. K. P. Bennett, A. Demiriz, and R. Maclin. Exploiting unlabeled data in ensemble methods. In Proc. ICML, 2002.

    Google Scholar 

  17. K. P. Bennett and O. L. Mangasarian. Robust linear programming discrimination of two linearly inseparable sets. Optimization Methods and Software, 1:23–34, 1992.

    Article  Google Scholar 

  18. A. Bertoni, P. Campadelli, and M. Parodi. A boosting algorithm for regression. In W. Gerstner, A. Germond, M. Hasler, and J.-D. Nicoud, editors, Proceedings ICANN’97, Int. Conf. on Artificial Neural Networks, volume V of LNCS, pages 343–348, Berlin, 1997. Springer.

    Google Scholar 

  19. D. P. Bertsekas. Nonlinear Programming. Athena Scientific, Belmont, MA, 1995.

    MATH  Google Scholar 

  20. C. M. Bishop. Neural Networks for Pattern Recognition. Oxford University Press, 1995.

    Google Scholar 

  21. A. Blumer, A. Ehrenfeucht, D. Haussler, and M. Warmuth. Occam’s razor. Information Processing Letters, 24:377–380, 1987.

    Article  MATH  MathSciNet  Google Scholar 

  22. B. E. Boser, I. M. Guyon, and V. N. Vapnik. A training algorithm for optimal margin classifiers. In D. Haussler, editor, Proceedings of the 5th Annual ACM orkshop on Computational Learning Theory, pages 144–152, 1992.

    Google Scholar 

  23. P. S. Bradley and O. L. Mangasarian. Feature selection via concave minimization and support vector machines. In Proc. 15th International Conf. on Machine Learning, pages 82–90. Morgan Kaufmann, San Francisco, CA, 1998.

    Google Scholar 

  24. L. M. Bregman. The relaxation method for finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Computational Math. and Math. Physics, 7:200–127, 1967.

    Article  Google Scholar 

  25. L. Breiman. Bagging predictors. Machine Learning, 26(2):123–140, 1996.

    Google Scholar 

  26. L. Breiman. Bias, variance, and arcing classifiers. Technical Report 460, Statistics Department, University of California, July 1997.

    Google Scholar 

  27. L. Breiman. Prediction games and arcing algorithms. Neural Computation, 11(7):1493–1518, 1999. Also Technical Report 504, Statistics Department, University of California Berkeley.

    Article  Google Scholar 

  28. L. Breiman. Some infinity theory for predictor ensembles. Technical Report 577, Berkeley, August 2000.

    Google Scholar 

  29. L. Breiman, J. Friedman, J. Olshen, and C. Stone. Classification and Regression Trees. Wadsworth, 1984.

    Google Scholar 

  30. N. Bshouty and D. Gavinsky. On boosting with polynomially bounded distributions. JMLR, pages 107–111, 2002. Accepted.

    Google Scholar 

  31. P. Buhlmann and B. Yu. Boosting with the l2 loss: Regression and classification. J. Amer. Statist. Assoc., 2002. revised, also Technical Report 605, Stat Dept, UC Berkeley August, 2001.

    Google Scholar 

  32. C. Campbell and K. P. Bennett. A linear programming approach to novelty detection. In T. K. Leen, T. G. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems, volume 13, pages 395–401. MIT Press, 2001.

    Google Scholar 

  33. J. Carmichael. Non-intrusive appliance load monitoring system. Epri journal, Electric Power Research Institute, 1990.

    Google Scholar 

  34. Y. Censor and S. A. Zenios. Parallel Optimization: Theory, Algorithms and Application. Numerical Mathematics and Scientific Computation. Oxford University Press, 1997.

    Google Scholar 

  35. N. Cesa-Bianchi, A. Krogh, and M. Warmuth. Bounds on approximate steepest descent for likelihood maximization in exponential families. IEEE Transaction on Information Theory, 40(4):1215–1220, July 1994.

    Article  MATH  MathSciNet  Google Scholar 

  36. O. Chapelle, V. Vapnik, O. Bousquet, and S. Mukherjee. Choosing multiple parameters for support vector machines. Machine Learning, 46(1):131–159, 2002.

    Article  MATH  Google Scholar 

  37. S. Chen, D. Donoho, and M. Saunders. Atomic decomposition by basis pursuit. Technical Report 479, Department of Statistics, Stanford University, 1995.

    Google Scholar 

  38. W. W. Cohen, R. E. Schapire, and Y. Singer. Learning to order things. In Michael I. Jordan, Michael J. Kearns, and Sara A. Solla, editors, Advances in Neural Information Processing Systems, volume 10. The MIT Press, 1998.

    Google Scholar 

  39. M. Collins, R. E. Schapire, and Y. Singer. Logistic Regression, AdaBoost and Bregman distances. Machine Learning, 48(1–3):253–285, 2002. Special Issue on New Methods for Model Selection and Model Combination.

    Article  MATH  Google Scholar 

  40. R. Cominetti and J.-P. Dussault. A stable exponential penalty algorithm with superlinear convergence. J.O.T.A., 83(2), Nov 1994.

    Google Scholar 

  41. C. Cortes and V. N. Vapnik. Support vector networks. Machine Learning, 20:273–297, 1995.

    MATH  Google Scholar 

  42. T. M. Cover and P. E. Hart. Nearest neighbor pattern classifications. IEEE transaction on information theory, 13(1):21–27, 1967.

    Article  MATH  Google Scholar 

  43. D. D. Cox and F. O’sullivan. Asymptotic analysis of penalized likelihood and related estimates. The Annals of Statistics, 18(4):1676–1695, 1990.

    Article  MATH  MathSciNet  Google Scholar 

  44. K. Crammer and Y. Singer. On the learnability and design of output codes for multiclass problems. In N. Cesa-Bianchi and S. Goldberg, editors, Proc. Colt, pages 35–46, San Francisco, 2000. Morgan Kaufmann.

    Google Scholar 

  45. N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines. Cambridge University Press, Cambridge, UK, 2000.

    Google Scholar 

  46. S. Della Pietra, V. Della Pietra, and J. Lafferty. Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(4):380–393, April 1997.

    Article  Google Scholar 

  47. S. Della Pietra, V. Della Pietra, and J. Lafferty. Duality and auxiliary functions for Bregman distances. Technical Report CMU-CS-01-109, School of Computer Science, Carnegie Mellon University, 2001.

    Google Scholar 

  48. A. Demiriz, K. P. Bennett, and J. Shawe-Taylor. Linear programming boosting via column generation. Journal of Machine Learning Research, 46:225–254, 2002.

    Article  MATH  Google Scholar 

  49. M. Dettling and P. Bühlmann. How to use boosting for tumor classification with gene expression data. Preprint. See http://www.stat.ethz.ch/~dettling/boosting, 2002.

  50. L. Devroye, L. Györ., and G. Lugosi. A Probabilistic Theory of Pattern Recognition. Number 31 in Applications of Mathematics. Springer, New York, 1996.

    Google Scholar 

  51. T. G. Dietterich. An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning, 40(2):139–157, 1999.

    Article  Google Scholar 

  52. T. G. Dietterich and G. Bakiri. Solving multiclass learning problems via errorcorrecting output codes. Journal of Aritifical Intelligence Research, 2:263–286, 1995.

    MATH  Google Scholar 

  53. C. Domingo and O. Watanabe. A modification of AdaBoost. In Proc. COLT, San Francisco, 2000. Morgan Kaufmann.

    Google Scholar 

  54. H. Drucker, C. Cortes, L. D. Jackel, Y. LeCun, and V. Vapnik. Boosting and other ensemble methods. Neural Computation, 6, 1994.

    Google Scholar 

  55. H. Drucker, R. E. Schapire, and P. Y. Simard. Boosting performance in neural networks. International Journal of Pattern Recognition and Artificial Intelligence, 7:705–719, 1993.

    Article  Google Scholar 

  56. N. Duffy and D. P. Helmbold. A geometric approach to leveraging weak learners. In P. Fischer and H. U. Simon, editors, Computational Learning Theory: 4th European Conference (EuroCOLT’ 99), pages 18–33, March 1999. Long version to appear in TCS.

    Google Scholar 

  57. N. Duffy and D. P. Helmbold. Boosting methods for regression. Technical report, Department of Computer Science, University of Santa Cruz, 2000.

    Google Scholar 

  58. N. Duffy and D. P. Helmbold. Leveraging for regression. In Proc. COLT, pages 208–219, San Francisco, 2000. Morgan Kaufmann.

    Google Scholar 

  59. N. Duffy and D. P. Helmbold. Potential boosters? In S. A. Solla, T. K. Leen, and K.-R. Müller, editors, Advances in Neural Information Processing Systems, volume 12, pages 258–264. MIT Press, 2000.

    Google Scholar 

  60. G. Escudero, L. Màrquez, and G. Rigau. Boosting applied to word sense disambiguation. In LNAI 1810: Proceedings of the 12th European Conference on Machine Learning, ECML, pages 129–141, Barcelona, Spain, 2000.

    Google Scholar 

  61. W. Feller. An Introduction to Probability Theory and its Applications. Wiley, Chichester, third edition, 1968.

    Google Scholar 

  62. D. H. Fisher, Jr., editor. Improving regressors using boosting techniques, 1997.

    Google Scholar 

  63. M. Frean and T. Downs. A simple cost function for boosting. Technical report, Dep. of Computer Science and Electrical Engineering, University of Queensland, 1998.

    Google Scholar 

  64. Y. Freund. Boosting a weak learning algorithm by majority. Information and Computation, 121(2):256–285, September 1995.

    Article  MATH  MathSciNet  Google Scholar 

  65. Y. Freund. An adaptive version of the boost by majority algorithm. Machine Learning, 43(3):293–318, 2001.

    Article  MATH  Google Scholar 

  66. Y. Freund, R. Iyer, R. E. Schapire, and Y. Singer. An efficient boosting algorithm for combining preferences. In Proc. ICML, 1998.

    Google Scholar 

  67. Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. In EuroCOLT: European Conference on Computational Learning Theory. LNCS, 1994.

    Google Scholar 

  68. Y. Freund and R. E. Schapire. Experiments with a new boosting algorithm. In Proc. 13th International Conference on Machine Learning, pages 148–146. Morgan Kaufmann, 1996.

    Google Scholar 

  69. Y. Freund and R. E. Schapire. Game theory, on-line prediction and boosting. In Proc. COLT, pages 325–332, New York, NY, 1996. ACM Press.

    Google Scholar 

  70. Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119–139, 1997.

    Article  MATH  MathSciNet  Google Scholar 

  71. Y. Freund and R. E. Schapire. Adaptive game playing using multiplicative weights. Games and Economic Behavior, 29:79–103, 1999.

    Article  MATH  MathSciNet  Google Scholar 

  72. Y. Freund and R. E. Schapire. A short introduction to boosting. Journal of Japanese Society for Artificial Intelligence, 14(5):771–780, September 1999. Appeared in Japanese, translation by Naoki Abe.

    Google Scholar 

  73. J. Friedman. Stochastic gradient boosting. Technical report, Stanford University, March 1999.

    Google Scholar 

  74. J. Friedman, T. Hastie, and R. J. Tibshirani. Additive logistic regression: a statistical view of boosting. Annals of Statistics, 2:337–374, 2000. with discussion pp.375-407, also Technical Report at Department of Statistics, Sequoia Hall, Stanford University.

    Article  MathSciNet  Google Scholar 

  75. J. H. Friedman. On bias, variance, 0/1-loss, and the corse of dimensionality. In Data Mining and Knowledge Discovery, volume I, pages 55–77. Kluwer Academic Publishers, 1997.

    Article  Google Scholar 

  76. J. H. Friedman. Greedy function approximation. Technical report, Department of Statistics, Stanford University, February 1999.

    Google Scholar 

  77. K. R. Frisch. The logarithmic potential method of convex programming. Memorandum, University Institute of Economics, Oslo, May 13 1955.

    Google Scholar 

  78. T. Graepel, R. Herbrich, B. Schölkopf, A. J. Smola, P. L. Bartlett, K.-R. Müller, K. Obermayer, and R. C. Williamson. Classification on proximity data with LPmachines. In D. Willshaw and A. Murray, editors, Proceedings of ICANN’99, volume 1, pages 304–309. IEE Press, 1999.

    Google Scholar 

  79. Y. Grandvalet. Bagging can stabilize without reducing variance. In ICANN’01, Lecture Notes in Computer Science. Springer, 2001.

    Google Scholar 

  80. Y. Grandvalet, F. D’alché-Buc, and C. Ambroise. Boosting mixture models for semi-supervised tasks. In Proc. ICANN, Vienna, Austria, 2001.

    Google Scholar 

  81. A. J. Grove and D. Schuurmans. Boosting in the limit: Maximizing the margin of learned ensembles. In Proceedings of the Fifteenth National Conference on Artifical Intelligence, 1998.

    Google Scholar 

  82. V. Guruswami and A. Sahai. Multiclass learning, boosing, and error-correcting codes. In Proc. of the twelfth annual conference on Computational learning theory, pages 145–155, New York, USA, 1999. ACM Press.

    Google Scholar 

  83. W. Hart. Non-intrusive appliance load monitoring. Proceedings of the IEEE, 80(12), 1992.

    Google Scholar 

  84. M. Haruno, S. Shirai, and Y. Ooyama. Using decision trees to construct a practical parser. Machine Learning, 34:131–149, 1999.

    Article  MATH  Google Scholar 

  85. T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: data mining, inference and prediction. Springer series in statistics. Springer, New York, N.Y., 2001.

    MATH  Google Scholar 

  86. T. J. Hastie and R. J. Tibshirani. Generalized Additive Models, volume 43 of Monographs on Statistics and Applied Probability. Chapman & Hall, London, 1990.

    Google Scholar 

  87. D. Haussler. Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications. Information and Computation, 100:78–150, 1992.

    Article  MATH  MathSciNet  Google Scholar 

  88. S. S. Haykin. Neural Networks: A Comprehensive Foundation. Prentice-Hall, second edition, 1998.

    Google Scholar 

  89. D. P. Helmbold, K. Kivinen, and M. K. Warmuth. Relative loss bounds for single neurons. IEEE Transactions on Neural Networks, 10(6):1291–1304, 1999.

    Article  Google Scholar 

  90. R. Herbrich. Learning Linear Classifiers: Theory and Algorithms, volume 7 of Adaptive Computation and Machine Learning. MIT Press, 2002.

    Google Scholar 

  91. R. Herbrich, T. Graepel, and J. Shawe-Taylor. Sparsity vs. large margins for linear classifiers. In Proc. COLT, pages 304–308, San Francisco, 2000. Morgan Kaufmann.

    Google Scholar 

  92. R. Herbrich and R. Williamson. Algorithmic luckiness. JMLR, 3:175–212, 2002.

    Article  MathSciNet  Google Scholar 

  93. R. Hettich and K. O. Kortanek. Semi-infinite programming: Theory, methods and applications. SIAM Review, 3:380–429, September 1993.

    Google Scholar 

  94. F. J. Huang, Z.-H. Zhou, H.-J. Zhang, and T. Chen. Pose invariant face recognition. In Proceedings of the 4th IEEE International Conference on Automatic Face and Gesture Recognition, pages 245–250, Grenoble, France, 2000.

    Google Scholar 

  95. R. D. Iyer, D. D. Lewis, R. E. Schapire, Y. Singer, and A. Singhal. Boosting for document routing. In A. Agah, J. Callan, and E. Rundensteiner, editors, Proceedings of CIKM-00, 9th ACM International Conference on Information and Knowledge Management, pages 70–77, McLean, US, 2000. ACM Press, New York, US.

    Google Scholar 

  96. W. James and C. Stein. Estimation with quadratic loss. In Proceedings of the Fourth Berkeley Symposium on Mathematics, Statistics and Probability, volume 1, pages 361–380, Berkeley, 1960. University of California Press.

    Google Scholar 

  97. W. Jiang. Some theoretical aspects of boosting in the presence of noisy data. In Proceedings of the Eighteenth International Conference on Machine Learning, 2001.

    Google Scholar 

  98. D. S. Johnson and F. P. Preparata. The densest hemisphere problem. Theoretical Computer Science, 6:93–107, 1978.

    Article  MATH  MathSciNet  Google Scholar 

  99. M. I. Jordan and R. A. Jacobs. Hierarchical mixtures of experts and the em algorithm. Neural Computation, 6(2):181–214, 1994.

    Article  Google Scholar 

  100. M. Kearns and Y. Mansour. On the boosting ability og top-down decision tree learning algorithms. In Proc. 28th ACM Symposium on the Theory of Computing,, pages 459–468. ACM Press, 1996.

    Google Scholar 

  101. M. Kearns and L. Valiant. Cryptographic limitations on learning Boolean formulae and finite automata. Journal of the ACM, 41(1):67–95, January 1994.

    Article  MATH  MathSciNet  Google Scholar 

  102. M. J. Kearns and U. V. Vazirani. An Introduction to Computational Learning Theory. MIT Press, 1994.

    Google Scholar 

  103. G. S. Kimeldorf and G. Wahba. Some results on Tchebycheffian spline functions. J. Math. Anal. Applic., 33:82–95, 1971.

    Article  MATH  MathSciNet  Google Scholar 

  104. J. Kivinen and M. Warmuth. Boosting as entropy projection. In Proc. 12th Annu. Conference on Comput. Learning Theory, pages 134–144. ACM Press, New York, NY, 1999.

    Google Scholar 

  105. J. Kivinen, M. Warmuth, and P. Auer. The perceptron algorithm vs. winnow: Linear vs. logarithmic mistake bounds when few input variables are relevant. Special issue of Artificial Intelligence, 97(1–2):325–343, 1997.

    MATH  MathSciNet  Google Scholar 

  106. J. Kivinen and M. K. Warmuth. Additive versus exponentiated gradient updates for linear prediction. Information and Computation, 132(1):1–64, 1997.

    Article  MATH  MathSciNet  Google Scholar 

  107. K. C. Kiwiel. Relaxation methods for strictly convex regularizations of piecewise linear programs. Applied Mathematics and Optimization, 38:239–259, 1998.

    Article  MATH  MathSciNet  Google Scholar 

  108. V. Koltchinksii and D. Panchenko. Empirical margin distributions and bounding the generalization error of combined classifiers. Ann. Statis., 30(1), 2002.

    Google Scholar 

  109. A. Krieger, A. Wyner, and C. Long. Boosting noisy data. In Proceedings, 18th ICML. Morgan Kaufmann, 2001.

    Google Scholar 

  110. J. Lafferty. Additive models, boosting, and inference for generalized divergences. In Proc. 12th Annu. Conf. on Comput. Learning Theory, pages 125–133, New York, NY, 1999. ACM Press.

    Google Scholar 

  111. G. Lebanon and J. Lafferty. Boosting and maximum likelihood for exponential models. In Advances in Neural information processings systems, volume 14, 2002. to appear. Longer version also NeuroCOLT Technical Report NC-TR-2001-098.

    Google Scholar 

  112. Y. A. LeCun, L. D. Jackel, L. Bottou, A. Brunot, C. Cortes, J. S. Denker, H. Drucker, I. Guyon, U. A. Müller, E. Säckinger, P. Y. Simard, and V. N. Vapnik. Comparison of learning algorithms for handwritten digit recognition. In F. Fogelman-Soulié and P. Gallinari, editors, Proceedings ICANN’95-International Conference on Artificial Neural Networks, volume II, pages 53–60, Nanterre, France, 1995. EC2.

    Google Scholar 

  113. M. Leshno, V. Lin, A. Pinkus, and S. Schocken. Multilayer Feedforward Networks with a Nonpolynomial Activation Function Can Approximate any Function. Neural Networks, 6:861–867, 1993.

    Article  Google Scholar 

  114. N. Littlestone, P. M. Long, and M. K. Warmuth. On-line learning of linear functions. Journal of Computational Complexity, 5:1–23, 1995. Earlier version is Technical Report CRL-91-29 at UC Santa Cruz.

    Article  MATH  MathSciNet  Google Scholar 

  115. D. G. Luenberger. Linear and Nonlinear Programming. Addison-Wesley Publishing Co., Reading, second edition, May 1984. Reprinted with corrections in May, 1989.

    Google Scholar 

  116. Gábor Lugosi and Nicolas Vayatis. A consistent strategy for boosting algorithms. In Proceedings of the Annual Conference on Computational Learning Theory, volume 2375 of LNAI, pages 303–318, Sydney, February 2002. Springer.

    Google Scholar 

  117. Z.-Q. Luo and P. Tseng. On the convergence of coordinate descent method for convex differentiable minimization. Journal of Optimization Theory and Applications, 72(1):7–35, 1992.

    Article  MATH  MathSciNet  Google Scholar 

  118. S. Mallat and Z. Zhang. Matching Pursuits with time-frequency dictionaries. IEEE Transactions on Signal Processing, 41(12):3397–3415, December 1993.

    Article  MATH  Google Scholar 

  119. O. L. Mangasarian. Linear and nonlinear separation of patterns by linear programming. Operations Research, 13:444–452, 1965.

    Article  MATH  MathSciNet  Google Scholar 

  120. O. L. Mangasarian. Arbitrary-norm separating plane. Operation Research Letters, 24(1):15–23, 1999.

    Article  MATH  MathSciNet  Google Scholar 

  121. S. Mannor and R. Meir. Geometric bounds for generlization in boosting. In Proceedings of the Fourteenth Annual Conference on Computational Learning Theory, pages 461–472, 2001.

    Google Scholar 

  122. S. Mannor and R. Meir. On the existence of weak learners and applications to boosting. Machine Learning, 48(1–3):219–251, 2002.

    Google Scholar 

  123. S. Mannor, R. Meir, and T. Zhang. The consistency of greedy algorithms for classification. In Procedings COLT’02, volume 2375 of LNAI, pages 319–333, Sydney, 2002. Springer.

    Google Scholar 

  124. L. Mason. Margins and Combined Classifiers. PhD thesis, Australian National University, September 1999.

    Google Scholar 

  125. L. Mason, P. L. Bartlett, and J. Baxter. Improved generalization through explicit optimization of margins. Technical report, Department of Systems Engineering, Australian National University, 1998.

    Google Scholar 

  126. L. Mason, J. Baxter, P. L. Bartlett, and M. Frean. Functional gradient techniques for combining hypotheses. In A. J. Smola, P. L. Bartlett, B. Schölkopf, and C. Schuurmans, editors, Advances in Large Margin Classifiers. MIT Press, Cambridge, MA, 1999.

    Google Scholar 

  127. L. Mason, J. Baxter, P. L. Bartlett, and M. Frean. Functional gradient techniques for combining hypotheses. In A. J. Smola, P. L. Bartlett, B. Schölkopf, and D. Schuurmans, editors, Advances in Large Margin Classifiers, pages 221–247. MIT Press, Cambridge, MA, 2000.

    Google Scholar 

  128. J. Matoušek. Geometric Discrepancy: An Illustrated Guide. Springer Verlag, 1999.

    Google Scholar 

  129. R. Meir, R. El-Yaniv, and Shai Ben-David. Localized boosting. In Proc. COLT, pages 190–199, San Francisco, 2000. Morgan Kaufmann.

    Google Scholar 

  130. R. Meir and T. Zhang. Data-dependent bounds for bayesian mixture models. unpublished manuscript, 2002.

    Google Scholar 

  131. J. Mercer. Functions of positive and negative type and their connection with the theory of integral equations. Philos. Trans. Roy. Soc. London, A 209:415–446, 1909.

    Article  Google Scholar 

  132. S. Merler, C. Furlanello, B. Larcher, and A. Sboner. Tuning cost-sensitive boosting and its application to melanoma diagnosis. In J. Kittler and F. Roli, editors, Proceedings of the 2nd Internationa Workshop on Multiple Classifier Systems MCS2001, volume 2096 of LNCS, pages 32–42. Springer, 2001.

    Google Scholar 

  133. J. Moody. The effective number of parameters: An analysis of generalization and regularization in non-linear learning systems. In S. J. Hanson J. Moody and R. P. Lippman, editors, Advances in Neural information processings systems, volume 4, pages 847–854, San Mateo, CA, 1992. Morgan Kaufman.

    Google Scholar 

  134. K.-R. Müller, S. Mika, G. Rätsch, K. Tsuda, and B. Schölkopf. An introduction to kernel-based learning algorithms. IEEE Transactions on Neural Networks, 12(2):181–201, 2001.

    Article  Google Scholar 

  135. N. Murata, S. Amari, and S. Yoshizawa. Network information criterion-determining the number of hidden units for an artificial neural network model. IEEE Transactions on Neural Networks, 5:865–872, 1994.

    Article  Google Scholar 

  136. S. Nash and A. Sofer. Linear and Nonlinear Programming. McGraw-Hill, New York, NY, 1996.

    Google Scholar 

  137. Richard Nock and Patrice Lefaucheur. A robust boosting algorithm. In Proc. 13th European Conference on Machine Learning, volume LNAI 2430, Helsinki, 2002. Springer Verlag.

    Google Scholar 

  138. T. Onoda, G. Rätsch, and K.-R. Müller. An asymptotic analysis of AdaBoost in the binary classification case. In L. Niklasson, M. Bodén, and T. Ziemke, editors, Proc. of the Int. Conf. on Artificial Neural Networks (ICANN’98), pages 195–200, March 1998.

    Google Scholar 

  139. T. Onoda, G. Rätsch, and K.-R. Müller. A non-intrusive monitoring system for household electric appliances with inverters. In H. Bothe and R. Rojas, editors, Proc. of NC’2000, Berlin, 2000. ICSC Academic Press Canada/Switzerland.

    Google Scholar 

  140. J. O’sullivan, J. Langford, R. Caruana, and A. Blum. Featureboost: A metalearning algorithm that improves model robustness. In Proceedings, 17th ICML. Morgan Kaufmann, 2000.

    Google Scholar 

  141. N. Oza and S. Russell. Experimental comparisons of online and batch versions of bagging and boosting. In Proc. KDD-01, 2001.

    Google Scholar 

  142. R. El-Yaniv P. Derbeko and R. Meir. Variance optimized bagging. In Proc. 13th European Conference on Machine Learning, 2002.

    Google Scholar 

  143. T. Poggio and F. Girosi. Regularization algorithms for learning that are equivalent to multilayer networks. Science, 247:978–982, 1990.

    Article  MathSciNet  Google Scholar 

  144. J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1992.

    Google Scholar 

  145. J. R. Quinlan. Boosting first-order learning. Lecture Notes in Computer Science, 1160:143, 1996.

    Google Scholar 

  146. G. Rätsch. Ensemble learning methods for classification. Master’s thesis, Dep. of Computer Science, University of Potsdam, April 1998. In German.

    Google Scholar 

  147. G. Rätsch. Robust Boosting via Convex Optimization. PhD thesis, University of Potsdam, Computer Science Dept., August-Bebel-Str. 89, 14482 Potsdam, Germany, October 2001.

    Google Scholar 

  148. G. Rätsch. Robustes boosting durch konvexe optimierung. In D. Wagner et al., editor, Ausgezeichnete Informatikdissertationen 2001, volume D-2 of GI-Edition-Lecture Notes in Informatics (LNI), pages 125–136. Bonner Köllen, 2002.

    Google Scholar 

  149. G. Rätsch, A. Demiriz, and K. Bennett. Sparse regression ensembles in infinite and finite hypothesis spaces. Machine Learning, 48(1–3):193–221, 2002. Special Issue on New Methods for Model Selection and Model Combination. Also NeuroCOLT2 Technical Report NC-TR-2000-085.

    Google Scholar 

  150. G. Rätsch, S. Mika, B. Schölkopf, and K.-R. Müller. Constructing boosting algorithms from SVMs: an application to one-class classification. IEEE PAMI, 24(9), September 2002. In press. Earlier version is GMD TechReport No. 119, 2000.

    Google Scholar 

  151. G. Rätsch, S. Mika, and M. K. Warmuth. On the convergence of leveraging. NeuroCOLT2 Technical Report 98, Royal Holloway College, London, August 2001. A short version appeared in NIPS 14, MIT Press, 2002.

    Google Scholar 

  152. G. Rätsch, S. Mika, and M. K. Warmuth. On the convergence of leveraging. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural information processings systems, volume 14, 2002. In press. Longer version also NeuroCOLT Technical Report NC-TR-2001-098.

    Google Scholar 

  153. G. Rätsch, T. Onoda, and K.-R. Müller. Soft margins for AdaBoost. Machine Learning, 42(3):287–320, March 2001. also NeuroCOLT Technical Report NCTR-1998-021.

    Article  MATH  Google Scholar 

  154. G. Rätsch, B. Schölkopf, A. J. Smola, S. Mika, T. Onoda, and K.-R. Müller. Robust ensemble learning. In A. J. Smola, P. L. Bartlett, B. Schölkopf, and D. Schuurmans, editors, Advances in Large Margin Classifiers, pages 207–219. MIT Press, Cambridge, MA, 2000.

    Google Scholar 

  155. G. Rätsch, A. J. Smola, and S. Mika. Adapting codes and embeddings for polychotomies. In NIPS, volume 15. MIT Press, 2003. accepted.

    Google Scholar 

  156. G. Rätsch, M. Warmuth, S. Mika, T. Onoda, S. Lemm, and K.-R. Müller. Barrier boosting. In Proc. COLT, pages 170–179, San Francisco, 2000. Morgan Kaufmann.

    Google Scholar 

  157. G. Rätsch and M. K. Warmuth. Maximizing the margin with boosting. In Proc. COLT, volume 2375 of LNAI, pages 319–333, Sydney, 2002. Springer.

    Google Scholar 

  158. G. Ridgeway, D. Madigan, and T. Richardson. Boosting methodology for regression problems. In D. Heckerman and J. Whittaker, editors, Proceedings of Artificial Intelligence and Statistics’ 99, pages 152–161, 1999.

    Google Scholar 

  159. J. Rissanen. Modeling by shortest data description. Automatica, 14:465–471, 1978.

    Article  MATH  Google Scholar 

  160. C. P. Robert. The Bayesian Choice: A Decision Theoretic Motivation. Springer Verlag, New York, 1994.

    MATH  Google Scholar 

  161. M. Rochery, R. Schapire, M. Rahim, N. Gupta, G. Riccardi, S. Bangalore, H. Alshawi, and S. Douglas. Combining prior knowledge and boosting for call classification in spoken language dialogue. In International Conference on Accoustics, Speech and Signal Processing, 2002.

    Google Scholar 

  162. R. T. Rockafellar. Convex Analysis. Princeton Landmarks in Mathemathics. Princeton University Press, New Jersey, 1970.

    MATH  Google Scholar 

  163. R. E. Schapire. The strength of weak learnability. Machine Learning, 5(2):197–227, 1990.

    Google Scholar 

  164. R. E. Schapire. Using output codes to boost multiclass learning problems. In Machine Learning: Proceedings of the 14th International Conference, pages 313–321, 1997.

    Google Scholar 

  165. R. E. Schapire. A brief introduction to boosting. In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, 1999.

    Google Scholar 

  166. R. E. Schapire. The boosting approach to machine learning: An overview. In Workshop on Nonlinear Estimation and Classification. MSRI, 2002.

    Google Scholar 

  167. R. E. Schapire, Y. Freund, P. L. Bartlett, and W. S. Lee. Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics, 26(5):1651–1686, October 1998.

    Article  MATH  MathSciNet  Google Scholar 

  168. R. E. Schapire and Y. Singer. Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37(3):297–336, December 1999. also Proceedings of the 14th Workshop on Computational Learning Theory 1998, pages 80-91.

    Article  MATH  Google Scholar 

  169. R. E. Schapire and Y. Singer. Boostexter: A boosting-based system for text categorization. Machine Learning, 39(2/3):135–168, 2000.

    Article  MATH  Google Scholar 

  170. R. E. Schapire, Y. Singer, and A. Singhal. Boosting and rocchio applied to text filtering. In Proc. 21st Annual International Conference on Research and Development in Information Retrieval, 1998.

    Google Scholar 

  171. R. E. Schapire, P. Stone, D. McAllester, M. L. Littman, and J. A. Csirik. Modeling auction price uncertainty using boosting-based conditional density estimations noise. In Proceedings of the Proceedings of the Nineteenth International Conference on Machine Learning, 2002.

    Google Scholar 

  172. B. Schölkopf, R. Herbrich, and A. J. Smola. A generalized representer theorem. In D. P. Helmbold and R. C. Williamson, editors, COLT/EuroCOLT, volume 2111 of LNAI, pages 416–426. Springer, 2001.

    Google Scholar 

  173. B. Schölkopf, J. Platt, J. Shawe-Taylor, A. J. Smola, and R. C. Williamson. Estimating the support of a high-dimensional distribution. TR 87, Microsoft Research, Redmond, WA, 1999.

    Google Scholar 

  174. B. Schölkopf, A. Smola, R. C. Williamson, and P. L. Bartlett. New support vector algorithms. Neural Computation, 12:1207–1245, 2000. also NeuroCOLT Technical Report NC-TR-1998-031.

    Article  Google Scholar 

  175. B. Schölkopf and A. J. Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002.

    Google Scholar 

  176. H. Schwenk and Y. Bengio. Boosting neural networks. Neural Computation, 12(8):1869–1887, 2000.

    Article  Google Scholar 

  177. R. A. Servedio. PAC analogoues of perceptron and winnow via boosting the margin. In Proc. COLT, pages 148–157, San Francisco, 2000. Morgan Kaufmann.

    Google Scholar 

  178. R. A. Servedio. Smooth boosting and learning with malicious noise. In Proceedings of the Fourteenth Annual Conference on Computational Learning Theory, pages 473–489, 2001.

    Google Scholar 

  179. J. Shawe-Taylor, P. L. Bartlett, R. C. Williamson, and M. Anthony. Structural risk minimization over data-dependent hierarchies. IEEE Trans. Inf. Theory, 44(5):1926–1940, September 1998.

    Article  MATH  MathSciNet  Google Scholar 

  180. J. Shawe-Taylor and N. Cristianini. Further results on the margin distribution. In Proceedings of the twelfth Conference on Computational Learning Theory, pages 278–285, 1999.

    Google Scholar 

  181. J. Shawe-Taylor and N. Cristianini. On the genralization of soft margin algorithms. Technical Report NC-TR-2000-082, NeuroCOLT2, June 2001.

    Google Scholar 

  182. J. Shawe-Taylor and G. Karakoulas. Towards a strategy for boosting regressors. In A. J. Smola, P. L. Bartlett, B. Schölkopf, and D. Schuurmans, editors, Advances in Large Margin Classifiers, pages 247–258, Cambridge, MA, 2000. MIT Press.

    Google Scholar 

  183. Y. Singer. Leveraged vector machines. In S. A. Solla, T. K. Leen, and K.-R. Müller, editors, Advances in Neural Information Processing Systems, volume 12, pages 610–616. MIT Press, 2000.

    Google Scholar 

  184. D. Tax and R. Duin. Data domain description by support vectors. In M. Verleysen, editor, Proc. ESANN, pages 251–256, Brussels, 1999. D. Facto Press.

    Google Scholar 

  185. F. Thollard, M. Sebban, and P. Ezequel. Boosting density function estimators. In Proc. 13th European Conference on Machine Learning, volume LNAI 2430, pages 431–443, Helsinki, 2002. Springer Verlag.

    Google Scholar 

  186. A. N. Tikhonov and V. Y. Arsenin. Solutions of Ill-posed Problems. W. H. Winston, Washington, D.C., 1977.

    MATH  Google Scholar 

  187. K. Tsuda, M. Sugiyama, and K.-R. Müller. Subspace information criterion for non-quadratic regularizers-model selection for sparse regressors. IEEE Transactions on Neural Networks, 13(1):70–80, 2002.

    Article  Google Scholar 

  188. L. G. Valiant. A theory of the learnable. Communications of the ACM, 27(11):1134–1142, November 1984.

    Article  MATH  Google Scholar 

  189. A. W. van der Vaart and J. A. Wellner. Weak Convergence and Empirical Processes. Springer Verlag, New York, 1996.

    MATH  Google Scholar 

  190. V. N. Vapnik. The nature of statistical learning theory. Springer Verlag, New York, 1995.

    MATH  Google Scholar 

  191. V. N. Vapnik. Statistical Learning Theory. Wiley, New York, 1998.

    MATH  Google Scholar 

  192. V. N. Vapnik and A. Y. Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probab. and its Applications, 16(2):264–280, 1971.

    Article  MathSciNet  MATH  Google Scholar 

  193. J. von Neumann. Zur Theorie der Gesellschaftsspiele. Math. Ann., 100:295–320, 1928.

    Article  MathSciNet  MATH  Google Scholar 

  194. M. A. Walker, O. Rambow, and M. Rogati. Spot: A trainable sentence planner. In Proc. 2nd Annual Meeting of the North American Chapter of the Assiciation for Computational Linguistics, 2001.

    Google Scholar 

  195. R. Zemel and T. Pitassi. A gradient-based boosting algorithm for regression problems. In T. K. Leen, T. G. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems, volume 13, pages 696–702. MIT Press, 2001.

    Google Scholar 

  196. T. Zhang. Statistical behavior and consistency of classification methods based on convex risk minimization. Technical Report RC22155, IBM Research, Yorktown Heights, NY, 2001.

    Google Scholar 

  197. T. Zhang. A general greedy approximation algorithm with applications. In Advances in Neural Information Processing Systems, volume 14. MIT Press, 2002.

    Google Scholar 

  198. T. Zhang. On the dual formulation of regularized linear systems with convex risks. Machine Learning, 46:91–129, 2002.

    Article  MATH  Google Scholar 

  199. T. Zhang. Sequential greedy approximation for certain convex optimization problems. Technical report, IBM T.J. Watson Research Center, 2002.

    Google Scholar 

  200. Z.-H. Zhou, Y. Jiang, Y.-B. Yang, and S.-F. Chen. Lung cancer cell identification based on artificial neural network ensembles. Artificial Intelligence in Medicine, 24(1):25–36, 2002.

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Meir, R., Rätsch, G. (2003). An Introduction to Boosting and Leveraging. In: Mendelson, S., Smola, A.J. (eds) Advanced Lectures on Machine Learning. Lecture Notes in Computer Science(), vol 2600. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36434-X_4

Download citation

  • DOI: https://doi.org/10.1007/3-540-36434-X_4

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-00529-2

  • Online ISBN: 978-3-540-36434-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics