An Introduction to Boosting and Leveraging

Meir, Ron; Rätsch, Gunnar

doi:10.1007/3-540-36434-X_4

Ron Meir³ &
Gunnar Rätsch⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2600))

3669 Accesses
142 Citations

Abstract

We provide an introduction to theoretical and practical aspects of Boosting and Ensemble learning, providing a useful reference for researchers in the field of Boosting as well as for those seeking to enter this fascinating area of research. We begin with a short background concerning the necessary learning theoretical foundations of weak learners and their linear combinations. We then point out the useful connection between Boosting and the Theory of Optimization, which facilitates the understanding of Boosting and later on enables us to move on to new Boosting algorithms, applicable to a broad spectrum of problems. In order to increase the relevance of the paper to practitioners, we have added remarks, pseudo code, “tricks of the trade”, and algorithmic considerations where appropriate. Finally, we illustrate the usefulness of Boosting algorithms by giving an overview of some existing applications. The main ideas are illustrated on the problem of binary classification, although several extensions are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 49.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

S. Abney, R. E. Schapire, and Y. Singer. Boosting applied to tagging and pp attachment. In Proc. of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, 1999.
Google Scholar
H. Akaike. A new look at the statistical model identification. IEEE Trans. Automat. Control, 19(6):716–723, 1974.
Article MATH MathSciNet Google Scholar
E. L. Allwein, R. E. Schapire, and Y. Singer. Reducing multiclass to binary: A unifying approach for margin classifiers. Journal of Machine Learning Research, 1:113–141, 2000.
Article MathSciNet Google Scholar
M. Anthony and P. L. Bartlett. Neural Network Learning: Theoretical Foundations. Cambridge University Press, 1999.
Google Scholar
A. Antos, B. Kégl, T. Linder, and G. Lugosi. Data-dependent margin-based generalization bounds for classification. JMLR, 3:73–98, 2002.
Article Google Scholar
J. A. Aslam. Improving algorithms for boosting. In Proc. COLT, San Francisco, 2000. Morgan Kaufmann.
Google Scholar
F. Audrino and P. Bühlmann. Volatility estimation with functional gradient descent for very high-dimensional financial time series. Journal of Computational Finance., 2002. To appear. See http://www.stat.ethz.ch/~buhlmann/bibliog.html.
J. P. Barnes. Capacity control in boosting using a p-convex hull. Master’s thesis, Australian National University, 1999. supervised by R. C. Williamson.
Google Scholar
P. Bartlett, P. Boucheron, and G. Lugosi. Model selction and error estimation. Machine Learning, 48:85–2002, 2002.
Article MATH Google Scholar
P. L. Bartlett, O. Bousquet, and S. Mendelson. Localized rademacher averages. In Procedings COLT’02, volume 2375 of LNAI, pages 44–58, Sydney, 2002. Springer.
Google Scholar
P. L. Bartlett and S. Mendelson. Rademacher and gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 2002. to appear 10/02.
Google Scholar
E. Bauer and R. Kohavi. An empirical comparison of voting classification algorithm: Bagging, boosting and variants. Machine Learning, 36:105–142, 1999.
Article Google Scholar
H. H. Bauschke and J. M. Borwein. Legendre functions and the method of random Bregman projections. Journal of Convex Analysis, 4:27–67, 1997.
MATH MathSciNet Google Scholar
S. Ben-David, P. Long, and Y. Mansour. Agnostic boosting. In Proceedings of the Fourteenth Annual Conference on Computational Learning Theory, pages 507–516, 2001.
Google Scholar
K. P. Bennett and O. L. Mangasarian. Multicategory separation via linear programming. Optimization Methods and Software, 3:27–39, 1993.
Article Google Scholar
K. P. Bennett, A. Demiriz, and R. Maclin. Exploiting unlabeled data in ensemble methods. In Proc. ICML, 2002.
Google Scholar
K. P. Bennett and O. L. Mangasarian. Robust linear programming discrimination of two linearly inseparable sets. Optimization Methods and Software, 1:23–34, 1992.
Article Google Scholar
A. Bertoni, P. Campadelli, and M. Parodi. A boosting algorithm for regression. In W. Gerstner, A. Germond, M. Hasler, and J.-D. Nicoud, editors, Proceedings ICANN’97, Int. Conf. on Artificial Neural Networks, volume V of LNCS, pages 343–348, Berlin, 1997. Springer.
Google Scholar
D. P. Bertsekas. Nonlinear Programming. Athena Scientific, Belmont, MA, 1995.
MATH Google Scholar
C. M. Bishop. Neural Networks for Pattern Recognition. Oxford University Press, 1995.
Google Scholar
A. Blumer, A. Ehrenfeucht, D. Haussler, and M. Warmuth. Occam’s razor. Information Processing Letters, 24:377–380, 1987.
Article MATH MathSciNet Google Scholar
B. E. Boser, I. M. Guyon, and V. N. Vapnik. A training algorithm for optimal margin classifiers. In D. Haussler, editor, Proceedings of the 5th Annual ACM orkshop on Computational Learning Theory, pages 144–152, 1992.
Google Scholar
P. S. Bradley and O. L. Mangasarian. Feature selection via concave minimization and support vector machines. In Proc. 15th International Conf. on Machine Learning, pages 82–90. Morgan Kaufmann, San Francisco, CA, 1998.
Google Scholar
L. M. Bregman. The relaxation method for finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Computational Math. and Math. Physics, 7:200–127, 1967.
Article Google Scholar
L. Breiman. Bagging predictors. Machine Learning, 26(2):123–140, 1996.
Google Scholar
L. Breiman. Bias, variance, and arcing classifiers. Technical Report 460, Statistics Department, University of California, July 1997.
Google Scholar
L. Breiman. Prediction games and arcing algorithms. Neural Computation, 11(7):1493–1518, 1999. Also Technical Report 504, Statistics Department, University of California Berkeley.
Article Google Scholar
L. Breiman. Some infinity theory for predictor ensembles. Technical Report 577, Berkeley, August 2000.
Google Scholar
L. Breiman, J. Friedman, J. Olshen, and C. Stone. Classification and Regression Trees. Wadsworth, 1984.
Google Scholar
N. Bshouty and D. Gavinsky. On boosting with polynomially bounded distributions. JMLR, pages 107–111, 2002. Accepted.
Google Scholar
P. Buhlmann and B. Yu. Boosting with the l2 loss: Regression and classification. J. Amer. Statist. Assoc., 2002. revised, also Technical Report 605, Stat Dept, UC Berkeley August, 2001.
Google Scholar
C. Campbell and K. P. Bennett. A linear programming approach to novelty detection. In T. K. Leen, T. G. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems, volume 13, pages 395–401. MIT Press, 2001.
Google Scholar
J. Carmichael. Non-intrusive appliance load monitoring system. Epri journal, Electric Power Research Institute, 1990.
Google Scholar
Y. Censor and S. A. Zenios. Parallel Optimization: Theory, Algorithms and Application. Numerical Mathematics and Scientific Computation. Oxford University Press, 1997.
Google Scholar
N. Cesa-Bianchi, A. Krogh, and M. Warmuth. Bounds on approximate steepest descent for likelihood maximization in exponential families. IEEE Transaction on Information Theory, 40(4):1215–1220, July 1994.
Article MATH MathSciNet Google Scholar
O. Chapelle, V. Vapnik, O. Bousquet, and S. Mukherjee. Choosing multiple parameters for support vector machines. Machine Learning, 46(1):131–159, 2002.
Article MATH Google Scholar
S. Chen, D. Donoho, and M. Saunders. Atomic decomposition by basis pursuit. Technical Report 479, Department of Statistics, Stanford University, 1995.
Google Scholar
W. W. Cohen, R. E. Schapire, and Y. Singer. Learning to order things. In Michael I. Jordan, Michael J. Kearns, and Sara A. Solla, editors, Advances in Neural Information Processing Systems, volume 10. The MIT Press, 1998.
Google Scholar
M. Collins, R. E. Schapire, and Y. Singer. Logistic Regression, AdaBoost and Bregman distances. Machine Learning, 48(1–3):253–285, 2002. Special Issue on New Methods for Model Selection and Model Combination.
Article MATH Google Scholar
R. Cominetti and J.-P. Dussault. A stable exponential penalty algorithm with superlinear convergence. J.O.T.A., 83(2), Nov 1994.
Google Scholar
C. Cortes and V. N. Vapnik. Support vector networks. Machine Learning, 20:273–297, 1995.
MATH Google Scholar
T. M. Cover and P. E. Hart. Nearest neighbor pattern classifications. IEEE transaction on information theory, 13(1):21–27, 1967.
Article MATH Google Scholar
D. D. Cox and F. O’sullivan. Asymptotic analysis of penalized likelihood and related estimates. The Annals of Statistics, 18(4):1676–1695, 1990.
Article MATH MathSciNet Google Scholar
K. Crammer and Y. Singer. On the learnability and design of output codes for multiclass problems. In N. Cesa-Bianchi and S. Goldberg, editors, Proc. Colt, pages 35–46, San Francisco, 2000. Morgan Kaufmann.
Google Scholar
N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines. Cambridge University Press, Cambridge, UK, 2000.
Google Scholar
S. Della Pietra, V. Della Pietra, and J. Lafferty. Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(4):380–393, April 1997.
Article Google Scholar
S. Della Pietra, V. Della Pietra, and J. Lafferty. Duality and auxiliary functions for Bregman distances. Technical Report CMU-CS-01-109, School of Computer Science, Carnegie Mellon University, 2001.
Google Scholar
A. Demiriz, K. P. Bennett, and J. Shawe-Taylor. Linear programming boosting via column generation. Journal of Machine Learning Research, 46:225–254, 2002.
Article MATH Google Scholar
M. Dettling and P. Bühlmann. How to use boosting for tumor classification with gene expression data. Preprint. See http://www.stat.ethz.ch/~dettling/boosting, 2002.
L. Devroye, L. Györ., and G. Lugosi. A Probabilistic Theory of Pattern Recognition. Number 31 in Applications of Mathematics. Springer, New York, 1996.
Google Scholar
T. G. Dietterich. An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning, 40(2):139–157, 1999.
Article Google Scholar
T. G. Dietterich and G. Bakiri. Solving multiclass learning problems via errorcorrecting output codes. Journal of Aritifical Intelligence Research, 2:263–286, 1995.
MATH Google Scholar
C. Domingo and O. Watanabe. A modification of AdaBoost. In Proc. COLT, San Francisco, 2000. Morgan Kaufmann.
Google Scholar
H. Drucker, C. Cortes, L. D. Jackel, Y. LeCun, and V. Vapnik. Boosting and other ensemble methods. Neural Computation, 6, 1994.
Google Scholar
H. Drucker, R. E. Schapire, and P. Y. Simard. Boosting performance in neural networks. International Journal of Pattern Recognition and Artificial Intelligence, 7:705–719, 1993.
Article Google Scholar
N. Duffy and D. P. Helmbold. A geometric approach to leveraging weak learners. In P. Fischer and H. U. Simon, editors, Computational Learning Theory: 4th European Conference (EuroCOLT’ 99), pages 18–33, March 1999. Long version to appear in TCS.
Google Scholar
N. Duffy and D. P. Helmbold. Boosting methods for regression. Technical report, Department of Computer Science, University of Santa Cruz, 2000.
Google Scholar
N. Duffy and D. P. Helmbold. Leveraging for regression. In Proc. COLT, pages 208–219, San Francisco, 2000. Morgan Kaufmann.
Google Scholar
N. Duffy and D. P. Helmbold. Potential boosters? In S. A. Solla, T. K. Leen, and K.-R. Müller, editors, Advances in Neural Information Processing Systems, volume 12, pages 258–264. MIT Press, 2000.
Google Scholar
G. Escudero, L. Màrquez, and G. Rigau. Boosting applied to word sense disambiguation. In LNAI 1810: Proceedings of the 12th European Conference on Machine Learning, ECML, pages 129–141, Barcelona, Spain, 2000.
Google Scholar
W. Feller. An Introduction to Probability Theory and its Applications. Wiley, Chichester, third edition, 1968.
Google Scholar
D. H. Fisher, Jr., editor. Improving regressors using boosting techniques, 1997.
Google Scholar
M. Frean and T. Downs. A simple cost function for boosting. Technical report, Dep. of Computer Science and Electrical Engineering, University of Queensland, 1998.
Google Scholar
Y. Freund. Boosting a weak learning algorithm by majority. Information and Computation, 121(2):256–285, September 1995.
Article MATH MathSciNet Google Scholar
Y. Freund. An adaptive version of the boost by majority algorithm. Machine Learning, 43(3):293–318, 2001.
Article MATH Google Scholar
Y. Freund, R. Iyer, R. E. Schapire, and Y. Singer. An efficient boosting algorithm for combining preferences. In Proc. ICML, 1998.
Google Scholar
Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. In EuroCOLT: European Conference on Computational Learning Theory. LNCS, 1994.
Google Scholar
Y. Freund and R. E. Schapire. Experiments with a new boosting algorithm. In Proc. 13th International Conference on Machine Learning, pages 148–146. Morgan Kaufmann, 1996.
Google Scholar
Y. Freund and R. E. Schapire. Game theory, on-line prediction and boosting. In Proc. COLT, pages 325–332, New York, NY, 1996. ACM Press.
Google Scholar
Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119–139, 1997.
Article MATH MathSciNet Google Scholar
Y. Freund and R. E. Schapire. Adaptive game playing using multiplicative weights. Games and Economic Behavior, 29:79–103, 1999.
Article MATH MathSciNet Google Scholar
Y. Freund and R. E. Schapire. A short introduction to boosting. Journal of Japanese Society for Artificial Intelligence, 14(5):771–780, September 1999. Appeared in Japanese, translation by Naoki Abe.
Google Scholar
J. Friedman. Stochastic gradient boosting. Technical report, Stanford University, March 1999.
Google Scholar
J. Friedman, T. Hastie, and R. J. Tibshirani. Additive logistic regression: a statistical view of boosting. Annals of Statistics, 2:337–374, 2000. with discussion pp.375-407, also Technical Report at Department of Statistics, Sequoia Hall, Stanford University.
Article MathSciNet Google Scholar
J. H. Friedman. On bias, variance, 0/1-loss, and the corse of dimensionality. In Data Mining and Knowledge Discovery, volume I, pages 55–77. Kluwer Academic Publishers, 1997.
Article Google Scholar
J. H. Friedman. Greedy function approximation. Technical report, Department of Statistics, Stanford University, February 1999.
Google Scholar
K. R. Frisch. The logarithmic potential method of convex programming. Memorandum, University Institute of Economics, Oslo, May 13 1955.
Google Scholar
T. Graepel, R. Herbrich, B. Schölkopf, A. J. Smola, P. L. Bartlett, K.-R. Müller, K. Obermayer, and R. C. Williamson. Classification on proximity data with LPmachines. In D. Willshaw and A. Murray, editors, Proceedings of ICANN’99, volume 1, pages 304–309. IEE Press, 1999.
Google Scholar
Y. Grandvalet. Bagging can stabilize without reducing variance. In ICANN’01, Lecture Notes in Computer Science. Springer, 2001.
Google Scholar
Y. Grandvalet, F. D’alché-Buc, and C. Ambroise. Boosting mixture models for semi-supervised tasks. In Proc. ICANN, Vienna, Austria, 2001.
Google Scholar
A. J. Grove and D. Schuurmans. Boosting in the limit: Maximizing the margin of learned ensembles. In Proceedings of the Fifteenth National Conference on Artifical Intelligence, 1998.
Google Scholar
V. Guruswami and A. Sahai. Multiclass learning, boosing, and error-correcting codes. In Proc. of the twelfth annual conference on Computational learning theory, pages 145–155, New York, USA, 1999. ACM Press.
Google Scholar
W. Hart. Non-intrusive appliance load monitoring. Proceedings of the IEEE, 80(12), 1992.
Google Scholar
M. Haruno, S. Shirai, and Y. Ooyama. Using decision trees to construct a practical parser. Machine Learning, 34:131–149, 1999.
Article MATH Google Scholar
T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: data mining, inference and prediction. Springer series in statistics. Springer, New York, N.Y., 2001.
MATH Google Scholar
T. J. Hastie and R. J. Tibshirani. Generalized Additive Models, volume 43 of Monographs on Statistics and Applied Probability. Chapman & Hall, London, 1990.
Google Scholar
D. Haussler. Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications. Information and Computation, 100:78–150, 1992.
Article MATH MathSciNet Google Scholar
S. S. Haykin. Neural Networks: A Comprehensive Foundation. Prentice-Hall, second edition, 1998.
Google Scholar
D. P. Helmbold, K. Kivinen, and M. K. Warmuth. Relative loss bounds for single neurons. IEEE Transactions on Neural Networks, 10(6):1291–1304, 1999.
Article Google Scholar
R. Herbrich. Learning Linear Classifiers: Theory and Algorithms, volume 7 of Adaptive Computation and Machine Learning. MIT Press, 2002.
Google Scholar
R. Herbrich, T. Graepel, and J. Shawe-Taylor. Sparsity vs. large margins for linear classifiers. In Proc. COLT, pages 304–308, San Francisco, 2000. Morgan Kaufmann.
Google Scholar
R. Herbrich and R. Williamson. Algorithmic luckiness. JMLR, 3:175–212, 2002.
Article MathSciNet Google Scholar
R. Hettich and K. O. Kortanek. Semi-infinite programming: Theory, methods and applications. SIAM Review, 3:380–429, September 1993.
Google Scholar
F. J. Huang, Z.-H. Zhou, H.-J. Zhang, and T. Chen. Pose invariant face recognition. In Proceedings of the 4th IEEE International Conference on Automatic Face and Gesture Recognition, pages 245–250, Grenoble, France, 2000.
Google Scholar
R. D. Iyer, D. D. Lewis, R. E. Schapire, Y. Singer, and A. Singhal. Boosting for document routing. In A. Agah, J. Callan, and E. Rundensteiner, editors, Proceedings of CIKM-00, 9th ACM International Conference on Information and Knowledge Management, pages 70–77, McLean, US, 2000. ACM Press, New York, US.
Google Scholar
W. James and C. Stein. Estimation with quadratic loss. In Proceedings of the Fourth Berkeley Symposium on Mathematics, Statistics and Probability, volume 1, pages 361–380, Berkeley, 1960. University of California Press.
Google Scholar
W. Jiang. Some theoretical aspects of boosting in the presence of noisy data. In Proceedings of the Eighteenth International Conference on Machine Learning, 2001.
Google Scholar
D. S. Johnson and F. P. Preparata. The densest hemisphere problem. Theoretical Computer Science, 6:93–107, 1978.
Article MATH MathSciNet Google Scholar
M. I. Jordan and R. A. Jacobs. Hierarchical mixtures of experts and the em algorithm. Neural Computation, 6(2):181–214, 1994.
Article Google Scholar
M. Kearns and Y. Mansour. On the boosting ability og top-down decision tree learning algorithms. In Proc. 28th ACM Symposium on the Theory of Computing,, pages 459–468. ACM Press, 1996.
Google Scholar
M. Kearns and L. Valiant. Cryptographic limitations on learning Boolean formulae and finite automata. Journal of the ACM, 41(1):67–95, January 1994.
Article MATH MathSciNet Google Scholar
M. J. Kearns and U. V. Vazirani. An Introduction to Computational Learning Theory. MIT Press, 1994.
Google Scholar
G. S. Kimeldorf and G. Wahba. Some results on Tchebycheffian spline functions. J. Math. Anal. Applic., 33:82–95, 1971.
Article MATH MathSciNet Google Scholar
J. Kivinen and M. Warmuth. Boosting as entropy projection. In Proc. 12th Annu. Conference on Comput. Learning Theory, pages 134–144. ACM Press, New York, NY, 1999.
Google Scholar
J. Kivinen, M. Warmuth, and P. Auer. The perceptron algorithm vs. winnow: Linear vs. logarithmic mistake bounds when few input variables are relevant. Special issue of Artificial Intelligence, 97(1–2):325–343, 1997.
MATH MathSciNet Google Scholar
J. Kivinen and M. K. Warmuth. Additive versus exponentiated gradient updates for linear prediction. Information and Computation, 132(1):1–64, 1997.
Article MATH MathSciNet Google Scholar
K. C. Kiwiel. Relaxation methods for strictly convex regularizations of piecewise linear programs. Applied Mathematics and Optimization, 38:239–259, 1998.
Article MATH MathSciNet Google Scholar
V. Koltchinksii and D. Panchenko. Empirical margin distributions and bounding the generalization error of combined classifiers. Ann. Statis., 30(1), 2002.
Google Scholar
A. Krieger, A. Wyner, and C. Long. Boosting noisy data. In Proceedings, 18th ICML. Morgan Kaufmann, 2001.
Google Scholar
J. Lafferty. Additive models, boosting, and inference for generalized divergences. In Proc. 12th Annu. Conf. on Comput. Learning Theory, pages 125–133, New York, NY, 1999. ACM Press.
Google Scholar
G. Lebanon and J. Lafferty. Boosting and maximum likelihood for exponential models. In Advances in Neural information processings systems, volume 14, 2002. to appear. Longer version also NeuroCOLT Technical Report NC-TR-2001-098.
Google Scholar
Y. A. LeCun, L. D. Jackel, L. Bottou, A. Brunot, C. Cortes, J. S. Denker, H. Drucker, I. Guyon, U. A. Müller, E. Säckinger, P. Y. Simard, and V. N. Vapnik. Comparison of learning algorithms for handwritten digit recognition. In F. Fogelman-Soulié and P. Gallinari, editors, Proceedings ICANN’95-International Conference on Artificial Neural Networks, volume II, pages 53–60, Nanterre, France, 1995. EC2.
Google Scholar
M. Leshno, V. Lin, A. Pinkus, and S. Schocken. Multilayer Feedforward Networks with a Nonpolynomial Activation Function Can Approximate any Function. Neural Networks, 6:861–867, 1993.
Article Google Scholar
N. Littlestone, P. M. Long, and M. K. Warmuth. On-line learning of linear functions. Journal of Computational Complexity, 5:1–23, 1995. Earlier version is Technical Report CRL-91-29 at UC Santa Cruz.
Article MATH MathSciNet Google Scholar
D. G. Luenberger. Linear and Nonlinear Programming. Addison-Wesley Publishing Co., Reading, second edition, May 1984. Reprinted with corrections in May, 1989.
Google Scholar
Gábor Lugosi and Nicolas Vayatis. A consistent strategy for boosting algorithms. In Proceedings of the Annual Conference on Computational Learning Theory, volume 2375 of LNAI, pages 303–318, Sydney, February 2002. Springer.
Google Scholar
Z.-Q. Luo and P. Tseng. On the convergence of coordinate descent method for convex differentiable minimization. Journal of Optimization Theory and Applications, 72(1):7–35, 1992.
Article MATH MathSciNet Google Scholar
S. Mallat and Z. Zhang. Matching Pursuits with time-frequency dictionaries. IEEE Transactions on Signal Processing, 41(12):3397–3415, December 1993.
Article MATH Google Scholar
O. L. Mangasarian. Linear and nonlinear separation of patterns by linear programming. Operations Research, 13:444–452, 1965.
Article MATH MathSciNet Google Scholar
O. L. Mangasarian. Arbitrary-norm separating plane. Operation Research Letters, 24(1):15–23, 1999.
Article MATH MathSciNet Google Scholar
S. Mannor and R. Meir. Geometric bounds for generlization in boosting. In Proceedings of the Fourteenth Annual Conference on Computational Learning Theory, pages 461–472, 2001.
Google Scholar
S. Mannor and R. Meir. On the existence of weak learners and applications to boosting. Machine Learning, 48(1–3):219–251, 2002.
Google Scholar
S. Mannor, R. Meir, and T. Zhang. The consistency of greedy algorithms for classification. In Procedings COLT’02, volume 2375 of LNAI, pages 319–333, Sydney, 2002. Springer.
Google Scholar
L. Mason. Margins and Combined Classifiers. PhD thesis, Australian National University, September 1999.
Google Scholar
L. Mason, P. L. Bartlett, and J. Baxter. Improved generalization through explicit optimization of margins. Technical report, Department of Systems Engineering, Australian National University, 1998.
Google Scholar
L. Mason, J. Baxter, P. L. Bartlett, and M. Frean. Functional gradient techniques for combining hypotheses. In A. J. Smola, P. L. Bartlett, B. Schölkopf, and C. Schuurmans, editors, Advances in Large Margin Classifiers. MIT Press, Cambridge, MA, 1999.
Google Scholar
L. Mason, J. Baxter, P. L. Bartlett, and M. Frean. Functional gradient techniques for combining hypotheses. In A. J. Smola, P. L. Bartlett, B. Schölkopf, and D. Schuurmans, editors, Advances in Large Margin Classifiers, pages 221–247. MIT Press, Cambridge, MA, 2000.
Google Scholar
J. Matoušek. Geometric Discrepancy: An Illustrated Guide. Springer Verlag, 1999.
Google Scholar
R. Meir, R. El-Yaniv, and Shai Ben-David. Localized boosting. In Proc. COLT, pages 190–199, San Francisco, 2000. Morgan Kaufmann.
Google Scholar
R. Meir and T. Zhang. Data-dependent bounds for bayesian mixture models. unpublished manuscript, 2002.
Google Scholar
J. Mercer. Functions of positive and negative type and their connection with the theory of integral equations. Philos. Trans. Roy. Soc. London, A 209:415–446, 1909.
Article Google Scholar
S. Merler, C. Furlanello, B. Larcher, and A. Sboner. Tuning cost-sensitive boosting and its application to melanoma diagnosis. In J. Kittler and F. Roli, editors, Proceedings of the 2nd Internationa Workshop on Multiple Classifier Systems MCS2001, volume 2096 of LNCS, pages 32–42. Springer, 2001.
Google Scholar
J. Moody. The effective number of parameters: An analysis of generalization and regularization in non-linear learning systems. In S. J. Hanson J. Moody and R. P. Lippman, editors, Advances in Neural information processings systems, volume 4, pages 847–854, San Mateo, CA, 1992. Morgan Kaufman.
Google Scholar
K.-R. Müller, S. Mika, G. Rätsch, K. Tsuda, and B. Schölkopf. An introduction to kernel-based learning algorithms. IEEE Transactions on Neural Networks, 12(2):181–201, 2001.
Article Google Scholar
N. Murata, S. Amari, and S. Yoshizawa. Network information criterion-determining the number of hidden units for an artificial neural network model. IEEE Transactions on Neural Networks, 5:865–872, 1994.
Article Google Scholar
S. Nash and A. Sofer. Linear and Nonlinear Programming. McGraw-Hill, New York, NY, 1996.
Google Scholar
Richard Nock and Patrice Lefaucheur. A robust boosting algorithm. In Proc. 13th European Conference on Machine Learning, volume LNAI 2430, Helsinki, 2002. Springer Verlag.
Google Scholar
T. Onoda, G. Rätsch, and K.-R. Müller. An asymptotic analysis of AdaBoost in the binary classification case. In L. Niklasson, M. Bodén, and T. Ziemke, editors, Proc. of the Int. Conf. on Artificial Neural Networks (ICANN’98), pages 195–200, March 1998.
Google Scholar
T. Onoda, G. Rätsch, and K.-R. Müller. A non-intrusive monitoring system for household electric appliances with inverters. In H. Bothe and R. Rojas, editors, Proc. of NC’2000, Berlin, 2000. ICSC Academic Press Canada/Switzerland.
Google Scholar
J. O’sullivan, J. Langford, R. Caruana, and A. Blum. Featureboost: A metalearning algorithm that improves model robustness. In Proceedings, 17th ICML. Morgan Kaufmann, 2000.
Google Scholar
N. Oza and S. Russell. Experimental comparisons of online and batch versions of bagging and boosting. In Proc. KDD-01, 2001.
Google Scholar
R. El-Yaniv P. Derbeko and R. Meir. Variance optimized bagging. In Proc. 13th European Conference on Machine Learning, 2002.
Google Scholar
T. Poggio and F. Girosi. Regularization algorithms for learning that are equivalent to multilayer networks. Science, 247:978–982, 1990.
Article MathSciNet Google Scholar
J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1992.
Google Scholar
J. R. Quinlan. Boosting first-order learning. Lecture Notes in Computer Science, 1160:143, 1996.
Google Scholar
G. Rätsch. Ensemble learning methods for classification. Master’s thesis, Dep. of Computer Science, University of Potsdam, April 1998. In German.
Google Scholar
G. Rätsch. Robust Boosting via Convex Optimization. PhD thesis, University of Potsdam, Computer Science Dept., August-Bebel-Str. 89, 14482 Potsdam, Germany, October 2001.
Google Scholar
G. Rätsch. Robustes boosting durch konvexe optimierung. In D. Wagner et al., editor, Ausgezeichnete Informatikdissertationen 2001, volume D-2 of GI-Edition-Lecture Notes in Informatics (LNI), pages 125–136. Bonner Köllen, 2002.
Google Scholar
G. Rätsch, A. Demiriz, and K. Bennett. Sparse regression ensembles in infinite and finite hypothesis spaces. Machine Learning, 48(1–3):193–221, 2002. Special Issue on New Methods for Model Selection and Model Combination. Also NeuroCOLT2 Technical Report NC-TR-2000-085.
Google Scholar
G. Rätsch, S. Mika, B. Schölkopf, and K.-R. Müller. Constructing boosting algorithms from SVMs: an application to one-class classification. IEEE PAMI, 24(9), September 2002. In press. Earlier version is GMD TechReport No. 119, 2000.
Google Scholar
G. Rätsch, S. Mika, and M. K. Warmuth. On the convergence of leveraging. NeuroCOLT2 Technical Report 98, Royal Holloway College, London, August 2001. A short version appeared in NIPS 14, MIT Press, 2002.
Google Scholar
G. Rätsch, S. Mika, and M. K. Warmuth. On the convergence of leveraging. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural information processings systems, volume 14, 2002. In press. Longer version also NeuroCOLT Technical Report NC-TR-2001-098.
Google Scholar
G. Rätsch, T. Onoda, and K.-R. Müller. Soft margins for AdaBoost. Machine Learning, 42(3):287–320, March 2001. also NeuroCOLT Technical Report NCTR-1998-021.
Article MATH Google Scholar
G. Rätsch, B. Schölkopf, A. J. Smola, S. Mika, T. Onoda, and K.-R. Müller. Robust ensemble learning. In A. J. Smola, P. L. Bartlett, B. Schölkopf, and D. Schuurmans, editors, Advances in Large Margin Classifiers, pages 207–219. MIT Press, Cambridge, MA, 2000.
Google Scholar
G. Rätsch, A. J. Smola, and S. Mika. Adapting codes and embeddings for polychotomies. In NIPS, volume 15. MIT Press, 2003. accepted.
Google Scholar
G. Rätsch, M. Warmuth, S. Mika, T. Onoda, S. Lemm, and K.-R. Müller. Barrier boosting. In Proc. COLT, pages 170–179, San Francisco, 2000. Morgan Kaufmann.
Google Scholar
G. Rätsch and M. K. Warmuth. Maximizing the margin with boosting. In Proc. COLT, volume 2375 of LNAI, pages 319–333, Sydney, 2002. Springer.
Google Scholar
G. Ridgeway, D. Madigan, and T. Richardson. Boosting methodology for regression problems. In D. Heckerman and J. Whittaker, editors, Proceedings of Artificial Intelligence and Statistics’ 99, pages 152–161, 1999.
Google Scholar
J. Rissanen. Modeling by shortest data description. Automatica, 14:465–471, 1978.
Article MATH Google Scholar
C. P. Robert. The Bayesian Choice: A Decision Theoretic Motivation. Springer Verlag, New York, 1994.
MATH Google Scholar
M. Rochery, R. Schapire, M. Rahim, N. Gupta, G. Riccardi, S. Bangalore, H. Alshawi, and S. Douglas. Combining prior knowledge and boosting for call classification in spoken language dialogue. In International Conference on Accoustics, Speech and Signal Processing, 2002.
Google Scholar
R. T. Rockafellar. Convex Analysis. Princeton Landmarks in Mathemathics. Princeton University Press, New Jersey, 1970.
MATH Google Scholar
R. E. Schapire. The strength of weak learnability. Machine Learning, 5(2):197–227, 1990.
Google Scholar
R. E. Schapire. Using output codes to boost multiclass learning problems. In Machine Learning: Proceedings of the 14th International Conference, pages 313–321, 1997.
Google Scholar
R. E. Schapire. A brief introduction to boosting. In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, 1999.
Google Scholar
R. E. Schapire. The boosting approach to machine learning: An overview. In Workshop on Nonlinear Estimation and Classification. MSRI, 2002.
Google Scholar
R. E. Schapire, Y. Freund, P. L. Bartlett, and W. S. Lee. Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics, 26(5):1651–1686, October 1998.
Article MATH MathSciNet Google Scholar
R. E. Schapire and Y. Singer. Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37(3):297–336, December 1999. also Proceedings of the 14th Workshop on Computational Learning Theory 1998, pages 80-91.
Article MATH Google Scholar
R. E. Schapire and Y. Singer. Boostexter: A boosting-based system for text categorization. Machine Learning, 39(2/3):135–168, 2000.
Article MATH Google Scholar
R. E. Schapire, Y. Singer, and A. Singhal. Boosting and rocchio applied to text filtering. In Proc. 21st Annual International Conference on Research and Development in Information Retrieval, 1998.
Google Scholar
R. E. Schapire, P. Stone, D. McAllester, M. L. Littman, and J. A. Csirik. Modeling auction price uncertainty using boosting-based conditional density estimations noise. In Proceedings of the Proceedings of the Nineteenth International Conference on Machine Learning, 2002.
Google Scholar
B. Schölkopf, R. Herbrich, and A. J. Smola. A generalized representer theorem. In D. P. Helmbold and R. C. Williamson, editors, COLT/EuroCOLT, volume 2111 of LNAI, pages 416–426. Springer, 2001.
Google Scholar
B. Schölkopf, J. Platt, J. Shawe-Taylor, A. J. Smola, and R. C. Williamson. Estimating the support of a high-dimensional distribution. TR 87, Microsoft Research, Redmond, WA, 1999.
Google Scholar
B. Schölkopf, A. Smola, R. C. Williamson, and P. L. Bartlett. New support vector algorithms. Neural Computation, 12:1207–1245, 2000. also NeuroCOLT Technical Report NC-TR-1998-031.
Article Google Scholar
B. Schölkopf and A. J. Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002.
Google Scholar
H. Schwenk and Y. Bengio. Boosting neural networks. Neural Computation, 12(8):1869–1887, 2000.
Article Google Scholar
R. A. Servedio. PAC analogoues of perceptron and winnow via boosting the margin. In Proc. COLT, pages 148–157, San Francisco, 2000. Morgan Kaufmann.
Google Scholar
R. A. Servedio. Smooth boosting and learning with malicious noise. In Proceedings of the Fourteenth Annual Conference on Computational Learning Theory, pages 473–489, 2001.
Google Scholar
J. Shawe-Taylor, P. L. Bartlett, R. C. Williamson, and M. Anthony. Structural risk minimization over data-dependent hierarchies. IEEE Trans. Inf. Theory, 44(5):1926–1940, September 1998.
Article MATH MathSciNet Google Scholar
J. Shawe-Taylor and N. Cristianini. Further results on the margin distribution. In Proceedings of the twelfth Conference on Computational Learning Theory, pages 278–285, 1999.
Google Scholar
J. Shawe-Taylor and N. Cristianini. On the genralization of soft margin algorithms. Technical Report NC-TR-2000-082, NeuroCOLT2, June 2001.
Google Scholar
J. Shawe-Taylor and G. Karakoulas. Towards a strategy for boosting regressors. In A. J. Smola, P. L. Bartlett, B. Schölkopf, and D. Schuurmans, editors, Advances in Large Margin Classifiers, pages 247–258, Cambridge, MA, 2000. MIT Press.
Google Scholar
Y. Singer. Leveraged vector machines. In S. A. Solla, T. K. Leen, and K.-R. Müller, editors, Advances in Neural Information Processing Systems, volume 12, pages 610–616. MIT Press, 2000.
Google Scholar
D. Tax and R. Duin. Data domain description by support vectors. In M. Verleysen, editor, Proc. ESANN, pages 251–256, Brussels, 1999. D. Facto Press.
Google Scholar
F. Thollard, M. Sebban, and P. Ezequel. Boosting density function estimators. In Proc. 13th European Conference on Machine Learning, volume LNAI 2430, pages 431–443, Helsinki, 2002. Springer Verlag.
Google Scholar
A. N. Tikhonov and V. Y. Arsenin. Solutions of Ill-posed Problems. W. H. Winston, Washington, D.C., 1977.
MATH Google Scholar
K. Tsuda, M. Sugiyama, and K.-R. Müller. Subspace information criterion for non-quadratic regularizers-model selection for sparse regressors. IEEE Transactions on Neural Networks, 13(1):70–80, 2002.
Article Google Scholar
L. G. Valiant. A theory of the learnable. Communications of the ACM, 27(11):1134–1142, November 1984.
Article MATH Google Scholar
A. W. van der Vaart and J. A. Wellner. Weak Convergence and Empirical Processes. Springer Verlag, New York, 1996.
MATH Google Scholar
V. N. Vapnik. The nature of statistical learning theory. Springer Verlag, New York, 1995.
MATH Google Scholar
V. N. Vapnik. Statistical Learning Theory. Wiley, New York, 1998.
MATH Google Scholar
V. N. Vapnik and A. Y. Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probab. and its Applications, 16(2):264–280, 1971.
Article MathSciNet MATH Google Scholar
J. von Neumann. Zur Theorie der Gesellschaftsspiele. Math. Ann., 100:295–320, 1928.
Article MathSciNet MATH Google Scholar
M. A. Walker, O. Rambow, and M. Rogati. Spot: A trainable sentence planner. In Proc. 2nd Annual Meeting of the North American Chapter of the Assiciation for Computational Linguistics, 2001.
Google Scholar
R. Zemel and T. Pitassi. A gradient-based boosting algorithm for regression problems. In T. K. Leen, T. G. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems, volume 13, pages 696–702. MIT Press, 2001.
Google Scholar
T. Zhang. Statistical behavior and consistency of classification methods based on convex risk minimization. Technical Report RC22155, IBM Research, Yorktown Heights, NY, 2001.
Google Scholar
T. Zhang. A general greedy approximation algorithm with applications. In Advances in Neural Information Processing Systems, volume 14. MIT Press, 2002.
Google Scholar
T. Zhang. On the dual formulation of regularized linear systems with convex risks. Machine Learning, 46:91–129, 2002.
Article MATH Google Scholar
T. Zhang. Sequential greedy approximation for certain convex optimization problems. Technical report, IBM T.J. Watson Research Center, 2002.
Google Scholar
Z.-H. Zhou, Y. Jiang, Y.-B. Yang, and S.-F. Chen. Lung cancer cell identification based on artificial neural network ensembles. Artificial Intelligence in Medicine, 24(1):25–36, 2002.
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, Technion, Haifa, 32000, Israel
Ron Meir
Research School of Information Sciences & Engineering The Australian National University, Canberra, ACT 0200, Australia
Gunnar Rätsch

Authors

Ron Meir
View author publications
You can also search for this author in PubMed Google Scholar
Gunnar Rätsch
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

RSISE, The Australian National University, 0200, Canberra, ACT, Australia
Shahar Mendelson
Research School for Information Sciences and Engineering, The Australian National University, 0200, Canberra, ACT, Australia
Alexander J. Smola

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Meir, R., Rätsch, G. (2003). An Introduction to Boosting and Leveraging. In: Mendelson, S., Smola, A.J. (eds) Advanced Lectures on Machine Learning. Lecture Notes in Computer Science(), vol 2600. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36434-X_4

Download citation

DOI: https://doi.org/10.1007/3-540-36434-X_4
Published: 30 January 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00529-2
Online ISBN: 978-3-540-36434-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics