Advertisement

Machine Learning

, Volume 48, Issue 1–3, pp 219–251 | Cite as

On the Existence of Linear Weak Learners and Applications to Boosting

  • Shie Mannor
  • Ron Meir
Article

Abstract

We consider the existence of a linear weak learner for boosting algorithms. A weak learner for binary classification problems is required to achieve a weighted empirical error on the training set which is bounded from above by 1/2 − γ, γ > 0, for any distribution on the data set. Moreover, in order that the weak learner be useful in terms of generalization, γ must be sufficiently far from zero. While the existence of weak learners is essential to the success of boosting algorithms, a proof of their existence based on a geometric point of view has been hitherto lacking. In this work we show that under certain natural conditions on the data set, a linear classifier is indeed a weak learner. Our results can be directly applied to generalization error bounds for boosting, leading to closed-form bounds. We also provide a procedure for dynamically determining the number of boosting iterations required to achieve low generalization error. The bounds established in this work are based on the theory of geometric discrepancy.

boosting weak learner geometric discrepancy 

References

  1. Alexander, R. (1975). Generalized sums of distances. Pacific J. Math., 56, 297–304.Google Scholar
  2. Alexander, R. (1990). Geometric methods in the study of irregularities of distribution. Combinatorica, 10:2, 115–136.Google Scholar
  3. Alexander, R. (1991). Principles of a new method in the study of irregularities of distribution. Invent. Math., 103, 279–296.Google Scholar
  4. Alexander, R. (1994). The effect of dimension on certain geometric problems of irregularities of distribution. Pac. J. Math., 165:1, 1–15.Google Scholar
  5. Anthony, M., & Bartlett, P. L. (1999). Neural Network Learning; Theoretical Foundations. Cambridge: Cambridge University Press.Google Scholar
  6. Bartlett, P., & Ben-David, S. (1999). On the hardness of learning with neural networks. In Proceedings of the Fourth European Conference on Computational Learning Theory'99.Google Scholar
  7. Breiman, L. (1996). Bagging predictors. Machine Learning, 26:2, 123–140.Google Scholar
  8. Breiman, L. (1998). Prediction games and arcing algorithms. Technical Report 504, Berkeley.Google Scholar
  9. Cristianini, N., & Shawe-Taylor. J. (2000). An introduction to support vector machines: And other kernel-based learning methods. Cambridge, England: Cambridge University Press.Google Scholar
  10. Freund, Y. (1995). Boosting a weak learning algorithm by majority. Information and Computation, 121, 256–285.Google Scholar
  11. Freund, Y., & Schapire, R. E. (1996). Experiments with a new boosting algorithm. In Proceeding of the Thirteenth International Conference on Machine Learning (pp. 148-156).Google Scholar
  12. Freund,Y., & Schapire, R. E. (1996). Game theory, on-line prediction and boosting. In Proceeding of the Thirteenth International Conference on Machine Learning (pp. 148-156).Google Scholar
  13. Friedman, J. Hastie, T., & Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting. The Annals of Statistics, 38:2, 337–374.Google Scholar
  14. Gradshteyn, I. S., & Ryzhik, I. M. (1994). Tables of Integrals, Series and Products. 5th edn. New York: Academic Press.Google Scholar
  15. Johnson, D. S., & Preparata, F. P. (1978). The densest hemisphere problem. Theoretical Computer Science, 6 93–107.Google Scholar
  16. Jordan, M. I., & Jacobs, R. A. (1994). Hierarchical mixtures of experts and the EM algorithm. Neural Computation, 6:2, 181–214.Google Scholar
  17. Kearns, M., & Mansour, Y. (1996). On the boosting ability og top-down decision tree learning algorithms. In Proc. 28th ACM Symposium on the Theory of Computing (pp. 459–468). New York: ACM Press.Google Scholar
  18. Koltchinskii, V., Panchenko, D., & Lozano, F. (2001). Some new bounds on the generlization error of combined classifiers. In T. Dietterich (Eds.), Advances in neural information processing systems 14, Boston, MIT Press.Google Scholar
  19. Mason, L., Baxter, J., Bartlett, P., & Frean, M. (2000). Functional gradient techniques for combining hypotheses. In B. Schölkopf, A. Smola, P. Bartlett, & D. Schuurmans (Eds.), Advances in Large Margin Classifiers. Boston: MIT Press.Google Scholar
  20. Matoušek, J. (1995). Tight upper bound on the discrepancy of half-spaces. Discrete & Computational Geometry, 13, 593–601.Google Scholar
  21. Matoušek, J. (1999). Geometric Discrepancy: An Illustrated Guide. New York: Springer Verlag.Google Scholar
  22. Meir, R. El-Yaniv, R., & Ben-David, S. (2000). Localized boosting. In N. Cesa-Bianchi and S. Goldman (Eds.), Proc. Thirteenth Annual Conference on Computaional Learning Theory (pp. 190–199). San Francisco, CA: Morgan Kaufman.Google Scholar
  23. Mannor S., & Meir, R. (2001). Weak learners and improved rates of convergence in boosting. In T. Dietterich (Ed.), Advances in Neural Information Processing Systems 14, Boston. MIT Press.Google Scholar
  24. Motwani R., & Raghavan, P. (1995). Randomized Algorithms. Cambridge, England: Cambridge University Press.Google Scholar
  25. Santaló, L. A. (1976). Integral geometry and geometric probability. Reading, MA: Addison-Wesley.Google Scholar
  26. Schoenberg, I. J. (1937). On certain metric spaces arising from euclidean spaces by change of metric and their embedding into hilbert space. Ann. Math., 38, 787–793.Google Scholar
  27. Schapire, R. (1990). The strength of weak learnability. Machine Learning, 5:2, 197–227.Google Scholar
  28. Schapire, R. E., Freund, Y., Bartlett, P., & Lee., W. S. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics, 26:5, 1651–1686.Google Scholar
  29. Schapire, R. E., & Singer, Y. (1999). Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37:3, 297–336.Google Scholar
  30. Vapnik, V. N. (1982). Estimation of dependences based on empirical data. New York: Springer Verlag.Google Scholar
  31. Vapnik, V. N. (1998). Statistical learning theory. New York: Wiley Interscience.Google Scholar
  32. Vapnik, V., & Chervonenkis, A. (1971). On the uniform convergence of relative frequencies of events to probabilities. Theory of Probability and its Applications, 16, 264–280.Google Scholar
  33. Vidyasagar, M. (1996). A Theory of Learning and Generalization. New York: Springer Verlag.Google Scholar

Copyright information

© Kluwer Academic Publishers 2002

Authors and Affiliations

  • Shie Mannor
    • 1
  • Ron Meir
    • 1
  1. 1.Department of Electrical EngineeringTechnionHaifaIsrael

Personalised recommendations