Machine Learning

, Volume 5, Issue 2, pp 197–227 | Cite as

The strength of weak learnability

  • Robert E. Schapire
Article

Abstract

This paper addresses the problem of improving the accuracy of an hypothesis output by a learning algorithm in the distribution-free (PAC) learning model. A concept class islearnable (orstrongly learnable) if, given access to a source of examples of the unknown concept, the learner with high probability is able to output an hypothesis that is correct on all but an arbitrarily small fraction of the instances. The concept class isweakly learnable if the learner can produce an hypothesis that performs only slightly better than random guessing. In this paper, it is shown that these two notions of learnability are equivalent.

A method is described for converting a weak learning algorithm into one that achieves arbitrarily high accuracy. This construction may have practical applications as a tool for efficiently converting a mediocre learning algorithm into one that performs extremely well. In addition, the construction has some interesting theoretical consequences, including a set of general upper bounds on the complexity of any strong learning algorithm as a function of the allowed error ο.

Keywords

Machine learning learning from examples learnability theory PAC learning polynomial-time identification 

References

  1. Angluin, D. (1980). Finding patterns common to a set of strings.J. of Computer and System Sciences,21, 46–62.Google Scholar
  2. Angluin, D. (1988). Queries and concept learning.Machine Learning,2, 319–342.Google Scholar
  3. Angluin, D. and Valiant, L.G. (1979). Fast probabilistic algorithms for Hamiltonian circuits and matchings.J. Computer and System Sciences,18, 155–193.Google Scholar
  4. Baum, E.B. (1989). On learning a union of half spaces. Unpublished manuscript.Google Scholar
  5. Blumer, A., Ehrenfeucht, a., Haussler, D., and Warmuth, M.K. (1987). Occam's razor.Information Processing Letters,24, 377–380.Google Scholar
  6. Blumer, A., Ehrenfeucht, A., Haussler, D., and Warmuth, M.K. (1989). Learnability and the Vapnik-Chervonenkis dimension.J. of the Association for Computing Machinery,36, 929–965.Google Scholar
  7. Board, R. and Pitt, L. (1990). On the necessity of Occam algorithms. (In press)Proceedings of the Twenty-second Annual ACM Symposium on Theory of Computing. New York, NY: ACM Press.Google Scholar
  8. Boucheron, S. and Sallantin, J. (1988). Some remarks about space-complexity of learning, and circuit complexity of recognizing.Proceedings of the 1988 Workshop on Computational Learning Theory (pp. 125–138). San Mateo, CA: Morgan Kaufman.Google Scholar
  9. Ehrenfeucht, A. and Haussler, D. (1989). Learning decision trees from random examples.Information and Computation,3, 231–246.Google Scholar
  10. Floyd, S. (1989). Space-bounded learning and the Vapnik-Chervonenkis dimension.Proceedings of the Second Annual Workshop on Computational Learning Theory (pp. 349–364). San Mateo, CA: Morgan Kaufman.Google Scholar
  11. Haussler, D. (1988).Space efficient learning algorithms (Technical Report UCSC-CRL-88–2). Santa Cruz, CA: University of California, Baskin Center for Computer Engineering and Information Sciences.Google Scholar
  12. Haussler, D., Kearns, M., Littlestone, N., and Warmuth, M.K. (1988). Equivalence of models for polynomial learnability.Proceedings of the 1988 Workshop on Computational Learning Theory (pp. 42–55). San Mateo, CA: Morgan Kaufman.Google Scholar
  13. Haussler, D., Littlestone, N., and Warmuth, M.K. (1987). Expected mistake bounds for on-line learning algorithms. Unpublished manuscript.Google Scholar
  14. Haussler, D., Littlestone, N., and Warmuth, M.K. (1988). Predicting {0, 1}-functions on randomly drawn points.Proceedings of the Twenty-Ninth Annual Symposium on Foundations of Computer Science (pp. 100–109). Washington, DC: IEEE Computer Society Press.Google Scholar
  15. Helmbold, D., Sloan, R., and Warmuth, M.K. (1990). Learning nested differences of intersection-closed concept classes.Machine Learning, 5, xxx-xxx.Google Scholar
  16. Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables.J. of the American Statistical Association,58, 13–30.Google Scholar
  17. Kearns, M. (1988). Thoughts on hypothesis boosting. Unpublished manuscript.Google Scholar
  18. Kearns, M. (1989).The Computational Complexity of Machine Learning. Doctoral dissertation, Department of Computer Science, Harvard University, Cambridge, MA.Google Scholar
  19. Kearns, M., Li, M., Pitt, L., and Valiant, L. (1987). On the learnability of Boolean formulae.Proceedings of the Nineteenth Annual ACM Symposium on Theory of Computing (pp. 285–295). New York, NY: ACM Press.Google Scholar
  20. Kearns, M. and Valiant, L.G. (1988).Learning Boolean formulae or finite automata is as hard as factoring (Technical Report TR-14–88). Cambridge, MA: Harvard University Aiken Computation Laboratory.Google Scholar
  21. Kearns, M. and Valiant, L.G. (1989). Cryptographic limitations on learning Boolean formulae and finite automata.Proceedings of the Twenty-First Annual ACM Symposium on Theory of Computing (pp. 433–444). New York, NY: ACM Press.Google Scholar
  22. Pitt, L. and Valiant, L.G. (1988). Computational limitations on learning from examples.J. of the Association for Computing Machinery,35, 965–984.Google Scholar
  23. Rivest, R.L. (1987). Learning decision lists.Machine Learning,2, 229–246.Google Scholar
  24. Schapire, R.E. (1989). Pattern languages are not learnable. Unpublished manuscript.Google Scholar
  25. Valiant, L.G. (1984). A theory of the learnable.Communications of the ACM,27, 1134–1142.Google Scholar

Copyright information

© Kluwer Academic Publishers 1990

Authors and Affiliations

  • Robert E. Schapire
    • 1
  1. 1.MIT Laboratory for Computer ScienceCambridge

Personalised recommendations