The strength of weak learnability Article DOI :
10.1007/BF00116037

Cite this article as: Schapire, R.E. Mach Learn (1990) 5: 197. doi:10.1007/BF00116037
Abstract This paper addresses the problem of improving the accuracy of an hypothesis output by a learning algorithm in the distribution-free (PAC ) learning model. A concept class islearnable (orstrongly learnable ) if, given access to a source of examples of the unknown concept, the learner with high probability is able to output an hypothesis that is correct on all but an arbitrarily small fraction of the instances. The concept class isweakly learnable if the learner can produce an hypothesis that performs only slightly better than random guessing. In this paper, it is shown that these two notions of learnability are equivalent.

A method is described for converting a weak learning algorithm into one that achieves arbitrarily high accuracy. This construction may have practical applications as a tool for efficiently converting a mediocre learning algorithm into one that performs extremely well. In addition, the construction has some interesting theoretical consequences, including a set of general upper bounds on the complexity of any strong learning algorithm as a function of the allowed error ο.

Keywords Machine learning learning from examples learnability theory PAC learning polynomial-time identification Download to read the full article text

References Angluin, D. (1980). Finding patterns common to a set of strings.

J. of Computer and System Sciences ,

21 , 46–62.

Google Scholar Angluin, D. (1988). Queries and concept learning.

Machine Learning ,

2 , 319–342.

Google Scholar Angluin, D. and Valiant, L.G. (1979). Fast probabilistic algorithms for Hamiltonian circuits and matchings.

J. Computer and System Sciences ,

18 , 155–193.

Google Scholar Baum, E.B. (1989). On learning a union of half spaces. Unpublished manuscript.

Blumer, A., Ehrenfeucht, a., Haussler, D., and Warmuth, M.K. (1987). Occam's razor.

Information Processing Letters ,

24 , 377–380.

Google Scholar Blumer, A., Ehrenfeucht, A., Haussler, D., and Warmuth, M.K. (1989). Learnability and the Vapnik-Chervonenkis dimension.

J. of the Association for Computing Machinery ,

36 , 929–965.

Google Scholar Board, R. and Pitt, L. (1990). On the necessity of Occam algorithms. (In press)

Proceedings of the Twenty-second Annual ACM Symposium on Theory of Computing . New York, NY: ACM Press.

Google Scholar Boucheron, S. and Sallantin, J. (1988). Some remarks about space-complexity of learning, and circuit complexity of recognizing.

Proceedings of the 1988 Workshop on Computational Learning Theory (pp. 125–138). San Mateo, CA: Morgan Kaufman.

Google Scholar Ehrenfeucht, A. and Haussler, D. (1989). Learning decision trees from random examples.

Information and Computation ,

3 , 231–246.

Google Scholar Floyd, S. (1989). Space-bounded learning and the Vapnik-Chervonenkis dimension.

Proceedings of the Second Annual Workshop on Computational Learning Theory (pp. 349–364). San Mateo, CA: Morgan Kaufman.

Google Scholar Haussler, D. (1988).

Space efficient learning algorithms (Technical Report UCSC-CRL-88–2). Santa Cruz, CA: University of California, Baskin Center for Computer Engineering and Information Sciences.

Google Scholar Haussler, D., Kearns, M., Littlestone, N., and Warmuth, M.K. (1988). Equivalence of models for polynomial learnability.

Proceedings of the 1988 Workshop on Computational Learning Theory (pp. 42–55). San Mateo, CA: Morgan Kaufman.

Google Scholar Haussler, D., Littlestone, N., and Warmuth, M.K. (1987). Expected mistake bounds for on-line learning algorithms. Unpublished manuscript.

Haussler, D., Littlestone, N., and Warmuth, M.K. (1988). Predicting {0, 1}-functions on randomly drawn points.

Proceedings of the Twenty-Ninth Annual Symposium on Foundations of Computer Science (pp. 100–109). Washington, DC: IEEE Computer Society Press.

Google Scholar Helmbold, D., Sloan, R., and Warmuth, M.K. (1990). Learning nested differences of intersection-closed concept classes.

Machine Learning , 5, xxx-xxx.

Google Scholar Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables.

J. of the American Statistical Association ,

58 , 13–30.

Google Scholar Kearns, M. (1988). Thoughts on hypothesis boosting. Unpublished manuscript.

Kearns, M. (1989).The Computational Complexity of Machine Learning . Doctoral dissertation, Department of Computer Science, Harvard University, Cambridge, MA.

Kearns, M., Li, M., Pitt, L., and Valiant, L. (1987). On the learnability of Boolean formulae.

Proceedings of the Nineteenth Annual ACM Symposium on Theory of Computing (pp. 285–295). New York, NY: ACM Press.

Google Scholar Kearns, M. and Valiant, L.G. (1988).

Learning Boolean formulae or finite automata is as hard as factoring (Technical Report TR-14–88). Cambridge, MA: Harvard University Aiken Computation Laboratory.

Google Scholar Kearns, M. and Valiant, L.G. (1989). Cryptographic limitations on learning Boolean formulae and finite automata.

Proceedings of the Twenty-First Annual ACM Symposium on Theory of Computing (pp. 433–444). New York, NY: ACM Press.

Google Scholar Pitt, L. and Valiant, L.G. (1988). Computational limitations on learning from examples.

J. of the Association for Computing Machinery ,

35 , 965–984.

Google Scholar Rivest, R.L. (1987). Learning decision lists.

Machine Learning ,

2 , 229–246.

Google Scholar Schapire, R.E. (1989). Pattern languages are not learnable. Unpublished manuscript.

Valiant, L.G. (1984). A theory of the learnable.

Communications of the ACM ,

27 , 1134–1142.

Google Scholar © Kluwer Academic Publishers 1990

Authors and Affiliations 1. MIT Laboratory for Computer Science Cambridge