## Abstract

This paper addresses the problem of improving the accuracy of an hypothesis output by a learning algorithm in the distribution-free (*PAC*) learning model. A concept class is*learnable* (or*strongly learnable*) if, given access to a source of examples of the unknown concept, the learner with high probability is able to output an hypothesis that is correct on all but an arbitrarily small fraction of the instances. The concept class is*weakly learnable* if the learner can produce an hypothesis that performs only slightly better than random guessing. In this paper, it is shown that these two notions of learnability are equivalent.

A method is described for converting a weak learning algorithm into one that achieves arbitrarily high accuracy. This construction may have practical applications as a tool for efficiently converting a mediocre learning algorithm into one that performs extremely well. In addition, the construction has some interesting theoretical consequences, including a set of general upper bounds on the complexity of any strong learning algorithm as a function of the allowed error ο.

## Keywords

Machine learning learning from examples learnability theory PAC learning polynomial-time identification## References

- Angluin, D. (1980). Finding patterns common to a set of strings.
*J. of Computer and System Sciences*,*21*, 46–62.Google Scholar - Angluin, D. and Valiant, L.G. (1979). Fast probabilistic algorithms for Hamiltonian circuits and matchings.
*J. Computer and System Sciences*,*18*, 155–193.Google Scholar - Baum, E.B. (1989). On learning a union of half spaces. Unpublished manuscript.Google Scholar
- Blumer, A., Ehrenfeucht, a., Haussler, D., and Warmuth, M.K. (1987). Occam's razor.
*Information Processing Letters*,*24*, 377–380.Google Scholar - Blumer, A., Ehrenfeucht, A., Haussler, D., and Warmuth, M.K. (1989). Learnability and the Vapnik-Chervonenkis dimension.
*J. of the Association for Computing Machinery*,*36*, 929–965.Google Scholar - Board, R. and Pitt, L. (1990). On the necessity of Occam algorithms. (In press)
*Proceedings of the Twenty-second Annual ACM Symposium on Theory of Computing*. New York, NY: ACM Press.Google Scholar - Boucheron, S. and Sallantin, J. (1988). Some remarks about space-complexity of learning, and circuit complexity of recognizing.
*Proceedings of the 1988 Workshop on Computational Learning Theory*(pp. 125–138). San Mateo, CA: Morgan Kaufman.Google Scholar - Ehrenfeucht, A. and Haussler, D. (1989). Learning decision trees from random examples.
*Information and Computation*,*3*, 231–246.Google Scholar - Floyd, S. (1989). Space-bounded learning and the Vapnik-Chervonenkis dimension.
*Proceedings of the Second Annual Workshop on Computational Learning Theory*(pp. 349–364). San Mateo, CA: Morgan Kaufman.Google Scholar - Haussler, D. (1988).
*Space efficient learning algorithms*(Technical Report UCSC-CRL-88–2). Santa Cruz, CA: University of California, Baskin Center for Computer Engineering and Information Sciences.Google Scholar - Haussler, D., Kearns, M., Littlestone, N., and Warmuth, M.K. (1988). Equivalence of models for polynomial learnability.
*Proceedings of the 1988 Workshop on Computational Learning Theory*(pp. 42–55). San Mateo, CA: Morgan Kaufman.Google Scholar - Haussler, D., Littlestone, N., and Warmuth, M.K. (1987). Expected mistake bounds for on-line learning algorithms. Unpublished manuscript.Google Scholar
- Haussler, D., Littlestone, N., and Warmuth, M.K. (1988). Predicting {0, 1}-functions on randomly drawn points.
*Proceedings of the Twenty-Ninth Annual Symposium on Foundations of Computer Science*(pp. 100–109). Washington, DC: IEEE Computer Society Press.Google Scholar - Helmbold, D., Sloan, R., and Warmuth, M.K. (1990). Learning nested differences of intersection-closed concept classes.
*Machine Learning*, 5, xxx-xxx.Google Scholar - Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables.
*J. of the American Statistical Association*,*58*, 13–30.Google Scholar - Kearns, M. (1988). Thoughts on hypothesis boosting. Unpublished manuscript.Google Scholar
- Kearns, M. (1989).
*The Computational Complexity of Machine Learning*. Doctoral dissertation, Department of Computer Science, Harvard University, Cambridge, MA.Google Scholar - Kearns, M., Li, M., Pitt, L., and Valiant, L. (1987). On the learnability of Boolean formulae.
*Proceedings of the Nineteenth Annual ACM Symposium on Theory of Computing*(pp. 285–295). New York, NY: ACM Press.Google Scholar - Kearns, M. and Valiant, L.G. (1988).
*Learning Boolean formulae or finite automata is as hard as factoring*(Technical Report TR-14–88). Cambridge, MA: Harvard University Aiken Computation Laboratory.Google Scholar - Kearns, M. and Valiant, L.G. (1989). Cryptographic limitations on learning Boolean formulae and finite automata.
*Proceedings of the Twenty-First Annual ACM Symposium on Theory of Computing*(pp. 433–444). New York, NY: ACM Press.Google Scholar - Pitt, L. and Valiant, L.G. (1988). Computational limitations on learning from examples.
*J. of the Association for Computing Machinery*,*35*, 965–984.Google Scholar - Schapire, R.E. (1989). Pattern languages are not learnable. Unpublished manuscript.Google Scholar
- Valiant, L.G. (1984). A theory of the learnable.
*Communications of the ACM*,*27*, 1134–1142.Google Scholar