Toward efficient agnostic learning Michael J. Kearns Robert E. Schapire Linda M. Sellie Article Received: 25 January 1993 Accepted: 15 October 1993 DOI :
10.1007/BF00993468

Cite this article as: Kearns, M.J., Schapire, R.E. & Sellie, L.M. Mach Learn (1994) 17: 115. doi:10.1007/BF00993468
48
Citations
312
Downloads
Abstract In this paper we initiate an investigation of generalizations of the Probably Approximately Correct (PAC) learning model that attempt to significantly weaken the target function assumptions. The ultimate goal in this direction is informally termedagnostic learning , in which we make virtually no assumptions on the target function. The name derives from the fact that as designers of learning algorithms, we give up the belief that Nature (as represented by the target function) has a simple or succinct explanation. We give a number of positive and negative results that provide an initial outline of the possibilities for agnostic learning. Our results include hardness results for the most obvious generalization of the PAC model to an agnostic setting, an efficient and general agnostic learning method based on dynamic programming, relationships between loss functions for agnostic learning, and an algorithm for a learning problem that involves hidden variables.

Keywords machine learning agnostic learning PAC learning computational learning theory Download to read the full article text

References Aldous, D. & Vazirani, U. (1990). A Markovian extension of Valiant's learning model.31st Annual Symposium on Foundations of Computer Science (pp. 392–404). IEEE Press.

Blum, A. & Chalasani, P. (1992). Learning switching concepts.Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory (pp. 231–242). ACM Press.

Blumer, A., Ehrenfeucht, A., Haussler, D., & Warmuth, M. K. (1989). Learnability and the Vapnik-Chervonenkis dimension.

Journal of the Association for Computing Machinery, 36 929–965.

Google Scholar Duda, R. O. & Hart, P. E. (1973).Pattern Classification and Scene Analysis . Wiley.

Dudley, R. M. (1978). Central limit theorems for empirical measures.

The Annals of Probability, 6 899–929.

Google Scholar Freund, Y. (1990). Boosting a weak learning algorithm by majority.Proceedings of the Third Annual Workshop on Computational Learning Theory (pp. 202–216). Morgan Kaufmann.

Freund, Y. (1992). An improved boosting algorithm and its implications on learning complexity.Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory (pp. 391–398). ACM Press.

Garey, M. & Johnson, D. (1979).

Computers and Intractability: A Guide to the Theory of NP-Completeness . San Francisco: W. H. Freeman.

Google Scholar Haussler, D. (1992). Decision theoretic generalizations of the PAC model for neural net and other learning applications.

Information and Computation, 100 , 78–150.

Google Scholar Helmbold, D. P. & Long, P. M. (1994). Tracking drifting concepts by minimizing disagreements.

Machine Learning, 14 , 27–45.

Google Scholar Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables.

Journal of the American Statistical Association, 58 , 13–30.

Google Scholar Izenman, A. J. (1991). Recent developments in nonparametric density estimation.

Journal of the American Statistical Association, 86 , 205–224.

Google Scholar Kearns, M. & Li, M. (1993). Learning in the presence of malicious errors.

SIAM Journal on Computing, 22 , 807–837.

Google Scholar Kearns, M., Li, M., Pitt, L., & Valiant, L. (1987). On the learnability of Boolean formulae.Proceedings of the Nineteenth Annual ACM Symposium on Theory of Computing (pp. 285–295).

Kearns, M. & Valiant, L. G. (1994). Cryptographic limitations on learning Boolean formulae and finite automata.

Journal of the Association for Computing Machinery, 41 , 67–95. ACM Press.

Google Scholar Kearns, M. J. & Schapire, R. E. (1990). Efficient distribution-free learning of probabilistic concepts.31st Annual Symposium on Foundations of Computer Science (pp. 382–391). IEEE Press. To appear,Journal of Computer and System Sciences .

Linial, N., Mansour, Y., & Nisan, N. (1993). Constant depth circuits, Fourier transform, and learnability.

Journal of the Association for Computing Machinery, 40 , 607–620.

Google Scholar Pitt, L. & Valiant, L. G. (1988). Computational limitations on learning from examples.

Journal of the Association for Computing Machinery, 35 , 965–984.

Google Scholar Pollard, D. (1984).Convergence of Stochastic Processes . Springer-Verlag.

Rissanen, J., Speed, T. P., & Yu, B. (1992). Density estimation by stochastic complexity.

IEEE Transactions on Information Theory, 38 , 315–323.

Google Scholar Schapire, R. E. (1990). The strength of weak learnability.

Machine Learning, 5 , 197–227.

Google Scholar Valiant, L. G. (1984). A theory of the learnable.

Communications of the ACM, 27 , 1134–1142.

Google Scholar Valiant, L. G. (1985). Learning disjunctions of conjunctions.Proceedings of the 9th International Joint Conference on Artificial Intelligence (pp. 560–566).

Vapnik, V. N. (1982).Estimation of Dependences Based on Empirical Data . Springer-Verlag.

White, H. (1989). Learning in artificial neural networks: A statistical perspective.

Neural Computation, 1 , 425–464.

Google Scholar Yamanishi, K. (1992a). A learning criterion for stochastic rules.

Machine Learning, 9 , 165–203.

Google Scholar Yamanishi, K. (1992b). Learning nonparametric densities in terms of finite dimensional parametric hypotheses.

IEICE Transactions: D Information and Systems, E75D , 459–469.

Google Scholar © Kluwer Academic Publishers 1994

Authors and Affiliations Michael J. Kearns Robert E. Schapire Linda M. Sellie 1. AT&T Bell Laboratories Murray Hill 2. Department of Computer Science University of Chicago Chicago