Toward efficient agnostic learning Article Received: 25 January 1993 Accepted: 15 October 1993 DOI:
Cite this article as: Kearns, M.J., Schapire, R.E. & Sellie, L.M. Mach Learn (1994) 17: 115. doi:10.1007/BF00993468 Abstract
In this paper we initiate an investigation of generalizations of the Probably Approximately Correct (PAC) learning model that attempt to significantly weaken the target function assumptions. The ultimate goal in this direction is informally termed
agnostic learning, in which we make virtually no assumptions on the target function. The name derives from the fact that as designers of learning algorithms, we give up the belief that Nature (as represented by the target function) has a simple or succinct explanation. We give a number of positive and negative results that provide an initial outline of the possibilities for agnostic learning. Our results include hardness results for the most obvious generalization of the PAC model to an agnostic setting, an efficient and general agnostic learning method based on dynamic programming, relationships between loss functions for agnostic learning, and an algorithm for a learning problem that involves hidden variables. Keywords machine learning agnostic learning PAC learning computational learning theory Download to read the full article text References
Aldous, D. & Vazirani, U. (1990). A Markovian extension of Valiant's learning model.
31st Annual Symposium on Foundations of Computer Science (pp. 392–404). IEEE Press.
Blum, A. & Chalasani, P. (1992). Learning switching concepts.
Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory (pp. 231–242). ACM Press.
Blumer, A., Ehrenfeucht, A., Haussler, D., & Warmuth, M. K. (1989). Learnability and the Vapnik-Chervonenkis dimension.
Journal of the Association for Computing Machinery, 36
Duda, R. O. & Hart, P. E. (1973).
Pattern Classification and Scene Analysis. Wiley.
Dudley, R. M. (1978). Central limit theorems for empirical measures.
The Annals of Probability, 6
Freund, Y. (1990). Boosting a weak learning algorithm by majority.
Proceedings of the Third Annual Workshop on Computational Learning Theory (pp. 202–216). Morgan Kaufmann.
Freund, Y. (1992). An improved boosting algorithm and its implications on learning complexity.
Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory (pp. 391–398). ACM Press.
Garey, M. & Johnson, D. (1979).
Computers and Intractability: A Guide to the Theory of NP-Completeness
. San Francisco: W. H. Freeman.
Haussler, D. (1992). Decision theoretic generalizations of the PAC model for neural net and other learning applications.
Information and Computation, 100
Helmbold, D. P. & Long, P. M. (1994). Tracking drifting concepts by minimizing disagreements.
Machine Learning, 14
Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables.
Journal of the American Statistical Association, 58
Izenman, A. J. (1991). Recent developments in nonparametric density estimation.
Journal of the American Statistical Association, 86
Kearns, M. & Li, M. (1993). Learning in the presence of malicious errors.
SIAM Journal on Computing, 22
Kearns, M., Li, M., Pitt, L., & Valiant, L. (1987). On the learnability of Boolean formulae.
Proceedings of the Nineteenth Annual ACM Symposium on Theory of Computing (pp. 285–295).
Kearns, M. & Valiant, L. G. (1994). Cryptographic limitations on learning Boolean formulae and finite automata.
Journal of the Association for Computing Machinery, 41
, 67–95. ACM Press.
Kearns, M. J. & Schapire, R. E. (1990). Efficient distribution-free learning of probabilistic concepts.
31st Annual Symposium on Foundations of Computer Science (pp. 382–391). IEEE Press. To appear, Journal of Computer and System Sciences.
Linial, N., Mansour, Y., & Nisan, N. (1993). Constant depth circuits, Fourier transform, and learnability.
Journal of the Association for Computing Machinery, 40
Pitt, L. & Valiant, L. G. (1988). Computational limitations on learning from examples.
Journal of the Association for Computing Machinery, 35
Pollard, D. (1984).
Convergence of Stochastic Processes. Springer-Verlag.
Rissanen, J., Speed, T. P., & Yu, B. (1992). Density estimation by stochastic complexity.
IEEE Transactions on Information Theory, 38
Schapire, R. E. (1990). The strength of weak learnability.
Machine Learning, 5
Valiant, L. G. (1984). A theory of the learnable.
Communications of the ACM, 27
Valiant, L. G. (1985). Learning disjunctions of conjunctions.
Proceedings of the 9th International Joint Conference on Artificial Intelligence (pp. 560–566).
Vapnik, V. N. (1982).
Estimation of Dependences Based on Empirical Data. Springer-Verlag.
White, H. (1989). Learning in artificial neural networks: A statistical perspective.
Neural Computation, 1
Yamanishi, K. (1992a). A learning criterion for stochastic rules.
Machine Learning, 9
Yamanishi, K. (1992b). Learning nonparametric densities in terms of finite dimensional parametric hypotheses.
IEICE Transactions: D Information and Systems, E75D
Google Scholar Copyright information
© Kluwer Academic Publishers 1994