Abstract
In this paper we initiate an investigation of generalizations of the Probably Approximately Correct (PAC) learning model that attempt to significantly weaken the target function assumptions. The ultimate goal in this direction is informally termedagnostic learning, in which we make virtually no assumptions on the target function. The name derives from the fact that as designers of learning algorithms, we give up the belief that Nature (as represented by the target function) has a simple or succinct explanation. We give a number of positive and negative results that provide an initial outline of the possibilities for agnostic learning. Our results include hardness results for the most obvious generalization of the PAC model to an agnostic setting, an efficient and general agnostic learning method based on dynamic programming, relationships between loss functions for agnostic learning, and an algorithm for a learning problem that involves hidden variables.
Article PDF
Similar content being viewed by others
References
Aldous, D. & Vazirani, U. (1990). A Markovian extension of Valiant's learning model.31st Annual Symposium on Foundations of Computer Science (pp. 392–404). IEEE Press.
Blum, A. & Chalasani, P. (1992). Learning switching concepts.Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory (pp. 231–242). ACM Press.
Blumer, A., Ehrenfeucht, A., Haussler, D., & Warmuth, M. K. (1989). Learnability and the Vapnik-Chervonenkis dimension.Journal of the Association for Computing Machinery, 36 929–965.
Duda, R. O. & Hart, P. E. (1973).Pattern Classification and Scene Analysis. Wiley.
Dudley, R. M. (1978). Central limit theorems for empirical measures.The Annals of Probability, 6 899–929.
Freund, Y. (1990). Boosting a weak learning algorithm by majority.Proceedings of the Third Annual Workshop on Computational Learning Theory (pp. 202–216). Morgan Kaufmann.
Freund, Y. (1992). An improved boosting algorithm and its implications on learning complexity.Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory (pp. 391–398). ACM Press.
Garey, M. & Johnson, D. (1979).Computers and Intractability: A Guide to the Theory of NP-Completeness. San Francisco: W. H. Freeman.
Haussler, D. (1992). Decision theoretic generalizations of the PAC model for neural net and other learning applications.Information and Computation, 100, 78–150.
Helmbold, D. P. & Long, P. M. (1994). Tracking drifting concepts by minimizing disagreements.Machine Learning, 14, 27–45.
Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables.Journal of the American Statistical Association, 58, 13–30.
Izenman, A. J. (1991). Recent developments in nonparametric density estimation.Journal of the American Statistical Association, 86, 205–224.
Kearns, M. & Li, M. (1993). Learning in the presence of malicious errors.SIAM Journal on Computing, 22, 807–837.
Kearns, M., Li, M., Pitt, L., & Valiant, L. (1987). On the learnability of Boolean formulae.Proceedings of the Nineteenth Annual ACM Symposium on Theory of Computing (pp. 285–295).
Kearns, M. & Valiant, L. G. (1994). Cryptographic limitations on learning Boolean formulae and finite automata.Journal of the Association for Computing Machinery, 41, 67–95. ACM Press.
Kearns, M. J. & Schapire, R. E. (1990). Efficient distribution-free learning of probabilistic concepts.31st Annual Symposium on Foundations of Computer Science (pp. 382–391). IEEE Press. To appear,Journal of Computer and System Sciences.
Linial, N., Mansour, Y., & Nisan, N. (1993). Constant depth circuits, Fourier transform, and learnability.Journal of the Association for Computing Machinery, 40, 607–620.
Pitt, L. & Valiant, L. G. (1988). Computational limitations on learning from examples.Journal of the Association for Computing Machinery, 35, 965–984.
Pollard, D. (1984).Convergence of Stochastic Processes. Springer-Verlag.
Rissanen, J., Speed, T. P., & Yu, B. (1992). Density estimation by stochastic complexity.IEEE Transactions on Information Theory, 38, 315–323.
Schapire, R. E. (1990). The strength of weak learnability.Machine Learning, 5, 197–227.
Valiant, L. G. (1984). A theory of the learnable.Communications of the ACM, 27, 1134–1142.
Valiant, L. G. (1985). Learning disjunctions of conjunctions.Proceedings of the 9th International Joint Conference on Artificial Intelligence (pp. 560–566).
Vapnik, V. N. (1982).Estimation of Dependences Based on Empirical Data. Springer-Verlag.
White, H. (1989). Learning in artificial neural networks: A statistical perspective.Neural Computation, 1, 425–464.
Yamanishi, K. (1992a). A learning criterion for stochastic rules.Machine Learning, 9, 165–203.
Yamanishi, K. (1992b). Learning nonparametric densities in terms of finite dimensional parametric hypotheses.IEICE Transactions: D Information and Systems, E75D, 459–469.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Kearns, M.J., Schapire, R.E. & Sellie, L.M. Toward efficient agnostic learning. Mach Learn 17, 115–141 (1994). https://doi.org/10.1007/BF00993468
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/BF00993468