Machine Learning

, Volume 17, Issue 2–3, pp 115–141 | Cite as

Toward efficient agnostic learning

  • Michael J. Kearns
  • Robert E. Schapire
  • Linda M. Sellie


In this paper we initiate an investigation of generalizations of the Probably Approximately Correct (PAC) learning model that attempt to significantly weaken the target function assumptions. The ultimate goal in this direction is informally termedagnostic learning, in which we make virtually no assumptions on the target function. The name derives from the fact that as designers of learning algorithms, we give up the belief that Nature (as represented by the target function) has a simple or succinct explanation. We give a number of positive and negative results that provide an initial outline of the possibilities for agnostic learning. Our results include hardness results for the most obvious generalization of the PAC model to an agnostic setting, an efficient and general agnostic learning method based on dynamic programming, relationships between loss functions for agnostic learning, and an algorithm for a learning problem that involves hidden variables.


machine learning agnostic learning PAC learning computational learning theory 


  1. Aldous, D. & Vazirani, U. (1990). A Markovian extension of Valiant's learning model.31st Annual Symposium on Foundations of Computer Science (pp. 392–404). IEEE Press.Google Scholar
  2. Blum, A. & Chalasani, P. (1992). Learning switching concepts.Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory (pp. 231–242). ACM Press.Google Scholar
  3. Blumer, A., Ehrenfeucht, A., Haussler, D., & Warmuth, M. K. (1989). Learnability and the Vapnik-Chervonenkis dimension.Journal of the Association for Computing Machinery, 36 929–965.Google Scholar
  4. Duda, R. O. & Hart, P. E. (1973).Pattern Classification and Scene Analysis. Wiley.Google Scholar
  5. Dudley, R. M. (1978). Central limit theorems for empirical measures.The Annals of Probability, 6 899–929.Google Scholar
  6. Freund, Y. (1990). Boosting a weak learning algorithm by majority.Proceedings of the Third Annual Workshop on Computational Learning Theory (pp. 202–216). Morgan Kaufmann.Google Scholar
  7. Freund, Y. (1992). An improved boosting algorithm and its implications on learning complexity.Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory (pp. 391–398). ACM Press.Google Scholar
  8. Garey, M. & Johnson, D. (1979).Computers and Intractability: A Guide to the Theory of NP-Completeness. San Francisco: W. H. Freeman.Google Scholar
  9. Haussler, D. (1992). Decision theoretic generalizations of the PAC model for neural net and other learning applications.Information and Computation, 100, 78–150.Google Scholar
  10. Helmbold, D. P. & Long, P. M. (1994). Tracking drifting concepts by minimizing disagreements.Machine Learning, 14, 27–45.Google Scholar
  11. Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables.Journal of the American Statistical Association, 58, 13–30.Google Scholar
  12. Izenman, A. J. (1991). Recent developments in nonparametric density estimation.Journal of the American Statistical Association, 86, 205–224.Google Scholar
  13. Kearns, M. & Li, M. (1993). Learning in the presence of malicious errors.SIAM Journal on Computing, 22, 807–837.Google Scholar
  14. Kearns, M., Li, M., Pitt, L., & Valiant, L. (1987). On the learnability of Boolean formulae.Proceedings of the Nineteenth Annual ACM Symposium on Theory of Computing (pp. 285–295).Google Scholar
  15. Kearns, M. & Valiant, L. G. (1994). Cryptographic limitations on learning Boolean formulae and finite automata.Journal of the Association for Computing Machinery, 41, 67–95. ACM Press.Google Scholar
  16. Kearns, M. J. & Schapire, R. E. (1990). Efficient distribution-free learning of probabilistic concepts.31st Annual Symposium on Foundations of Computer Science (pp. 382–391). IEEE Press. To appear,Journal of Computer and System Sciences.Google Scholar
  17. Linial, N., Mansour, Y., & Nisan, N. (1993). Constant depth circuits, Fourier transform, and learnability.Journal of the Association for Computing Machinery, 40, 607–620.Google Scholar
  18. Pitt, L. & Valiant, L. G. (1988). Computational limitations on learning from examples.Journal of the Association for Computing Machinery, 35, 965–984.Google Scholar
  19. Pollard, D. (1984).Convergence of Stochastic Processes. Springer-Verlag.Google Scholar
  20. Rissanen, J., Speed, T. P., & Yu, B. (1992). Density estimation by stochastic complexity.IEEE Transactions on Information Theory, 38, 315–323.Google Scholar
  21. Schapire, R. E. (1990). The strength of weak learnability.Machine Learning, 5, 197–227.Google Scholar
  22. Valiant, L. G. (1984). A theory of the learnable.Communications of the ACM, 27, 1134–1142.Google Scholar
  23. Valiant, L. G. (1985). Learning disjunctions of conjunctions.Proceedings of the 9th International Joint Conference on Artificial Intelligence (pp. 560–566).Google Scholar
  24. Vapnik, V. N. (1982).Estimation of Dependences Based on Empirical Data. Springer-Verlag.Google Scholar
  25. White, H. (1989). Learning in artificial neural networks: A statistical perspective.Neural Computation, 1, 425–464.Google Scholar
  26. Yamanishi, K. (1992a). A learning criterion for stochastic rules.Machine Learning, 9, 165–203.Google Scholar
  27. Yamanishi, K. (1992b). Learning nonparametric densities in terms of finite dimensional parametric hypotheses.IEICE Transactions: D Information and Systems, E75D, 459–469.Google Scholar

Copyright information

© Kluwer Academic Publishers 1994

Authors and Affiliations

  • Michael J. Kearns
    • 1
  • Robert E. Schapire
    • 1
  • Linda M. Sellie
    • 2
  1. 1.AT&T Bell LaboratoriesMurray Hill
  2. 2.Department of Computer ScienceUniversity of ChicagoChicago

Personalised recommendations