## Abstract

The basic question addressed in this paper is: how can a learning algorithm cope with incorrect training examples? Specifically, how can algorithms that produce an “approximately correct” identification with “high probability” for reliable data be adapted to handle noisy data? We show that when the teacher may make independent random errors in classifying the example data, the strategy of selecting the most consistent rule for the sample is sufficient, and usually requires a feasibly small number of examples, provided noise affects less than half the examples on average. In this setting we are able to estimate the rate of noise using only the knowledge that the rate is less than one half. The basic ideas extend to other types of random noise as well. We also show that the search problem associated with this strategy is intractable in general. However, for particular classes of rules the target rule may be efficiently identified if we use techniques specific to that class. For an important class of formulas – the *k*-CNF formulas studied by Valiant – we present a polynomial-time algorithm that identifies concepts in this form when the rate of classification errors is less than one half.

## References

- Angluin, D. (1987). Learning regular sets from queries and counterexamples.
*Information and Computation*, 75, 87–106.Google Scholar - Blumer, A., Ehrenfeucht, A., Haussler, D., & Warmuth, M. (1986). Classifying learnable geometric concepts with the Vapnik-Chervonenkis dimension.
*Pro-ceedings of the Eighteenth Annual ACM Symposium on Theory of Computing*(pp. 273–282). Berkeley, CA: The Association for Computing Machinery.Google Scholar - Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables.
*Journal of the American Statistical Association*, 60 13–30.Google Scholar - Kearns, M., & Li, M. (1987).
*Learning in the presence of malicious errors*(Technical Report TR-03–87). Cambridge, MA: Harvard University, Center for Research in Computing Technology.Google Scholar - Kearns, M., Li, M., Pitt, L., & Valiant, L. (1987). On the learnability of Boolean formulae.
*Proceedings of the Nineteenth Annual ACM Symposium on Theory of Computing*(pp. 285–295). New York: The Association for Computing Machinery.Google Scholar - Laird, P. (1987).
*Learning from good data and bad*. Doctoral dissertation, Depart-ment of Computer Science, Yale University, New Haven, CT.Google Scholar - Quinlan, J. R. (1986). The effect of noise on concept learning. In R. S. Michalski, J. G. Carbonell, & T. M. Mitchell (Eds.),
*Machine learning*(Vol. 2). Los Altos, CA: Morgan Kaufmann.Google Scholar - Schlimmer, J. C., & Granger, R. H. (1986). Incremental learning from noisy data.
*Machine Learning*, 1, 317–354.Google Scholar - Shackelford, G. G., & Volper, D. J. (1987).
*Learning in the presence of noise*. Unpublished manuscript. University of California, Department of Information and Computer Science, Irvine.Google Scholar - Valiant, L. G. (1984). A theory of the learnable.
*Communications of the ACM*, 27, 1134–1142.Google Scholar - Valiant, L. G. (1985). Learning disjunctions of conjunctions.
*Proceedings of the Ninth International Joint Conference on Artificial Intelligence*(pp. 560–566). Los Angeles, CA: Morgan Kaufmann.Google Scholar - Vapnik, V. N. (1982).
*Estimation of dependencies based on empirical data*. New York: Springer-Verlag.Google Scholar - Wilkins, D. C., & Buchanan, B. G. (1986). On debugging rule sets when reasoning under uncertainty.
*Proceedings of the Fifth National Conference on Artificial Intelligence*(pp. 448–454). Philadelphia, PA: Morgan Kaufmann.Google Scholar