Learning in Hybrid Noise Environments Using Statistical Queries
We consider formal models of learning from noisy data. Specifically, we focus on learning in the probability approximately correct model as defined by Valiant. Two of the most widely studied models of noise in this setting have been classification noise and malicious errors. However, a more realistic model combining the two types of noise has not been formalized. We define a learning environment based on a natural combination of these two noise models. We first show that hypothesis testing is possible in this model. We next describe a simple technique for learning in this model, and then describe a more powerful technique based on statistical query learning. We show that the noise tolerance of this improved technique is roughly optimal with respect to the desired learning accuracy and that it provides a smooth tradeoff between the tolerable amounts of the two types of noise. Finally, we show that statistical query simulation yields learning algorithms for other combinations of noise models, thus demonstrating that statistical query specification truly captures the generic fault tolerance of a learning algorithm.
Unable to display preview. Download preview PDF.
- [AD93]Javed Aslam and Scott Decatur. General bounds on statistical query learning and PAC learning with noise via hypothesis boosting. In Proceedings of the 34th Annual Symposium on Foundations of Computer Science, pages 282–291, November 1993.Google Scholar
- [AD94]Javed Aslam and Scott Decatur. Improved noise-tolerant learning and generalized statistical queries. Technical Report TR-17–94, Harvard University, July 1994.Google Scholar
- [AL88]Dana Angluin and Philip Laird. Learning from noisy examples. Machine Learning, 2 (4): 343–370, 1988.Google Scholar
- [Ang92]Dana Angluin. Computational learning theory: Survey and selected bibliography. In Proceedings of the 24 Annual ACM Symposium on the Theory of Computing, 1992.Google Scholar
- AV79] Dana Angluin and Leslie G. Valiant. Fast probabilistic algorithms for Hamil-Google Scholar
- [Dec93]Scott Decatur. Statistical queries and faulty PAC oracles. In Proceedings of the Sixth Annual ACM Workshop on Computational Learning Theory, pages 262–268. ACM Press, July 1993.Google Scholar
- [Dec95]Scott Decatur. Efficient Learning from Faulty Data. PhD thesis, Harvard University, 1995.Google Scholar
- [DG95]Scott Decatur and Rosario Gennaro. On learning from noisy and incomplete examples. In Proceedings of the Eighth Annual ACM Workshop on Computational Learning Theory. ACM Press, July 1995.Google Scholar
- [Kea93]Michael Kearns. Efficient noise-tolerant learning from statistical queries. In Proceedings of the 25 th Annual ACM Symposium on the Theory of Computing, pages 392–401, San Diego, 1993.Google Scholar
- [KL88]Michael Kearns and Ming Li. Learning in the presence of malicious errors. In Proceedings of the 20th Annual ACM Symposium on Theory of Computing, Chicago, Illinois, May 1988.Google Scholar
- [Sim93]Hans Ulrich Simon. General bounds on the number of examples needed for learning probabilistic concepts. In Proceedings of the Sixth Annual ACM Workshop on Computational Learning Theory, pages 402–411. ACM Press, 1993.Google Scholar
- [Va185]Leslie Valiant. Learning disjunctions of conjunctions. In Proceedings of the Ninth International Joint Conference on Artificial Intelligence, 1985.Google Scholar
- [VC71]V.N. Vapnik and A.Ya. Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theor. Probability Appl.,16(2):264280, 1971.Google Scholar