# Partial occam's razor and its applications

## Abstract

We introduce the notion of “partial Occam algorithm”. A partial Occam algorithm produces a succinct hypothesis that is partially consistent with given examples, where the proportion of consistent examples is a bit more than half. By using this new notion, we propose one approach for obtaining a PAC learning algorithm: A partial Occam algorithm is equivalent to a weak PAC learning algorithm. Thus, by using boosting techniques, we can obtain an ordinary PAC learning algorithm from this weak PAC learning algorithm. We demonstrate with some examples that some improvement is possible by this approach, in particular in the hypothesis size. First, we obtain a non proper PAC learning algorithm for *k*-DNF, which has similar sample complexity as Littlestone's Winnow, but produces hypothesis of size polynomial in *d* and log *k* for a *k*-DNF target with *n* variables and *d* terms. (*Cf*. The hypothesis size of Winnow is *O*(*n*^{k}).) Next we show that 1-decision lists of length d with n variables are non-proper PAC learn able by using *O* (1 (log + 16^{d} log *n*(*d* + log log *n*)^{2})) examples within polynomial time w.r.t. *n*, 2^{d}, 1/ε, and log 1/S. Again, while this sample complexity is similar to Winnow, we improve the hypothesis size. We also point out that our algorithms are robust against random classification noise.

## Preview

Unable to display preview. Download preview PDF.

## References

- 1.H. Almuallim and T.G. Dietterich. Learning Boolean concepts in the presence of many irrelevant features.
*Artificial Intelligence*, 69(1-2), 279–306, 1994.CrossRefMathSciNetGoogle Scholar - 2.J.A. Aslam and S.E. Decatur. General bounds on statistical query learning and PAC learning with noise via hypothesis boosting. In
*Proceedings of the 34th IEEE Symposium on Foundation of Computer Science*, 282–291 1993.Google Scholar - 3.D. Angluin and P.D. Laird. Learning from noisy examples.
*Machine Learning*, 2(4):343–370, 1988.Google Scholar - 4.A. Blumer, A. Ehrenfeucht, D. Haussler, and M.K. Warmuth. Occam's razor.
*Information Processing Letters*, 24:377–380,1987.MathSciNetGoogle Scholar - 5.R. Board and L. Pitt. On the necessity of Occam algorithms. Theoretical Computer Science, 100:157–184, 1992.Google Scholar
- 6.A. Dhagat and L. Hellerstein. PAC learning with irrelevant attributes. In
*Proceedings of the 35th IEEE Symposium on Foundation of Computer Science*, 64–74, 1994.Google Scholar - 7.Y. Freund. Boosting a weak learning algorithm by majority.
*Information and Computation*, 121(2):256–285, 1995.Google Scholar - 8.Jeffrey Jackson. An efficient membership-query algorithm for learning DNF with respect to the uniform distribution. In
*Proceedings of the 35rd Annual Symposium on Foundations of Computer Science*, pages 42–53. IEEE Computer Society Press, Los Alamitos, CA, 1994.Google Scholar - 9.T. Hancock, T. Jiang, M. Li, and J. Tromp. Lower bounds on learning decision lists and trees. In
*Proceedings of the 12th Annual Symposium on Theoretical Aspects of Computer Science*, Lecture Notes in Computer Science, 527–538, 1995.Google Scholar - 10.D. Haussler. Quantifying inductive bias: AI learning algorithms and Valiant's learning framework. Artificial Intelligence, 36:177–221, 1988.CrossRefGoogle Scholar
- 11.D. Helmbold and M. K. Warmuth. On weak learning. Journal of Computer and System Sciences, 50(3), 551–573, 1995.Google Scholar
- 12.M. Kearns. Efficient noise-tolerant learning from statistical queries. In
*Proceedings of the,25th Annual ACM Symposium on Theory of Computing*, 392–401 1993.Google Scholar - 13.M.J. Kearns, M. Li and L.G. Valiant. Learning Boolean formulas.
*Journal of the ACM*, 41(6):1298–1328, 1994.Google Scholar - 14.M.J. Kearns and U.V. Vazirani.
*An Introduction to Computational Learning Theory*. Cambridge University Press, 1994.Google Scholar - 15.P.D. Laird.
*Learning from Good and Bad Data*. Kluwer international series in engineering and computer science. Kluwer Academic Publishers, Boston, 1988.Google Scholar - 16.N. Littlestone. Learning when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2:285–318, 1988.Google Scholar
- 17.N. Littlestone. From on line to batch learning. In
*Proceedings of the 2nd Workshop on Computational Learning Theory*, 269–284, 1990.Google Scholar - 18.R.L. Rivest. Learning decision lists.
*Machine Learning*, 2(3):229–246, 1987.Google Scholar - 19.R.E. Schapire. The strength of weak learnability. Machine Learning, 5(2):197–227, 1990.Google Scholar
- 20.L.G. Valiant. A theory of the learnable.
*Communications of the ACM*, 27(11):1134–1142, 1984.CrossRefGoogle Scholar - 21.L.G. Valiant. Learning disjunctions of conjunctions. In
*Proceedings of the 9th International Joint Conference on Artificial Intelligence*, 560–566, 1985.Google Scholar - 22.M.K. Warmuth. Posted in the COLT list. *** DIRECT SUPPORT *** A0008157 00003Google Scholar