Partial occam's razor and its applications

  • Carlos Domingo
  • Tatsuie Tsukiji
  • Osamu Watanabe
Session 4
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1316)


We introduce the notion of “partial Occam algorithm”. A partial Occam algorithm produces a succinct hypothesis that is partially consistent with given examples, where the proportion of consistent examples is a bit more than half. By using this new notion, we propose one approach for obtaining a PAC learning algorithm: A partial Occam algorithm is equivalent to a weak PAC learning algorithm. Thus, by using boosting techniques, we can obtain an ordinary PAC learning algorithm from this weak PAC learning algorithm. We demonstrate with some examples that some improvement is possible by this approach, in particular in the hypothesis size. First, we obtain a non proper PAC learning algorithm for k-DNF, which has similar sample complexity as Littlestone's Winnow, but produces hypothesis of size polynomial in d and log k for a k-DNF target with n variables and d terms. (Cf. The hypothesis size of Winnow is O(nk).) Next we show that 1-decision lists of length d with n variables are non-proper PAC learn able by using O (1 (log + 16d log n(d + log log n)2)) examples within polynomial time w.r.t. n, 2d, 1/ε, and log 1/S. Again, while this sample complexity is similar to Winnow, we improve the hypothesis size. We also point out that our algorithms are robust against random classification noise.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    H. Almuallim and T.G. Dietterich. Learning Boolean concepts in the presence of many irrelevant features. Artificial Intelligence, 69(1-2), 279–306, 1994.CrossRefMathSciNetGoogle Scholar
  2. 2.
    J.A. Aslam and S.E. Decatur. General bounds on statistical query learning and PAC learning with noise via hypothesis boosting. In Proceedings of the 34th IEEE Symposium on Foundation of Computer Science, 282–291 1993.Google Scholar
  3. 3.
    D. Angluin and P.D. Laird. Learning from noisy examples. Machine Learning, 2(4):343–370, 1988.Google Scholar
  4. 4.
    A. Blumer, A. Ehrenfeucht, D. Haussler, and M.K. Warmuth. Occam's razor. Information Processing Letters, 24:377–380,1987.MathSciNetGoogle Scholar
  5. 5.
    R. Board and L. Pitt. On the necessity of Occam algorithms. Theoretical Computer Science, 100:157–184, 1992.Google Scholar
  6. 6.
    A. Dhagat and L. Hellerstein. PAC learning with irrelevant attributes. In Proceedings of the 35th IEEE Symposium on Foundation of Computer Science, 64–74, 1994.Google Scholar
  7. 7.
    Y. Freund. Boosting a weak learning algorithm by majority. Information and Computation, 121(2):256–285, 1995.Google Scholar
  8. 8.
    Jeffrey Jackson. An efficient membership-query algorithm for learning DNF with respect to the uniform distribution. In Proceedings of the 35rd Annual Symposium on Foundations of Computer Science, pages 42–53. IEEE Computer Society Press, Los Alamitos, CA, 1994.Google Scholar
  9. 9.
    T. Hancock, T. Jiang, M. Li, and J. Tromp. Lower bounds on learning decision lists and trees. In Proceedings of the 12th Annual Symposium on Theoretical Aspects of Computer Science, Lecture Notes in Computer Science, 527–538, 1995.Google Scholar
  10. 10.
    D. Haussler. Quantifying inductive bias: AI learning algorithms and Valiant's learning framework. Artificial Intelligence, 36:177–221, 1988.CrossRefGoogle Scholar
  11. 11.
    D. Helmbold and M. K. Warmuth. On weak learning. Journal of Computer and System Sciences, 50(3), 551–573, 1995.Google Scholar
  12. 12.
    M. Kearns. Efficient noise-tolerant learning from statistical queries. In Proceedings of the,25th Annual ACM Symposium on Theory of Computing, 392–401 1993.Google Scholar
  13. 13.
    M.J. Kearns, M. Li and L.G. Valiant. Learning Boolean formulas. Journal of the ACM, 41(6):1298–1328, 1994.Google Scholar
  14. 14.
    M.J. Kearns and U.V. Vazirani. An Introduction to Computational Learning Theory. Cambridge University Press, 1994.Google Scholar
  15. 15.
    P.D. Laird. Learning from Good and Bad Data. Kluwer international series in engineering and computer science. Kluwer Academic Publishers, Boston, 1988.Google Scholar
  16. 16.
    N. Littlestone. Learning when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2:285–318, 1988.Google Scholar
  17. 17.
    N. Littlestone. From on line to batch learning. In Proceedings of the 2nd Workshop on Computational Learning Theory, 269–284, 1990.Google Scholar
  18. 18.
    R.L. Rivest. Learning decision lists. Machine Learning, 2(3):229–246, 1987.Google Scholar
  19. 19.
    R.E. Schapire. The strength of weak learnability. Machine Learning, 5(2):197–227, 1990.Google Scholar
  20. 20.
    L.G. Valiant. A theory of the learnable. Communications of the ACM, 27(11):1134–1142, 1984.CrossRefGoogle Scholar
  21. 21.
    L.G. Valiant. Learning disjunctions of conjunctions. In Proceedings of the 9th International Joint Conference on Artificial Intelligence, 560–566, 1985.Google Scholar
  22. 22.
    M.K. Warmuth. Posted in the COLT list. *** DIRECT SUPPORT *** A0008157 00003Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1997

Authors and Affiliations

  • Carlos Domingo
    • 1
  • Tatsuie Tsukiji
    • 2
  • Osamu Watanabe
    • 3
  1. 1.Departament de Llenguatges i Sistemes InformaticsUniversitat Politécnica de CatalunyaCatalunya
  2. 2.Graduate School of Human InformaticsNagoya UniversityNagoya
  3. 3.Department of Computer ScienceTokyo Institute of TechnologyTokyo

Personalised recommendations