Average case analysis of a learning algorithm for μ-DNF expressions

  • Mostefa Golea
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 904)


In this paper, we present an average case model for analyzing learning algorithms. We show how the average behavior of a learning algorithm can be understood in terms of a single hypothesis that we refer to as the average hypothesis. As a case study, we apply the average case model to a simplified version of Pagallo and Haussler's algorithm for PAG learning μDNF expressions on the uniform distribution [15]. The average case analysis reveals that, as the training sample size m increases, the average hypothesis evolves from an almost random DNF expression to a well structured μDNF expression that represents exactly the target function. The learning curves exhibit a strong threshold behavior and, in some cases, have a terraced structure. That is, as m increases, the average accuracy stays relatively constant for short/long periods, interspersed with periods in which it rises quickly. This nontrivial behavior cannot not be deduced from a simple PAC analysis. The average sample complexity of the algorithm is O(n2), a large improvement over the PAC analysis result of O(n6) reported in [15]. The results of the numerical simulations are in very good agreement with the theoretical predictions


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Blumer A., Ehrenfeucht A., Haussler D., and Warmuth M. (1989), “Learnability and the Vapnik-Chervonenkis dimension”, Journal of the ACM, Vol. 36, pp. 929–965.Google Scholar
  2. 2.
    Buntine W.L. (1990). A Theory of Learning Classification Rules. PhD Thesis, University of Technology, Sydney.Google Scholar
  3. 3.
    Buntine W.L. (1989), “A Critique of the Valiant Model”, Proc. of the 11th International Joint Conference on Artificial Intelligence, pp. 837–842.Google Scholar
  4. 4.
    Cohen D. and Tesauro G. (1990), “Can Neural Networks Do Better than the VC Bounds?”, Advances in Neural Information Processing Systems, Vol. 3, p. 911.Google Scholar
  5. 5.
    Golea M. and Marchand M. (1993), “On Learning Perceptrons with binary weights”, Neural Computation, Vol. 5, pp. 767–782.Google Scholar
  6. 6.
    Golea M. and Marchand M. (1993), “Average Case Analysis of the Clipped Hebb Rule for Nonoverlapping Perceptron Networks”. Proceedings of the 6th Annual ACM Workshop on Computational Learning Theory, pp. 151–157.Google Scholar
  7. 7.
    Haussler D. (1990), “Probably Approximately Correct Learning”, Proceedings of the 8th National Conference on Artificial Intelligence (AAAI-90), p. 1101.Google Scholar
  8. 8.
    Haussler D., Kearns M., and Schapire R. (1991), “Bounds on the Sample Complexity of Bayesian Learning Using Information Theory and VC Dimension”, Proceedings of the 4th Annual ACM Workshop on Computational Learning Theory, pp. 61–74.Google Scholar
  9. 9.
    Hirschberg D.S. and Pazzani M. (1991), “Average case analysis of a k-CNF learning algorithm”, TR-91-50. Irvine: University of California, Department of Information and Computer Science.Google Scholar
  10. 10.
    Hogg R.V. and Tanis E.A. (1988). Probability and Statistical Inference. Macmillan, New York. Third edition.Google Scholar
  11. 11.
    IbaW. and Langley P. (1992), “Induction of One-Level Decision Trees”, Proc. 9th International Conference on Machine Learning, pp. 233–240. Aberdeen: Morgan Kaufmann.Google Scholar
  12. 12.
    Iba W. and Langley P. (1993), “Average-Case Analysis of a Nearest Neighbor Algorithm”, Proc. 20th International Joint Conference on Artificial Intelligence, pp. 889–894. Chambery, France: Morgan Kaufmann.Google Scholar
  13. 13.
    Kearns M., Li M., Pitt L., and Valiant L. (1987), “On the learnability of boolean formulas”, Proceedings of the 9th Annual ACM Symposium on Theory of Computing, p. 285. New York: ACM Press.Google Scholar
  14. 14.
    Langley P., Iba W., and Thompson K. (1992), “An Analysis of Bayesian Classifiers”., Proc. 10th National Conference on Artificial Intelligence, pp. 223–228. San Jose, CA: AAAI Press.Google Scholar
  15. 15.
    Pagallo G. and Haussler D. (1989), “A greedy method for learning μDNF functions under the uniform distribution”. Technical Report UCSC-CRL-89-12, Santa Cruz: Dept. of Computer and Information Science, University of California at Santa Cruz.Google Scholar
  16. 16.
    Pazzani M. and Sarrett W. (1992), “A Framework for Average Case Analysis of Conjunctive Learning Algorithms”, Machine Learning, Vol. 9, pp. 349–372.Google Scholar
  17. 17.
    Opper M., Kinzel W., Kleinz J., and Nehl R. (1990), “On the Ability of the Optimal Perceptron to Generalize”, J. Phys. A: Math. Gen., Vol. 23, L581–L586.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1995

Authors and Affiliations

  • Mostefa Golea
    • 1
  1. 1.Institute for Social Information Science (ISIS)Fujitsu Laboratories LTDShizuokaJapan

Personalised recommendations