# Average case analysis of a learning algorithm for *μ*-DNF expressions

## Abstract

In this paper, we present an average case model for analyzing learning algorithms. We show how the average behavior of a learning algorithm can be understood in terms of a single hypothesis that we refer to as the *average hypothesis*. As a case study, we apply the average case model to a simplified version of Pagallo and Haussler's algorithm for PAG learning *μ*DNF expressions on the uniform distribution [15]. The average case analysis reveals that, as the training sample size *m* increases, the average hypothesis *evolves* from an almost random DNF expression to a well structured *μ*DNF expression that represents exactly the target function. The learning curves exhibit a strong *threshold* behavior and, in some cases, have a *terraced structure*. That is, as *m* increases, the average accuracy stays relatively constant for short/long periods, interspersed with periods in which it rises quickly. This nontrivial behavior cannot not be deduced from a simple PAC analysis. The *average sample complexity* of the algorithm is *O*(*n*^{2}), a large improvement over the PAC analysis result of *O*(*n*^{6}) reported in [15]. The results of the numerical simulations are in very good agreement with the theoretical predictions

## Preview

Unable to display preview. Download preview PDF.

## References

- 1.Blumer A., Ehrenfeucht A., Haussler D., and Warmuth M. (1989), “Learnability and the Vapnik-Chervonenkis dimension”,
*Journal of the ACM*, Vol. 36, pp. 929–965.Google Scholar - 2.Buntine W.L. (1990).
*A Theory of Learning Classification Rules*. PhD Thesis, University of Technology, Sydney.Google Scholar - 3.Buntine W.L. (1989), “A Critique of the Valiant Model”, Proc. of the
*11th International Joint Conference on Artificial Intelligence*, pp. 837–842.Google Scholar - 4.Cohen D. and Tesauro G. (1990), “Can Neural Networks Do Better than the VC Bounds?”,
*Advances in Neural Information Processing Systems*, Vol. 3, p. 911.Google Scholar - 5.Golea M. and Marchand M. (1993), “On Learning Perceptrons with binary weights”,
*Neural Computation*, Vol. 5, pp. 767–782.Google Scholar - 6.Golea M. and Marchand M. (1993), “Average Case Analysis of the Clipped Hebb Rule for Nonoverlapping Perceptron Networks”.
*Proceedings of the 6th Annual ACM Workshop on Computational Learning Theory*, pp. 151–157.Google Scholar - 7.Haussler D. (1990), “Probably Approximately Correct Learning”,
*Proceedings of the 8th National Conference on Artificial Intelligence*(AAAI-90), p. 1101.Google Scholar - 8.Haussler D., Kearns M., and Schapire R. (1991), “Bounds on the Sample Complexity of Bayesian Learning Using Information Theory and VC Dimension”,
*Proceedings of the 4th Annual ACM Workshop on Computational Learning Theory*, pp. 61–74.Google Scholar - 9.Hirschberg D.S. and Pazzani M. (1991), “Average case analysis of a
*k*-CNF learning algorithm”, TR-91-50. Irvine: University of California, Department of Information and Computer Science.Google Scholar - 10.Hogg R.V. and Tanis E.A. (1988).
*Probability and Statistical Inference*. Macmillan, New York. Third edition.Google Scholar - 11.IbaW. and Langley P. (1992), “Induction of One-Level Decision Trees”,
*Proc. 9th International Conference on Machine Learning*, pp. 233–240. Aberdeen: Morgan Kaufmann.Google Scholar - 12.Iba W. and Langley P. (1993), “Average-Case Analysis of a Nearest Neighbor Algorithm”,
*Proc. 20th International Joint Conference on Artificial Intelligence*, pp. 889–894. Chambery, France: Morgan Kaufmann.Google Scholar - 13.Kearns M., Li M., Pitt L., and Valiant L. (1987), “On the learnability of boolean formulas”,
*Proceedings of the 9th Annual ACM Symposium on Theory of Computing*, p. 285. New York: ACM Press.Google Scholar - 14.Langley P., Iba W., and Thompson K. (1992), “An Analysis of Bayesian Classifiers”.,
*Proc. 10th National Conference on Artificial Intelligence*, pp. 223–228. San Jose, CA: AAAI Press.Google Scholar - 15.Pagallo G. and Haussler D. (1989), “A greedy method for learning
*μ*DNF functions under the uniform distribution”. Technical Report UCSC-CRL-89-12, Santa Cruz: Dept. of Computer and Information Science, University of California at Santa Cruz.Google Scholar - 16.Pazzani M. and Sarrett W. (1992), “A Framework for Average Case Analysis of Conjunctive Learning Algorithms”,
*Machine Learning*, Vol. 9, pp. 349–372.Google Scholar - 17.Opper M., Kinzel W., Kleinz J., and Nehl R. (1990), “On the Ability of the Optimal Perceptron to Generalize”,
*J. Phys. A: Math. Gen.*, Vol. 23, L581–L586.Google Scholar