Advertisement

Vapnik-Chervonenkis dimension of recurrent neural networks

  • Pascal Koiran
  • Eduardo D. Sontag
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1208)

Abstract

Most of the work on the Vapnik-Chervonenkis dimension of neural networks has been focused on feedforward networks. However, recurrent networks are also widely used in learning applications, in particular when time is a relevant parameter. This paper provides lower and upper bounds for the VC dimension of such networks. Several types of activation functions are discussed, including threshold, polynomial, piecewise-polynomial and sigmoidal functions. The bounds depend on two independent parameters: the number w of weights in the network, and the length k of the input sequence. In contrast, for feedforward networks, VC dimension bounds can be expressed as a function of w only. An important difference between recurrent and feedforward nets is that a fixed recurrent net can receive inputs of arbitrary length. Therefore we are particularly interested in the case k≫w. Ignoring multiplicative constants, the main results say roughly the following:
  • For architectures with activation σ = any fixed nonlinear polynomial, the VC dimension is ≈ wk.

  • For architectures with activation σ = any fixed piecewise polynomial, the VC dimension is between wk and w2k.

  • For architectures with activation σ = H (threshold nets), the VC dimension is between w log(k/w) and min{wk log wk, w2+w log wk}.

  • For the standard sigmoid σ(x)=1/(1+ex), the VC dimension is between wk and w4k2.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    E.B. Baum and D. Haussler, “What size net gives valid generalization?”, Neural Computation, 1 (1989), pp. 151–160.Google Scholar
  2. 2.
    Y. Bengio, Neural Networks for Speech and Sequence Recognition, Thompson Computer Press, Boston, 1996.Google Scholar
  3. 3.
    T.M. Cover, “Capacity problems for linear machines”, in: Pattern Recognition (L. Kanal ed.), Thompson Book Co., 1968, pp. 283–289Google Scholar
  4. 4.
    B. Dasgupta and E.D. Sontag, “Sample complexity for learning recurrent perceptron mappings,” IEEE Trans. Inform. Theory, September 1996, to appear. (Summary in Advances in Neural Information Processing Systems 8 (NIPS95) (D.S. Touretzky, M.C. Moser, and M.E. Hasselmo, eds.), MIT Press, Cambridge, MA, 1996, pp. 204–210.)Google Scholar
  5. 5.
    C.L. Giles, G.Z. Sun, H.H. Chen, Y.C. Lee and D. Chen, “Higher order recurrent networks and grammatical inference”, in Advances in Neural Information Processing Systems 2, D.S. Touretzky (ed.), Morgan Kaufmann, San Mateo, CA, 1990.Google Scholar
  6. 6.
    P. Goldberg and M. Jerrum, “Bounding the Vapnik-Chervonenkis dimension of concept classes parametrized by real numbers,” Machine Learning 18(1995), pp. 131–148.Google Scholar
  7. 7.
    M. Karpinski and A. Macintyre, “Polynomial bounds for VC dimension of sigmoidal and general Pfaffian neural networks,” J. Computer Sys. Sci., to appear. (Summary in “Polynomial bounds for VC dimension of sigmoidal neural networks,” in Proc. 27th ACM Symposium on Theory of Computing, 1995, pp. 200–208).Google Scholar
  8. 8.
    P. Koiran and E.D. Sontag, “Neural networks with quadratic VC dimension,” J. Computer Sys. Sci., to appear. (Summary in Advances in Neural Information Processing Systems 8 (NIPS95) (D.S. Touretzky, M.C. Moser, and M.E. Hasselmo, eds.), MIT Press, Cambridge, MA, 1996, pp. 197–203.)Google Scholar
  9. 9.
    M. Matthews, “A state-space approach to adaptive nonlinear filtering using recurrent neural networks,” Proc. 1990 IASTED Symp. on Artificial Intelligence Applications and Neural Networks, Zürich, pp. 197–200, July 1990.Google Scholar
  10. 10.
    M.M. Polycarpou, and P.A. Ioannou, “Neural networks and on-line approximators for adaptive control,” in Proc. Seventh Yale Workshop on Adaptive and Learning Systems, pp. 93–798, Yale University, 1992.Google Scholar
  11. 11.
    H. Siegelmann and E.D. Sontag, “On the computational power of neural nets,” J. Comp. Syst. Sci. 50(1995): 132–150.Google Scholar
  12. 12.
    H. Siegelmann and E.D. Sontag, “Analog computation, neural networks, and circuits,” Theor. Comp. Sci. 131(1994): 331–360.Google Scholar
  13. 13.
    E.D. Sontag, Mathematical Control Theory: Deterministic Finite Dimensional Systems, Springer, New York, 1990.Google Scholar
  14. 14.
    E.D. Sontag, “Neural nets as systems models and controllers,” in Proc. Seventh Yale Workshop on Adaptive and Learning Systems, pp. 73–79, Yale University, 1992.Google Scholar
  15. 15.
    E.D. Sontag, “Feedforward nets for interpolation and classification,” J. Comp. Syst. Sci. 45(1992): 20–48.Google Scholar
  16. 16.
    A.M. Zador and B.A. Pearlmutter, “VC dimension of an integrate-and-fire neuron model,” Neural Computation 8(1996): 611–624.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1997

Authors and Affiliations

  • Pascal Koiran
    • 1
  • Eduardo D. Sontag
    • 2
  1. 1.Laboratoire de l'Informatique du Parallélisme Ecole Normale Supérieure de LyonCNRSLyon Cedex 07France
  2. 2.Department of MathematicsRutgers UniversityNew BrunswickUSA

Personalised recommendations