ICANN ’93 pp 446-450 | Cite as

A ‘Self-Referential’ Weight Matrix

  • J. Schmidhuber


Weight modifications in traditional neural nets are computed by hard-wired algorithms. Without exception, all previous weight change algorithms have many specific limitations. Is it (in principle) possible to overcome limitations of hard-wired algorithms by allowing neural nets to run and improve their own weight change algorithms? This paper constructively demonstrates that the answer (in principle) is ‘yes’. I derive an initial gradientbased sequence learning algorithm for a ‘self-referential’ recurrent network that can ‘speak’ about its own weight matrix in terms of activations. It uses some of its input and output units for observing its own errors and for explicitly analyzing and modifying its own weight matrix, including those parts of the weight matrix responsible for analyzing and modifying the weight matrix. The result is the first ‘introspective’ neural net with explicit potential control over all of its own adaptive parameters. A disadvantage of the algorithm is its high computational complexity per time step which is independent of the sequence length and equals O(nconnlognconn), where riconn is the number of connections. Another disadvantage is the high number of local minima of the unusually complex error surface. The purpose of this paper, however, is not to come up with the most efficient ‘introspective’ or ‘self-referential’ weight change algorithm, but to show that such algorithms are possible at all.


Weight Matrix Training Sequence Output Unit High Computational Complexity Input Unit 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    K. Möller and S. Thrun. Task modularization by network modulation. In J. Rault, editor, Proceedings of Neuro-Nimes’ 90, pages 419–432, November 1990.Google Scholar
  2. [2]
    A. J. Robinson and F. Fallside. The utility driven dynamic error propagation network. Technical Report CUED/F-INFENG/TR.1, Cambridge University Engineering Department, 1987.Google Scholar
  3. [3]
    J. H. Schmidhuber. A fixed size storage O(n 3) time complexity learning algorithm for fully recurrent continually running networks. Neural Computation, 4(2): 243–248, 1992.CrossRefGoogle Scholar
  4. [4]
    J. H. Schmidhuber. Learning to control fast-weight memories: An alternative to recurrent nets. Neural Computation, 4(1): 131–139, 1992.CrossRefGoogle Scholar
  5. [5]
    J. H. Schmidhuber. An introspective network that can learn to run its own weight change algorithm. In Proc. of the Third International Conference on Artificial Neural Networks, Brighton. IEE, 1993. Accepted for publication.Google Scholar
  6. [6]
    J. H. Schmidhuber. A neural network that embeds its own meta-levels. In Proc. of the International Conference on Neural Networks’ 93, San Francisco. IEEE, 1993. Accepted for publication.Google Scholar
  7. [7]
    R. J. Williams. Complexity of exact gradient computation algorithms for recurrent neural networks. Technical Report Technical Report NU-CCS-89-27, Boston: Northeastern University, College of Computer Science, 1989.Google Scholar
  8. [8]
    R. J. Williams and D. Zipser. A learning algorithm for continually running fully recurrent networks. Neural Computation, 1(2): 270–280, 1989.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Limited 1993

Authors and Affiliations

  • J. Schmidhuber
    • 1
  1. 1.Institut für InformatikTechnische Universität MünchenMünchen 40Germany

Personalised recommendations