Abstract
This paper introduces the application of gradient descent methods to meta-learning. The concept of “meta-learning”, i.e. of a system that improves or discovers a learning algorithm, has been of interest in machine learning for decades because of its appealing applications. Previous meta-learning approaches have been based on evolutionary methods and, therefore, have been restricted to small models with few free parameters. We make meta-learning in large systems feasible by using recurrent neural networks with their attendant learning routines as meta-learning systems. Our system derived complex well performing learning algorithms from scratch. In this paper we also show that our approach performs non-stationary time series prediction.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
R. Caruana. Learning many related tasks at the same time with backpropagation. In G. Tesauro, D. Touretzky, and T. Leen, editors, Advances in Neural Information Processing Systems 7, pages 657–664. The MIT Press, 1995.
D. Chalmers. The evolution of learning: An experiment in genetic connectionism. In D. S. Touretzky, J. L. Elman, T. J. Sejnowski, and G. E. Hinton, editors, Proc. of the 1990 Con. Models Summer School, pages 81–90. Morgan Kaufmann, 1990.
N. E. Cotter and P. R. Conwell. Fixed-weight networks can learn. In Int. Joint Conference on Neural Networks, volume II, pages 553–559. IEEE, NY, 1990.
H. Ellis. Transfer of Learning. MacMillan, New York, NY, 1965.
J. L. Elman. Finding structure in time. Technical Report CRL 8801, Center for Research in Language, University of California, San Diego, 1988.
D. Haussler. Quantifying inductive bias: AI learning algorithms and Valiant’s learning framework. Artificial Intelligence, 36:177–221, 1988.
S. Hochreiter and J. Schmidhuber. Flat minima. Neural Comp., 9(1):1–42, 1997.
S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 9(8):1735–1780, 1997.
A. J. Robinson and F. Fallside. The utility driven dynamic error propagation network. Technical Report CUED/F-INFENG/TR.1, Camb. Uni. Eng. Dep., 1987.
T. P. Runarsson and M. T. Jonsson. Evolution and design of distributed learning rules. In 2000 IEEE Symposium of Combinations of Evolutionary Computing and Neural Networks, San Antonio, Texas, USA, page 59. 2000.
J. Schmidhuber, J. Zhao, and M. Wiering. Simple principles of metalearning. Technical Report IDSIA-69-96, IDSIA, 1996.
J. Schmidhuber, J. Zhao, and M. Wiering. Shifting inductive bias with success-story algorithm, adaptive levin search, and incremental self-improvement. Machine Learning, 28:105–130, 1997.
J. Schmidhuber. Evolutionary principles in self-referential learning, or on learning how to learn: The meta-meta-... hook. Inst. für Inf., Tech. Univ. München, 1987.
S. Thrun and L. Pratt, editors. Learning To Learn. Kluwer Academic Pub., 1997.
P. Utgoff. Shift of bias for inductive concept learning. In R. Michalski, J. Carbonell, and T. Mitchell, editors, Machine Learning, volume 2. Morgan Kaufmann, 1986.
P. J. Werbos. Generalization of backpropagation with application to a recurrent gas market model. Neural Networks, 1, 1988.
R. J. Williams and D. Zipser. A learning algorithm for continually running fully recurrent networks. Technical Report ICS 8805, Univ. of Cal., La Jolla, 1988.
R. J. Williams and D. Zipser. Gradient-based learning algorithms for recurrent networks and their computational complexity. In Y. Chauvin and D. E. Rumelhart, editors, Back-propagation: Theory, Architectures and Applications. Hillsdale, 1992.
A. S. Younger, P. R. Conwell, and N. E. Cotter. Fixed-weight on-line learning. IEEE-Transactions on Neural Networks, 10(2):272–283, 1999.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hochreiter, S., Younger, A.S., Conwell, P.R. (2001). Learning to Learn Using Gradient Descent. In: Dorffner, G., Bischof, H., Hornik, K. (eds) Artificial Neural Networks — ICANN 2001. ICANN 2001. Lecture Notes in Computer Science, vol 2130. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44668-0_13
Download citation
DOI: https://doi.org/10.1007/3-540-44668-0_13
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42486-4
Online ISBN: 978-3-540-44668-2
eBook Packages: Springer Book Archive