Skip to main content

Advertisement

SpringerLink
Go to cart
  • Log in
  1. Home
  2. Machine Learning
  3. Article
Learning to predict by the methods of temporal differences
Download PDF
Your article has downloaded

Similar articles being viewed by others

Slider with three articles shown per slide. Use the Previous and Next buttons to navigate the slides or the slide controller buttons at the end to navigate through each slide.

Asymptotically Optimal Strategies for Online Prediction with History-Dependent Experts

11 March 2021

Jeff Calder & Nadejda Drenska

Clark’s Equation: A Useful Difference Equation for Population Models, Predictive Control, and Numerical Approximations

09 July 2020

Eduardo Liz

The Paradox of Predictability

19 March 2021

Victor Gijsbers

An overview on evolving systems and learning from stream data

16 March 2020

Daniel Leite, Igor Škrjanc & Fernando Gomide

Scientific discovery, causal explanation, and process model induction

30 May 2019

Pat Langley

Tracking the Optimal Sequence of Predictive Strategies

01 December 2018

V. V. V’yugin & V. G. Trunov

Minimizing prediction errors in predictive processing: from inconsistency to non-representationalism

13 December 2019

Thomas van Es

Confidence, credibility and prediction

19 June 2018

Murray Aitkin & Charles Liu

On the Szegő—Kolmogorov prediction theorem

10 December 2021

Alexander Olevskii & Alexander Ulanovskii

Download PDF
  • Published: August 1988

Learning to predict by the methods of temporal differences

  • Richard S. Sutton1 

Machine Learning volume 3, pages 9–44 (1988)Cite this article

  • 39k Accesses

  • 1935 Citations

  • 30 Altmetric

  • Metrics details

Abstract

This article introduces a class of incremental learning procedures specialized for prediction-that is, for using past experience with an incompletely known system to predict its future behavior. Whereas conventional prediction-learning methods assign credit by means of the difference between predicted and actual outcomes, the new methods assign credit by means of the difference between temporally successive predictions. Although such temporal-difference methods have been used in Samuel's checker player, Holland's bucket brigade, and the author's Adaptive Heuristic Critic, they have remained poorly understood. Here we prove their convergence and optimality for special cases and relate them to supervised-learning methods. For most real-world prediction problems, temporal-difference methods require less memory and less peak computation than conventional methods and they produce more accurate predictions. We argue that most problems to which supervised learning is currently applied are really prediction problems of the sort to which temporal-difference methods can be applied to advantage.

Download to read the full article text

Working on a manuscript?

Avoid the common mistakes

References

  • Ackley, D. H., Hinton, G. H., & Sejnowski, T. J. (1985). A learning algorithm for Boltzmann machines. Cognitive Science, 9, 147–169.

    Google Scholar 

  • Anderson, C. W. (1986). Learning and problem solving with multilayer connectionist systems. Doctoral dissertation. Department of Computer and Information Science. University of Massachusetts, Amherst.

  • Anderson, C. W. (1987). Strategy learning with multilayer connectionist representations. Proceedings of the Fourth International Workshop on Machine Learning (pp. 103–114). Irvine. CA: Morgan Kaufmann.

    Google Scholar 

  • Barto, A. G. (1985). Learning by statistical cooperation of self-interested neuron-like computing elements. Human Neurobiology, 4, 229–256.

    Google Scholar 

  • Barto, A. G., Sutton, R. S., & Anderson, C. W. (1983). Neuronlike elements that can solve difficult learning control problems. IEEE Transactions on Systems. Man, and Cybernetics, 13, 834–846.

    Google Scholar 

  • Booker, L. B. (1982). Intelligent behavior as an adaptation to the task environment. Doctoral dissertation. Department of Computer and Communication Sciences, University of Michigan. Ann Arbor.

  • Christensen, J. (1986). Learning static evaluation functions by linear regression. In T. M. Mitchell, J. G. Carbonell, & R. S. Michalski (Eds.). Machine learning: A guide to current research. Boston: Kluwer Academic

    Google Scholar 

  • Christensen, J., & Korf, R. E. (1986). A unified theory of hemistic evaluation functions and its application to learning. Proceedings of the Fifth National Conference on Artificial Intelligence (pp. 148–152). Philadelphia, PA: Morgan Kaufmann.

    Google Scholar 

  • Denardo, E. V. (1982). Dynamic programming: Models and applications. Englewood Cliffs, NJ: Prentice-Hall.

    Google Scholar 

  • Dietterich, T. G., & Michalski, R. S. (1986). Learning to predict sequences. In R. S. Michalski, J. G. Carbonell, & T. M. Mitchell (Eds.). Machine learning: An artificial intelligence approach (Vol. 2). Los Altos, CA Morgan Kaufmann.

    Google Scholar 

  • Gelperin, A., Hopfield, J. J., Tank, D. W. (1985). The logic of Limax learning. In A. Selverston (Ed.), Model neural networks and behavior. New York: Plenum Press.

    Google Scholar 

  • Hampson, S. E. (1983). A neural model of adaptive behavior. Doctoral dissertation, Department of Information and Computer Science. University of California, Irvine.

  • Hampson, S. E., & Volper, D. J. (1987). Disjunctive models of boolean category learning. Biological Cybernetics, 56, 121–137.

    Google Scholar 

  • Holland, J. H. (1986). Escaping brittleness: The possibilities of general-purpose learning algorithms applied to parallel rule-based systems. In R. S. Michalski, J. G. Carbonell, & T. M. Mitchell (Eds.). Machine learning: An artificial intelligence approach (Vol. 2). Los Altos, CA: Morgan Kaufmann.

    Google Scholar 

  • Kehoe, E. J., Schreurs, B. G., & Graham, P. (1987). Temporal primacy over-rides prior training in serial compound conditioning of the rabbit's nictitating membrane response. Animal Learning and Behavior, 15, 455–464.

    Google Scholar 

  • Kemeny, J. G., & Snell, J. L. (1976). Finite Markov chains, New York: Springer-Verlag.

    Google Scholar 

  • Klopf, A. H. (1987). A neuronal model of classical conditioning (Technical Report 87–1139). OH: Wright-Patterson Air Force Base, Wright Aeronautical Laboratories.

    Google Scholar 

  • Moore, J. W., Desmond, J. E., Berthier, N. E., Blazis, D. E. J., Sutton, R. S., & Barto, A. G. (1986). Simulation of the classically conditioned nictitating membrane response by a neuron-like adaptive element: Response topography, neuronal firing and interstimulus intervals. Behavioral Brain Research, 21, 143–154.

    Google Scholar 

  • Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1985). Learning internal representations by error propagation (Technical Report No. 8506). La Jolla: University of California, San Diego, Institute for Cognitive Science. Also in D. E. Rumelhart & J. L. McClelland (Eds.). Paralled distributed processing: Explorations in the microstructure of cognition (Vol. 1). Cambridge, MA: MIT Press.

    Google Scholar 

  • Samuel, A. L. (1959). Some studies in machine learning using the game of checkers. IBM Journal on Research and Development, 3, 210–229. Reprinted in E. A. Feigenbaum & J. Feldman (Eds.). Computers and though. New York: McGraw-Hill.

    Google Scholar 

  • Sutton, R. S. (1984). Temporal credit assignment in reinforcement learning Doctoral dissertation, Department of Computer and Information Science. University of Massachusetts. Amherst.

  • Sutton, R. S., & Barto, A. G. (1981a). Toward a modern theory of adaptive networks: Expectation and prediction. Psychological Review, 88, 135–171.

    Google Scholar 

  • Sutton, R. S., & Barto, A. G. (1981b). An adaptive network that constructs and uses an internal model of its environment. Cognition and Brain Theory, 4, 217–246.

    Google Scholar 

  • Sutton, R. S., & Barto, A. G. (1987). A temporal-difference model of classical conditioning. Proceedings of the Ninth Annual Conference of the Cognitive Science Society (pp. 355–378). Seattle, WA: Lawrence Erlbaum.

    Google Scholar 

  • Sutton, R. S., & Pinette, B. (1985). The learning of world models by connectionist networks. Proceedings of the Seventh Annual Conference of the Cognitive Science Society (pp. 54–64). Irvine, CA: Lawrence Erlbaum.

    Google Scholar 

  • Varga, R. S. (1962). Matrix iterative analysis. Englewood Cliffs, NJ: Prentice-Hall.

    Google Scholar 

  • Widrow B., & Hoff, M. E. (1960). Adaptive switching circuits, 1960 WESCON Convention Record, Part IV (pp. 96–104).

  • Widrow, B., & Stearns, S. D. (1985). Adaptive signal processing. Englewood Cliffs, NJ: Prentice-Hall.

    Google Scholar 

  • Williams, R. J. (1986). Reinforcement learning in connectionist networks: A mathematical analysis (Technical Report No. 8605). La Jolla: University of California. San Diego. Institute for Cognitive Science.

    Google Scholar 

  • Witten, I. H. (1977). An adaptive optimal controller for discrete-time Markov environments. Information and Control, 34, 286–295.

    Google Scholar 

Download references

Author information

Authors and Affiliations

  1. GTE Laboratories Incorporated, 40 Sylvan Road, 02254, Waltham, MA, U.S.A.

    Richard S. Sutton

Authors
  1. Richard S. Sutton
    View author publications

    You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Sutton, R.S. Learning to predict by the methods of temporal differences. Mach Learn 3, 9–44 (1988). https://doi.org/10.1007/BF00115009

Download citation

  • Received: 22 April 1987

  • Revised: 04 February 1988

  • Issue Date: August 1988

  • DOI: https://doi.org/10.1007/BF00115009

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Keywords

  • Incremental learning
  • prediction
  • connectionism
  • credit assignment
  • evaluation functions
Download PDF

Working on a manuscript?

Avoid the common mistakes

Advertisement

Over 10 million scientific documents at your fingertips

Switch Edition
  • Academic Edition
  • Corporate Edition
  • Home
  • Impressum
  • Legal information
  • Privacy statement
  • California Privacy Statement
  • How we use cookies
  • Manage cookies/Do not sell my data
  • Accessibility
  • FAQ
  • Contact us
  • Affiliate program

Not logged in - 95.216.99.153

Not affiliated

Springer Nature

© 2023 Springer Nature Switzerland AG. Part of Springer Nature.