Introduction: The Challenge of Reinforcement Learning

  • Richard S. Sutton
Part of the The Springer International Series in Engineering and Computer Science book series (SECS, volume 173)


Reinforcement learning is the learning of a mapping from situations to actions so as to maximize a scalar reward or reinforcement signal. The learner is not told which action to take, as in most forms of machine learning, but instead must discover which actions yield the highest reward by trying them. In the most interesting and challenging cases, actions may affect not only the immediate’s reward, but also the next situation, and through that all subsequent rewards. These two characteristics—trial-and-error search and delayed reward—are the two most important distinguishing features of reinforcement learning.


Reinforcement Learning Learn Automaton Reinforcement Learning Algorithm Machine Learning Research Biological Cybernetic 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Barto, A.G. Bradtke, S.J. & Singh, S.P. (1991). Real-time learning and control using asynchronous dynamic programming (Technical Report 91–57). Amherst, MA: University of Massachusetts, Computer Science Department.Google Scholar
  2. Barto, A.G. & Sutton, R.S. (1981). Landmark learning: An illustration of associative search. Biological Cybernetics, 42, 1–8.MATHCrossRefGoogle Scholar
  3. Barto, A.G., Sutton, R.S. & Anderson, C.W. (1983). Neuronlike elements that can solve difficult learning control problems. IEEE Trans. on Systems, Man, and Cybernetics, SMC-13, 834–846.Google Scholar
  4. Barto, A.G., Sutton, R.S. & Brouwer, P.S. (1981). Associative search network: A reinforcement learning associative memory. Biological Cybernetics, 40, 201–211.MATHCrossRefGoogle Scholar
  5. Booker, L.B. (1988). Classifier systems that learn world models. Machine Learning, 3, 161–192.Google Scholar
  6. Grefenstette, J.J., Ramsey, C.L. & Schultz, A.C. (1990). Learning sequential decision rules using simulation models and competition. Machine Learning, 5, 355–382.Google Scholar
  7. Hampson, S.E. (1983). A neural model of adaptive behavior. Ph.D. dissertation, Dept. of Information and Computer Science, Univ. of Calif., Irvine (Technical Report #213). A revised edition appeared as Connectionist Problem Solving, Boston: Birkhäuser, 1990.Google Scholar
  8. Holland, J.H. (1975). Adaptation in natural and artificial systems. Ann Arbor, MI: Univ. of Michigan Press.Google Scholar
  9. Holland, J.H. (1986). Escaping brittleness: The possibilities of general-purpose learning algorithms applied to parallel rule-based systems. In: R.S. Michalski, J.G. Carbonell, & T.M. Mitchell (Eds.), Machine learning, An artificial intelligence approach, Volume II, 593–623, Los Altos, CA: Morgan Kaufman.Google Scholar
  10. Kaelbling, L.P. (1990). Learning in embedded systems. Ph.D. dissertation, Computer Science Dept, Stanford University.Google Scholar
  11. Mahadevan, S. & Connell, J. (1990). Automatic programming of behavior-based robots using reinforcement learning. IBM technical report. To appear in Artificial Intelligence. Google Scholar
  12. Minsky, M.L. (1961). Steps toward artificial intelligence. Proceedings IRE, 49, 8–30. Reprinted in E.A. Feigenbaum & J. Feldman (Eds.), Computers and Thought, 406–450, New York: McGraw-Hill, 1963.Google Scholar
  13. Narendra, K.S. & Thathachar, M.A.L. (1974). Learning automata-a survey. IEEE Transactions on Systems, Man, and Cybernetics, 4, 323–334. (Or see their textbook, Learning Automata: An Introduction, Englewood Cliffs, NJ: Prentice Hall, 1989.)CrossRefGoogle Scholar
  14. Samuel, A.L. (1959). Some studies in machine learning using the game of checkers. IBM Journal on Research and Development, 3,210–229. Reprinted in E.A. Feigenbaum & J. Feldman (Eds.), Computers and Thought, 71–105, New York: McGraw-Hill, 1963.Google Scholar
  15. Waltz, M.D. & Fu, K.S. (1965). A heuristic approach to reinforcement learning control systems. IEEE Transactions on Automatic Control, AC-10, 390–398.CrossRefGoogle Scholar
  16. Watkins, C.J.C.H. (1989). Learning with delayed rewards. Ph.D. dissertation, Psychology Department, Cambridge University.Google Scholar
  17. Werbos, P.J. (1987). Building and understanding adaptive systems: A statistical/numerical approach to factory automation and brain research. IEEE Transactions on Systems,Man and Cybernetics, Jan-Feb.Google Scholar
  18. Whitehead, S.D. & Ballard, D.H. (1991). Learning to perceive and act by trial and error. Machine Learning, 7, 45–84.Google Scholar

Copyright information

© Springer Science+Business Media New York 1992

Authors and Affiliations

  • Richard S. Sutton
    • 1
  1. 1.GTE LaboratoriesWalthamUSA

Personalised recommendations