A tutorial survey of reinforcement learning

Sathiya Keerthi, S; Ravindran, B

doi:10.1007/BF02743935

A tutorial survey of reinforcement learning

Published: December 1994

Volume 19, pages 851–889, (1994)
Cite this article

Sadhana Aims and scope Submit manuscript

S Sathiya Keerthi¹ &
B Ravindran¹

418 Accesses
15 Citations
Explore all metrics

Abstract

This paper gives a compact, self-contained tutorial survey of reinforcement learning, a tool that is increasingly finding application in the development of intelligent dynamic systems. Research on reinforcement learning during the past decade has led to the development of a variety of useful algorithms. This paper surveys the literature and presents the algorithms in a cohesive framework.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Machine Learning: Algorithms, Real-World Applications and Research Directions

Article 22 March 2021

What Is Machine Learning?

A random forest guided tour

Article 19 April 2016

References

Albus J S 1975 A new approach to manipulator control: The cerebellar model articulation controller (CMAC).Trans. ASME, J. Dyn. Syst., Meas., Contr. 97:220–227
MATH Google Scholar
Anderson C W 1986Learning and problem solving with multilayer connectionist systems. Ph D thesis, University of Massachusetts, Amherst, MA
Google Scholar
Anderson C W 1987 Strategy learning with multilayer connectionist representations. Technical report, TR87-509.3, GTE Laboratories, INC., Waltham, MA
Google Scholar
Anderson C W 1989 Learning to control an inverted pendulum using neural networks.IEEE Contr. Syst. Mag.: 31–37
Anderson C W 1993 Q-Learning with hidden-unit restarting. InAdvances in neural information processing systems 5 (eds) S J Hanson, J D Cowan, C L Giles (San Mateo, CA: Morgan Kaufmann) pp 81–88
Google Scholar
Bacharach J R 1991 A connectionist learning control architecture for navigation. InAdvances in neural information processing systems 3 (eds) R P Lippman, J E Moody, D S Touretzky (San Mateo, CA: Morgan Kaufmann) pp 457–463
Google Scholar
Bacharach J R 1992Connectionist modeling and control of finite state environments. Ph D thesis, University of Massachusetts, Amherst, MA
Google Scholar
Barto A G 1985 Learning by statistical cooperation of self-interested neuron-like computing elements.Human Neurobiology 4: 229–256
Google Scholar
Barto A G 1986 Game-theoritic cooperativity in networks of self interested units. InNeural networks for computing (ed.) J S Denker (New York: American Institute of Physics) pp 41–46
Google Scholar
Barto A G 1992 Reinforcemnet learning and adaptive critic methods. InHandbook of intelligent control: Neural, fuzzy, and adaptive approaches (eds) D A White, D A Sofge (New York: Van Nostrand Reinhold) pp 469–491
Google Scholar
Barto A G, Anandan P 1985 Pattern recognizing stocahstic learning automata.IEEE Trans. Syst., Man Cybern. 15: 360–375
MATH MathSciNet Google Scholar
Barto A G, Anandan P, Anderson C W 1985 Cooperativity in networks of pattern recognizing stochastic learning automata. InProceedings of the Fourth Yale Workshop on Applications of Adaptive Systems Theory, New Haven, CT
Barto A G, Bradtke S J, Singh S P 1992 Real-time learning and control using asymchronous dynamic programming. Technical Report COINS 91-57, University of Massachusetts, Amherst, MA
Google Scholar
Barto A G, Jordan M I 1987 Gradient following without back-propagation in layered networks. InProceedings of the IEEE First Annual Conference on Neural Networks, (eds) M Caudill, C Butler (New York: IEEE) pp II629-II636
Google Scholar
Barto A G, Singh S P 1991 On the computational economics of reinforcement learning. InConnectionist Models Proceedings of the 1990 Summer School. (eds) D S Touretzky, J L Elman, T J Sejnowski, G E Hinton (San Mateo, CA: Morgan Kaufmann) pp 35–44
Google Scholar
Barto A G, Sutton R S 1981, Landmark learning: an illustration of associative search.Biol. Cybern. 42: 1–8
Article MATH Google Scholar
Barto A G, Sutton R S 1982 Simulation of anticipatory responses in classical conditioning by a neuron-like adaptive element.Behav. Brain Res. 4: 221–235
Article Google Scholar
Barto A G, Sutton R S, Anderson C W 1983 Neuronlike elements that can solve difficult learning control problems.IEEE Trans. Syst., Man Cybern. 13: 835–846
Google Scholar
Barto A G, Sutton R S, Brouwer P S 1981 Associative search network: a reinforcement learning associative memory.IEEE Trans. Syst., Man Cybern. 40: 201–211
MATH Google Scholar
Barto A G, Sutton R S, Watkins C J C H 1990 Learning and sequential decision making. InLearning and computational neuroscience: Foundations of adaptive networks. (eds) M Gabriel, J Moore (Cambridge, MA: MIT Press) pp 539–602
Google Scholar
Bellman R E, Dreyfus S E 1962Applied dynamic programming. RAND Corporation
Bertsekas D P 1982 Distributed dynamic programming.IEEE Trans. Autom. Contr. 27: 610–616
Article MATH MathSciNet Google Scholar
Bertsekas D P 1989Dynamic programming: Deterministic and stochastic models (Englewood Cliffs, NJ: Prentice-Hall)
Google Scholar
Bertsekas D P, Tsitsiklis J N 1989Parallel and distributed computation: Numerical methods (Englewood Cliffs, NJ: Prentice-Hall)
MATH Google Scholar
Boyen J 1992Modular neural networks for learning context-dependent game strategies. Masters thesis, Computer Speech and Language Processing, University of Cambridge, Cambridge, England
Google Scholar
Bradtke S J 1993 Reinforcement learning applied to linear quadratic regulation. InAdvances in neural information processing systems 5 (eds) S J Hanson, J D Cowan, C L Giles (San Mateo, CA: Morgan Kaufmann) pp 295–302
Google Scholar
Bradtke S J 1994Incremental dynamic programming for on-line adaptive optimal control. CMPSCI Technical Report 94-62
Brody C 1992 Fast learning with predictive forward models. InAdvances in neural information processing systems 4 (eds) J E Moody, S J Hanson, R P Lippmann (San Mateo, CA: Morgan Kaufmann) pp 563–570
Google Scholar
Brooks R A 1986 Achieving artificial intelligence through building robots. Technical Report, A I Memo 899, Massachusetts Institute of Technology, Aritificial Intelligence Laboratory, Cambridge, MA
Google Scholar
Buckland K M, Lawrence P D 1994 Transition point dynamic programming. InAdvances in neural information processing systems 6 (eds) J D Cowan, G Tesauro, J Alspector (San Fransisco, CA: Morgan Kaufmann) pp 639–646
Google Scholar
Chapman D 1991Vision, Instruction, and Action (Cambridge, MA: MIT Press)
Google Scholar
Chapman D, Kaelbling L P 1991 Input generalization in delayed reinforcement learning: an algorithm and performance comparisons. InProceedings of the 1991 International Joint Conference on Artificial Intelligence
Chrisman L 1992 Planning for closed-loop execution using partially observable markovian decision processes. InProceedings of AAAI
Dayan P 1991a Navigating through temporal difference. InAdvances in neural information processing systems 3 (eds) R P Lippmann, J E Moody, D S Touretzky (San Mateo, CA: Morgan Kaufmann) pp 464–470
Google Scholar
Dayan P 1991bReinforcing connectionism: Learning the statistical way. Ph D thesis, University of Edinburgh, Edinburgh
Google Scholar
Dayan P, Hinton G E 1993 Feudal reinforcement learning. InAdvances in neural information processing systems 5 (eds) S J Hanson, J D Cowan, C L Giles (San Mateo, CA: Morgan Kaufmann) pp 271–278
Google Scholar
Dayan P, Sejnowski T J 1993 TD(λ) converges with probability 1. Technical Report, CNL, The Salk Institute, San Diego, CA
Google Scholar
Dean T L, Wellman M P 1991Planning and control (San Mateo, CA: Morgan Kaufmann)
Google Scholar
Gullapalli V 1990 A stochastic reinforcement algorithm for learning real-valued functions.Neural Networks 3: 671–692
Article Google Scholar
Gullapalli V 1992a Reinforcement learning and its application to control. Technical Report, COINS, 92-10, Ph D thesis, University of Massachusetts, Amherst, MA
Google Scholar
Gullapalli V 1992b A comparison of supervised and reinforcement learning methods on a reinforcment learning task. InProceedings of the 1991 IEEE Symposium on Intelligent Control, Arlignton, VA
Gullapalli V, Barto A G 1994 Convergence of indirect adaptive asynchronous value iteration algorithms. InAdvances in neural information processing systems 6(eds) J D Cowan, G Tesauro, J Alspector (San Fransisco, CA: Morgan Kaufmann) pp 695–702
Google Scholar
Gullapalli V, Franklin J A, Benbrahim H 1994 Acquiring robot skills via reinforcement learning.IEEE Contr. Syst. Mag.: 13–24
Hertz J A, Krogh A S, Palmer R G 1991Introduction to the theory of neural computation (Reading, MA: Addison-Wesley)
Google Scholar
Jaakkola T, Jordan M I, Singh S P 1994 Convergence of stochastic iterative dynamic programming algorithms. InAdvances in Neural information processing systems 6(eds) J D Cowan, G Tesauro, J Alspector (San Fransisco, CA: Morgan Kaufmann) pp. 703–710
Google Scholar
Jacobs R A, Jordan M I, Nowlan S J, Hinton G E 1991 Adaptive mixtures of local experts.Neural Comput. 3: 79–87
Article Google Scholar
Jordan M I, Jacobs R A 1990 Learning to control an unstable system with forward modeling. InAdvances in neural information processing systems 2 (ed.) D S Touretzky (San Mateo, CA: Morgan Kaufmann)
Google Scholar
Jordan M I, Rumelhart D E 1990 Forward models: Supervised learning with a distal teacher. Center for Cognitive Science, Occasional Paper # 40, Massachusetts Institute of Technology, Cambridge, MA
Google Scholar
Kaelbling L P 1990Learning in embedded systems. (Technical Report, TR-90-04) Ph D thesis, Department of Computer Science, Stanford University, Stanford, CA
Google Scholar
Kaelbling L P 1991Learning in Embedded Systems (Cambridge, MA: MIT Press)
Google Scholar
Klopf A H 1972 Brain function and adaptive systems — a heterostatic theory. Teachnical report AFCRL-72-0164, Air Force Cambridge Research Laboratories, Bedford, MA
Google Scholar
Klopf A H 1982The hedonistic neuron: A theory of memory, learning and intelligence. (Washington, D C: Hemisphere)
Google Scholar
Klopf A H 1988 A neuronal model of classical conditioning.Psychobiology 16: 85–125
Google Scholar
Korf R E 1990 Real-time heuristic search.Artif. Intell. 42: 189–211
Article MATH Google Scholar
Kumar P R 1985 A survey of some results in stochastic adaptive control.SIAM J. Contr. Optim. 23: 329–380
Article MATH Google Scholar
Lin C S, Kim H 1991 CMAC-based adaptive critic self-learning control.IEEE Trans. Neural Networks 2: 530–533
Article Google Scholar
Lin L J 1991a Programming robots using reinforcement learning and teaching. InProceedings of the Ninth National Conference on Artificial Intelligence, pages 781–786, MIT Press, Cambridge, MA
Google Scholar
Lin L J 1991b Self-improvement based on reinforcement learning, planning and teaching. InMachine Learning: Proceedings of the Eighth International Workshop (eds) L A Birnbaum, G C Collins (San Mateo, CA: Morgan Kaufmann) pp 323–327
Google Scholar
Lin L J 1991c Self-improving reactive agents: Case studies of reinforcement learning frameworks. InFrom Animals to Animats: Proceedings of the First International Conference on Simulation of Adaptive Behaviour, (Cambridge, MA: MIT Press) pp 297–305
Google Scholar
Lin L J 1992 Self-improving reactive agents based on reinforcement learning, planning and teaching.Mach. Learning 8: 293–321
Google Scholar
Lin L J 1993 Hierarchical learning of robot skills by reinforcement. InProceedings of the 1993 International Conference on Neural Networks pp 181–186
Linden A 1993 On discontinuous Q-functions in reinforcement learning. Available via anonymous ftp from archive.cis.ohio-state.edu in directory /pub/neuroprose
Maes P, Brooks R 1990 Learning to coordinate behaviour. InProceedings of the Eighth National Conference on Artificial Intelligence (San Mateo, CA: Morgan Kaufmann) pp 796–802
Google Scholar
Magriel P 1976Backgammon (New York: Times Books)
Google Scholar
Mahadevan S, Connell J 1991 Scaling reinforcement learning to robotics by exploiting the subsumption architecture. InMachine Learning: Proceedings of the Eighth International Workshop (eds) L A Birnbaum, G C Collins (San Mateo, CA: Morgan Kaufmann) pp 328–332
Google Scholar
Mazzoni P, Andersen R A, Jordan M I 1990A _R-P learning applied to a network model of cortical area 7a. InProceedings of the 1990 International Joint Conference on Neural Networks 2: 373–379
Article Google Scholar
Michie D, Chambers R A 1968 BOXES: An experiment in adaptive control.Machine intelligence 2 (eds) E Dale, D Michie (New York: Oliver and Boyd) pp 137–152
Google Scholar
Millan J D R, Torras C 1992 A reinforcement connectionist approach to robot path finding in non maze-like environments.Mach. Learning 8: 363–395
Google Scholar
Minsky M L 1954Theory of neural-analog reinforcement systems and application to the brain-model problem. Ph D thesis, Princeton University, Princeton, NJ
Google Scholar
Minsky M L 1961 Steps towards artificial intelligence. InProceedings of the Institute of Radio Engineers 49: 8–30 (Reprinted 1963 inComputers and thought (eds) E A Feigenbaum, J Feldman (New York: McGraw-Hill) pp 406–450
MathSciNet Google Scholar
Moore A W 1990Efficient memory-based learning for robot control. Ph D thesis, University of Cambridge, Cambridge, UK
Google Scholar
Moore A W 1991 Variable resolution dynamic programming: Efficiently learning action maps in multivariate real-vlaued state-spaces. InMachine Learning: Proceedings of the Eighth International Workshop (eds) L A Birnbaum, G C Collins (San Mateo, CA: Morgan Kaufmann) pp 328–332
Google Scholar
Moore A W, Atkeson C G 1993 Memory-based reinforcement learning: Efficient computation with prioritized sweeping. InAdvances in neural information processing systems 5 (eds) S J Hanson, J D Cowan, C L Giles (San Mateo, CA: Morgan Kaufmann) pp 263–270
Google Scholar
Mozer M C, Bacharach J 1990a Discovering the structure of reactive environment by exploration. InAdvances in neural information processing 2 (ed.) D S Touretzky (San Mateo, CA: Morgan Kaufmann) pp 439–446
Google Scholar
Mozer M C, Bacharach J 1990b Discovering the structure of reactive environment by exploration.Neural Computation 2: 447–457
Google Scholar
Narendra K, Thathachar M A L 1989Learning automata: An introduction (Englewood Cliffs, NJ: Prentice Hall)
Google Scholar
Peng J, Williams R J 1993 Efficient learning and planning within the Dyna framework. InProceedings of the 1993 International Joint Conference on Neural Networks, pp 168–174
Platt J C 1991 Learning by combining memorization and gradient descent.Advances in neural information processing systems 3 (eds) R P Lippmann, J E Moody, D S Touretzky (San Mateo, CA: Morgan Kaufmann) pp 714–720
Google Scholar
Rosen B E, Goodwin J M, Vidal J J 1991 Adaptive range coding.Advances in neural information processing systems 3 (eds) R P Lippmann, J E Moody, D S Touretzky (San Mateo, CA: Morgan Kaufmann) pp 486–494
Google Scholar
Ross S 1983Introduction to stochastic dynamic programming (New York: Academic Press)
MATH Google Scholar
Rummery G A, Niranjan M 1994 On-line Q-learning using connectionist systems. Technical Report CUED/F-INFENG/TR 166, University of Cambridge, Cambridge, England
Samuel A L 1959 Some studies in machine learning using the game of checkers.IBM J. Res. Dev.: 210–229 (Reprinted 1963 inComputers and thought (eds) E A Feigenbaum, J Feldman (New York: McGraw-Hill)
Samuel A L 1967 Some studies in machine learning using the game of checkers, II Recent progress.IBM J. Res. Dev.: 601–617
Selfridge O, Sutton R S, Barto A G 1985 Training and tracking in robotics. InProceedings of the Ninth International Joint Conference of Artificial Intelligence (ed.) A Joshi (San Mateo, CA: Morgan Kaufmann) pp 670–672
Google Scholar
Shepansky J F, Macy S A 1987 Teaching artificial neural systems to drive: Manual training techniques for autonomous systems. InProceedings of the First Annual International Conference on Neural Networks, San Diego, CA
Singh S P 1991 Transfer of learning across composition of sequential tasks. InMachine Learning: Proceedings of the Eighth International Workshop (eds) L A Birnbaum, G C Collins (San Mateo, CA: Morgan Kaufmann) pp 348–352
Google Scholar
Singh S P 1992a Reinforcement learning with a hierarchy of abstract models. InProceedings of the Tenth National Conference on Artificial Intelligence, San Jose, CA
Singh S P 1992b On the efficient learning of multiple sequential tasks. InAdvances in neural information processing systems 4 (eds) J E Moody, S J Hanson, R P Lippmann (San Mateo, CA: Morgan Kaufmann) pp 251–258
Google Scholar
Singh S P 1992c Scaling Reinforcement learning algorithms by learning variable temporal resolution models. InProceedings of the Ninth International Machine Learning Conference
Singh S P 1992d Transfer of learning by composing solutions of elemental sequential tasks.Mach. Learning 8: 323–339
MATH Google Scholar
Singh S P, Barto A G, Grupen R, Connelly C 1994 Robust reinforcement learning in motion planning. InAdvances in neural information processing systems 6 (eds) J D Cowan, G Tesauro, J Alspector (San Fransisco, CA: Morgan Kaufmann) pp 655–662
Google Scholar
Singh S P, Yee R C 1993 An upper bound on the loss from approximate optimal-value functions. Technical Report, University of Massachusetts, Amherst, MA
Google Scholar
Sutton R S 1984Temporal credit assignment in reinforcement learning. Ph D thesis, University of Massachusetts, Amherst, MA
Google Scholar
Sutton R S 1988 Learning to predict by the method of temporal differences.Mach. Learning 3: 9–44
Google Scholar
Sutton R S 1990 Integrated architecture for learning, planning, and reacting based on approximating dyanmic programming. InProceedings of the Seventh International Conference on Machine Learning (San Mateo, CA: Morgan Kaufmann) pp 216–224
Google Scholar
Sutton R S 1991a Planning by incremental dynamic programming. InMachine Learning: Proceedings of the Eighth International Workshop (eds) L A Birnbaum, G C Collins (San Mateo, CA: Morgan Kaufmann) pp 353–357
Google Scholar
Sutton R S 1991b Integrated modeling and control based on reinforcement learning and dynamic programming. InAdvances in neural information processing systems 3 (eds) R P Lippmann, J E Moody, D S Touretzky (San Mateo, CA: Morgan Kaufmann) pp 471–478
Google Scholar
Sutton R S, Barto A G 1981 Toward a modern theory of adaptive networks: Expectation and prediction.Psychol. Rev. 88: 135–170
Article Google Scholar
Sutton R S, Barto A G 1987 A temporal-difference model of classical conditioning. InProceedings of the Ninth Annual Conference of the Cognitive Science Society, Erlbaum, Hillsdale, NJ
Sutton R S, Barto A G 1990 Time-derivative models of Pavlovian reinforcement.Learning and Computational Neuroscience: Foundations of Adaptive Networks (eds) M Gabriel, J Moore (Cambridge, MA: MIT Press) pp 497–537
Google Scholar
Sutton R S, Singh S P 1994 On step-size and bias in TD-learning. InProceedings of the Eighth Yale Workshop on Adaptive and Learning Systems Yale University, pp 91–96
Sutton R S, Barto A G, Williams R J 1991 Reinforcement learning is direct adaptive optimal control. InProceedings of th American Control Conference Boston, MA, pp 2143–2146
Tan M 1991 Larning a cost-sensitive internal representation for reinforcement learning. InMachine Learning: Proceedings of the Eighth International Workshop (eds) L A Birnbaum, G C Collins (San Mateo, CA: Morgan Kaufmann) pp 358–362
Google Scholar
Tesauro G J 1992 Practical issues in temporal difference learning.Mach. Learning 8: 257–278
MATH Google Scholar
Tham C K, Prager R W 1994 A modular Q-learning architecture for manipulator task decomposition. InMachine Learning: Proceedings of the Eleventh Intenational Conference (eds) W W Cohen, H Hirsh (Princeton, NJ: Morgan Kaufmann) (Available via gopher from Dept. of Eng., University of Cambridge, Cambridge, England)
Google Scholar
Thrun S B 1986 Efficient exploration in reinforcement learning. Technical report CMU-CS-92-102, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA
Google Scholar
Thrun S B 1993 Exploration and model building in mobile robot domains. InProceedings of the 1993 International Conference on Neural Networks
Thrun S B, Muller K 1992 Active exploration in dynamic environments. InAdvances in neural information processing systems 4 (eds) J E Moody, S J Hanson, R P Lippmann (San Mateo, CA: Morgan Kaufmann)
Google Scholar
Thrun S B, Schwartz A 1993 Issues in using function approximation for reinforcement learning. InProceedings of the Fourth Connectionist Models Summer School (Hillsdale, NJ: Lawrence Erlbaum)
Google Scholar
Tsitsiklis J N 1993 Asynchronous stochastic approximation and Q-learning. Technical Report, LIDS-P-2172, Laboratory for Information and Decision Systems, MIT, Cambridge, MA
Google Scholar
Utgoff P E, Clouse J A 1991 Two kinds of training information for evaluation function learning. InProceedings of the Ninth Annual Conference on Artificial Intelligence (San Mateo, CA: Morgan Kaufmann) pp 596–600
Google Scholar
Watkins 1989Learning from delayed rewards. Ph D thesis, Cambridge University, Cambridge, England
Google Scholar
Watkins C J C H, Dayan P 1992 Technical note: Q-learning.Mach. Learning 8: 279–292
MATH Google Scholar
Werbos P J 1987 Building and understanding adaptive systems: a statistical/numerical approach to factory automation and brain research.IEEE Trans. Syst, Man Cybern.
Werbos P J 1988 Generalization of back propagation with application to recurrent gas market model.Neural Networks 1: 339–356
Article Google Scholar
Werbos P J 1989 Neural network for control and system identification. InProceedings of the 28th Conference on Decision and Control Tampa, FL, pp 260–265
Werbos P J 1990a Consistency of HDP applied to simple reinforcement learning problems.Neural Networks 3: 179–189
Article Google Scholar
Werbos P J 1990b A menu of designs for reinforcement learning over time, InNeural networks for control (eds) W T Miller, R S Sutton, P J Werbos (Cambridge, MA: MIT Press) pp 67–95
Google Scholar
Werbos P J 1992 Approximate dynamic programming for real-time control and neural modeling. InHandbook of intelligent control: Neural, fuzzy, and adaptive approaches (eds) D A White, D A Sofge (New York: Van Nostrand Reinhold) pp 493–525
Google Scholar
Whitehead S D 1991a A complexity analysis of cooperative mechanisms in reinforcement learning. InProceedings of the Ninth Conference on Artificial Intelligence, (Cambridge, MA: MIT Press) pp 607–613
Google Scholar
Whitehead S D 1991b Complexity and cooperation in Q-learning. InMachine Learning: Proceedings of the Eighth International Workshop (eds) L A Birnbaum, G C Collins (San Mateo, CA: Morgan Kaufmann) pp 363–367
Google Scholar
Whitehead S D, Ballard D H 1990 Active perception and reinforcement learning.Neural Comput. 2: 409–419
Google Scholar
Williams R J 1986 Reinforcement learning in connectionist networks: a mathematical analysis. Technical report ICS 8605, Institute for Cognitive Science, University of California at San Diego, La Jolla, CA
Google Scholar
Williams R J 1987 Reinforcement-learning connectionist systems. Technical report NU-CCS-87-3, College of Computer Science, Northeastern University, Boston, MA
Google Scholar
Williams R J, Baird L C III 1990 A mathematical analysis of actor-critic architectures for learning optimal controls through incremental dynamic programming. InProceedings of the Sixth Yale Workshop on Adaptive and Learning Systems New Haven, CT, pp 96–101

Download references

Author information

Authors and Affiliations

Department of Computer Science and Automation, Indian Institute of Science, 560 012, Bangalore, India
S Sathiya Keerthi & B Ravindran

Authors

S Sathiya Keerthi
View author publications
You can also search for this author in PubMed Google Scholar
B Ravindran
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sathiya Keerthi, S., Ravindran, B. A tutorial survey of reinforcement learning. Sadhana 19, 851–889 (1994). https://doi.org/10.1007/BF02743935

Download citation

Issue Date: December 1994
DOI: https://doi.org/10.1007/BF02743935

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A tutorial survey of reinforcement learning

Abstract

Access this article

Similar content being viewed by others

Machine Learning: Algorithms, Real-World Applications and Research Directions

What Is Machine Learning?

A random forest guided tour

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A tutorial survey of reinforcement learning

Abstract

Access this article

Similar content being viewed by others

Machine Learning: Algorithms, Real-World Applications and Research Directions

What Is Machine Learning?

A random forest guided tour

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation