Skip to main content

Reinforcement Learning and Adaptive Control

  • Living reference work entry
  • First Online:
Encyclopedia of Systems and Control

Abstract

Reinforcement learning (RL) is a machine learning paradigm in which an agent attempts to learn a control policy that can generate the desired sequence of actions for achieving a higher level objective. RL promises to provide a learning mechanism via which autonomous agents can learn to control themselves directly through experience, without requiring manual coding of control policies. Similar to other machine learning paradigms, RL research heavily focuses on end-to-end learning, which in this case is learning of policies directly through experience. Recent successes of RL have shown that agents can learn to decision making and control policies on complex simulations for which it would have been very difficult to manually create control policies. Some examples include chess, go, and more recently complex continuous time- simulated domains. Some pressing issues include sample complexity, robustness, and reliable simulations to the real-world transfer.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Bibliography

  • Ã…ström KJ, Wittenmark B (1995) Adaptive control, 2nd edn. Addison-Wesley, Reading

    MATH  Google Scholar 

  • Ammar HB, Tuyls K, Taylor ME, Driessens K, Weiss G (2012) Reinforcement learning transfer via sparse coding. In: Proceedings of the 11th international conference on autonomous agents and multiagent systems, vol 1. International Foundation for Autonomous Agents and Multiagent Systems, pp 383–390

    Google Scholar 

  • Ammar HB, Eaton E, Ruvolo P, Taylor ME (2015) Unsupervised cross-domain transfer in policy gradient reinforcement learning via manifold alignment. In: Proceedings of the AAAI

    Google Scholar 

  • Axelrod A, Chowdhary G (2015) The explore-exploit dilemma in nonstationary decision making under uncertainty. In: The explore-exploit dilemma in nonstationary decision making under uncertainty, ser 2198–4182, 1st edn. Springer International Publishing. https://www.springerprofessional.de/en/the-explore-exploit-dilemma-in-nonstationary-decision-making-und/7454158

  • Banerjee B, Stone P (2007) General game learning using knowledge transfer. In: IJCAI, pp 672–677

    Google Scholar 

  • Bertsekas DP, Tsitsiklis JN (1996) Neuro-dynamic programming, vol 5. Athena Scientific Belmont

    MATH  Google Scholar 

  • Bertsekas DP, Bertsekas DP, Bertsekas DP, Bertsekas DP (1995) Dynamic programming and optimal control. Athena Scientific, Belmont

    MATH  Google Scholar 

  • Busoniu L, Babuska R, Schutter BD, Ernst D (2010) Reinforcement learning and dynamic programming using function approximators, 1st edn. CRC Press

    MATH  Google Scholar 

  • Calise A, Hovakimyan N, Idan M (2001) Adaptive output feedback control of nonlinear systems using neural networks. Automatica 37(8):1201–1211. Special issue on Neural Networks for Feedback Control

    Google Scholar 

  • Chowdhary G, Liu M, Grande R, Walsh T, How J, Carin L (2014) Off-policy reinforcement learning with gaussian processes. IEEE/CAA J Automat Sin 1(3):227–238

    Article  Google Scholar 

  • Chowdhary G, Kingravi HA, How JP, Vela PA (2015) Bayesian nonparametric adaptive control using gaussian processes. IEEE Trans Neural Netw Learn Syst 26(3):537–550

    Article  MathSciNet  Google Scholar 

  • Chua K, Calandra R, McAllister R, Levine S (2018) Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In: Advances in Neural Information Processing Systems 31, Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, Eds. Curran Associates, Inc., pp 4754–4765 [Online]. Available: http://papers.nips.cc/paper/ 7725-deep-reinforcement-learning-in-a-handful-of-tria ls-using-probabilistic-dynamics-models.pdf

  • Deisenroth M, Rasmussen CE (2011) Pilco: a model-based and data-efficient approach to policy search. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 465–472

    Google Scholar 

  • Duan Y, Chen X, Houthooft R, Schulman J, Abbeel P (2016) Benchmarking deep reinforcement learning for continuous control. In: International conference on machine learning, pp 1329–1338

    Google Scholar 

  • Geramifard A, Walsh TJ, Tellex S, Chowdhary G, Roy N, How JP et al (2013) A tutorial on linear function approximators for dynamic programming and reinforcement learning. Found Trends Mach Learn 6(4):375–451

    Article  MATH  Google Scholar 

  • Heess N, Sriram S, Lemmon J, Merel J, Wayne G, Tassa Y, Erez T, Wang Z, Eslami A, Riedmiller M, et al (2017) Emergence of locomotion behaviours in rich environments. arXiv preprint arXiv:1707.02286

    Google Scholar 

  • Joshi G, Chowdhary G (2018) Cross-domain transfer in reinforcement learning using target apprentice. In: 2018 IEEE international conference on robotics and automation (ICRA). IEEE, pp 7525–7532

    Google Scholar 

  • Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: a survey. J Artif Intell Res 4:237–285

    Article  Google Scholar 

  • Kaelbling L, Littman M, Cassandra A (1998) Planning and acting in partially observable stochastic domains. Artif Intell 101:99–134

    Article  MathSciNet  MATH  Google Scholar 

  • Kamthe S, Deisenroth M (2018) Data-efficient reinforcement learning with probabilistic model predictive control. In: International conference on artificial intelligence and statistics, pp 1701–1710

    Google Scholar 

  • Kiumarsi B, Lewis FL, Modares H, Karimpour A, Naghibi-Sistani M-B (2014) Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics. Automatica 50(4):1167–1175

    Article  MathSciNet  MATH  Google Scholar 

  • Kuss M (2006) Gaussian process models for robust regression, classification, and reinforcement learning. Ph.D. dissertation, Technische Universität Darmstadt

    Google Scholar 

  • Levine S, Koltun V (2013) Guided policy search. In: International conference on machine learning, pp 1–9

    Google Scholar 

  • Levine S, Wagener N, Abbeel P (2015) Learning contact-rich manipulation skills with guided policy search. In: 2015 IEEE International Conference on Robotics and Automation (ICRA). IEEE, pp 156–163

    Google Scholar 

  • Levine S, Pastor P, Krizhevsky A, Quillen D (2016) Learning hand-eye coordination for robotic grasping with large-scale data collection. In: International symposium on experimental robotics. Springer, pp 173–184

    Google Scholar 

  • Lewis FL, Vrabie D, Syrmos VL (2012) Optimal control. John Wiley & Sons, Hoboken

    Book  MATH  Google Scholar 

  • Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971

    Google Scholar 

  • Liu L, Hodgins J (2017) Learning to schedule control fragments for physics-based characters using deep Q-learning. ACM Trans Graph (TOG) 36(3):29

    Google Scholar 

  • Liu M, Chowdhary G, Da Silva BC, Liu S-Y, How JP (2018) Gaussian processes for learning and control: a tutorial with examples. IEEE Control Syst Mag 38(5):53–86

    Article  MathSciNet  Google Scholar 

  • Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533

    Article  Google Scholar 

  • Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: International conference on machine learning, pp 1928–1937

    Google Scholar 

  • Modares H, Lewis FL, Naghibi-Sistani M-B (2014) Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. Automatica 50(1):193–202

    Article  MathSciNet  MATH  Google Scholar 

  • Nagabandi A, Kahn G, Fearing RS, Levine S (2018) Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning. In: 2018 IEEE international conference on robotics and automation (ICRA). IEEE, pp 7559–7566

    Google Scholar 

  • Narendra KS, Balakrishnan J (1997) Adaptive control using multiple models. IEEE Trans Autom Control 42(2):171–187

    Article  MathSciNet  MATH  Google Scholar 

  • Ng AY, Jordan M (2000) Pegasus: a policy search method for large MDPs and POMDPs. In: Proceedings of the sixteenth conference on uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., Stanford CA, pp 406–415

    Google Scholar 

  • Peng XB, Berseth G, Van de Panne M (2016) Terrain-adaptive locomotion skills using deep reinforcement learning. ACM Trans Graph (TOG) 35(4):81

    Google Scholar 

  • Peng XB, Berseth G, Yin K, Van De Panne M (2017a) Deeploco: dynamic locomotion skills using hierarchical deep reinforcement learning. ACM Trans Graph (TOG) 36(4):41

    Article  Google Scholar 

  • Peng XB, Andrychowicz M, Zaremba W, Abbeel P (2017b) Sim-to-real transfer of robotic control with dynamics randomization. arXiv preprint arXiv:1710.06537

    Google Scholar 

  • Peters J, Schaal S (2006) Policy gradient methods for robotics. In: 2006 IEEE/RSJ international conference on intelligent robots and systems. IEEE, pp 2219–2225

    Google Scholar 

  • Peters J, Schaal S (2008) Natural actor-critic. Neurocomputing 71(7):1180–1190

    Article  Google Scholar 

  • Ross S, Gordon G, Bagnell D (2011) A reduction of imitation learning and structured prediction to no-regret online learning. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp 627–635

    Google Scholar 

  • Schulman J, Moritz P, Levine S, Jordan MI, Abbeel P (2015a) High-dimensional continuous control using generalized advantage estimation. CoRR, abs/1506.02438

    Google Scholar 

  • Schulman J, Levine S, Abbeel P, Jordan MI, Moritz P (2015b) Trust region policy optimization. In: ICML, pp 1889–1897

    Google Scholar 

  • Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347

    Google Scholar 

  • Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, Dieleman S, Grewe D, Nham J, Kalchbrenner N, Sutskever I, Lillicrap T, Leach M, Kavukcuoglu K, Graepel T, Hassabis D (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484–489

    Article  Google Scholar 

  • Sutton RS (1991) Integrated modeling and control based on reinforcement learning and dynamic programming. In: Advances in neural information processing systems, pp 471–478

    Google Scholar 

  • Sutton RS, Barto AG (1998) Reinforcement learning: An introduction, vol 1, no 1. MIT Press, Cambridge

    Google Scholar 

  • Sutton RS, McAllester DA, Singh SP, Mansour Y (2000) Policy gradient methods for reinforcement learning with function approximation. In: Advances in neural information processing systems, pp 1057–1063

    Google Scholar 

  • Tangkaratt V, Mori S, Zhao T, Morimoto J, Sugiyama M (2014) Model-based policy gradients with parameter-based exploration by least-squares conditional density estimation. Neural Netw 57:128–140

    Article  MATH  Google Scholar 

  • Tao G (2003) Adaptive control design and analysis. New York: Wiley

    Book  MATH  Google Scholar 

  • Taylor ME, Stone P (2009) Transfer learning for reinforcement learning domains: a survey. J Mach Learn Res 10:1633–1685

    MathSciNet  MATH  Google Scholar 

  • Taylor ME, Stone P, Liu Y (1999, 2005) Value functions for RL-based behavior transfer: a comparative study. In: Proceedings of the national conference on artificial intelligence, vol 20, no 2. AAAI Press/MIT Press, Menlo Park/London/Cambridge, MA, p 880

    Google Scholar 

  • Vinyals O, Ewalds T, Bartunov S, Georgiev P, Vezhnevets AS, Yeo M, Makhzani A, Küttler H, Agapiou J, Schrittwieser J et al (2017) Starcraft II: a new challenge for reinforcement learning. arXiv preprint arXiv:1708.04782

    Google Scholar 

  • Wu Y, Mansimov E, Liao S, Grosse R, Ba J (2017) Scalable trust-region method for deep reinforcement learning using kronecker-factored approximation. Adv Neural Inf Proces Syst pp 5279–5288

    Google Scholar 

  • Yan M, Frosio I, Tyree S, Kautz J (2017) Sim-to-real transfer of accurate grasping with eye-in-hand observations and continuous control. arXiv preprint arXiv:1712.03303

    Google Scholar 

  • Zhu H, Gupta A, Rajeswaran A, Levine S, Kumar V (2018) Dexterous manipulation with deep reinforcement learning: Efficient, general, and low-cost. arXiv preprint arXiv:1810.06045

    Google Scholar 

Download references

Acknowledgements

This work was supported in part by Navy #N00014-19-1-2373 and NSF-CPS NIFA award #2018-67007-28379

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Girish Chowdhary .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer-Verlag London Ltd., part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Chowdhary, G., Joshi, G., Havens, A. (2020). Reinforcement Learning and Adaptive Control. In: Baillieul, J., Samad, T. (eds) Encyclopedia of Systems and Control. Springer, London. https://doi.org/10.1007/978-1-4471-5102-9_100064-1

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-5102-9_100064-1

  • Published:

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-5102-9

  • Online ISBN: 978-1-4471-5102-9

  • eBook Packages: Springer Reference EngineeringReference Module Computer Science and Engineering

Publish with us

Policies and ethics