Input-Decoupled Q-Learning for Optimal Control

Phan, Minh Q.; Azad, Seyed Mahdi B.

doi:10.1007/s40295-019-00157-4

Input-Decoupled Q-Learning for Optimal Control

Published: 14 May 2019

Volume 67, pages 630–656, (2020)
Cite this article

The Journal of the Astronautical Sciences Aims and scope Submit manuscript

578 Accesses
3 Citations
Explore all metrics

Abstract

A design of optimal controllers based on a reinforcement learning method called Q-Learning is presented. Central to Q-Learning is the Q-function which is a function of the state and all input variables. This paper shows that decoupled-in-the-inputs Q-functions exist, and can be used to find the optimal controllers for each input individually. The method thus converts a multiple-variable optimization problem into much simpler single-variable optimization problems while achieving optimality. An explicit model of the system is not required to learn these decoupled Q-functions, but rather the method relies on the ability to probe the system and observe its state transition. Derived within the framework of modern control theory, the method is applicable to both linear and non-linear systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Model-Free Reinforcement Learning-Based Control for Continuous-Time Systems

A novel policy iteration based deterministic Q-learning for discrete-time nonlinear systems

Article 18 November 2015

Model-Free Reinforcement Learning-Based Control for Continuous-Time Systems

References

Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction, vol. 1. MIT Press, Cambridge (1998)
Vrabie, D., Vamvoudakis, K.G., Lewis, F.L.: Optimal adaptive control and differential games by reinforcement learning principles, Vol. 2. IET (2013)
Watkins, C.J.C.H.: Learning from delayed rewards. University of Cambridge England, PhD thesis (1989)
Google Scholar
Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3-4), 279–292 (1992)
Article Google Scholar
Werbos, P.J.: A menu of designs for reinforcement learning over time. Neural Networks for Control, pp. 67–95 (1990)
Sutton, R.S., Barto, A.G., Williams, R.J.: Reinforcement learning is direct adaptive optimal control. IEEE Control. Syst. 12(2), 19–22 (1992)
Article Google Scholar
Bradtke, S.J.: Reinforcement learning applied to linear quadratic regulation. Advances in Neural Information Processing Systems, pp. 295–295 (1993)
Bradtke, S.J., Ydstie, B.E., Barto, A.G.: Adaptive linear quadratic control using policy iteration, American Control Conference, 1994, Vol. 3, IEEE, pp. 3475–3479 (1994)
Ten Hagen, S., Kröse, B.: Linear quadratic regulation using reinforcement learning (1998)
Kretchmar, R.M., Young, P.M., Anderson, C.W., Hittle, D.C., Anderson, M.L., Delnero, C.C.: Robust reinforcement learning control with static and dynamic stability. Int. J. Robust Nonlinear Control 11(15), 1469–1500 (2001)
Article MathSciNet Google Scholar
Negenborn, R.R., De Schutter, B., Wiering, M.A., Hellendoorn, H.: Learning-based model predictive control for Markov decision processes. IFAC Proceedings Volumes 38(1), 354–359 (2005)
Article Google Scholar
Gaweda, A.E., Muezzinoglu, M.K., Jacobs, A.A., Aronoff, G.R., Brier, M.E.: Model predictive control with reinforcement learning for drug delivery in renal anemia management, Engineering in Medicine and Biology Society 2006. EMBS’06. 28th Annual International Conference of the IEEE, IEEE, pp. 5177–5180 (2006)
Al-Tamimi, A., Vrabie, D., Abu-Khalaf, M., Lewis, F. L.: Model-free approximate dynamic programming schemes for linear systems. In: 2007 international joint conference on neural networks, 2007. IJCNN, IEEE, pp. 371–378 (2007)
Lewis, F., Liu, D., Lendaris, G., Werbos, P., Balakrishnan, S., Ding, J.: Special issue on adaptive dynamic programming and reinforcement learning in feedback control. IEEE Trans. Syst. Man Cybern. B Cybern. 38(4), 896 (2008)
Article Google Scholar
Wang, F.-Y., Zhang, H., Liu, D.: Adaptive dynamic programming: An introduction. IEEE Comput. Intell. Mag. 4, 2 (2009)
Google Scholar
Ernst, D., Glavic, M., Capitanescu, F., Wehenkel, L.: Reinforcement learning versus model predictive control: a comparison on a power system problem. IEEE Trans. Syst. Man Cybern. B Cybern. 39(2), 517–529 (2009)
Article Google Scholar
Lewis, F.L., Vrabie, D., Vamvoudakis, K.G.: Reinforcement learning and feedback control: Using natural decision methods to design optimal adaptive controllers. IEEE Control. Syst. 32(6), 76–105 (2012)
Article MathSciNet Google Scholar
Kiumarsi, B., Lewis, F.L., Modares, H., Karimpour, A., Naghibi-Sistani, M. -B.: Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics. Automatica 50(4), 1167–1175 (2014)
Article MathSciNet Google Scholar
Palanisamy, M., Modares, H., Lewis, F.L., Aurangzeb, M.: Continuous-time Q-learning for infinite-horizon discounted cost linear quadratic regulator problems. IEEE Trans. Cybern. 45(2), 165–176 (2015)
Article Google Scholar
Zhang, T., Kahn, G., Levine, S., Abbeel, P.: Learning deep control policies for autonomous aerial vehicles with mpc-guided policy search. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), IEEE, pp. 528–535 (2016)
Kahn, G., Villaflor, A., Pong, V., Abbeel, P., Levine, S.: Uncertainty-aware reinforcement learning for collision avoidance. arXiv:1702.01182(2017)
Vamvoudakis, K.G.: Q-learning for continuous-time linear systems: A model-free infinite horizon optimal control approach. Syst. Control Lett. 100, 14–20 (2017)
Article MathSciNet Google Scholar
Camacho, E.F., Bordons, C.: Nonlinear model predictive control: An introductory review, Assessment and future directions of nonlinear model predictive control, pp. 1–16, Springer (2007)
Anderson, B., Moore, J.: Optimal control: linear quadratic methods. Prentice-Hall, Upper Saddle River (1990)
MATH Google Scholar
Phan, M.Q., Azad, S.M.B.: Model predictive control and Model Predictive Q-Learning for structural vibration control. AAS/AIAA Astrodynamics Specialist Conference (2017)
Phan, M.Q., Azad, S.M.B.: Model Predictive Q-Learning (MPQ-L) for bilinear systems. In: 7th International Conference on High Performance Scientific Computing: Modeling, Simulation, and Optimization of Complex Processes, Hanoi (2018)
Kulkarni, N.V., Phan, M.Q.: Reinforcement-learning-based magneto-hydrodynamic control of hypersonic flows. IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning (ADPRL), pp. 9–191 (2007)
Kulkarni, N.V., Phan, M.Q.: Neural optimal magnetohydrodynamic control of hypersonic flows. J. Guid. Control. Dyn. 30(5), 1519–1523 (2007)
Article Google Scholar
Kulkarni, N.V., Phan, M.Q.: Performance optimization of the magnetohydrodynamic generator at the scramjet inlet. J. Propuls. Power 21(5), 822–830 (2005)
Article Google Scholar
Kulkarni, N.V., Phan, M.Q.: Neural-network-based design of optimal controllers for nonlinear systems. J. Guid. Control. Dyn. 27(5), 745–751 (2004)
Article Google Scholar
Postoyan, R., Buṡoniu, L., Nešić, D., Daafouz, J.: Stability of infinite-horizon optimal control with discounted cost. In: 2014 IEEE 53rd Annual Conference on Decision and Control (CDC), IEEE, pp. 3903–3908 (2014)

Download references

Author information

Authors and Affiliations

Thayer School of Engineering, Dartmouth College, Hanover, NH, USA
Minh Q. Phan & Seyed Mahdi B. Azad

Authors

Minh Q. Phan
View author publications
You can also search for this author in PubMed Google Scholar
Seyed Mahdi B. Azad
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Minh Q. Phan.

Ethics declarations

Conflict of interests

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Phan, M.Q., Azad, S.M.B. Input-Decoupled Q-Learning for Optimal Control. J Astronaut Sci 67, 630–656 (2020). https://doi.org/10.1007/s40295-019-00157-4

Download citation

Published: 14 May 2019
Issue Date: June 2020
DOI: https://doi.org/10.1007/s40295-019-00157-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Input-Decoupled Q-Learning for Optimal Control

Abstract

Access this article

Similar content being viewed by others

Model-Free Reinforcement Learning-Based Control for Continuous-Time Systems

A novel policy iteration based deterministic Q-learning for discrete-time nonlinear systems

Model-Free Reinforcement Learning-Based Control for Continuous-Time Systems

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Input-Decoupled Q-Learning for Optimal Control

Abstract

Access this article

Similar content being viewed by others

Model-Free Reinforcement Learning-Based Control for Continuous-Time Systems

A novel policy iteration based deterministic Q-learning for discrete-time nonlinear systems

Model-Free Reinforcement Learning-Based Control for Continuous-Time Systems

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation