Abstract
A design of optimal controllers based on a reinforcement learning method called Q-Learning is presented. Central to Q-Learning is the Q-function which is a function of the state and all input variables. This paper shows that decoupled-in-the-inputs Q-functions exist, and can be used to find the optimal controllers for each input individually. The method thus converts a multiple-variable optimization problem into much simpler single-variable optimization problems while achieving optimality. An explicit model of the system is not required to learn these decoupled Q-functions, but rather the method relies on the ability to probe the system and observe its state transition. Derived within the framework of modern control theory, the method is applicable to both linear and non-linear systems.
Similar content being viewed by others
References
Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction, vol. 1. MIT Press, Cambridge (1998)
Vrabie, D., Vamvoudakis, K.G., Lewis, F.L.: Optimal adaptive control and differential games by reinforcement learning principles, Vol. 2. IET (2013)
Watkins, C.J.C.H.: Learning from delayed rewards. University of Cambridge England, PhD thesis (1989)
Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3-4), 279–292 (1992)
Werbos, P.J.: A menu of designs for reinforcement learning over time. Neural Networks for Control, pp. 67–95 (1990)
Sutton, R.S., Barto, A.G., Williams, R.J.: Reinforcement learning is direct adaptive optimal control. IEEE Control. Syst. 12(2), 19–22 (1992)
Bradtke, S.J.: Reinforcement learning applied to linear quadratic regulation. Advances in Neural Information Processing Systems, pp. 295–295 (1993)
Bradtke, S.J., Ydstie, B.E., Barto, A.G.: Adaptive linear quadratic control using policy iteration, American Control Conference, 1994, Vol. 3, IEEE, pp. 3475–3479 (1994)
Ten Hagen, S., Kröse, B.: Linear quadratic regulation using reinforcement learning (1998)
Kretchmar, R.M., Young, P.M., Anderson, C.W., Hittle, D.C., Anderson, M.L., Delnero, C.C.: Robust reinforcement learning control with static and dynamic stability. Int. J. Robust Nonlinear Control 11(15), 1469–1500 (2001)
Negenborn, R.R., De Schutter, B., Wiering, M.A., Hellendoorn, H.: Learning-based model predictive control for Markov decision processes. IFAC Proceedings Volumes 38(1), 354–359 (2005)
Gaweda, A.E., Muezzinoglu, M.K., Jacobs, A.A., Aronoff, G.R., Brier, M.E.: Model predictive control with reinforcement learning for drug delivery in renal anemia management, Engineering in Medicine and Biology Society 2006. EMBS’06. 28th Annual International Conference of the IEEE, IEEE, pp. 5177–5180 (2006)
Al-Tamimi, A., Vrabie, D., Abu-Khalaf, M., Lewis, F. L.: Model-free approximate dynamic programming schemes for linear systems. In: 2007 international joint conference on neural networks, 2007. IJCNN, IEEE, pp. 371–378 (2007)
Lewis, F., Liu, D., Lendaris, G., Werbos, P., Balakrishnan, S., Ding, J.: Special issue on adaptive dynamic programming and reinforcement learning in feedback control. IEEE Trans. Syst. Man Cybern. B Cybern. 38(4), 896 (2008)
Wang, F.-Y., Zhang, H., Liu, D.: Adaptive dynamic programming: An introduction. IEEE Comput. Intell. Mag. 4, 2 (2009)
Ernst, D., Glavic, M., Capitanescu, F., Wehenkel, L.: Reinforcement learning versus model predictive control: a comparison on a power system problem. IEEE Trans. Syst. Man Cybern. B Cybern. 39(2), 517–529 (2009)
Lewis, F.L., Vrabie, D., Vamvoudakis, K.G.: Reinforcement learning and feedback control: Using natural decision methods to design optimal adaptive controllers. IEEE Control. Syst. 32(6), 76–105 (2012)
Kiumarsi, B., Lewis, F.L., Modares, H., Karimpour, A., Naghibi-Sistani, M. -B.: Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics. Automatica 50(4), 1167–1175 (2014)
Palanisamy, M., Modares, H., Lewis, F.L., Aurangzeb, M.: Continuous-time Q-learning for infinite-horizon discounted cost linear quadratic regulator problems. IEEE Trans. Cybern. 45(2), 165–176 (2015)
Zhang, T., Kahn, G., Levine, S., Abbeel, P.: Learning deep control policies for autonomous aerial vehicles with mpc-guided policy search. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), IEEE, pp. 528–535 (2016)
Kahn, G., Villaflor, A., Pong, V., Abbeel, P., Levine, S.: Uncertainty-aware reinforcement learning for collision avoidance. arXiv:1702.01182(2017)
Vamvoudakis, K.G.: Q-learning for continuous-time linear systems: A model-free infinite horizon optimal control approach. Syst. Control Lett. 100, 14–20 (2017)
Camacho, E.F., Bordons, C.: Nonlinear model predictive control: An introductory review, Assessment and future directions of nonlinear model predictive control, pp. 1–16, Springer (2007)
Anderson, B., Moore, J.: Optimal control: linear quadratic methods. Prentice-Hall, Upper Saddle River (1990)
Phan, M.Q., Azad, S.M.B.: Model predictive control and Model Predictive Q-Learning for structural vibration control. AAS/AIAA Astrodynamics Specialist Conference (2017)
Phan, M.Q., Azad, S.M.B.: Model Predictive Q-Learning (MPQ-L) for bilinear systems. In: 7th International Conference on High Performance Scientific Computing: Modeling, Simulation, and Optimization of Complex Processes, Hanoi (2018)
Kulkarni, N.V., Phan, M.Q.: Reinforcement-learning-based magneto-hydrodynamic control of hypersonic flows. IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning (ADPRL), pp. 9–191 (2007)
Kulkarni, N.V., Phan, M.Q.: Neural optimal magnetohydrodynamic control of hypersonic flows. J. Guid. Control. Dyn. 30(5), 1519–1523 (2007)
Kulkarni, N.V., Phan, M.Q.: Performance optimization of the magnetohydrodynamic generator at the scramjet inlet. J. Propuls. Power 21(5), 822–830 (2005)
Kulkarni, N.V., Phan, M.Q.: Neural-network-based design of optimal controllers for nonlinear systems. J. Guid. Control. Dyn. 27(5), 745–751 (2004)
Postoyan, R., Buṡoniu, L., Nešić, D., Daafouz, J.: Stability of infinite-horizon optimal control with discounted cost. In: 2014 IEEE 53rd Annual Conference on Decision and Control (CDC), IEEE, pp. 3903–3908 (2014)
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interests
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Phan, M.Q., Azad, S.M.B. Input-Decoupled Q-Learning for Optimal Control. J Astronaut Sci 67, 630–656 (2020). https://doi.org/10.1007/s40295-019-00157-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40295-019-00157-4