Advertisement

A New Discrete-Time Iterative Adaptive Dynamic Programming Algorithm Based on Q-Learning

  • Qinglai Wei
  • Derong Liu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9377)

Abstract

In this paper, a novel Q-learning based policy iteration adaptive dynamic programming (ADP) algorithm is developed to solve the optimal control problems for discrete-time nonlinear systems. The idea is to use a policy iteration ADP technique to construct the iterative control law which stabilizes the system and simultaneously minimizes the iterative Q function. Convergence property is analyzed to show that the iterative Q function is monotonically non-increasing and converges to the solution of the optimality equation. Finally, simulation results are presented to show the performance of the developed algorithm.

Keywords

Adaptive critic designs adaptive dynamic programming approximate dynamic programming Q-learning policy iteration neural networks nonlinear systems optimal control 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abu-Khalaf, M., Lewis, F.L.: Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica 41(5), 779–791 (2005)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Al-Tamimi, A., Lewis, F.L., Abu-Khalaf, M.: Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Transactions on Systems, Man, and Cybernetics–Part B: Cybernetics 38(4), 943–949 (2008)CrossRefGoogle Scholar
  3. 3.
    Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)MATHGoogle Scholar
  4. 4.
    Jiang, Y., Jiang, Z.P.: Robust adaptive dynamic programming with an application to power systems. IEEE Transactions on Neural Networks and Learning Systems 24(7), 1150–1156 (2013)CrossRefGoogle Scholar
  5. 5.
    Liu, D., Wei, Q.: Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems. IEEE Transactions on Neural Networks and Learning Systems 25(3), 621–634 (2014)CrossRefGoogle Scholar
  6. 6.
    Modares, H., Lewis, F.L.: Linear quadratic tracking control of partially-unknown continuous-time systems using reinforcement learning. IEEE Transactions on Automatic Control 59(11), 3051–3056 (2014)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Modares, H., Lewis, F.L., Naghibi-Sistani, M.B.: Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks. IEEE Transactions on Neural Networks and Learning systems 24(10), 1513–1525 (2013)CrossRefGoogle Scholar
  8. 8.
    Modares, H., Lewis, F.L.: Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning. Automatica 50(7), 1780–1792 (2014)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Kiumarsi, B., Lewis, F.L., Modares, H., Karimpur, A., Naghibi-Sistani, M.B.: Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics. Automatica 50(4), 1167–1175 (2014)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Prokhorov, D.V., Wunsch, D.C.: Adaptive critic designs. IEEE Transactions on Neural Networks 8(5), 997–1007 (1997)CrossRefGoogle Scholar
  11. 11.
    Song, R., Xiao, W., Zhang, H., Sun, C.: Adaptive dynamic programming for a class of complex-valued nonlinear systems. IEEE Transactions on Neural Networks and Learning Systems 25(9), 1733–1739 (2014)CrossRefGoogle Scholar
  12. 12.
    Song, R., Lewis, F.L., Wei, Q., Zhang, H., Jiang, Z.-P., Levine, D.: Multiple actor-critic structures for continuous-time optimal control using input-output data. IEEE Transactions on Neural Networks and Learning Systems 26(4), 851–865 (2015)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Si, J., Wang, Y.-T.: On-line learning control by association and reinforcement. IEEE Transactions on Neural Networks 12(2), 264–276 (2001)CrossRefGoogle Scholar
  14. 14.
    Watkins, C.: Learning from Delayed Rewards. Ph.D. Thesis, Cambridge University, Cambridge (1989)Google Scholar
  15. 15.
    Watkins, C., Dayan, P.: Q-learning. Machine Learning 8(3-4), 279–292 (1992)CrossRefGoogle Scholar
  16. 16.
    Wei, Q., Liu, D.: An iterative ε-optimal control scheme for a class of discrete-time nonlinear systems with unfixed initial state. Neural Networks 32, 236–244 (2012)CrossRefGoogle Scholar
  17. 17.
    Wei, Q., Zhang, H., Dai, J.: Model-free multiobjective approximate dynamic programming for discrete-time nonlinear systems with general performance index functions. Neurocomputing 72(7–9), 1839–1848 (2009)CrossRefGoogle Scholar
  18. 18.
    Wei, Q., Liu, D.: A novel iterative θ-adaptive dynamic programming for discrete-time nonlinear systems. IEEE Transactions on Automation Science and Engineering 11(4), 1176–1190 (2014)CrossRefGoogle Scholar
  19. 19.
    Wei, Q., Liu, D.: Data-driven neuro-optimal temperature control of water gas shift reaction using stable iterative adaptive dynamic programming. IEEE Transactions on Industrial Electronics 61(11), 6399–6408 (2014)CrossRefGoogle Scholar
  20. 20.
    Wei, Q., Liu, D.: Neural-network-based adaptive optimal tracking control scheme for discrete-time nonlinear systems with approximation errors. Neurocomputing 149(3), 106–115 (2015)CrossRefGoogle Scholar
  21. 21.
    Wei, Q., Liu, D., Shi, G., Liu, Y.: Optimal multi-battery coordination control for home energy management systems via distributed iterative adaptive dynamic programming. IEEE Transactions on Industrial Electronics (2015) (article in press)Google Scholar
  22. 22.
    Wei, Q., Liu, D., Shi, G.: A novel dual iterative Q-learning method for optimal battery management in smart residential environments. IEEE Transactions on Industrial Electronics 62(4), 2509–2518 (2015)CrossRefGoogle Scholar
  23. 23.
    Wei, Q., Liu, D.: Adaptive dynamic programming for optimal tracking control of unknown nonlinear systems with application to coal gasification. IEEE Transactions on Automation Science and Engineering 11(4), 1020–1036 (2014)CrossRefGoogle Scholar
  24. 24.
    Wei, Q., Liu, D., Yang, X.: Infinite horizon self-learning optimal control of nonaffine discrete-time nonlinear systems. IEEE Transactions on Neural Networks and Learning Systems 26(4), 866–879 (2015)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Werbos, P.J.: Advanced forecasting methods for global crisis warning and models of intelligence. General Systems Yearbook 22, 25–38 (1977)Google Scholar
  26. 26.
    Werbos, P.J.: A menu of designs for reinforcement learning over time. In: Miller, W.T., Sutton, R.S., Werbos, P.J. (eds.) Neural Networks for Control, pp. 67–95. MIT Press, Cambridge (1991)Google Scholar
  27. 27.
    Xu, X., Lian, C., Zuo, L., He, H.: Kernel-based approximate dynamic programming for real-time online learning control: An experimental study. IEEE Transactions on Control Systems Technology 22(1), 146–156 (2014)CrossRefGoogle Scholar
  28. 28.
    Zhang, H., Wei, Q., Luo, Y.: A novel infinite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy HDP iteration algorithm. IEEE Transactions on System, Man, and cybernetics–Part B: Cybernetics 38(4), 937–942 (2008)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

<SimplePara><Emphasis Type="Bold">Open Access</Emphasis> This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 2.5 International License (http://creativecommons.org/licenses/by-nc/2.5/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. </SimplePara> <SimplePara>The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.</SimplePara>

Authors and Affiliations

  1. 1.The State Key Laboratory of Management and Control for Complex Systems, Institute of AutomationChinese Academy of SciencesBeijingChina
  2. 2.School of Automation and Electrical EngineeringUniversity of Science and Technology BeijingBeijingChina

Personalised recommendations