Skip to main content
Log in

Approximate dynamic programming solutions with a single network adaptive critic for a class of nonlinear systems

  • Published:
Journal of Control Theory and Applications Aims and scope Submit manuscript

Abstract

Approximate dynamic programming (ADP) formulation implemented with an adaptive critic (AC)-based neural network (NN) structure has evolved as a powerful technique for solving the Hamilton-Jacobi-Bellman (HJB) equations. As interest in ADP and the AC solutions are escalating with time, there is a dire need to consider possible enabling factors for their implementations. A typical AC structure consists of two interacting NNs, which is computationally expensive. In this paper, a new architecture, called the ‘cost-function-based single network adaptive critic (J-SNAC)’ is presented, which eliminates one of the networks in a typical AC structure. This approach is applicable to a wide class of nonlinear systems in engineering. In order to demonstrate the benefits and the control synthesis with the J-SNAC, two problems have been solved with the AC and the J-SNAC approaches. Results are presented, which show savings of about 50% of the computational costs by J-SNAC while having the same accuracy levels of the dual network structure in solving for optimal control. Furthermore, convergence of the J-SNAC iterations, which reduces to a least-squares problem, is discussed; for linear systems, the iterative process is shown to reduce to solving the familiar algebraic Ricatti equation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. F. L. Lewis. Applied Optimal Control and Estimation. New York: Prentice Hall, 1992.

    Google Scholar 

  2. A. E. Bryson, Y. C. Ho. Applied Optimal Control. London: Taylor & Francis, 1975.

    Google Scholar 

  3. P. J. Werbos. Approximate dynamic programming for real-time control and neural modeling. Handbook of Intelligent Control, New York: Van Nostrand, 1992: 493–525.

    Google Scholar 

  4. A. G. Barto. Connectionist learning for control: an overview. Neural Networks for Control. Cambridge: MIT Press, 1991: 5–58.

    Google Scholar 

  5. A. Barto, T. Dieterich. Reinforcement learning and its relation to supervised learning. Learning and Approximate Dynamic Programming. Piscataway: Wiley-IEEE Press, 2004: 47–63.

    Google Scholar 

  6. W. Powell, B. Van Roy. ADP for high-dimensional resource allocation problems. Learning and Approximate Dynamic Programming. Piscataway: Wiley-IEEE Press, 2004: 261–283.

    Google Scholar 

  7. D. P. Bertsekas, J. N. Tsitsiklis. Neuro-dynamic Programming. Belmont: Athena Scientific, 1996.

    MATH  Google Scholar 

  8. A. Al-Tamimi, F. L. Lewis, M. Abu-Khalaf. Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Transactions on Systems, Man, Cybernetics — Part B, 2008, 38(4): 943–949.

    Article  Google Scholar 

  9. S. N. Balakrishnan, J. Ding, F. L. Lewis. Issues on stability of adp feedback controllers for dynamical systems. IEEE Transaction on Systems, Man, Cybernetics — Part B, 2008, 38(4): 913–917.

    Article  Google Scholar 

  10. B. Li, J. Si. Robust dynamic programming for discounted infinitehorizon markov decision processes with uncertain stationary transition matrices. Proceedings of IEEE International Symposiom Approximate Dynamic Programming and Reinforcement Learning, New York: IEEE, 2007: 96–102.

    Chapter  Google Scholar 

  11. P. J. Werbos. Using ADP to understand and replicate brain intelligence: the next level design. Proceedings of IEEE Symposium Approximately Dynamic Programming and Reinforcement Learning, New York: IEEE, 2007: 209–216.

    Chapter  Google Scholar 

  12. S. N. Balakrishnan, V. Biega. Adaptive-critic based neural networks for aircraft optimal control. Journal of Guidance, Control and Dynamics, 1996, 19(4): 893–898.

    Article  Google Scholar 

  13. D. Prokhorov, D. Wunsch. Adaptive critic designs. IEEE Transactions on Neural Networks, 1995, 8(9): 1367–1372.

    Google Scholar 

  14. G. Venayagamoorthy, R. Harley, D. Wunsch. Dual heuristic programming excitation neurocontrol for generators in a multimachine power system. IEEE Transactions on Industry Applications, 2003, 39(2): 382–384.

    Article  Google Scholar 

  15. R. Padhi, N. Unnikrishnan, X. Wang, et al. A single network adaptive critic (SNAC) architecture for optimal control synthesis for a class of nonlinear systems. Neural Network, 2006, 19(10): 1648–1660.

    Article  MATH  Google Scholar 

  16. S. Ferrari, R. Stengel. An adaptive critic global controller. American Control Conference, New York: IEEE, 2002: 2665–2670.

    Google Scholar 

  17. S. Ferrari, R. Stengel. Classical/neural synthesis of nonlinear control systems. Journal of Guidance, Control and Dynamics, 2002, 25(3): 442–448.

    Article  Google Scholar 

  18. Q. Yang, J. Vance, S. Jagannathan. Control of nonaffine nonlinear discrete-time systems using reinforcement-learning-based linearly parameterized neural networks. IEEE Transactions on Systems, Man, and Cybernetics — Part B, 2008, 38(4): 994–1001.

    Article  Google Scholar 

  19. G. Lendaris, L. Schultz, T. Shannon. Adaptive critic design for intelligent steering and speed control of a 2-axle vehicle. IEEE/INNS/ENNS International Joint Conference on Neural Networks, Los Alamitos, CA: IEEE Computer Society, 2000: 73–78.

    Google Scholar 

  20. T. Hanselmann, L. Noakes, A. Zaknich. Continuous time adaptive critics. IEEE Transactions on Neural Networks, 2007, 18(3): 631–647.

    Article  Google Scholar 

  21. F. L. Lewis, K. G. Vamvoudakis. Optimal adaptive control for unknown systems using output feedback by reinforcement learning methods. Proceedings of the 8th IEEE International Conference on Control & Automation, New York: IEEE, 2010: 2138–2145.

    Google Scholar 

  22. R. Padhi, S. N. Balakrishnan. Optimal beaver population management using reduced order distributed parameter model and single network adaptive critics. American Control Conference, New York: IEEE, 2004: 1598–1603.

    Google Scholar 

  23. R. Padhi, N. Unnikrishnan, S. N. Balakrishnana. Optimal control synthesis of a class of nonlinear systems using single network adaptive critics. American Control Conference, New York: IEEE, 2004: 1592–1597.

    Google Scholar 

  24. V. Yadav, R. Padhi, S. N. Balakrishnan. Robust/optimal temperature profile control using neural networks. Proceedings of IEEE International Conference on Control Applications, New York: IEEE, 2006: 1986–1991.

    Google Scholar 

  25. S. Chen, Y. Yang, N. Nguyen, et al. SNAC convergence and use in adaptive autopilot design. International Joint Conference on Neural Networks, New York: IEEE, 2009: 530–537.

    Chapter  Google Scholar 

  26. L. Yang, J. Si, K. S. Tsakalis, et al. Direct heuristic dynamic programming for nonlinear tracking control with filtered tracking error. IEEE Transactions on Systems, Man, and Cybernetics — Part B, 2009, 39(6): 1617–1622.

    Article  Google Scholar 

  27. F. Wang, H. Zhang, D. Liu. Adaptive dynamic programming: an introduction. IEEE Computational Intelligence Magazine, 2009, 4(2): 39–47.

    Article  Google Scholar 

  28. H. Zhang, Q. Wei, Y. Luo. A novel infinite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy HDP iteration algorithm. IEEE Transactions on Systems, Man, and Cybernetics — Part B, 2008, 38(4): 937–942.

    Article  Google Scholar 

  29. S. N. Balakrishnan, V. Biega. Adaptive-critic based neural networks for aircraft optimal control. Journal of Guidance, Control and Dynamics, 1996, 19(4): 893–898.

    Article  Google Scholar 

  30. S. K. Gupta. Numerical Methods for Engineers. New Delhi: New Age International Publishers, Wiley Eastern Limited, 1995.

    Google Scholar 

  31. R. W. Beard. Improving the Closed-loop Performance of Nonlinear Systems. Ph.D. thesis. New York: Rensselaer Polytechnic Institute, 1995.

    Google Scholar 

  32. M. Gopal. Modern Control System Theory. 2nd ed, New York: John Wiley & Sons, 1993.

    Google Scholar 

  33. A. Yesildirek. Nonlinear Systems Control Using Neural Networks. Ph.D. thesis. Arlington: University of Texas, 1994.

    Google Scholar 

  34. D. Shirley, W. Stanley. Statistics for Research. 2nd ed, New York: John Wiley & Sons, 1991.

    MATH  Google Scholar 

  35. S. D. Senturia. Microsystem Design. Netherlands: Kluwer Academic Publishers, 2001.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jie Ding.

Additional information

This work was supported by the National Aeronautics and Space Administration (NASA) (No.ARMD NRA NNH07ZEA001N-IRAC1), and the National Science Foundation (NSF).

Jie DING is currently working toward the Ph.D. degree at the Department of Mechanical and Aerospace Engineering, Missouri University of Science and Technology.

S. N. BALAKRISHNAN received his Ph.D. degree from the University of Texas, Austin. He is currently a curators’ professor of Aerospace Engineering, Missouri University of Science and Technology. His research interests include neural networks, optimal control, and large-scale and impulse systems. His papers from the development of techniques in these areas include applications to missiles, spacecraft, aircraft, robotics, temperature, and animal population control.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ding, J., Balakrishnan, S.N. Approximate dynamic programming solutions with a single network adaptive critic for a class of nonlinear systems. J. Control Theory Appl. 9, 370–380 (2011). https://doi.org/10.1007/s11768-011-0191-3

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11768-011-0191-3

Keywords

Navigation