Abstract
Approximate dynamic programming (ADP) formulation implemented with an adaptive critic (AC)-based neural network (NN) structure has evolved as a powerful technique for solving the Hamilton-Jacobi-Bellman (HJB) equations. As interest in ADP and the AC solutions are escalating with time, there is a dire need to consider possible enabling factors for their implementations. A typical AC structure consists of two interacting NNs, which is computationally expensive. In this paper, a new architecture, called the ‘cost-function-based single network adaptive critic (J-SNAC)’ is presented, which eliminates one of the networks in a typical AC structure. This approach is applicable to a wide class of nonlinear systems in engineering. In order to demonstrate the benefits and the control synthesis with the J-SNAC, two problems have been solved with the AC and the J-SNAC approaches. Results are presented, which show savings of about 50% of the computational costs by J-SNAC while having the same accuracy levels of the dual network structure in solving for optimal control. Furthermore, convergence of the J-SNAC iterations, which reduces to a least-squares problem, is discussed; for linear systems, the iterative process is shown to reduce to solving the familiar algebraic Ricatti equation.
Similar content being viewed by others
References
F. L. Lewis. Applied Optimal Control and Estimation. New York: Prentice Hall, 1992.
A. E. Bryson, Y. C. Ho. Applied Optimal Control. London: Taylor & Francis, 1975.
P. J. Werbos. Approximate dynamic programming for real-time control and neural modeling. Handbook of Intelligent Control, New York: Van Nostrand, 1992: 493–525.
A. G. Barto. Connectionist learning for control: an overview. Neural Networks for Control. Cambridge: MIT Press, 1991: 5–58.
A. Barto, T. Dieterich. Reinforcement learning and its relation to supervised learning. Learning and Approximate Dynamic Programming. Piscataway: Wiley-IEEE Press, 2004: 47–63.
W. Powell, B. Van Roy. ADP for high-dimensional resource allocation problems. Learning and Approximate Dynamic Programming. Piscataway: Wiley-IEEE Press, 2004: 261–283.
D. P. Bertsekas, J. N. Tsitsiklis. Neuro-dynamic Programming. Belmont: Athena Scientific, 1996.
A. Al-Tamimi, F. L. Lewis, M. Abu-Khalaf. Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Transactions on Systems, Man, Cybernetics — Part B, 2008, 38(4): 943–949.
S. N. Balakrishnan, J. Ding, F. L. Lewis. Issues on stability of adp feedback controllers for dynamical systems. IEEE Transaction on Systems, Man, Cybernetics — Part B, 2008, 38(4): 913–917.
B. Li, J. Si. Robust dynamic programming for discounted infinitehorizon markov decision processes with uncertain stationary transition matrices. Proceedings of IEEE International Symposiom Approximate Dynamic Programming and Reinforcement Learning, New York: IEEE, 2007: 96–102.
P. J. Werbos. Using ADP to understand and replicate brain intelligence: the next level design. Proceedings of IEEE Symposium Approximately Dynamic Programming and Reinforcement Learning, New York: IEEE, 2007: 209–216.
S. N. Balakrishnan, V. Biega. Adaptive-critic based neural networks for aircraft optimal control. Journal of Guidance, Control and Dynamics, 1996, 19(4): 893–898.
D. Prokhorov, D. Wunsch. Adaptive critic designs. IEEE Transactions on Neural Networks, 1995, 8(9): 1367–1372.
G. Venayagamoorthy, R. Harley, D. Wunsch. Dual heuristic programming excitation neurocontrol for generators in a multimachine power system. IEEE Transactions on Industry Applications, 2003, 39(2): 382–384.
R. Padhi, N. Unnikrishnan, X. Wang, et al. A single network adaptive critic (SNAC) architecture for optimal control synthesis for a class of nonlinear systems. Neural Network, 2006, 19(10): 1648–1660.
S. Ferrari, R. Stengel. An adaptive critic global controller. American Control Conference, New York: IEEE, 2002: 2665–2670.
S. Ferrari, R. Stengel. Classical/neural synthesis of nonlinear control systems. Journal of Guidance, Control and Dynamics, 2002, 25(3): 442–448.
Q. Yang, J. Vance, S. Jagannathan. Control of nonaffine nonlinear discrete-time systems using reinforcement-learning-based linearly parameterized neural networks. IEEE Transactions on Systems, Man, and Cybernetics — Part B, 2008, 38(4): 994–1001.
G. Lendaris, L. Schultz, T. Shannon. Adaptive critic design for intelligent steering and speed control of a 2-axle vehicle. IEEE/INNS/ENNS International Joint Conference on Neural Networks, Los Alamitos, CA: IEEE Computer Society, 2000: 73–78.
T. Hanselmann, L. Noakes, A. Zaknich. Continuous time adaptive critics. IEEE Transactions on Neural Networks, 2007, 18(3): 631–647.
F. L. Lewis, K. G. Vamvoudakis. Optimal adaptive control for unknown systems using output feedback by reinforcement learning methods. Proceedings of the 8th IEEE International Conference on Control & Automation, New York: IEEE, 2010: 2138–2145.
R. Padhi, S. N. Balakrishnan. Optimal beaver population management using reduced order distributed parameter model and single network adaptive critics. American Control Conference, New York: IEEE, 2004: 1598–1603.
R. Padhi, N. Unnikrishnan, S. N. Balakrishnana. Optimal control synthesis of a class of nonlinear systems using single network adaptive critics. American Control Conference, New York: IEEE, 2004: 1592–1597.
V. Yadav, R. Padhi, S. N. Balakrishnan. Robust/optimal temperature profile control using neural networks. Proceedings of IEEE International Conference on Control Applications, New York: IEEE, 2006: 1986–1991.
S. Chen, Y. Yang, N. Nguyen, et al. SNAC convergence and use in adaptive autopilot design. International Joint Conference on Neural Networks, New York: IEEE, 2009: 530–537.
L. Yang, J. Si, K. S. Tsakalis, et al. Direct heuristic dynamic programming for nonlinear tracking control with filtered tracking error. IEEE Transactions on Systems, Man, and Cybernetics — Part B, 2009, 39(6): 1617–1622.
F. Wang, H. Zhang, D. Liu. Adaptive dynamic programming: an introduction. IEEE Computational Intelligence Magazine, 2009, 4(2): 39–47.
H. Zhang, Q. Wei, Y. Luo. A novel infinite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy HDP iteration algorithm. IEEE Transactions on Systems, Man, and Cybernetics — Part B, 2008, 38(4): 937–942.
S. N. Balakrishnan, V. Biega. Adaptive-critic based neural networks for aircraft optimal control. Journal of Guidance, Control and Dynamics, 1996, 19(4): 893–898.
S. K. Gupta. Numerical Methods for Engineers. New Delhi: New Age International Publishers, Wiley Eastern Limited, 1995.
R. W. Beard. Improving the Closed-loop Performance of Nonlinear Systems. Ph.D. thesis. New York: Rensselaer Polytechnic Institute, 1995.
M. Gopal. Modern Control System Theory. 2nd ed, New York: John Wiley & Sons, 1993.
A. Yesildirek. Nonlinear Systems Control Using Neural Networks. Ph.D. thesis. Arlington: University of Texas, 1994.
D. Shirley, W. Stanley. Statistics for Research. 2nd ed, New York: John Wiley & Sons, 1991.
S. D. Senturia. Microsystem Design. Netherlands: Kluwer Academic Publishers, 2001.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported by the National Aeronautics and Space Administration (NASA) (No.ARMD NRA NNH07ZEA001N-IRAC1), and the National Science Foundation (NSF).
Jie DING is currently working toward the Ph.D. degree at the Department of Mechanical and Aerospace Engineering, Missouri University of Science and Technology.
S. N. BALAKRISHNAN received his Ph.D. degree from the University of Texas, Austin. He is currently a curators’ professor of Aerospace Engineering, Missouri University of Science and Technology. His research interests include neural networks, optimal control, and large-scale and impulse systems. His papers from the development of techniques in these areas include applications to missiles, spacecraft, aircraft, robotics, temperature, and animal population control.
Rights and permissions
About this article
Cite this article
Ding, J., Balakrishnan, S.N. Approximate dynamic programming solutions with a single network adaptive critic for a class of nonlinear systems. J. Control Theory Appl. 9, 370–380 (2011). https://doi.org/10.1007/s11768-011-0191-3
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11768-011-0191-3