Abstract
In this chapter, a brief introduction to the background and development of adaptive dynamic programming (ADP) is provided. The review begins with the origin of ADP, and the basic structures and algorithm development are narrated according to chronological order. After that, we turn attention to control problems based on ADP. We present this subject regarding two aspects: feedback control based on ADP and non-linear games based on ADP. We will mention a few iterative algorithms from the recent literature and will point out some open problems in each case.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abu-Khalaf M, Lewis FL (2005) Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica 41(5):779–791
Abu-Khalaf M, Lewis FL, Huang J (2006) Policy iterations on the Hamilton–Jacobi–Isaacs equation for state feedback control with input saturation. IEEE Trans Autom Control 51(12):1989–1995
Al-Tamimi A, Lewis FL (2007) Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. In: Proceedings of IEEE international symposium on approximate dynamic programming and reinforcement learning, Honolulu, HI, pp 38–43
Al-Tamimi A, Abu-Khalaf M, Lewis FL (2007) Adaptive critic designs for discrete-time zero-sum games with application to H ∞ control. IEEE Trans Syst Man Cybern, Part B, Cybern 37(1):240–247
Al-Tamimi A, Lewis FL, Abu-Khalaf M (2007) Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control. Automatica 43(3):473–481
Al-Tamimi A, Lewis FL, Wang Y (2007) Model-free H-infinity load-frequency controller design for power systems. In: Proceedings of IEEE international symposium on intelligent control, pp 118–125
Al-Tamimi A, Lewis FL, Abu-Khalaf M (2008) Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Trans Syst Man Cybern, Part B, Cybern 38(4):943–949
Bardi M, Capuzzo-Dolcetta I (1997) Optimal control and viscosity solutions of Hamilton–Jacobi–Bellman equations. Birkhauser, Boston
Barto AG, Sutton RS, Anderson CW (1983) Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Trans Syst Man Cybern 13(5):835–846
Beard RW (1995) Improving the closed-loop performance of nonlinear systems. PhD dissertation, Rensselaer Polytech Institute, Troy, NY
Beard RW, Saridis GN (1998) Approximate solutions to the timeinvariant Hamilton–Jacobi–Bellman equation. J Optim Theory Appl 96(3):589–626
Beard RW, Saridis GN, Wen JT (1997) Galerkin approximations of the generalized Hamilton–Jacobi–Bellman equation. Automatica 33(12):2159–2177
Bellman RE (1957) Dynamic programming. Princeton University Press, Princeton
Bertsekas DP (2003) Convex analysis and optimization. Athena Scientific, Belmont
Bertsekas DP, Tsitsiklis JN (1996) Neuro-dynamic programming. Athena Scientific, Belmont
Bertsekas DP, Homer ML, Logan DA, Patek SD, Sandell NR (2000) Missile defense and interceptor allocation by neuro-dynamic programming. IEEE Trans Syst Man Cybern, Part A, Syst Hum 30(1):42–51
Bhasin S, Sharma N, Patre P, Dixon WE (2011) Asymptotic tracking by a reinforcement learning-based adaptive critic controller. J Control Theory Appl 9(3):400–409
Blackmore L, Rajamanoharan S, Williams BC (2008) Active estimation for jump Markov linear systems. IEEE Trans Autom Control 53(10):2223–2236
Bradtke SJ, Ydestie BE, Barto AG (1994) Adaptive linear quadratic control using policy iteration. In: Proceedings of the American control conference, Baltimore, Maryland, pp 3475–3476
Bryson AE, Ho YC (1975) Applied optimal control: optimization, estimation, and control. Hemisphere–Wiley, New York
Busoniu L, Babuska R, Schutter BD, Ernst D (2010) Reinforcement learning and dynamic programming using function approximators. CRC Press, Boca Raton
Campos J, Lewis FL (1999) Adaptive critic neural network for feedforward compensation. In: Proceedings of American control conference, San Diego, CA, pp 2813–2818
Chang HS, Marcus SI (2003) Two-person zero-sum Markov games: receding horizon approach. IEEE Trans Autom Control 48(11):1951–1961
Chen Z, Jagannathan S (2008) Generalized Hamilton–Jacobi–Bellman formulation-based neural network control of affine nonlinear discrete-time systems. IEEE Trans Neural Netw 19(1):90–106
Chen BS, Tseng CS, Uang HJ (2002) Fuzzy differential games for nonlinear stochastic systems: suboptimal approach. IEEE Trans Fuzzy Syst 10(2):222–233
Cheng T, Lewis FL, Abu-Khalaf M (2007) Fixed-final-time-constrained optimal control of nonlinear systems using neural network HJB approach. IEEE Trans Neural Netw 18(6):1725–1736
Cheng T, Lewis FL, Abu-Khalaf M (2007) A neural network solution for fixed-final time optimal control of nonlinear systems. Automatica 43(3):482–490
Costa OLV, Tuesta EF (2003) Finite horizon quadratic optimal control and a separation principle for Markovian jump linear systems. IEEE Trans Autom Control 48:1836–1842
Dalton J, Balakrishnan SN (1996) A neighboring optimal adaptive critic for missile guidance. Math Comput Model 23:175–188
Dreyfus SE, Law AM (1977) The art and theory of dynamic programming. Academic Press, New York
Engwerda J (2008) Uniqueness conditions for the affine open-loop linear quadratic differential game. Automatica 44(2):504–511
Enns R, Si J (2002) Apache helicopter stabilization using neural dynamic programming. J Guid Control Dyn 25(1):19–25
Enns R, Si J (2003) Helicopter trimming and tracking control using direct neural dynamic programming. IEEE Trans Neural Netw 14(4):929–939
Ferrari S, Stengel RF (2004) Online adaptive critic flight control. J Guid Control Dyn 27(5):777–786
Ferrari S, Steck JE, Chandramohan R (2008) Adaptive feedback control by constrained approximate dynamic programming. IEEE Trans Syst Man Cybern, Part B, Cybern 38(4):982–987
Goebel R (2002) Convexity in zero-sum differential games. In: Proceedings of the 41th IEEE conference on decision and control, Las Vegas, Nevada, pp 3964–3969
Goulart PJ, Kerrigan EC, Alamo T (2009) Control of constrained discrete-time systems with bounded L 2 gain. IEEE Trans Autom Control 54(5):1105–1111
Gu D (2008) A differential game approach to formation control. IEEE Trans Control Syst Technol 16(1):85–93
Hanselmann T, Noakes L, Zaknich A (2007) Continuous-time adaptive critics. IEEE Trans Neural Netw 18(3):631–647
He P, Jagannathan S (2005) Reinforcement learning-based output feedback control of nonlinear systems with input constraints. IEEE Trans Syst Man Cybern, Part B, Cybern 35(1):150–154
He P, Jagannathan S (2007) Reinforcement learning neural-network-based controller for nonlinear discrete-time systems with input constraints. IEEE Trans Syst Man Cybern, Part B, Cybern 37(2):425–436
Hou ZG, Wu CP (2005) A dynamic programming neural network for large-scale optimization problems. Acta Autom Sin 25(1):46–51
Hua X, Mizukami K (1994) Linear-quadratic zero-sum differential games for generalized state space systems. IEEE Trans Autom Control 39(1):143–147
Hwnag KS, Chiou JY, Chen TY (2004) Reinforcement learning in zero-sum Markov games for robot soccer systems. In: Proceedings of the 2004 IEEE international conference on networking, sensing and control, Taipei, Taiwan, pp 1110–1114
Jamshidi M (1982) Large-scale systems-modeling and control. North-Holland, Amsterdam
Javaherian H, Liu D, Zhang Y, Kovalenko O (2004) Adaptive critic learning techniques for automotive engine control. In: Proceedings of American control conference, Boston, MA, pp 4066–4071
Jimenez M, Poznyak A (2006) Robust and adaptive strategies with pre-identification via sliding mode technique in LQ differential games. In: Proceedings of American control conference Minneapolis, Minnesota, USA, pp 14–16
Kim YJ, Lim MT (2008) Parallel optimal control for weakly coupled nonlinear systems using successive Galerkin approximation. IEEE Trans Autom Control 53(6):1542–1547
Kleinman D (1968) On an iterative technique for Riccati equation computations. IEEE Trans Autom Control 13(1):114–115
Kulkarni NV, KrishnaKumar K (2003) Intelligent engine control using an adaptive critic. IEEE Trans Control Syst Technol 11:164–173
Landelius T (1997) Reinforcement learning and distributed local model synthesis. PhD dissertation, Linkoping University, Sweden
Laraki R, Solan E (2005) The value of zero-sum stopping games in continuous time. SIAM J Control Optim 43(5):1913–1922
Lendaris GG (2008) Higher level application of ADP: a nextphase for the control field. IEEE Trans Syst Man Cybern, Part B, Cybern 38(4):901–912
Lendaris GG, Paintz C (1997) Training strategies for critic and action neural networks in dual heuristic programming method. In: Proceedings of the 1997 IEEE international conference on neural networks, Houston, TX, pp 712–717
Leslie DS, Collins EJ (2005) Individual Q-learning in normal form games. SIAM J Control Optim 44(2):495–514
Lewis FL (1992) Applied optimal control and estimation. Texas instruments. Prentice Hall, Englewood Cliffs
Lewis FL, Liu D (2012) Reinforcement learning and approximate dynamic programming for feedback control. IEEE press series on computational intelligence. Wiley, New York
Lewis FL, Syrmos VL (1992) Optimal control. Wiley, New York
Lin CS, Kim H (1991) CMAC-based adaptive critic self-learning control. IEEE Trans Neural Netw 2(5):530–533
Liu X, Balakrishnan SN (2000) Convergence analysis of adaptive critic based optimal control. In: Proceedings of American control conference, Chicago, Illinois, pp 1929–1933
Liu D, Zhang HG (2005) A neural dynamic programming approach for learning control of failure avoidance problems. Int J Intell Control Syst 10(1):21–32
Liu D, Xiong X, Zhang Y (2001) Action-dependent adaptive critic designs. In: Proceeding of international joint conference on neural networks, Washington, DC, pp 990–995
Liu D, Zhang Y, Zhang HG (2005) A self-learning call admission control scheme for CDMA cellular networks. IEEE Trans Neural Netw 16(5):1219–1228
Liu D, Javaherian H, Kovalenko O, Huang T (2008) Adaptive critic learning techniques for engine torque and air–fuel ratio control. IEEE Trans Syst Man Cybern, Part B, Cybern 38(4):988–993
Lu C, Si J, Xie X (2008) Direct heuristic dynamic programming for damping oscillations in a large power system. IEEE Trans Syst Man Cybern, Part B, Cybern 38(4):1008–1013
Lyshevski SE (2002) Optimization of dynamic systems using novel performance functionals. In: Proceedings of 41st IEEE conference on decision and control, Las Vegas, Nevada, pp 753–758
Malek-Zavarei M, Jashmidi M (1987) Time-delay systems: analysis, optimization and applications North-Holland, Amsterdam, pp 80–96
Murray JJ, Cox CJ, Lendaris GG, Saeks R (2002) Adaptive dynamic programming. IEEE Trans Syst Man Cybern, Part C, Appl Rev 32(2):140–153
Murray JJ, Cox CJ, Saeks R (2003) The adaptive dynamic programming theorem. In: Liu D, Antsaklis PJ (eds) Stability and control of dynamical systems with applications. Birkhäser, Boston, pp 379–394
Necoara I, Kerrigan EC, Schutter BD, Boom T (2007) Finite-horizon min–max control of max-plus-linear systems. IEEE Trans Autom Control 52(6):1088–1093
Owen G (1982) Game theory. Academic Press, New York
Padhi R, Unnikrishnan N, Wang X, Balakrishnan SN (2006) A single network adaptive critic (SNAC) architecture for optimal control synthesis for a class of nonlinear systems. Neural Netw 19(10):1648–1660
Parisini T, Zoppoli R (1998) Neural approximations for infinite-horizon optimal control of nonlinear stochastic systems. IEEE Trans Neural Netw 9(6):1388–1408
Park JW, Harley RG, Venayagamoorthy GK (2003) Adaptive-critic-based optimal neurocontrol for synchronous generators in a power system using MLP/RBF neural networks. IEEE Trans Ind Appl 39:1529–1540
Powell WB (2011) Approximate dynamic programming: solving the curses of dimensionality, 2nd edn. Wiley, Princeton
Prokhorov DV, Wunsch DC (1997) Adaptive critic designs. IEEE Trans Neural Netw 8(5):997–1007
Ray S, Venayagamoorthy GK, Chaudhuri B, Majumder R (2008) Comparison of adaptive critic-based and classical wide-area controllers for power systems. IEEE Trans Syst Man Cybern, Part B, Cybern 38(4):1002–1007
Rovithakis GA (2001) Stable adaptive neuro-control design via Lyapunov function derivative estimation. Automatica 37(8):1213–1221
Saeks RE, Cox CJ, Mathia K, Maren AJ (1997) Asymptotic dynamic programming: preliminary concepts and results. In: Proceedings of the 1997 IEEE international conference on neural networks, Houston, TX, pp 2273–2278
Saridis GN, Lee CS (1979) An approximation theory of optimal control for trainable manipulators. IEEE Trans Syst Man Cybern 9(3):152–159
Saridis GN, Wang FY (1994) Suboptimal control of nonlinear stochastic systems. Control Theory Adv Technol 10(4):847–871
Seiffertt J, Sanyal S, Wunsch DC (2008) Hamilton–Jacobi–Bellman equations and approximate dynamic programming on time scales. IEEE Trans Syst Man Cybern, Part B, Cybern 38(4):918–923
Shervais S, Shannon TT, Lendaris GG (2003) Intelligent supply chain management using adaptive critic learning. IEEE Trans Syst Man Cybern, Part A, Syst Hum 33(2):235–244
Shih P, Kaul B, Jagannathan S, Drallmeier J (2007) Near optimal output-feedback control of nonlinear discrete-time systems in nonstrict feedback form with application to engines. In: Proceedings of international joint conference on neural networks, Orlando, Florida, pp 396–401
Si J, Wang YT (2001) On-line learning control by association and reinforcement. IEEE Trans Neural Netw 12(2):264–276
Si J, Barto A, Powell W, Wunsch D (2004) Handbook of learning dynamic programming. Wiley, New Jersey
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge
Tesauro GJ (2000) Practical issues in temporal difference learning. Mach Learn 8:257–277
Tsitsiklis JN (1995) Efficient algorithms for globally optimal trajectories. IEEE Trans Autom Control 40(9):1528–1538
Uchida K, Fujita M (1992) Finite horizon H ∞ control problems with terminal penalties. IEEE Trans Autom Control 37(11):1762–1767
Vamvoudakis KG, Lewis FL (2010) Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 46:878–888
Venayagamoorthy GK, Harley RG, Wunsch DG (2002) Comparison of heuristic dynamic programming and dual heuristic programming adaptive critics for neurocontrol of a turbogenerator. IEEE Trans Neural Netw 13:764–773
Vrabie D, Abu-Khalaf M, Lewis FL, Wang Y (2007) Continuous-time ADP for linear systems with partially unknown dynamics. In: Proceedings of the 2007 IEEE symposium on approximate dynamic programming and reinforcement learning, Honolulu, USA, pp 247–253
Vrabie D, Vamvoudakis KG, Lewis FL (2012) Optimal adaptive control and differential games by reinforcement learning principles. IET Press, London
Watkins C (1989) Learning from delayed rewards. PhD dissertation, Cambridge University, Cambridge, England
Werbos PJ (1977) Advanced forecasting methods for global crisis warning and models of intelligence. Gen Syst Yearbook 22:25–38
Werbos PJ (1987) Building and understanding adaptive systems: a statistical/numerical approach to factory automation and brain research. IEEE Trans Syst Man Cybern 17(1):7–20
Werbos PJ (1990) Consistency of HDP applied to a simple reinforcement learning problem. Neural Netw 3(2):179–189
Werbos PJ (1990) A menu of designs for reinforcement learning over time. In: Miller WT, Sutton RS, Werbos PJ (eds) Neural networks for control. MIT Press, Cambridge, pp 67–95
Werbos PJ (1992) Approximate dynamic programming for real-time control and neural modeling. In: White DA, Sofge DA (eds) Handbook of intelligent control: neural, fuzzy and adaptive approaches. Van Nostrand, New York, chap 13
Werbos PJ (2007) Using ADP to understand and replicate brain intelligence: the next level design. In: Proceedings of IEEE symposium on approximate dynamic programming and reinforcement learning, Honolulu, HI, pp 209–216
Widrow B, Gupta N, Maitra S (1973) Punish/reward: learning with a critic in adaptive threshold systems. IEEE Trans Syst Man Cybern 3(5):455–465
Wiering MA, Hasselt HV (2008) Ensemble algorithms in reinforcement learning. IEEE Trans Syst Man Cybern, Part B, Cybern 38(4):930–936
Yadav V, Padhi R, Balakrishnan SN (2007) Robust/optimal temperature profile control of a high-speed aerospace vehicle using neural networks. IEEE Trans Neural Netw 18(4):1115–1128
Yang Q, Jagannathan S (2007) Online reinforcement learning neural network controller design for nanomanipulation. In: Proceedings of IEEE symposium on approximate dynamic programming and reinforcement learning, Honolulu, HI, pp 225–232
Yang L, Enns R, Wang YT, Si J (2003) Direct neural dynamic programming. In: Liu D, Antsaklis PJ (eds) Stability and control of dynamical systems with applications. Birkhauser, Boston
Yang F, Wang Z, Feng G, Liu X (2009) Robust filtering with randomly varying sensor delay: the finite-horizon case. IEEE Trans Circuits Syst I, Regul Pap 56(3):664–672
Yen GG, DeLima PG (2005) Improving the performance of globalized dual heuristic programming for fault tolerant control through an online learning supervisor. IEEE Trans Autom Sci Eng 2(2):121–131
Yen GG, Hickey TW (2004) Reinforcement learning algorithms for robotic navigation in dynamic environments. ISA Trans 43:217–230
Zadorojniy A, Shwartz A (2006) Robustness of policies in constrained Markov decision processes. IEEE Trans Autom Control 51(4):635–638
Zattoni E (2008) Structural invariant subspaces of singular Hamiltonian systems and nonrecursive solutions of finite-horizon optimal control problems. IEEE Trans Autom Control 53(5):1279–1284
Zhang HG, Wang ZS (2007) Global asymptotic stability of delayed cellular neural networks. IEEE Trans Neural Netw 18(3):947–950
Zhang P, Deng H, Xi J (2005) On the value of two-person zero-sum linear quadratic differential games. In: Proceedings of the 44th IEEE conference on decision and control, and the European control conference. Seville, Spain, pp 12–15
Zhang HG, Lun SX, Liu D (2007) Fuzzy H(infinity) filter design for a class of nonlinear discrete-time systems with multiple time delays. IEEE Trans Fuzzy Syst 15(3):453–469
Zhang HG, Wang ZS, Liu D (2007) Robust exponential stability of recurrent neural networks with multiple time-varying delays. IEEE Trans Circuits Syst II, Express Briefs 54(8):730–734
Zhang HS, Xie L, Duan G (2007) H ∞ control of discrete-time systems with multiple input delays. IEEE Trans Autom Control 52(2):271–283
Zhang HG, Yang DD, Chai TY (2007) Guaranteed cost networked control for T-S fuzzy systems with time delays. IEEE Trans Syst Man Cybern, Part C, Appl Rev 37(2):160–172
Zhang HG, Ma TD, Huang GB (2010) Robust global exponential synchronization of uncertain chaotic delayed neural networks via dual-stage impulsive control. IEEE Trans Syst Man Cybern, Part B, Cybern 40(3):831–844
Zhang HG, Cui LL, Zhang X, Luo YH (2011) Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method. IEEE Trans Neural Netw 22(12):2226–2236
Zhang HG, Wei QL, Liu D (2011) An iterative approximate dynamic programming method to solve for a class of nonlinear zero-sum differential games. Automatica 47(1):207–214
Zhao Y, Patek SD, Beling PA (2008) Decentralized Bayesian search using approximate dynamic programming methods. IEEE Trans Syst Man Cybern, Part B, Cybern 38(4):970–975
Zheng CD, Zhang HG, Wang ZS (2010) An augmented LKF approach involving derivative information of both state and delay. IEEE Trans Neural Netw 21(7):1100–1109
Zheng CD, Zhang HG, Wang ZS (2011) Novel exponential stability criteria of high-order neural networks with time-varying delays. IEEE Trans Syst Man Cybern, Part B, Cybern 41(2):486–496
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag London
About this chapter
Cite this chapter
Zhang, H., Liu, D., Luo, Y., Wang, D. (2013). Overview. In: Adaptive Dynamic Programming for Control. Communications and Control Engineering. Springer, London. https://doi.org/10.1007/978-1-4471-4757-2_1
Download citation
DOI: https://doi.org/10.1007/978-1-4471-4757-2_1
Publisher Name: Springer, London
Print ISBN: 978-1-4471-4756-5
Online ISBN: 978-1-4471-4757-2
eBook Packages: EngineeringEngineering (R0)