Adaptive Dynamic Programming - Discrete Version

Szuster, Marcin; Hendzel, Zenon

doi:10.1007/978-3-319-68826-8_6

Marcin Szuster⁴ &
Zenon Hendzel⁴

Part of the book series: Studies in Systems, Decision and Control ((SSDC,volume 120))

1299 Accesses

Abstract

This chapter presents the application of adaptive structures to the Bellman’s DP method to approximate the value function. Such action resulted in the creation of a family of neural dynamic programming algorithms that can be used for on-line control of a dynamic objects. The chapter also looks at the main features of the aforementioned family of algorithms and provides a descripion of selected actor-critic learning methods such as heuristic dynamic programming, dual-heuristic dynamic programming and global dual-heuristic dynamic programming which assume availability of a mathematical model, as well as model-free methods i.e. action-dependent heuristic dynamic programming algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

Astrom, K.J., Wittenmark, B.: Adaptive Control. Addison-Wesley, New York (1979)
Google Scholar
Baird III, L.C.: Reinforcement learning in continuous time: advantage updating. In: Proceedings of the IEEE International Conference on Neural Networks, pp. 2448–2453 (1994)
Google Scholar
Balaji, P.G., German, X., Srinivasan, D.: Urban traffic signal control using reinforcement learning agents. IET Intell. Transp. Sy. 4, 177–188 (2010)
Google Scholar
Barto, A., Sutton, R.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Google Scholar
Barto, A., Mahadevan, S.: Recent advances in hierarchical reinforcement learning. Discrete Event Dyn. Syst. 13, 343–379 (2003)
MathSciNet MATH Google Scholar
Barto, A., Sutton, R., Anderson, C.: Neuronlike adaptive elements that can solve difficult learning problems. EEE Trans. Syst., Man, Cybern., Syst. I 13, 834–846 (1983)
Article Google Scholar
Carreras, M., Yuh, J., Batlle, J., Ridao, P.: A behaviorbased scheme using reinforcement learning for autonomous underwater vehicles. IEEE J. Ocean. Eng. 30, 416–427 (2005)
Article Google Scholar
Cichosz, P.: Learning Systems. (in Polish). WNT, Warsaw (2000)
Google Scholar
Doya, K.: Reinforcement learning in continuous time and space. Neural Comput. 12, 219–245 (2000)
Article Google Scholar
Ernst, D., Glavic M., Wehenkel, L.: Power systems stability control: reinforcement learning framework. IEEE Trans. Power Syst. 19, 427–435 (2004)
Google Scholar
Fairbank, M., Alonso, E., Prokhorov, D.: Simple and fast calculation of the second-order gradients for globalized dual heuristic dynamic programming in neural networks. IEEE Trans. Neural Netw. Learn. Syst. 23, 1671–1676 (2012)
Google Scholar
Ferrari, S.: Algebraic and Adaptive Learning in Neural Control Systems. Ph.D. Thesis, Princeton University, Princeton (2002)
Google Scholar
Ferrari, S., Stengel, R.F.: An adaptive critic global controller. In: Proceedings of American Control Conference, vol. 4, pp. 2665–2670. Anchorage, Alaska (2002)
Google Scholar
Ferrari, S., Stengel, R.F.: Model-based adaptive critic designs in learning and approximate dynamic programming. In: Si, J., Barto, A., Powell, W., Wunsch, D.J. (eds.) Handbook of Learning and Approximate Dynamic Programming, pp. 64–94. Wiley, New York (2004)
Google Scholar
Gierlak, P., Szuster, M., ylski, W.: Discrete dual-heuristic programming in 3DOF manipulator control. Lect. Notes Artif. Int. 6114, 256–263 (2010)
Google Scholar
Hagen, S., Krose, B.: Neural Q-learning. Neural. Comput. Appl. 12, 81–88 (2003)
Article Google Scholar
Han, D., Balakrishnan, S.: Adaptive critic based neural networks for control-constrained agile missile control. Proc. Am. Control Conf. 4, 2600–2605 (1999)
Google Scholar
Hanselmann, T., Noakes, L., Zaknich, A.: Continuous-time adaptive critics. IEEE Trans. Neural Netw. 18, 631–647 (2007)
Article Google Scholar
Hendzel, Z., Burghardt, A., Szuster, M.: Reinforcement learning in discrete neural control of the underactuated system. Lect. Notes Artif. Int. 7894, 64–75 (2013)
Google Scholar
Hendzel, Z., Szuster, M.: Discrete model-based dual heuristic programming in wheeled mobile robot control. In: Awrejcewicz, J., Kamierczak, M., Olejnik, P., Mrozowski, J. (eds.) Dynamical Systems - Theory and Applications, pp. 745–752. Left Grupa, Lodz (2009)
Google Scholar
Hendzel, Z., Szuster, M.: Heuristic dynamic programming in wheeled mobile robot control. In: Kaszyski, R., Pietrusewicz, K. (eds.) Methods and Models in Automation and Robotics, pp. 513–518. IFAC, Poland (2009)
Google Scholar
Hendzel, Z., Szuster, M.: Discrete action dependant heuristic dynamic programming in wheeled mobile robot control. Solid State Phenom. 164, 419–424 (2010)
Article MATH Google Scholar
Hendzel, Z., Szuster, M.: Discrete model-based adaptive critic designs in wheeled mobile robot control. Lect. Notes Artif. Int. 6114, 264–271 (2010)
Google Scholar
Hendzel, Z., Szuster, M.: Discrete neural dynamic programming in wheeled mobile robot control. Commun. Nonlinear. Sci. Numer. Simul. 16, 2355–2362 (2011)
Article MathSciNet MATH Google Scholar
Hendzel, Z., Szuster, M.: Adaptive dynamic programming methods in control of wheeled mobile robot. Int. J. Appl. Mech. Eng. 17, 837–851 (2012)
MATH Google Scholar
Hendzel, Z., Szuster, M.: Globalised dual heuristic dynamic programming in control of nonlinear dynamical system. In: Awrejcewicz, J., Kamierczak, M., Olejnik, P., Mrozowski, J. (eds.) Dynamical Systems: Applications, pp. 123–134. WPL, Lodz (2013)
Google Scholar
Iftekharuddin, K.M.: Transformation invariant on-line target recognition. IEEE Trans. Neural Netw. 22, 906–918 (2011)
Google Scholar
Kareem Jaradat, M.A., Al-Rousan M., Quadan, L.: Reinforcement based mobile robot navigation in dynamic environment. Robot. Cim.-Int. Manuf. 27, 135–149 (2011)
Google Scholar
Lendaris, G., Schultz, L., Shannon, T.: Adaptive critic design for intelligent steering and speed control of a 2-axle vehicle. In: Proceedings of the IEEE INNS-ENNS International Joint Conference on Neural Networks, vol. 3, pp. 73–78 (2000)
Google Scholar
Lendaris, G., Shannon, T.: Application considerations for the DHP methodology. In: Proceedings of the IEEE International Joint Conference on Neural Networks, vol. 2, pp. 1013–1018 (1998)
Google Scholar
Lewis, F.L., Liu, D., Lendaris, G.G.: Guest editorial: special issue on adaptive dynamic programming and reinforcement learning in feedback control. IEEE Trans. Syst. Man Cybern. B Cybern. 38, 896–897 (2008)
Google Scholar
Lewis, F.L., Vrabie, D.: Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst. Mag. 9, 32–50 (2009)
Google Scholar
Liu, D., Wang, D., Yang X.: An iterative adaptive dynamic programming algorithm for optimal control of unknown discrete-time nonlinear systems with constrained inputs. Inform. Sci. 220, 331–342 (2013)
Google Scholar
Millán, J.,del R.: Reinforcement learning of goal-directed obstacle-avoiding reaction strategies in an autonomous mobile robot. Robot. Auton. Syst. 15, 275–299 (1995)
Google Scholar
Mohagheghi, S., Venayagamoorthy, G.K., Harley, R.G.: Adaptive critic design based neuro-fuzzy controller for a static compensator in a multimachine power system. IEEE Trans. Power Syst. 21, 1744–1754 (2006)
Google Scholar
Ni, Z., He, H.: Heuristic dynamic programming with internal goal representation. Soft Comput. 17, 2101–2108 (2013)
Google Scholar
Ni, Z., He, H., Wen, J., Xu, X.: Goal representation heuristic dynamic programming on maze navigation. IEEE Trans. Neural Netw. Learn. Syst. 24, 2038–2050 (2013)
Google Scholar
Ni, Z., He, H., Zhao, D., Xu, X., Prokhorov, D.V.: Grdhp: A general utility function representation for dual heuristic dynamic programming. IEEE Trans. Neural Netw. Learn. Syst 26, 614–627 (2015)
Google Scholar
Ng, A.Y., Kim, H.J., Jordan, M.I., Sastry, S.: Autonomous helicopter flight via reinforcement learning. Adv. Neural Inf. Process. Syst. 16 (2004)
Google Scholar
Peters, J., Schaal, S.: Natural actor-critic. Neurocomputing 71, 1180–1190 (2008)
Google Scholar
Powell, W.B.: Approximate Dynamic Programming: Solving the Curses of Dimensionality. Princeton, Willey-Interscience (2007)
Book MATH Google Scholar
Prokhorov, D., Wunch, D.: Adaptive critic designs. IEEE Trans. Neural Netw. 8, 997–1007 (1997)
Article Google Scholar
Rutkowski, L.: Computational Intelligence - Methods and Techniques (in Polish). Polish Scientific Publishers PWN, Warsaw (2005)
Google Scholar
Si, J., Barto, A.G., Powell, W.B., Wunsch, D.: Handbook of Learning and Approximate Dynamic Programming. IEEE Press, Wiley-Interscience, Hoboken (2004)
Book Google Scholar
Shannon, T., Lendaris, G.: A new hybrid critic–training method for approximate dynamic programming. In: Proceedings of International Society for the System Sciences (2000)
Google Scholar
Szuster, M., Hendzel, Z., Burghardt, A.: Fuzzy sensor-based navigation with neural tracking control of the wheeled mobile robot. Lect. Notes Artif. Int. 8468, 302–313 (2014)
MATH Google Scholar
Szuster, M., Hendzel, Z.: Discrete globalised dual heuristic dynamic programming in control of the two-wheeled mobile robot. Math. Probl. Eng. 2014, 1–16 (2014)
Article Google Scholar
Szuster, M., Gierlak, P.: Approximate dynamic programming in tracking control of a robotic manipulator. Int. J. Adv. Robot. Syst. 13, 1–18 (2016)
Article Google Scholar
Szuster, M., Gierlak, P.: Globalised dual heuristic dynamic programming in control of robotic manipulator. AMM 817, 150–161 (2016)
Article Google Scholar
Szuster, M.: Globalised dual heuristic dynamic programming in tracking control of the wheeled mobile robot. Lect. Notes Artif. Int. 8468, 290–301 (2014)
Google Scholar
Syam, R., Watanabe, K., Izumi, K.: Adaptive actor-critic learning for the control of mobile robots by applying predictive models. Soft. Comput. 9, 835–845 (2005)
Article MATH Google Scholar
Syam, R., Watanabe, K., Izumi, K., Kiguchi, K.: Control of nonholonomic mobile robot by an adaptive-critic method with simulated experience based value functions. In: Proceedings of the IEEE International Conference of Robotics and Automation, vol. 4, pp. 3960–3965 (2002)
Google Scholar
Vamvoudakis, K.G., Lewis, F.L.: Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 46, 878–888 (2010)
Google Scholar
Vamvoudakis, K.G., Lewis, F.L.: Multi-player non-zerosum games: online adaptive learning solution of coupled Hamilton-Jacobi equations. Automatica 47, 1556–1569 (2011)
Google Scholar
Venayagamoorthy, G.K., Harley, R.G., Wunsch, D.C.: Comparison of heuristic dynamic programming and dual heuristic programming adaptive critics of a turbogenerator. IEEE Trans. Neural Netw. 13, 764–773 (2002)
Article Google Scholar
Venayagamoorthy, G.K., Wunsch, D.C., Harley, R.G.: Adaptive critic based neurocontroller for turbogenerators with global dual heuristic programming. In: Proceedings of the IEEE Power Engineering Society Winter Meeting, vol. 1, pp. 291–294 (2000)
Google Scholar
Visnevski, N., Prokhorov, D.: Control of a nonlinear multivariable system with adaptive critic designs. In: Proceedings of Artificial Neural Networks in Engineering, vol. 6, pp. 559–565 (1996)
Google Scholar
Vrabie, D., Lewis, F.: Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems. Neural Netw. 22, 237–246 (2009)
Google Scholar
Wang, D., Liu, D., Wei, Q.: Finite-horizon neuro-optimal tracking control for a class of discrete-time nonlinear systems using adaptive dynamic programming approach. Neurocomputing 78, 14–22 (2012)
Google Scholar
Wang, D., Liu D., Wei, Q., Zhao D., Jin, N.: Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming. Automatica 48, 1825–1832 (2012)
Google Scholar
Wang, D., Liu, D., Zhao, D., Huang, Y., Zhang, D.: A neural network-based iterative GDHP approach for solving a class of nonlinear optimal control problems with control constraints. Meural Comput. Appl. 22, 219–227 (2013)
Google Scholar
Wang, F.-Y., Zhang H., Liu D.: Adaptive dynamic programming: an introduction. IEEE Comput. Intell. Mag. 4, 39–47 (2009)
Google Scholar
Xu, X., Hou, Z., Lian, C., He, H.: Online learning control using adaptive critic designs with sparse kernel machines. IEEE Trans. Neural Netw. Learn. Syst. 24, 762–775 (2013)
Google Scholar
Xu, X., Wang, X., Hu, D.: Mobile robot path-tracking using an adaptive critic learning PD controller. Lect. Notes Comput. Sci. 3174, 25–34 (2004)
Article Google Scholar
Xu, X., Zuo, L., Huang, Z.: Reinforcement learning algorithms with function approximation: recent advances and applications. Inform. Sci. 261, 1–31 (2014)
Google Scholar
Zhang, H., Cui, L., Zhang, X., Luo, Y.: Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method. IEEE Trans. Neural Netw. 22, 2226–2236 (2011)
Google Scholar
Zelinsky, A., Gaskett, C., Wettergreen, D.: Q-learning in continous state and action spaces. In: Proceedings of Australian Joint Conference on Artificial Intelligence, pp. 417–428. Springer (1999)
Google Scholar
Zhang, X., Zhang, H., Luo, Y.: Adaptive dynamic programming-based optimal control of unknown nonaffine nonlinear discrete-time systems with proof of convergence. Neurocomputing 91, 48–55 (2012)
Google Scholar
Zhong, X., Ni, Z., He, H.: A theoretical foundation of goal representation heuristic dynamic programming. IEEE Trans. Neural Netw. Learn. Syst. PP, 1–13 (2105)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Applied Mechanics and Robotics, Faculty of Mechanical Engineering and Aeronautics, Rzeszow University of Technology, Rzeszow, Poland
Marcin Szuster & Zenon Hendzel

Authors

Marcin Szuster
View author publications
You can also search for this author in PubMed Google Scholar
Zenon Hendzel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marcin Szuster .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Szuster, M., Hendzel, Z. (2018). Adaptive Dynamic Programming - Discrete Version. In: Intelligent Optimal Adaptive Control for Mechatronic Systems. Studies in Systems, Decision and Control, vol 120. Springer, Cham. https://doi.org/10.1007/978-3-319-68826-8_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-68826-8_6
Published: 28 December 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68824-4
Online ISBN: 978-3-319-68826-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics