Skip to main content

Adaptive Dynamic Programming - Discrete Version

  • Chapter
  • First Online:
Book cover Intelligent Optimal Adaptive Control for Mechatronic Systems

Part of the book series: Studies in Systems, Decision and Control ((SSDC,volume 120))

  • 1299 Accesses

Abstract

This chapter presents the application of adaptive structures to the Bellman’s DP method to approximate the value function. Such action resulted in the creation of a family of neural dynamic programming algorithms that can be used for on-line control of a dynamic objects. The chapter also looks at the main features of the aforementioned family of algorithms and provides a descripion of selected actor-critic learning methods such as heuristic dynamic programming, dual-heuristic dynamic programming and global dual-heuristic dynamic programming which assume availability of a mathematical model, as well as model-free methods i.e. action-dependent heuristic dynamic programming algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  1. Astrom, K.J., Wittenmark, B.: Adaptive Control. Addison-Wesley, New York (1979)

    Google Scholar 

  2. Baird III, L.C.: Reinforcement learning in continuous time: advantage updating. In: Proceedings of the IEEE International Conference on Neural Networks, pp. 2448–2453 (1994)

    Google Scholar 

  3. Balaji, P.G., German, X., Srinivasan, D.: Urban traffic signal control using reinforcement learning agents. IET Intell. Transp. Sy. 4, 177–188 (2010)

    Google Scholar 

  4. Barto, A., Sutton, R.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)

    Google Scholar 

  5. Barto, A., Mahadevan, S.: Recent advances in hierarchical reinforcement learning. Discrete Event Dyn. Syst. 13, 343–379 (2003)

    MathSciNet  MATH  Google Scholar 

  6. Barto, A., Sutton, R., Anderson, C.: Neuronlike adaptive elements that can solve difficult learning problems. EEE Trans. Syst., Man, Cybern., Syst. I 13, 834–846 (1983)

    Article  Google Scholar 

  7. Carreras, M., Yuh, J., Batlle, J., Ridao, P.: A behaviorbased scheme using reinforcement learning for autonomous underwater vehicles. IEEE J. Ocean. Eng. 30, 416–427 (2005)

    Article  Google Scholar 

  8. Cichosz, P.: Learning Systems. (in Polish). WNT, Warsaw (2000)

    Google Scholar 

  9. Doya, K.: Reinforcement learning in continuous time and space. Neural Comput. 12, 219–245 (2000)

    Article  Google Scholar 

  10. Ernst, D., Glavic M., Wehenkel, L.: Power systems stability control: reinforcement learning framework. IEEE Trans. Power Syst. 19, 427–435 (2004)

    Google Scholar 

  11. Fairbank, M., Alonso, E., Prokhorov, D.: Simple and fast calculation of the second-order gradients for globalized dual heuristic dynamic programming in neural networks. IEEE Trans. Neural Netw. Learn. Syst. 23, 1671–1676 (2012)

    Google Scholar 

  12. Ferrari, S.: Algebraic and Adaptive Learning in Neural Control Systems. Ph.D. Thesis, Princeton University, Princeton (2002)

    Google Scholar 

  13. Ferrari, S., Stengel, R.F.: An adaptive critic global controller. In: Proceedings of American Control Conference, vol. 4, pp. 2665–2670. Anchorage, Alaska (2002)

    Google Scholar 

  14. Ferrari, S., Stengel, R.F.: Model-based adaptive critic designs in learning and approximate dynamic programming. In: Si, J., Barto, A., Powell, W., Wunsch, D.J. (eds.) Handbook of Learning and Approximate Dynamic Programming, pp. 64–94. Wiley, New York (2004)

    Google Scholar 

  15. Gierlak, P., Szuster, M., ylski, W.: Discrete dual-heuristic programming in 3DOF manipulator control. Lect. Notes Artif. Int. 6114, 256–263 (2010)

    Google Scholar 

  16. Hagen, S., Krose, B.: Neural Q-learning. Neural. Comput. Appl. 12, 81–88 (2003)

    Article  Google Scholar 

  17. Han, D., Balakrishnan, S.: Adaptive critic based neural networks for control-constrained agile missile control. Proc. Am. Control Conf. 4, 2600–2605 (1999)

    Google Scholar 

  18. Hanselmann, T., Noakes, L., Zaknich, A.: Continuous-time adaptive critics. IEEE Trans. Neural Netw. 18, 631–647 (2007)

    Article  Google Scholar 

  19. Hendzel, Z., Burghardt, A., Szuster, M.: Reinforcement learning in discrete neural control of the underactuated system. Lect. Notes Artif. Int. 7894, 64–75 (2013)

    Google Scholar 

  20. Hendzel, Z., Szuster, M.: Discrete model-based dual heuristic programming in wheeled mobile robot control. In: Awrejcewicz, J., Kamierczak, M., Olejnik, P., Mrozowski, J. (eds.) Dynamical Systems - Theory and Applications, pp. 745–752. Left Grupa, Lodz (2009)

    Google Scholar 

  21. Hendzel, Z., Szuster, M.: Heuristic dynamic programming in wheeled mobile robot control. In: Kaszyski, R., Pietrusewicz, K. (eds.) Methods and Models in Automation and Robotics, pp. 513–518. IFAC, Poland (2009)

    Google Scholar 

  22. Hendzel, Z., Szuster, M.: Discrete action dependant heuristic dynamic programming in wheeled mobile robot control. Solid State Phenom. 164, 419–424 (2010)

    Article  MATH  Google Scholar 

  23. Hendzel, Z., Szuster, M.: Discrete model-based adaptive critic designs in wheeled mobile robot control. Lect. Notes Artif. Int. 6114, 264–271 (2010)

    Google Scholar 

  24. Hendzel, Z., Szuster, M.: Discrete neural dynamic programming in wheeled mobile robot control. Commun. Nonlinear. Sci. Numer. Simul. 16, 2355–2362 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  25. Hendzel, Z., Szuster, M.: Adaptive dynamic programming methods in control of wheeled mobile robot. Int. J. Appl. Mech. Eng. 17, 837–851 (2012)

    MATH  Google Scholar 

  26. Hendzel, Z., Szuster, M.: Globalised dual heuristic dynamic programming in control of nonlinear dynamical system. In: Awrejcewicz, J., Kamierczak, M., Olejnik, P., Mrozowski, J. (eds.) Dynamical Systems: Applications, pp. 123–134. WPL, Lodz (2013)

    Google Scholar 

  27. Iftekharuddin, K.M.: Transformation invariant on-line target recognition. IEEE Trans. Neural Netw. 22, 906–918 (2011)

    Google Scholar 

  28. Kareem Jaradat, M.A., Al-Rousan M., Quadan, L.: Reinforcement based mobile robot navigation in dynamic environment. Robot. Cim.-Int. Manuf. 27, 135–149 (2011)

    Google Scholar 

  29. Lendaris, G., Schultz, L., Shannon, T.: Adaptive critic design for intelligent steering and speed control of a 2-axle vehicle. In: Proceedings of the IEEE INNS-ENNS International Joint Conference on Neural Networks, vol. 3, pp. 73–78 (2000)

    Google Scholar 

  30. Lendaris, G., Shannon, T.: Application considerations for the DHP methodology. In: Proceedings of the IEEE International Joint Conference on Neural Networks, vol. 2, pp. 1013–1018 (1998)

    Google Scholar 

  31. Lewis, F.L., Liu, D., Lendaris, G.G.: Guest editorial: special issue on adaptive dynamic programming and reinforcement learning in feedback control. IEEE Trans. Syst. Man Cybern. B Cybern. 38, 896–897 (2008)

    Google Scholar 

  32. Lewis, F.L., Vrabie, D.: Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst. Mag. 9, 32–50 (2009)

    Google Scholar 

  33. Liu, D., Wang, D., Yang X.: An iterative adaptive dynamic programming algorithm for optimal control of unknown discrete-time nonlinear systems with constrained inputs. Inform. Sci. 220, 331–342 (2013)

    Google Scholar 

  34. Millán, J.,del R.: Reinforcement learning of goal-directed obstacle-avoiding reaction strategies in an autonomous mobile robot. Robot. Auton. Syst. 15, 275–299 (1995)

    Google Scholar 

  35. Mohagheghi, S., Venayagamoorthy, G.K., Harley, R.G.: Adaptive critic design based neuro-fuzzy controller for a static compensator in a multimachine power system. IEEE Trans. Power Syst. 21, 1744–1754 (2006)

    Google Scholar 

  36. Ni, Z., He, H.: Heuristic dynamic programming with internal goal representation. Soft Comput. 17, 2101–2108 (2013)

    Google Scholar 

  37. Ni, Z., He, H., Wen, J., Xu, X.: Goal representation heuristic dynamic programming on maze navigation. IEEE Trans. Neural Netw. Learn. Syst. 24, 2038–2050 (2013)

    Google Scholar 

  38. Ni, Z., He, H., Zhao, D., Xu, X., Prokhorov, D.V.: Grdhp: A general utility function representation for dual heuristic dynamic programming. IEEE Trans. Neural Netw. Learn. Syst 26, 614–627 (2015)

    Google Scholar 

  39. Ng, A.Y., Kim, H.J., Jordan, M.I., Sastry, S.: Autonomous helicopter flight via reinforcement learning. Adv. Neural Inf. Process. Syst. 16 (2004)

    Google Scholar 

  40. Peters, J., Schaal, S.: Natural actor-critic. Neurocomputing 71, 1180–1190 (2008)

    Google Scholar 

  41. Powell, W.B.: Approximate Dynamic Programming: Solving the Curses of Dimensionality. Princeton, Willey-Interscience (2007)

    Book  MATH  Google Scholar 

  42. Prokhorov, D., Wunch, D.: Adaptive critic designs. IEEE Trans. Neural Netw. 8, 997–1007 (1997)

    Article  Google Scholar 

  43. Rutkowski, L.: Computational Intelligence - Methods and Techniques (in Polish). Polish Scientific Publishers PWN, Warsaw (2005)

    Google Scholar 

  44. Si, J., Barto, A.G., Powell, W.B., Wunsch, D.: Handbook of Learning and Approximate Dynamic Programming. IEEE Press, Wiley-Interscience, Hoboken (2004)

    Book  Google Scholar 

  45. Shannon, T., Lendaris, G.: A new hybrid critic–training method for approximate dynamic programming. In: Proceedings of International Society for the System Sciences (2000)

    Google Scholar 

  46. Szuster, M., Hendzel, Z., Burghardt, A.: Fuzzy sensor-based navigation with neural tracking control of the wheeled mobile robot. Lect. Notes Artif. Int. 8468, 302–313 (2014)

    MATH  Google Scholar 

  47. Szuster, M., Hendzel, Z.: Discrete globalised dual heuristic dynamic programming in control of the two-wheeled mobile robot. Math. Probl. Eng. 2014, 1–16 (2014)

    Article  Google Scholar 

  48. Szuster, M., Gierlak, P.: Approximate dynamic programming in tracking control of a robotic manipulator. Int. J. Adv. Robot. Syst. 13, 1–18 (2016)

    Article  Google Scholar 

  49. Szuster, M., Gierlak, P.: Globalised dual heuristic dynamic programming in control of robotic manipulator. AMM 817, 150–161 (2016)

    Article  Google Scholar 

  50. Szuster, M.: Globalised dual heuristic dynamic programming in tracking control of the wheeled mobile robot. Lect. Notes Artif. Int. 8468, 290–301 (2014)

    Google Scholar 

  51. Syam, R., Watanabe, K., Izumi, K.: Adaptive actor-critic learning for the control of mobile robots by applying predictive models. Soft. Comput. 9, 835–845 (2005)

    Article  MATH  Google Scholar 

  52. Syam, R., Watanabe, K., Izumi, K., Kiguchi, K.: Control of nonholonomic mobile robot by an adaptive-critic method with simulated experience based value functions. In: Proceedings of the IEEE International Conference of Robotics and Automation, vol. 4, pp. 3960–3965 (2002)

    Google Scholar 

  53. Vamvoudakis, K.G., Lewis, F.L.: Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 46, 878–888 (2010)

    Google Scholar 

  54. Vamvoudakis, K.G., Lewis, F.L.: Multi-player non-zerosum games: online adaptive learning solution of coupled Hamilton-Jacobi equations. Automatica 47, 1556–1569 (2011)

    Google Scholar 

  55. Venayagamoorthy, G.K., Harley, R.G., Wunsch, D.C.: Comparison of heuristic dynamic programming and dual heuristic programming adaptive critics of a turbogenerator. IEEE Trans. Neural Netw. 13, 764–773 (2002)

    Article  Google Scholar 

  56. Venayagamoorthy, G.K., Wunsch, D.C., Harley, R.G.: Adaptive critic based neurocontroller for turbogenerators with global dual heuristic programming. In: Proceedings of the IEEE Power Engineering Society Winter Meeting, vol. 1, pp. 291–294 (2000)

    Google Scholar 

  57. Visnevski, N., Prokhorov, D.: Control of a nonlinear multivariable system with adaptive critic designs. In: Proceedings of Artificial Neural Networks in Engineering, vol. 6, pp. 559–565 (1996)

    Google Scholar 

  58. Vrabie, D., Lewis, F.: Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems. Neural Netw. 22, 237–246 (2009)

    Google Scholar 

  59. Wang, D., Liu, D., Wei, Q.: Finite-horizon neuro-optimal tracking control for a class of discrete-time nonlinear systems using adaptive dynamic programming approach. Neurocomputing 78, 14–22 (2012)

    Google Scholar 

  60. Wang, D., Liu D., Wei, Q., Zhao D., Jin, N.: Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming. Automatica 48, 1825–1832 (2012)

    Google Scholar 

  61. Wang, D., Liu, D., Zhao, D., Huang, Y., Zhang, D.: A neural network-based iterative GDHP approach for solving a class of nonlinear optimal control problems with control constraints. Meural Comput. Appl. 22, 219–227 (2013)

    Google Scholar 

  62. Wang, F.-Y., Zhang H., Liu D.: Adaptive dynamic programming: an introduction. IEEE Comput. Intell. Mag. 4, 39–47 (2009)

    Google Scholar 

  63. Xu, X., Hou, Z., Lian, C., He, H.: Online learning control using adaptive critic designs with sparse kernel machines. IEEE Trans. Neural Netw. Learn. Syst. 24, 762–775 (2013)

    Google Scholar 

  64. Xu, X., Wang, X., Hu, D.: Mobile robot path-tracking using an adaptive critic learning PD controller. Lect. Notes Comput. Sci. 3174, 25–34 (2004)

    Article  Google Scholar 

  65. Xu, X., Zuo, L., Huang, Z.: Reinforcement learning algorithms with function approximation: recent advances and applications. Inform. Sci. 261, 1–31 (2014)

    Google Scholar 

  66. Zhang, H., Cui, L., Zhang, X., Luo, Y.: Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method. IEEE Trans. Neural Netw. 22, 2226–2236 (2011)

    Google Scholar 

  67. Zelinsky, A., Gaskett, C., Wettergreen, D.: Q-learning in continous state and action spaces. In: Proceedings of Australian Joint Conference on Artificial Intelligence, pp. 417–428. Springer (1999)

    Google Scholar 

  68. Zhang, X., Zhang, H., Luo, Y.: Adaptive dynamic programming-based optimal control of unknown nonaffine nonlinear discrete-time systems with proof of convergence. Neurocomputing 91, 48–55 (2012)

    Google Scholar 

  69. Zhong, X., Ni, Z., He, H.: A theoretical foundation of goal representation heuristic dynamic programming. IEEE Trans. Neural Netw. Learn. Syst. PP, 1–13 (2105)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marcin Szuster .

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Szuster, M., Hendzel, Z. (2018). Adaptive Dynamic Programming - Discrete Version. In: Intelligent Optimal Adaptive Control for Mechatronic Systems. Studies in Systems, Decision and Control, vol 120. Springer, Cham. https://doi.org/10.1007/978-3-319-68826-8_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-68826-8_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-68824-4

  • Online ISBN: 978-3-319-68826-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics