Multiple Actor-Critic Optimal Control via ADP

Song, Ruizhuo; Wei, Qinglai; Li, Qing

doi:10.1007/978-981-13-1712-5_5

Multiple Actor-Critic Optimal Control via ADP

Ruizhuo Song⁵,
Qinglai Wei⁶ &
Qing Li⁵

Chapter
First Online: 29 December 2018

646 Accesses

Part of the book series: Studies in Systems, Decision and Control ((SSDC,volume 166))

Abstract

In industrial process control, there may be multiple performance objectives, depending on salient features of the input-output data. Aiming at this situation, this chapter proposes multiple actor-critic structures to obtain the optimal control via input-output data for unknown nonlinear systems. The shunting inhibitory artificial neural network (SIANN) is used to classify the input-output data into one of several categories. Different performance measure functions may be defined for disparate categories. The ADP algorithm, which contains model module, critic network and action network, is used to establish the optimal control in each category. A recurrent neural network (RNN) model is used to reconstruct the unknown system dynamics using input-output data. Neural networks are used to approximate the critic and action networks, respectively. It is proven that the model error and the closed unknown system are uniformly ultimately bounded (UUB). Simulation results demonstrate the performance of the proposed optimal control scheme for the unknown nonlinear system.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Levine, D., Ramirez Jr., P.: An attentional theory of emotional influences on risky decisions. Prog. Brain Res. 202(2), 369–388 (2013)
Google Scholar
Levine, D., Mills, B., Estrada, S.: Modeling emotional influences on human decision making under risk. In: Proceedings of International Joint Conference on Neural Networks, pp. 1657–1662 (2005)
Google Scholar
Werbos, P.: Intelligence in the brain: a theory of how it works and how to build it. Neural Netw. 22, 200–212 (2009)
Article Google Scholar
Werbos. P.: Stable adaptive control using new critic designs. In: Proceedings of Adaptation, Noise, and Self-Organizing Systems (1998)
Google Scholar
Narendra, K., Balakrishnan, J.: Adaptive control using multiple models. IEEE Trans. Autom. control 42(2), 171–187 (1997)
Article MathSciNet Google Scholar
Sugimoto, N., Morimoto, J., Hyon, S., Kawato, M.: The eMOSAIC model for humanoid robot control. Neural Netw. 29–30, 8–19 (2012)
Article Google Scholar
Doya, K.: What are the computations of the cerebellum, the basal ganglia and the cerebral cortex? Neural Netw. 12(7–8), 961–974 (1999)
Article Google Scholar
Hikosaka, O., Nakahara, H., Rand, M., Sakai, K., Lu, X., Nakamura, K., Miyachi, S., Doya, K.: Parallel neural networks for learning sequential procedures. Trends Neurosci. 22(10), 464–471 (1999)
Article Google Scholar
Lee, J., Lee, J.: Approximate dynamic programming-based approaches for input-output data-driven control of nonlinear processes. Automatica 41(7), 1281–1288 (2005)
Article MathSciNet Google Scholar
Song, R., Xiao, W., Zhang, H.: Multi-objective optimal control for a class of unknown nonlinear systems based on finite- approximation-error ADP algorithm. Neurocomputing 119(7), 212–221 (2013)
Article Google Scholar
Li, H., Liu, D., Wang, D.: Integral reinforcement learning for linear continuous-time zero-sum games with completely unknown dynamics. IEEE Trans. Autom. Sci. Eng. 11(3), 706–714 (2014)
Article Google Scholar
Yang, X., Liu, D., Huang, Y.: Neural-network-based online optimal control for uncertain nonlinear continuous-time systems with control constraints. IET Control Theory Appl. 7(17), 2037–2047 (2013)
Article MathSciNet Google Scholar
Lewis, F., Vamvoudakis, K.: Reinforcement learning for partially observable dynamic processes: adaptive dynamic programming using measured output data. IEEE Trans. Syst. Man Cybern. Part B: Cybern. 41(1), 14–25 (2011)
Article Google Scholar
Li, Z., Duan, Z., Lewis, F.: Distributed robust consensus control of multi-agent systems with heterogeneous matching uncertainties. Automatica 50(3), 883–889 (2014)
Article MathSciNet Google Scholar
Modares, H., Lewis, F., Naghibi-Sistani, M.: Integral reinforcement learning and experience replay for adaptive optimal control of partially unknown constrained-input continuous-time systems. Automatica 50(1), 193–202 (2014)
Article MathSciNet Google Scholar
Zhang, H., Lewis, F.: Adaptive cooperative tracking control of higher-order nonlinear systems with unknown dynamics. Automatica 48(7), 1432–1439 (2012)
Article MathSciNet Google Scholar
Wei, Q., Liu, D.: Adaptive dynamic programming for optimal tracking control of unknown nonlinear systems with application to coal gasification. IEEE Trans. Autom. Sci. Eng. 11(4), 1020–1036 (2014)
Article Google Scholar
Doya, K., Samejima, K., Katagiri, K., Kawato, M.: Multiple model-based reinforcement learning. Neural Comput. 14, 1347–1369 (2002)
Article Google Scholar
Levine, D.: Neural dynamics of affect, gist, probability, and choice. Cogn. Syst. Res. 15–16, 57–72 (2012)
Article Google Scholar
Werbos, P.: Using ADP to understand and replicate brain intelligence: the next level design. IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, pp. 209–216 (2007)
Google Scholar
Arulampalam, G., Bouzerdoum, A.: A generalized feedforward neural network architecture for classification and regression. Neural Netw. 16, 561–568 (2003)
Article Google Scholar
Bouzerdoum, A.: Classification and function approximation using feedforward shunting inhibitory artificial neural networks. In: Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks, vol. 6, pp. 613–618 (2000)
Google Scholar
Tivive, F., Bouzerdoum, A.: Efficient training algorithms for a class of shunting inhibitory convolutional neural networks. IEEE Trans. Neural Netw. 16(3), 541–556 (2005)
Article Google Scholar
Song, R., Lewis, F., Wei, Q., Zhang, H.: Off-policy actor-critic structure for optimal control of unknown systems with disturbances. IEEE Trans. Cybern. 46(5), 1041–1050 (2016)
Article Google Scholar
Hornik, K., Stinchcombe, M., White, H., Auer, P.: Degree of approximation results for feedforward networks approximating unknown mappings and their derivatives. Neural Comput. 6(6), 1262–1275 (1994)
Article Google Scholar
Zhang, H., Cui, L., Zhang, X., Luo, Y.: Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method. IEEE Trans. Neural Netw. 22(12), 2226–2236 (2011)
Article Google Scholar
Kim, Y., Lewis, F.: Neural network output feedback control of robot manipulators. IEEE Trans. Robot. Autom. 15(2), 301–309 (1999)
Article Google Scholar
Khalil, H.: Nonlinear System. Prentice-Hall, NJ (2002)
Google Scholar
Lewis, F., Jagannathan, S., Yesildirek, A.: Neural Network Control of Robot Manipulators and Nonlinear Systems. Taylor and Francis, London (1999)
Google Scholar
Song, R., Xiao, W., Zhang, H., Sun, C.: Adaptive dynamic programming for a class of complex-valued nonlinear systems. IEEE Trans. Neural Netw. Learn. Syst. 25(9), 1733–1739 (2014)
Article Google Scholar
Yang, C., Li, Z., Li, J.: Trajectory planning and optimized adaptive control for a class of wheeled inverted pendulum vehicle models. IEEE Trans. Cybern. 43(1), 24–36 (2013)
Article Google Scholar
Yang, C., Li, Z., Cui, R., Xu, B.: Neural network-based motion control of an underactuated wheeled inverted pendulum model. IEEE Trans. Neural Netw. Learn. Syst. 25(1), 2004–2016 (2014)
Article Google Scholar
Beard, R.: Improving the Closed-Loop Performance of Nonlinear Systems, Ph.D. thesis, Rensselaer Polytechnic Institute, Troy, NY (1995)
Google Scholar
Abu-Khalaf, M., Lewis, F.: Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica 41, 779–791 (2005)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

University of Science and Technology Beijing, Beijing, China
Ruizhuo Song & Qing Li
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Qinglai Wei

Authors

Ruizhuo Song
View author publications
You can also search for this author in PubMed Google Scholar
Qinglai Wei
View author publications
You can also search for this author in PubMed Google Scholar
Qing Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ruizhuo Song .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Song, R., Wei, Q., Li, Q. (2019). Multiple Actor-Critic Optimal Control via ADP. In: Adaptive Dynamic Programming: Single and Multiple Controllers. Studies in Systems, Decision and Control, vol 166. Springer, Singapore. https://doi.org/10.1007/978-981-13-1712-5_5

Download citation

DOI: https://doi.org/10.1007/978-981-13-1712-5_5
Published: 29 December 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1711-8
Online ISBN: 978-981-13-1712-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics