Abstract
In this study, a reinforcement learning (RL) method called twin delayed deep deterministic policy gradient (TD3) is used to tune the parameters of the proportional-integral (PI) controller parameters for a nonlinear three-tank hybrid (TTH) system in a decentralized multi-agent manner. The proposed multi-agent reinforcement learning method trains the agents to be more intelligent and capable than single-agent reinforcement learning, which makes tasks simpler. The TTH has continuous and discrete dynamics and provides a benchmark for process traits such as variable dynamics, interactions, and nonlinear dynamics. The real-time TTH setup is subjected to a Pseudorandom Binary Signal (PRBS), and the obtained excitation data are used to validate the system's first principles model. The TD3 is relatively slow to explore the optimal policy without having initial or previous knowledge of the states. In this study, PI controller parameters tuned using internal model controller (IMC) tuning rules are used as an initial guess so that the exploration of the parameters from scratch is eliminated, significantly improving the training speed and convergence accuracy. The DDPG algorithm uses the exact initial guess of the parameters as the TD3 algorithm. It is also compared with the traditional PI controller methods such as IMC, SIMC, IIMC, and AMIGO. The results show that the RL-TD3-tuned PI controller performs better than the RL-DDPG-tuned PI controller and other traditional PI controller methods in terms of integral squared error and control effort.
Similar content being viewed by others
References
Spielberg SPK, Gopaluni RB, Loewen PD (2017) Deep reinforcement learning approaches for process control. In: 2017 6th international symposium advance control Ind process AdCONIP, pp 201–206. https://doi.org/10.1109/ADCONIP.2017.7983780
Nian R, Liu J, Huang B (2020) A review On reinforcement learning: introduction and applications in industrial process control. Comput Chem Eng 139:106886. https://doi.org/10.1016/j.compchemeng.2020.106886
Buşoniu L, de Bruin T, Tolić D et al (2018) Reinforcement learning for control: performance, stability, and deep approximators. Annu Rev Control 46:8–28. https://doi.org/10.1016/j.arcontrol.2018.09.005
Deisenroth MP (2011) A survey on policy search for robotics. Found Trends Robot 2:1–142. https://doi.org/10.1561/2300000021
Kober J, Bagnell JA, Peters J (2013) Reinforcement learning in robotics: a survey. Int J Rob Res 32:1238–1274. https://doi.org/10.1177/0278364913495721
Mnih V, Kavukcuoglu K, Silver D et al (2013) Playing atari with deep reinforcement. Learning 5:1–9
Mnih V, Kavukcuoglu K, Silver D et al (2015) Human-level control through deep reinforcement learning. Nature 518:529–533. https://doi.org/10.1038/nature14236
Silver D, Huang A, Maddison CJ et al (2016) Mastering the game of Go with deep neural networks and tree search. Nature 529:484–489. https://doi.org/10.1038/nature16961
Barrett TD, Clements WR, Foerster JN, Lvovsky AI (2020) Exploratory combinatorial optimization with reinforcement learning. AAAI Conf Artif Intell. https://doi.org/10.1609/aaai.v34i04.5723
Powell BKM, Machalek D, Quah T (2020) Real-time optimization using reinforcement learning. Comput Chem Eng 143:107077. https://doi.org/10.1016/j.compchemeng.2020.107077
He W, Gao H, Zhou C et al (2021) Reinforcement learning control of a flexible two-link manipulator: an experimental investigation. IEEE Trans Syst Man, Cybern Syst 51:7326–7336. https://doi.org/10.1109/TSMC.2020.2975232
Rizvi SAA, Lin Z (2018) Output feedback reinforcement Q-learning control for the discrete-time linear quadratic regulator problem. Annu Conf Decis Control CDC. https://doi.org/10.1109/CDC.2017.8263836
Lewis FL, Vrabie D (2009) Adaptive dynamic programming for feedback control. In: Proceedings of 2009 7th Asian Control conferences ASCC, pp 1402–1409
Botvinick M, Wang JX, Dabney W et al (2020) Deep Reinforcement Learning and Its Neuroscientific Implications. Neuron 107:603–616. https://doi.org/10.1016/j.neuron.2020.06.014
Zhong W, Wang M, Wei Q, Lu J (2022) A new neuro-optimal nonlinear tracking control method via integral reinforcement learning with applications to nuclear systems. Neurocomputing 483:361–369. https://doi.org/10.1016/j.neucom.2022.01.034
Silver D, Lever G, Heess N et al (2014) Deterministic policy gradient algorithms. Int Conf Mach Learn ICML 1:605–619
Lillicrap TP, Hunt JJ, Pritzel A, et al (2016) Continuous control with deep reinforcement learning. IN: 4th international conference learning represention ICLR
Xu J, Zhang H, Qiu J (2022) A deep deterministic policy gradient algorithm based on averaged state-action estimation. Comput Electr Eng 101:108015. https://doi.org/10.1016/j.compeleceng.2022.108015
Li B, Yang ZP, Chen DQ et al (2021) Maneuvering target tracking of UAV based on MN-DDPG and transfer learning. Def Technol 17:457–466. https://doi.org/10.1016/j.dt.2020.11.014
Luo S, Lin X, Zheng Z (2019) A novel CNN-DDPG based AI-trader: performance and roles in business operations. Transp Res Part E Logist Transp Rev 131:68–79. https://doi.org/10.1016/j.tre.2019.09.013
Liu Z, Liu Y, Xu H et al (2022) Dynamic economic dispatch of power system based on DDPG algorithm. Energy Rep 8:1122–1129. https://doi.org/10.1016/j.egyr.2022.02.231
Liu Y, Liang H, Xiao Y et al (2022) Logistics-involved service composition in a dynamic cloud manufacturing environment: a DDPG-based approach. Robot Comput Integr Manuf 76:102323. https://doi.org/10.1016/j.rcim.2022.102323
Pandian BJ, Noel MM (2018) Control of a bioreactor using a new partially supervised reinforcement learning algorithm. J Process Control 69:16–29. https://doi.org/10.1016/j.jprocont.2018.07.013
Ma Y, Zhu W, Benton MG, Romagnoli J (2019) Continuous control of a polymerization system with deep reinforcement learning. J Process Control 75:40–47. https://doi.org/10.1016/j.jprocont.2018.11.004
Pandian BJ, Noel MM (2018) Tracking control of a continuous stirred tank reactor using direct and tuned reinforcement learning based controllers. Chem Prod Process Model 13:1–10. https://doi.org/10.1515/cppm-2017-0040
Hariprasad K, Bhartiya S, Gudi RD (2012) A gap metric based multiple model approach for nonlinear switched systems. J Process Control 22:1743–1754. https://doi.org/10.1016/j.jprocont.2012.07.005
Kroll A, Schulte H (2014) Benchmark problems for nonlinear system identification and control using Soft Computing methods: need and overview. Appl Soft Comput J 25:496–513. https://doi.org/10.1016/j.asoc.2014.08.034
Decarlo RA, Branicky MS, Pettersson S, Lennartson B (2000) Perspectives and results on the stability and stabilizability of hybrid systems. Proc IEEE 88:1069–1082. https://doi.org/10.1109/5.871309
Branicky MS, Borkar VS, Mitter SK (1998) A unified framework for hybrid control: model and optimal control theory. IEEE Trans Automat Contr 43:31–45. https://doi.org/10.1109/9.654885
Sathishkumar K, Kirubakaran V, Radhakrishnan TK (2018) Real time modeling and control of three tank hybrid system. Chem Prod Process Model 13:1–10. https://doi.org/10.1515/cppm-2017-0016
Rammal R, Airimitoaie TB, Melchior P, Cazaurang F (2022) Nonlinear three-tank system fault detection and isolation using differential flatness. IFAC J Syst Control. https://doi.org/10.1016/j.ifacsc.2022.100197
Hosokawa A, Mitsuhashi Y, Satoh K, Yang Z (2022) Output feedback full-order sliding mode control for a three-tank system. ISA Trans. https://doi.org/10.1016/j.isatra.2022.06.038
Sarailoo M, Rahmani Z, Rezaie B (2015) A novel model predictive control scheme based on bees algorithm in a class of nonlinear systems: application to a three tank system. Neurocomputing 152:294–304. https://doi.org/10.1016/j.neucom.2014.10.066
Emebu S, Kubalčík M, Backi CJ, Janáčová D (2023) A comparative study of linear and nonlinear optimal control of a three-tank system. ISA Trans 132:419–427. https://doi.org/10.1016/j.isatra.2022.06.002
Anbumani K, Hemamalini RR (2020) Optimal state feedback controller for three tank cylindrical interacting system using Grey Wolf Algorithm. Microprocess Microsyst 79:103269. https://doi.org/10.1016/j.micpro.2020.103269
Yu S, Lu X, Zhou Y et al (2020) Liquid level tracking control of three-tank systems. Int J Control Autom Syst 18:2630–2640. https://doi.org/10.1007/s12555-018-0895-y
Kouadri A, Namoun A, Zelmat M, Aitouche MA (2013) A statistical-based approach for fault detection in a three tank system. Int J Syst Sci 44:1783–1792. https://doi.org/10.1080/00207721.2012.670292
Bahita M, Belarbi K (2018) Real-time application of a fuzzy adaptive control to one level in a three-tank system. Proc Inst Mech Eng Part I J Syst Control Eng 232:845–856. https://doi.org/10.1177/0959651818764205
Jendoubi I, Bouffard F (2023) Multi-agent hierarchical reinforcement learning for energy management. Appl Energy 332:120500. https://doi.org/10.1016/j.apenergy.2022.120500
Bequette BW (2002) Master process control
Dahlin EB (1968) Designing and tuning digital controllers. Instrum Control Syst 8:77–84
Skogestad S (2004) Simple analytic rules for model reduction and PID controller tuning. Model Identif Control 25:85–120. https://doi.org/10.4173/mic.2004.2.2
Astrom KJ, HÄgglund T (2006) Advanced PID control
Morales EF, Zaragoza JH (2011) An introduction to reinforcement learning. Decis Theory Model Appl Artif Intell Concepts Solut. https://doi.org/10.4018/978-1-60960-165-2.ch004
Fujimoto S, Van Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. Int Conf Mach Learn ICML 4:2587–2601
Seborg DE, Edgar TF, Mellichamp DA and Doyle III FJ (2016) Process dynamics and control, 4th edn. John Wiley & Sons
Funding
No external funding was received in association with this research.
Author information
Authors and Affiliations
Contributions
N.R. contributed to conceptualization, data curation, formal analysis, investigation, methodology, validation, visualization, software, and writing—original draft. T.K.R. contributed to conceptualization, formal analysis, investigation, methodology, project administration, resources, software, supervision, validation, visualization, writing—original draft, and writing—review and editing. N.S. contributed to conceptualization, formal analysis, investigation, methodology, project administration, supervision, validation, visualization, and writing—review and editing.
Corresponding author
Ethics declarations
Conflict of interest
All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or nonfinancial interest in the subject matter or materials discussed in this manuscript.
Appendices
Appendix
Appendix-A: Technical data of three-tank hybrid system
S. no | Component | Specifications |
---|---|---|
Level Transmitter (LT) | Built-in sensor: piezoelectric with \(\mathrm{\mu C}\) based Input: (0–6500) mm H2O Output: 4–20 mA | |
2 | Rotameter | Range: (44–440) L/Hour |
3 | Pneumatic control valve | Range: (500/1000) L/Hour Characteristics: Equal % Valve action: air to open |
4 | Electro-pneumatic converter | Input pneumatic signal: 20 psi constant Input current signal: (4–20) mA Signal: (3–15) psi |
5 | Solenoidal valve | Type: magnet Current rating: 100 mA Line connection:\(\frac{1}{2}^{"}\mathrm{BSP }\left(\mathrm{F}\right)\mathrm{ Thread}\) |
6 | Air regulator | Input: 10.6 kg/cm2 Output:2.1 kg/cm2 Special feature: air regulator with filter |
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Rajasekhar , N., Radhakrishnan, T.K. & Samsudeen, N. Decentralized multi-agent control of a three-tank hybrid system based on twin delayed deep deterministic policy gradient reinforcement learning algorithm. Int. J. Dynam. Control 12, 1098–1115 (2024). https://doi.org/10.1007/s40435-023-01227-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40435-023-01227-0