Skip to main content
Log in

Decentralized multi-agent control of a three-tank hybrid system based on twin delayed deep deterministic policy gradient reinforcement learning algorithm

  • Published:
International Journal of Dynamics and Control Aims and scope Submit manuscript

Abstract

In this study, a reinforcement learning (RL) method called twin delayed deep deterministic policy gradient (TD3) is used to tune the parameters of the proportional-integral (PI) controller parameters for a nonlinear three-tank hybrid (TTH) system in a decentralized multi-agent manner. The proposed multi-agent reinforcement learning method trains the agents to be more intelligent and capable than single-agent reinforcement learning, which makes tasks simpler. The TTH has continuous and discrete dynamics and provides a benchmark for process traits such as variable dynamics, interactions, and nonlinear dynamics. The real-time TTH setup is subjected to a Pseudorandom Binary Signal (PRBS), and the obtained excitation data are used to validate the system's first principles model. The TD3 is relatively slow to explore the optimal policy without having initial or previous knowledge of the states. In this study, PI controller parameters tuned using internal model controller (IMC) tuning rules are used as an initial guess so that the exploration of the parameters from scratch is eliminated, significantly improving the training speed and convergence accuracy. The DDPG algorithm uses the exact initial guess of the parameters as the TD3 algorithm. It is also compared with the traditional PI controller methods such as IMC, SIMC, IIMC, and AMIGO. The results show that the RL-TD3-tuned PI controller performs better than the RL-DDPG-tuned PI controller and other traditional PI controller methods in terms of integral squared error and control effort.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Spielberg SPK, Gopaluni RB, Loewen PD (2017) Deep reinforcement learning approaches for process control. In: 2017 6th international symposium advance control Ind process AdCONIP, pp 201–206. https://doi.org/10.1109/ADCONIP.2017.7983780

  2. Nian R, Liu J, Huang B (2020) A review On reinforcement learning: introduction and applications in industrial process control. Comput Chem Eng 139:106886. https://doi.org/10.1016/j.compchemeng.2020.106886

    Article  Google Scholar 

  3. Buşoniu L, de Bruin T, Tolić D et al (2018) Reinforcement learning for control: performance, stability, and deep approximators. Annu Rev Control 46:8–28. https://doi.org/10.1016/j.arcontrol.2018.09.005

    Article  MathSciNet  Google Scholar 

  4. Deisenroth MP (2011) A survey on policy search for robotics. Found Trends Robot 2:1–142. https://doi.org/10.1561/2300000021

    Article  Google Scholar 

  5. Kober J, Bagnell JA, Peters J (2013) Reinforcement learning in robotics: a survey. Int J Rob Res 32:1238–1274. https://doi.org/10.1177/0278364913495721

    Article  Google Scholar 

  6. Mnih V, Kavukcuoglu K, Silver D et al (2013) Playing atari with deep reinforcement. Learning 5:1–9

    Google Scholar 

  7. Mnih V, Kavukcuoglu K, Silver D et al (2015) Human-level control through deep reinforcement learning. Nature 518:529–533. https://doi.org/10.1038/nature14236

    Article  Google Scholar 

  8. Silver D, Huang A, Maddison CJ et al (2016) Mastering the game of Go with deep neural networks and tree search. Nature 529:484–489. https://doi.org/10.1038/nature16961

    Article  Google Scholar 

  9. Barrett TD, Clements WR, Foerster JN, Lvovsky AI (2020) Exploratory combinatorial optimization with reinforcement learning. AAAI Conf Artif Intell. https://doi.org/10.1609/aaai.v34i04.5723

    Article  Google Scholar 

  10. Powell BKM, Machalek D, Quah T (2020) Real-time optimization using reinforcement learning. Comput Chem Eng 143:107077. https://doi.org/10.1016/j.compchemeng.2020.107077

    Article  Google Scholar 

  11. He W, Gao H, Zhou C et al (2021) Reinforcement learning control of a flexible two-link manipulator: an experimental investigation. IEEE Trans Syst Man, Cybern Syst 51:7326–7336. https://doi.org/10.1109/TSMC.2020.2975232

    Article  Google Scholar 

  12. Rizvi SAA, Lin Z (2018) Output feedback reinforcement Q-learning control for the discrete-time linear quadratic regulator problem. Annu Conf Decis Control CDC. https://doi.org/10.1109/CDC.2017.8263836

    Article  Google Scholar 

  13. Lewis FL, Vrabie D (2009) Adaptive dynamic programming for feedback control. In: Proceedings of 2009 7th Asian Control conferences ASCC, pp 1402–1409

  14. Botvinick M, Wang JX, Dabney W et al (2020) Deep Reinforcement Learning and Its Neuroscientific Implications. Neuron 107:603–616. https://doi.org/10.1016/j.neuron.2020.06.014

    Article  Google Scholar 

  15. Zhong W, Wang M, Wei Q, Lu J (2022) A new neuro-optimal nonlinear tracking control method via integral reinforcement learning with applications to nuclear systems. Neurocomputing 483:361–369. https://doi.org/10.1016/j.neucom.2022.01.034

    Article  Google Scholar 

  16. Silver D, Lever G, Heess N et al (2014) Deterministic policy gradient algorithms. Int Conf Mach Learn ICML 1:605–619

    Google Scholar 

  17. Lillicrap TP, Hunt JJ, Pritzel A, et al (2016) Continuous control with deep reinforcement learning. IN: 4th international conference learning represention ICLR

  18. Xu J, Zhang H, Qiu J (2022) A deep deterministic policy gradient algorithm based on averaged state-action estimation. Comput Electr Eng 101:108015. https://doi.org/10.1016/j.compeleceng.2022.108015

    Article  Google Scholar 

  19. Li B, Yang ZP, Chen DQ et al (2021) Maneuvering target tracking of UAV based on MN-DDPG and transfer learning. Def Technol 17:457–466. https://doi.org/10.1016/j.dt.2020.11.014

    Article  Google Scholar 

  20. Luo S, Lin X, Zheng Z (2019) A novel CNN-DDPG based AI-trader: performance and roles in business operations. Transp Res Part E Logist Transp Rev 131:68–79. https://doi.org/10.1016/j.tre.2019.09.013

    Article  Google Scholar 

  21. Liu Z, Liu Y, Xu H et al (2022) Dynamic economic dispatch of power system based on DDPG algorithm. Energy Rep 8:1122–1129. https://doi.org/10.1016/j.egyr.2022.02.231

    Article  Google Scholar 

  22. Liu Y, Liang H, Xiao Y et al (2022) Logistics-involved service composition in a dynamic cloud manufacturing environment: a DDPG-based approach. Robot Comput Integr Manuf 76:102323. https://doi.org/10.1016/j.rcim.2022.102323

    Article  Google Scholar 

  23. Pandian BJ, Noel MM (2018) Control of a bioreactor using a new partially supervised reinforcement learning algorithm. J Process Control 69:16–29. https://doi.org/10.1016/j.jprocont.2018.07.013

    Article  Google Scholar 

  24. Ma Y, Zhu W, Benton MG, Romagnoli J (2019) Continuous control of a polymerization system with deep reinforcement learning. J Process Control 75:40–47. https://doi.org/10.1016/j.jprocont.2018.11.004

    Article  Google Scholar 

  25. Pandian BJ, Noel MM (2018) Tracking control of a continuous stirred tank reactor using direct and tuned reinforcement learning based controllers. Chem Prod Process Model 13:1–10. https://doi.org/10.1515/cppm-2017-0040

    Article  Google Scholar 

  26. Hariprasad K, Bhartiya S, Gudi RD (2012) A gap metric based multiple model approach for nonlinear switched systems. J Process Control 22:1743–1754. https://doi.org/10.1016/j.jprocont.2012.07.005

    Article  Google Scholar 

  27. Kroll A, Schulte H (2014) Benchmark problems for nonlinear system identification and control using Soft Computing methods: need and overview. Appl Soft Comput J 25:496–513. https://doi.org/10.1016/j.asoc.2014.08.034

    Article  Google Scholar 

  28. Decarlo RA, Branicky MS, Pettersson S, Lennartson B (2000) Perspectives and results on the stability and stabilizability of hybrid systems. Proc IEEE 88:1069–1082. https://doi.org/10.1109/5.871309

    Article  Google Scholar 

  29. Branicky MS, Borkar VS, Mitter SK (1998) A unified framework for hybrid control: model and optimal control theory. IEEE Trans Automat Contr 43:31–45. https://doi.org/10.1109/9.654885

    Article  MathSciNet  Google Scholar 

  30. Sathishkumar K, Kirubakaran V, Radhakrishnan TK (2018) Real time modeling and control of three tank hybrid system. Chem Prod Process Model 13:1–10. https://doi.org/10.1515/cppm-2017-0016

    Article  Google Scholar 

  31. Rammal R, Airimitoaie TB, Melchior P, Cazaurang F (2022) Nonlinear three-tank system fault detection and isolation using differential flatness. IFAC J Syst Control. https://doi.org/10.1016/j.ifacsc.2022.100197

    Article  MathSciNet  Google Scholar 

  32. Hosokawa A, Mitsuhashi Y, Satoh K, Yang Z (2022) Output feedback full-order sliding mode control for a three-tank system. ISA Trans. https://doi.org/10.1016/j.isatra.2022.06.038

    Article  Google Scholar 

  33. Sarailoo M, Rahmani Z, Rezaie B (2015) A novel model predictive control scheme based on bees algorithm in a class of nonlinear systems: application to a three tank system. Neurocomputing 152:294–304. https://doi.org/10.1016/j.neucom.2014.10.066

    Article  Google Scholar 

  34. Emebu S, Kubalčík M, Backi CJ, Janáčová D (2023) A comparative study of linear and nonlinear optimal control of a three-tank system. ISA Trans 132:419–427. https://doi.org/10.1016/j.isatra.2022.06.002

    Article  Google Scholar 

  35. Anbumani K, Hemamalini RR (2020) Optimal state feedback controller for three tank cylindrical interacting system using Grey Wolf Algorithm. Microprocess Microsyst 79:103269. https://doi.org/10.1016/j.micpro.2020.103269

    Article  Google Scholar 

  36. Yu S, Lu X, Zhou Y et al (2020) Liquid level tracking control of three-tank systems. Int J Control Autom Syst 18:2630–2640. https://doi.org/10.1007/s12555-018-0895-y

    Article  Google Scholar 

  37. Kouadri A, Namoun A, Zelmat M, Aitouche MA (2013) A statistical-based approach for fault detection in a three tank system. Int J Syst Sci 44:1783–1792. https://doi.org/10.1080/00207721.2012.670292

    Article  Google Scholar 

  38. Bahita M, Belarbi K (2018) Real-time application of a fuzzy adaptive control to one level in a three-tank system. Proc Inst Mech Eng Part I J Syst Control Eng 232:845–856. https://doi.org/10.1177/0959651818764205

    Article  Google Scholar 

  39. Jendoubi I, Bouffard F (2023) Multi-agent hierarchical reinforcement learning for energy management. Appl Energy 332:120500. https://doi.org/10.1016/j.apenergy.2022.120500

    Article  Google Scholar 

  40. Bequette BW (2002) Master process control

  41. Dahlin EB (1968) Designing and tuning digital controllers. Instrum Control Syst 8:77–84

    Google Scholar 

  42. Skogestad S (2004) Simple analytic rules for model reduction and PID controller tuning. Model Identif Control 25:85–120. https://doi.org/10.4173/mic.2004.2.2

    Article  MathSciNet  Google Scholar 

  43. Astrom KJ, HÄgglund T (2006) Advanced PID control

  44. Morales EF, Zaragoza JH (2011) An introduction to reinforcement learning. Decis Theory Model Appl Artif Intell Concepts Solut. https://doi.org/10.4018/978-1-60960-165-2.ch004

    Article  Google Scholar 

  45. Fujimoto S, Van Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. Int Conf Mach Learn ICML 4:2587–2601

    Google Scholar 

  46. Seborg DE, Edgar TF, Mellichamp DA and Doyle III FJ (2016) Process dynamics and control, 4th edn. John Wiley & Sons

    Google Scholar 

Download references

Funding

No external funding was received in association with this research.

Author information

Authors and Affiliations

Authors

Contributions

N.R. contributed to conceptualization, data curation, formal analysis, investigation, methodology, validation, visualization, software, and writing—original draft. T.K.R. contributed to conceptualization, formal analysis, investigation, methodology, project administration, resources, software, supervision, validation, visualization, writing—original draft, and writing—review and editing. N.S. contributed to conceptualization, formal analysis, investigation, methodology, project administration, supervision, validation, visualization, and writing—review and editing.

Corresponding author

Correspondence to T. K. Radhakrishnan.

Ethics declarations

Conflict of interest

All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or nonfinancial interest in the subject matter or materials discussed in this manuscript.

Appendices

Appendix

Appendix-A: Technical data of three-tank hybrid system

S. no

Component

Specifications

 

Level Transmitter (LT)

Built-in sensor: piezoelectric with \(\mathrm{\mu C}\) based

Input: (0–6500) mm H2O

Output: 4–20 mA

2

Rotameter

Range: (44–440) L/Hour

3

Pneumatic control valve

Range: (500/1000) L/Hour

Characteristics: Equal %

Valve action: air to open

4

Electro-pneumatic converter

Input pneumatic signal: 20 psi constant

Input current signal: (4–20) mA

Signal: (3–15) psi

5

Solenoidal valve

Type: magnet

Current rating: 100 mA

Line connection:\(\frac{1}{2}^{"}\mathrm{BSP }\left(\mathrm{F}\right)\mathrm{ Thread}\)

6

Air regulator

Input: 10.6 kg/cm2

Output:2.1 kg/cm2

Special feature: air regulator with filter

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rajasekhar , N., Radhakrishnan, T.K. & Samsudeen, N. Decentralized multi-agent control of a three-tank hybrid system based on twin delayed deep deterministic policy gradient reinforcement learning algorithm. Int. J. Dynam. Control 12, 1098–1115 (2024). https://doi.org/10.1007/s40435-023-01227-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40435-023-01227-0

Keywords

Navigation