Skip to main content
Log in

Reinforcement Learning for Input Constrained Sub-optimal Tracking Control in Discrete-time Two-time-scale Systems

  • Regular Papers
  • Intelligent Control and Applications
  • Published:
International Journal of Control, Automation and Systems Aims and scope Submit manuscript

Abstract

Two-time-scale (TTS) systems were proposed to describe accurately complex systems that include multiple variables running on two-time scales. Different response speeds of variables and incomplete model information affect tracking performance of TTS systems. For tracking control of unknown model, practicability of reinforcement learning (RL) has been subject to criticism, as the method requires stable initial policy. Based on singular perturbation theory (SPT), a composite sub-optimal tracking policy is investigated combining model information with measured data. Besides, a selection criterion of initial stabilizing policy is presented by considering the policy as an input constraint. The proposed method integrating RL technique with convex optimization improves the tracking performance and practicability effectively. Finally, an emulation experiment in F-8 aircraft is given to demonstrate the validity of the developed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. A. Raza, F. M. Malik, N. Mazhar, and R. Khan, “Two-time-scale robust output feedback control for aircraft longitudinal dynamics via sliding mode control and high-gain observer,” Alexandria Engineering Journal, vol. 61, no. 6, pp. 4573–4583, October 2022.

    Article  Google Scholar 

  2. N. Daroogheh, N. Meskin, and K. Khorasani, “Ensemble kalman filters for state estimation and prediction of two-time scale nonlinear systems with application to gas turbine engines,” IEEE Transactions on Control Systems Technology, vol. 27, no. 6, pp. 2565–2573, September 2019.

    Article  Google Scholar 

  3. J. Yang, P. Si, Z. Wang, X. Jiang, and L. Hanzo, “Dynamic resource allocation and layer selection for scalable video streaming in femtocell networks: A twin-time-scale approach,” IEEE Transactions on Communications, vol. 66, no. 8, pp. 3455–3470, August 2018.

    Article  Google Scholar 

  4. J. Kim, U. Jon, and H. Lee. “State-constrained suboptimal tracking controller for continuous-time linear timeinvariant (CT-LTI) systems and its application for DC motor servo systems,” Applied Sciences, vol. 10, no. 16, pp. 5724–5741, August 2020.

    Article  Google Scholar 

  5. G. B. Avanzini, A. Zanhettin, and P. Rocco, “Constrained model predictive control for mobile robotic manipulators,” Robotica, vol. 36, no. 1, pp. 19–38, April 2018.

    Article  Google Scholar 

  6. V. R. Saksena, J. Oreilly, and P. V. Kokotovic, “Singular perturbations and time-scale methods in control theory: Survey 1976–1983,” Automatica, vol. 20, no. 3, pp. 273–293, May 1984.

    Article  MathSciNet  MATH  Google Scholar 

  7. V. Dragan. “On the linear quadratic optimal control for systems described by singularly perturbed it differential equations with two fast time scales,” Axioms, vol. 8, no. 1, pp. 1–30, March 2019.

    Article  Google Scholar 

  8. W. Chen, Y. Liu, and W. X. Zheng, “Synchronization analysis of two-time-scale nonlinear complex networks with time-scale-dependent coupling,” IEEE Transactions on Cybernetics, vol. 49, no. 9, pp. 3255–3267, September 2019.

    Article  Google Scholar 

  9. W. Xue, J. Fan, V. G. Lopez, J. Li, Y. Jiang, T. Chai, and F. L. Lewis, “New Methods for Optimal Operational Control of Industrial Processes Using Reinforcement Learning on Two Time Scales,” IEEE Transactions on Industrial Informatics, vol. 16, no. 5, pp. 3085–3099, May 2020.

    Article  Google Scholar 

  10. W. Xue, J. Fan, V. G. Lopez, Y. Jiang, T. Chai, and F. L. Lewis, “Off-Policy Reinforcement Learning for Tracking in Continuous-Time Systems on Two Time Scales,” IEEE Transactions on Neural Networks and Learning System, vol. 32, no. 10, pp. 4334–4346, October 2021.

    Article  MathSciNet  Google Scholar 

  11. R. Sutton, A. Barto, Reinforcement Learning - An Introduction, MIT Press, Cambridge, 1998.

    Book  MATH  Google Scholar 

  12. X. Wu and C. Wang, “Model-free optimal tracking control for an aircraft skin inspection robot with constrained-input and input time-delay via integral reinforcement learning,” International Journal of Control, Automation, and Systems, vol. 18, pp. 245–257, January 2020.

    Article  Google Scholar 

  13. Y. Yang, K. G. Vamvoudakis, H. Modares, Y. Yin, and D. C. Wunsch, “Hamiltonian-driven hybrid adaptive dynamic programming,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 51, no. 10, pp. 6423–6434, October 2021.

    Article  Google Scholar 

  14. T. Lindner, A. Milecki, and D. Wyrwa, “Positioning of the robotic arm using different reinforcement learning algorithms,” International Journal of Control, Automation, and Systems, vol. 19, pp. 1661–1676, April 2021.

    Article  Google Scholar 

  15. V. Vu, Q. Tran, T. Pham, and P. N. Dao, “Online actor-critic reinforcement learning control for uncertain surface vessel systems with external disturbances,” International Journal of Control, Automation, and Systems, vol. 20, pp. 1029–1040, March 2022.

    Article  Google Scholar 

  16. Y. Peng, Q. Chen, and W. Sun, “Reinforcement Q-learning algorithm for H infinite tracking control of unknown discrete-time linear systems,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 50, no. 11, pp. 4109–4122, November 2020.

    Article  Google Scholar 

  17. L. Zhou, J. Zhao, L. Ma, and C. Yang, “Decentralized composite suboptimal control for a class of two-time-scale interconnected networks with unknown slow dynamics,” Neurocomputing, vol. 383, no. 21, pp. 71–79, March 2020.

    Article  Google Scholar 

  18. M. Sayak, B. He, and C. Aranya, “Reduced-dimensional reinforcement learning control using singular perturbation approximations,” Automatica, vol. 126, no. 21, pp. 1–11, April 2021.

    MathSciNet  MATH  Google Scholar 

  19. K. Bahare, F. L. Lewis, M. Hamidreza, A. Karimpour, and M.-B. Naghibi-Sistani, “Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics,” Automatica, vol. 50, no. 4, pp. 1167–1175, April 2014.

    Article  MathSciNet  MATH  Google Scholar 

  20. Y. Jiang, J. Fan, T. Chai, F. L. Lewis, and J. Li, “Tracking control for linear discrete-time networked control systems with unknown dynamics and dropout,” IEEE Transactions on Neural Networks and Learning Systems, vol. 29, no. 10, pp. 4607–4620, October. 2018.

    Article  MathSciNet  Google Scholar 

  21. S. A. A. Rizvi, A. J. Pertzborn, and Z. Lin, “Reinforcement learning based optimal tracking control under unmeasurable disturbances with application to HVAC systems,” IEEE Transactions on Neural Networks and Learning Systems, vol. 15, no. 4, pp. 1–11, June 2021.

    Google Scholar 

  22. X. F. Li, L. Xue, and C. Y. Sun, “Linear quadratic tracking control of unknown discrete-time systems using value iteration algorithm,” Neurocomputing, vol. 314, no. 7, pp. 86–93, November 2018.

    Article  Google Scholar 

  23. Y. Jiang, J. Fan, T. Chai, and F. L. Lewis, “Dual-rate operational optimal control for flotation industrial process with unknown operational model,” IEEE Transactions on Industrial Electronics, vol. 66, no. 6, pp. 4587–4599, June 2019.

    Article  Google Scholar 

  24. J. Li, B. Kiumarsi, T. Chai, F. L. Lewis, and J. Fan, “Off-policy reinforcement learning: optimal operational control for two-time-scale industrial processes,” IEEE Transactions on Cybernetics, vol. 47, no. 12, pp. 4547–4558, December 2017.

    Article  Google Scholar 

  25. G. Gu, Discrete-time Linear Systems: Theory and Design with Applications, Springer, New York, NY, USA, 2012.

    Book  MATH  Google Scholar 

  26. P. Kokotovic, H. K. Khalil, and J. Oreilly, Singular Perturbation Methods in Control: Analysis and Design, Society for Industrial and Mathematics, Philadelphia, PA, 1999.

  27. V. Mayuresh, “Robust constrained model predictive control using linear matrix inequalities,” Automatica, vol. 32, no. 10, pp. 1361–1379, February 1996.

    Article  MathSciNet  MATH  Google Scholar 

  28. K. R. Muske, “Model predictive contro with linear models,” AIChE Journal, vol. 49, no. 9, pp. 3255–3267, September 1993.

    Google Scholar 

  29. D. Lee and J. Hu, “Primal-dual Q-learning framework for LQR design,” IEEE Transactions on Automatic Control, vol. 64, no. 9, pp. 3756–3763, September 2019.

    Article  MathSciNet  MATH  Google Scholar 

  30. F. Zhang, The Schur Complement and Its Applications, vol. 4, Springer, New York, NY, USA, 2006.

    Google Scholar 

  31. S. Boyd and L. Vandenberghe, Convex Optimization, Cambridge University Press, 2004.

  32. B. Litkouhi and H. Khalil, “Multirate and composite control of two-time-scale discrete-time systems,” IEEE Transactions on Automatic Control, vol. 30, no. 7, pp. 645–651, July 1985.

    Article  MathSciNet  MATH  Google Scholar 

  33. J. Elliott, “NASA’s advanced control law program for the F-8 digital fly-by-wire aircraft,” IEEE Transactions on Automatic Control, vol. 22, no. 5, pp. 753–757, October 1977.

    Article  Google Scholar 

  34. P. V. Kokotovi, Singular Perturbation Methods in Control: Analysis and Design, London, 1986.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhenlei Wang.

Ethics declarations

The authors declare that there is no competing financial interest or personal relationship that could have appeared to influence the work reported in this paper.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported by National Natural Science Foundation of China (Basic Science Center Program: 61988101), Natural Science Foundation of China (62233005, 62273149), Fundamental Research Funds for the Central Universities and Shanghai AI Lab.

Xuejie Que received her M.S. degree in applied mathematics from the Zhengzhou University, Zhengzhou, China, in 2019. She is currently pursuing a Ph.D. degree in Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, East China University of Science and Technology, Shanghai, China. Her research interests include multi-time-scale systems, optimal control, and reinforcement learning.

Zhenlei Wang is currently a Professor with the School of Information Science and Engineering, East China University of Science and Technology and also with the Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education. His research interests include intelligent control, modeling and analysis the characteristic of complex systems, intelligent optimization algorithms, and fault diagnosis.

Xin Wang is currently an Associate Professor in Shanghai Jiao Tong University, China. His research interests include multi-variable intelligent decoupling control, control and optimization of complex industrial processes, and multiple model adaptive control.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Que, X., Wang, Z. & Wang, X. Reinforcement Learning for Input Constrained Sub-optimal Tracking Control in Discrete-time Two-time-scale Systems. Int. J. Control Autom. Syst. 21, 3068–3079 (2023). https://doi.org/10.1007/s12555-022-0355-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12555-022-0355-6

Keywords

Navigation