Skip to main content

Controller Optimization for Multirate Systems Based on Reinforcement Learning

Abstract

The goal of this paper is to design a model-free optimal controller for the multirate system based on reinforcement learning. Sampled-data control systems are widely used in the industrial production process and multirate sampling has attracted much attention in the study of the sampled-data control theory. In this paper, we assume the sampling periods for state variables are different from periods for system inputs. Under this condition, we can obtain an equivalent discrete-time system using the lifting technique. Then, we provide an algorithm to solve the linear quadratic regulator (LQR) control problem of multirate systems with the utilization of matrix substitutions. Based on a reinforcement learning method, we use online policy iteration and off-policy algorithms to optimize the controller for multirate systems. By using the least squares method, we convert the off-policy algorithm into a model-free reinforcement learning algorithm, which only requires the input and output data of the system. Finally, we use an example to illustrate the applicability and efficiency of the model-free algorithm above mentioned.

This is a preview of subscription content, access via your institution.

References

  1. [1]

    P. Shi. Filtering on sampled-data systems with parametric uncertainty. IEEE Transactions on Automatic Control, vol. 43, no. 7, pp. 1022–1027, 1998. DOI: 10.1109/9.701119.

    MathSciNet  MATH  Article  Google Scholar 

  2. [2]

    X. J. Han, Y. C. Ma. Sampled-data robust H∞ control for T-S fuzzy time-delay systems with state quantization. International Journal of Control, Automation and Systems, vol. 17, no. 1, pp. 46–56, 2019. DOI: 10.1007/s12555-018-0279-3.

    Article  Google Scholar 

  3. [3]

    K. Abidi, Y. Yildiz, A. Annaswamy. Control of uncertain sampled-data systems: An adaptive posicast control approach. IEEE Transactions on Automatic Control, vol. 62, no. 5, pp. 2597–2602, 2017. DOI: 10.1109/TAC.2016.2600627.

    MathSciNet  MATH  Article  Google Scholar 

  4. [4]

    T. Nguyen-Van. An observer based sampled-data control for class of scalar nonlinear systems using continualized discretization method. International Journal of Control, Automation and Systems, vol. 16, no. 2, pp. 709–716, 2018. DOI: 10.1007/s12555-016-0739-6.

    MathSciNet  Article  Google Scholar 

  5. [5]

    R. J. Liu, J. F. Wu, D. Wang. Sampled-data fuzzy control of two-wheel inverted pendulums based on passivity theory. International Journal of Control, Automation and Systems, vol. 16, no. 5, pp. 2538–2648, 2018. DOI: 10.1007/s12555-018-0063-4.

    Article  Google Scholar 

  6. [6]

    R. E. Kalman, J. E. Bertram. A unified approach to the theory of sampling systems. Journal of the Franklin Institute, vol. 267, no. 5, pp. 405–436, 1959. DOI: 10.1016/0016- 0032(59)90093-6.

    MathSciNet  MATH  Article  Google Scholar 

  7. [7]

    B. Friedland. Sampled-data control systems containing periodically varying members. In Proceedings of the 1stIFAC World Conference, Moscow, Russia, pp. 361–367, 1961. DOI: 10.1016/s1474-6670(17)70078-X.

    Google Scholar 

  8. [8]

    D. G. Meyer. A new class of shift-varying operators, their shift-invariant equivalents, and multirate digital systems. IEEE Transactions on Automatic Control, vol. 35, no. 4, pp. 429–433, 1990. DOI: 10.1109/9.52295.

    MathSciNet  MATH  Article  Google Scholar 

  9. [9]

    T. W. Chen, L. Qiu. H∞ design of general multirate sampled-data control systems. Automatica, vol. 30, no. 7, pp. 1139–1152, 1994. DOI: 10.1016/0005-1098(94)90210-0.

    MathSciNet  MATH  Article  Google Scholar 

  10. [10]

    M. F. Sågfors, H. T. Toivonen, B. Lennartson. H∞ control of multirate sampled-data systems: A state-space approach. Automatica, vol. 34, no. 4, pp. 415–428, 1998. DOI: 10.1016/S0005-1098(97)00236-7.

    MathSciNet  MATH  Article  Google Scholar 

  11. [11]

    L. Qiu, K. Tan. Direct state space solution of multirate sampled-data H2 optimal control. Automatica, vol. 34, no. 11, pp. 1431–1437, 1998. DOI: 10.1016/S0005-1098(98)00080-6.

    MATH  Article  Google Scholar 

  12. [12]

    P. Colaneri, G. D. Nicolao. Multirate LQG control of continuous-time stochastic systems. Automatica, vol. 31, no. 4, pp. 591–595, 1995. DOI: 10.1016/0005-1098(95)98488-R.

    MathSciNet  MATH  Article  Google Scholar 

  13. [13]

    N. Xiao, L. H. Xie, L. Qiu. Feedback stabilization of discrete-time networked systems over fading channels. IEEE Transactions on Automatic Control, vol. 57, no. 9, pp. 2167–2189, 2012. DOI: 10.1109/TAC.2012.2183450.

    MathSciNet  MATH  Article  Google Scholar 

  14. [14]

    W. Chen, L. Qiu. Stabilization of networked control systems with multirate sampling. Automatica, vol. 49, no. 6, pp. 1528–1537, 2013. DOI: 10.1016/j.automatica.2013.02.010.

    MathSciNet  MATH  Article  Google Scholar 

  15. [15]

    S. R. Xue, X. B. Yang, Z. Li, H. J. Gao. An approach to fault detection for multirate sampled-data systems with frequency specifications. IEEE Transactions on Systems, man, and cybernetics: Systems, vol. 48, no. 7, pp. 1155–1165, 2018. DOI: 10.1109/TSMC.2016.2645797.

    Article  Google Scholar 

  16. [16]

    M. Y. Zhong, H. Ye, S. X. Ding, G. Z. Wang. Observer-based fast rate fault detection for a class of multirate sampled-data systems. IEEE Transactions on Automatic control, vol. 52, no. 3, pp. 520–525, 2007. DOI: 10.1109/TAC.2006.890488.

    MathSciNet  MATH  Article  Google Scholar 

  17. [17]

    H. J. Gao, S. R. Xue, S. Yin, J. B. Qiu, C. H. Wang. Out-put feedback control of multirate sampled-data systems with frequency specifications. IEEE Transactions on Control Systems Technology, vol. 25, no. 5, pp. 1599–1608, 2017. DOI: 10.1109/TCST.2016.2616379.

    Article  Google Scholar 

  18. [18]

    X. X. Guo, S. Singh, H. Lee, R. Lewis, X. S. Wang. Deep learning for real-time Atari game play using offline montecarlo tree search planning. In Proceedings of the 27th International Conference on Neural Information Processing Systems, ACM, Montreal, Canada, pp. 3338–3346, 2014.

    Google Scholar 

  19. [19]

    D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, D. Hassabis. Mastering the game of go with deep neural networks and tree search. Nature, vol. 529, no. 7587, pp. 484–489, 2016. DOI: 10.1038/nature16961.

    Article  Google Scholar 

  20. [20]

    D. P. Bertsekas, J. N. Tsitsiklis. Neuro-dynamic programming: An overview. In Proceedings of the 34th IEEE Conference on Decision and Control, IEEE, New Orleans, USA, pp. 560–564, 1995. DOI: 10.1109/CDC.1995.478953.

    Google Scholar 

  21. [21]

    F. Y. Wang, H. G. Zhang, D. R. Liu. Adaptive dynamic programming: An introduction. IEEE Computational Intelligence Magazine, vol. 4, no. 2, pp. 39–47, 2009. DOI: 10.1109/MCI.2009.932261.

    Article  Google Scholar 

  22. [22]

    W. N. Gao, Z. P. Jiang. Adaptive dynamic programming and adaptive optimal output regulation of linear systems. IEEE Transactions on Automatic Control, vol. 61, no. 12, pp. 4164–4169, 2016. DOI: 10.1109/TAC.2016.2548662.

    MathSciNet  MATH  Article  Google Scholar 

  23. [23]

    W. J. Lu, P. P. Zhu, S. Ferrari. A hybrid-adaptive dynamic programming approach for the model-free control of nonlinear switched systems. IEEE Transactions on Automatic Control, vol. 61, no. 10, pp. 3203–3208, 2016. DOI: 10.1109/TAC.2015.2509421.

    MathSciNet  MATH  Article  Google Scholar 

  24. [24]

    Y. Yang, J. M. Lee. A switching robust model predictive control approach for nonlinear systems. Journal of Process Control, vol. 23, no. 6, pp. 852–860, 2013. DOI: 10.1016/j.jprocont.2013.03.011.

    Article  Google Scholar 

  25. [25]

    B. Luo, H. N. Wu, T. W. Huang. Off-policy reinforcement learning for H∞ control design. IEEE Transactions on Cybernetics, vol. 45, no. 1, pp. 65–76, 2015. DOI: 10.1109/TCYB.2014.2319577.

    Article  Google Scholar 

  26. [26]

    H. J. Yang, M. Tan. Sliding mode control for flexible-link manipulators based on adaptive neural networks. International Journal of Automation and Computing, vol. 15, no. 2, pp. 239–248, 2018. DOI: 10.1007/s11633-018-1122-2.

    Article  Google Scholar 

  27. [27]

    M. S. Tong, W. Y. Lin, X. Huo, Z. S. Jin, C. Z. Miao. A model-free fuzzy adaptive trajectory tracking control algorithm based on dynamic surface control. International Journal of Advanced Robotic Systems, vol. 17, no. 1, pp. 17–29, 2020. DOI: 10.1177/1729881419894417.

    Article  Google Scholar 

  28. [28]

    I. Zaidi, M. Chtourou, M. Djemel. Robust neural control of discrete time uncertain nonlinear systems using sliding mode backpropagation training algorithm. International Journal of Automation and Computing, vol. 16, no. 2, pp. 213–225, 2019. DOI: 10.1007/s11633-017-1062-2.

    Article  Google Scholar 

  29. [29]

    M. Zhu, J. N. Bian, W. M. Wu. A novel collaborative scheme of simulation and model checking for system properties verification. Computers in Industry, vol. 57, no. 8–9, pp. 752–757, 2006. DOI: 10.1016/j.compind.2006.04.006.

    Article  Google Scholar 

  30. [30]

    Y. H. Zhu, D. B. Zhao, H. B. He, J. H. Ji. Event-triggered optimal control for partially unknown constrained-input systems via adaptive dynamic programming. IEEE Transactions on Industrial Electronics, vol. 64, no. 5, pp. 4101–4109, 2017. DOI: 10.1109/TIE.2016.2597763.

    Article  Google Scholar 

  31. [31]

    R. Kamalapurkar, P. Walters, W. E. Dixon. Model-based reinforcement learning for approximate optimal regulation. Automatica, vol. 64, pp. 94–104, 2016. DOI: 10.1016/j.automatica.2015.10.039.

    MathSciNet  MATH  Article  Google Scholar 

  32. [32]

    B. Kiumarsi, F. L. Lewis, H. Modares, A. Karimpour, M. B. Naghibi-Sistani. Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics. Automatica, vol. 50, pp. 1167–1175, 2014. DOI: 10.1016/j.automatica.2014.02.015.

    MathSciNet  MATH  Article  Google Scholar 

  33. [33]

    H. Modares, S. P. Nageshrao, G. A. Delgado Lopes, R. Babuska, F. L. Lewis. Optimal model-free output synchronization of heterogeneous systems using off-policy re-inforcement learning. Automatica, vol. 71, pp. 334–341, 2016. DOI: 10.1016/j.automatica.2016.05.017.

    MathSciNet  MATH  Article  Google Scholar 

  34. [34]

    A. Madady, H. R. Reza-Alikhani, S. Zamiri. Optimal N-parametric type iterative learning control. International Journal of Control, Automation and Systems, vol. 16, no. 5, pp. 2187–2202, 2018. DOI: 10.1007/s12555-017-0259-z.

    Article  Google Scholar 

  35. [35]

    Z. Li, S. R. Xue, W. Y. Lin, M. S. Tong. Training a robust reinforcement learning controller for the uncertain system based on policy gradient method. Neurocomputing, vol. 316, pp. 313–321, 2018. DOI: 10.1016/j.neucom.2018.08.007.

    Article  Google Scholar 

  36. [36]

    S. R. Xue, Z. Li, L. Yang. Training a model-free reinforcement learning controller for a 3-degree-of-freedom helicopter under multiple constraints. Measurement and Control, vol. 52, no. 7–8, pp. 844–854, 2019. DOI: 10.1177/0020294019847711.

    Article  Google Scholar 

  37. [37]

    S. Preitl, R. E. Precup, Z. Preitl, S. Vaivoda, S. Kilyeni, J. K. Tar. Iterative feedback and learning control. Servo systems applications. IFAC Proceedings Volumes, vol. 40, no. 8, pp. 16–27, 2007. DOI: 10.3182/20070709-3-RO-4910.00004.

    Article  Google Scholar 

  38. [38]

    R. P. A. Gil, Z. C. Johanyak, T. Kovacs. Surrogate model based optimization of traffic lights cycles and green period ratios using microscopic simulation and fuzzy rule interpolation. International Journal of Artificial Intelligence, vol. 16, no. 1, pp. 20–40, 2018.

    Google Scholar 

  39. [39]

    F. L. Lewis, D. Vrabie, K. G. Vamvoudakis. Reinforcement learning and feedback control: Using natural decision methods to design optimal adaptive controllers. IEEE Control Systems Magazine, vol. 32, no. 6, pp. 76–105, 2012. DOI: 10.1109/MCS.2012.2214134.

    MathSciNet  MATH  Article  Google Scholar 

  40. [40]

    J. X. Yu, H. Dang, L. M. Wang. Fuzzy iterative learning control-based design of fault tolerant guaranteed cost controller for nonlinear batch processes. International Journal of Control, Automation and Systems, vol. 16, no. 5, pp. 2518–2527, 2018. DOI: 10.1007/s12555-017-0614-0.

    Article  Google Scholar 

  41. [41]

    H. Modares, F. L. Lewis, Z. P. Jiang. Optimal output-feedback control of unknown continuous-time linear systems using off-policy reinforcement learning. IEEE Transactions on Cybernetics, vol. 46, no. 11, pp. 2401–2410, 2016. DOI: 10.1109/TCYB.2015.2477810.

    Article  Google Scholar 

  42. [42]

    B. Hu, J. C. Wang. Deep learning based hand gesture recognition and UAV flight controls. International Journal of Automation and Computing, vol. 17, no. 1, pp. 17–29, 2020. DOI: 10.1007/s11633-019-1194-7.

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by National Key R&D Program of China (No. 2018YFB1308404).

Author information

Affiliations

Authors

Corresponding authors

Correspondence to Sheng-Ri Xue or Hui-Jun Gao.

Additional information

Zhan Li received the Ph. D. degree in control science and engineering from Harbin Institute of Technology, Harbin, China in 2015. He is currently an associate professor with Research Institute of Intelligent Control and Systems, School of Astronautics, Harbin Institute of Technology, China.

His research interests include motion control, industrial robot control, robust control of small unmanned aerial vehicles (UAVs), and cooperative control of multivehicle systems.

Sheng-Ri Xue received the B. Sc. degree in automation engineering from Harbin Institute of Technology, China in 2015, where he is currently pursuing the Ph. D. degree with the Research Institute of Intelligent Control and Systems.

His research interests include H-infinity control, controller optimization, reinforcement learning, and their applications to sampled-data control systems design.

Xing-Hu Yu received the M. M. degree in osteopathic medicine from Jinzhou Medical University, China, in 2016. He is currently a Ph. D. degree candidate in control science and engineering from Harbin Institute of Technology, China.

His research interests include intelligent control and biomedical image processing.

Hui-Jun Gao received the Ph. D. degree in control science and engineering from Harbin Institute of Technology, China in 2005. From 2005 to 2007, he carried out his postdoctoral research with Department of Electrical and Computer Engineering, University of Alberta, Canada. Since 2004, he has been with Harbin Institute of Technology, where he is currently a full professor, the Director of Inter-discipline Science Research Center, and the Director of the Research Institute of Intelligent Control and Systems. He is an IEEE Industrial Electronics Society Administration Committee Member, and a council member of IFAC. He is the Co-Editor-in-Chief for IEEE Transactions on Industrial Electronics, and an Associate Editor for Automatica, IEEE Transactions on Control Systems Technology, IEEE Transactions on Cybernetics, and IEEE/ASME Transactions on Mechatronics.

His research interests include intelligent and robust control, robotics, mechatronics, and their engineering applications.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Li, Z., Xue, SR., Yu, XH. et al. Controller Optimization for Multirate Systems Based on Reinforcement Learning. Int. J. Autom. Comput. 17, 417–427 (2020). https://doi.org/10.1007/s11633-020-1229-0

Download citation

Keywords

  • Multirate system
  • reinforcement learning
  • policy iteration
  • optimal control
  • controller optimization