Reinforcement learning and adaptive optimization of a class of Markov jump systems with completely unknown dynamic information

  • Shuping HeEmail author
  • Maoguang Zhang
  • Haiyang Fang
  • Fei Liu
  • Xiaoli Luan
  • Zhengtao Ding
Extreme Learning Machine and Deep Learning Networks


In this paper, an online adaptive optimal control problem of a class of continuous-time Markov jump linear systems (MJLSs) is investigated by using a parallel reinforcement learning (RL) algorithm with completely unknown dynamics. Before collecting and learning the subsystems information of states and inputs, the exploration noise is firstly added to describe the actual control input. Then, a novel parallel RL algorithm is used to parallelly compute the corresponding N coupled algebraic Riccati equations by online learning. By this algorithm, we will not need to know the dynamic information of the MJLSs. The convergence of the proposed algorithm is also proved. Finally, the effectiveness and applicability of this novel algorithm is illustrated by two simulation examples.


Markov jump linear systems (MJLSs) Adaptive optimal control Online Reinforcement learning (RL) Coupled algebraic Riccati equations (AREs) 



This work was supported in part by the National Natural Science Foundation of China under Grant 61673001, 61722306, the Foundation for Distinguished Young Scholars of Anhui Province under Grant 1608085J05, the Key Support Program of University Outstanding Youth Talent of Anhui Province under Grant gxydZD2017001, the State Key Program of National Natural Science Foundation of China under Grant 61833007 and the 111 Project under Grant B12018.

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflicts of interest.


  1. 1.
    Krasovskii NN, Lidskii EA (1961) Analysis design of controller in systems with random attributes—part 1. Autom Remote Control 22:1021–1025Google Scholar
  2. 2.
    Luan X, Huang B, Liu F (2018) Higher order moment stability region for Markov jump systems based on cumulant generating function. Automatica 93:389–396MathSciNetzbMATHGoogle Scholar
  3. 3.
    Zhang L, Boukas EK (2009) Stability and stabilization of Markovian jump linear systems with partly unknown transition probabilities. Automatica 45(2):463–468MathSciNetzbMATHGoogle Scholar
  4. 4.
    Shi P, Li F (2015) A survey on Markovian jump systems: modeling and design. Int J Control Autom Syst 13(1):1–16MathSciNetGoogle Scholar
  5. 5.
    Wang Y, Xia Y, Shen H, Zhou P (2017) SMC design for robust stabilization of nonlinear Markovian jump singular systems. IEEE Trans Autom Control. (In Press) zbMATHGoogle Scholar
  6. 6.
    Li H, Shi P, Yao D, Wu L (2016) Observer-based adaptive sliding mode control for nonlinear Markovian jump systems. Automatica 64(1):133–142MathSciNetzbMATHGoogle Scholar
  7. 7.
    Kao Y, Xie J, Wang C, Karimi HR (2015) A sliding mode approach to H non-fragile observer-based control design for uncertain Markovian neutral-type stochastic systems. Automatica 52:218–226MathSciNetzbMATHGoogle Scholar
  8. 8.
    Shi P, Liu M, Zhang L (2015) Fault-tolerant sliding mode observer synthesis of Markovian jump systems using quantized measurements. IEEE Trans Industr Electron 62(9):5910–5918Google Scholar
  9. 9.
    Ma Y, Jia X, Liu D (2016) Robust finite-time H control for discrete-time singular Markovian jump systems with time-varying delay and actuator saturation. Appl Comput Math 286:213–227MathSciNetzbMATHGoogle Scholar
  10. 10.
    Mao Z, Jiang B, Shi P (2007) H fault detection filter design for networked control systems modelled by discrete Markovian jump systems. IET Control Theory Appl 1(5):1336–1343MathSciNetGoogle Scholar
  11. 11.
    Shi P, Li F, Wu L, Lim CC (2017) Neural network-based passive filtering for delayed neutral-type semi-markovian jump systems. IEEE Trans Neural Netw Learn Syst 28(9):2101–2114MathSciNetGoogle Scholar
  12. 12.
    Li F, Wu L, Shi P, Lim CC (2015) State estimation and sliding mode control for semi-Markovian jump systems with mismatched uncertainties. Automatica 51:385–393MathSciNetzbMATHGoogle Scholar
  13. 13.
    Ma H, Liang H, Zhu Q, Ahn CK (2018) Adaptive dynamic surface control design for uncertain nonlinear strict-feedback systems with unknown control direction and disturbances. IEEE Trans Syst Man Cybern Syst. (In Press) Google Scholar
  14. 14.
    Ma H, Zhou Q, Bai L, Liang H (2018) Observer-based adaptive fuzzy fault-tolerant control for stochastic nonstrict-feedback nonlinear systems with input quantization. IEEE Trans Syst Man Cybern Syst. (In Press) Google Scholar
  15. 15.
    Tao G (2003) Adaptive control design and analysis. Wiley-IEEE Press, HobokenzbMATHGoogle Scholar
  16. 16.
    Kleinman D (1968) On an iterative technique for Riccati equation computations. IEEE Trans Autom Control 13(1):114–115Google Scholar
  17. 17.
    Lu L, Lin W (1993) An iterative algorithm for the solution of the discrete-time algebraic Riccati equation. Linear Algebra Appl 188–189(1):465–488MathSciNetzbMATHGoogle Scholar
  18. 18.
    Costa OLV, Aya JCC (1999) Temporal difference methods for the maximal solution of discrete-time coupled algebraic Riccati equations. In: Proceedings of the american control conference, San Diego. IEEE Press, pp 1791–1795Google Scholar
  19. 19.
    Gajic Z, Borno I (1975) Lyapunov iterations for optimal control of jump linear systems at steady state. IEEE Trans Autom Control 40(11):1971–1975MathSciNetzbMATHGoogle Scholar
  20. 20.
    He W, Dong Y, Sun C (2016) Adaptive neural impedance control of a robotic manipulator with input saturation. IEEE Trans Syst Man Cybern Syst 46(3):334–344Google Scholar
  21. 21.
    Shen H, Men Y, Wu Z, Park JH (2017) Nonfragile H control for fuzzy Markovian jump systems under fast sampling singular perturbation. IEEE Trans Syst Man Cybern Syst. (In Press) Google Scholar
  22. 22.
    Xu Y, Lu R, Peng H, Xie K, Xue A (2017) Asynchronous dissipative state estimation for stochastic complex networks with quantized jumping coupling and uncertain measurements. IEEE Trans Neural Netw Learn Syst 28(2):268–277MathSciNetGoogle Scholar
  23. 23.
    Cheng J, Park JH, Karimi HR (2018) A flexible terminal approach to sampled-data exponentially synchronization of Markovian neural networks with time-varying delayed signals. IEEE Trans Cybern 48(8):2232–2244Google Scholar
  24. 24.
    Zhai D, An L, Li X, Zhang Q (2018) Adaptive fault-tolerant control for nonlinear systems with multiple sensor faults and unknown control directions. IEEE Trans Neural Netw Learn Syst 29(9):4436–4446Google Scholar
  25. 25.
    Zhai D, An L, Ye D, Zhang Q (2018) Adaptive reliable H static output feedback control against Markovian jumping sensor failures. IEEE Trans Neural Netw Learn Syst 29(3):631–644MathSciNetGoogle Scholar
  26. 26.
    Liu D, Wei Q, Yan P (2015) Generalized policy iteration adaptive dynamic programming for discrete-time nonlinear systems. IEEE Trans Syst Man Cybern Syst 45(12):1577–1591Google Scholar
  27. 27.
    Wei Q, Liu D (2014) Stable iterative adaptive dynamic programming algorithm with approximation errors for discrete-time nonlinear systems. Neural Comput Appl 24(6):1355–1367Google Scholar
  28. 28.
    Liang Y, Zhang H, Xiao G, Jiang H (2018) Reinforcement learning-based online adaptive controller design for a class of unknown nonlinear discrete-time systems with time delays. Neural Comput Appl 30(6):1733–1745Google Scholar
  29. 29.
    Lewis FL, Vrabie D (2009) Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst Mag 9(3):32–50Google Scholar
  30. 30.
    Vrabie D, Lewis FL (2011) Adaptive dynamic programming for online solution of a zero-sum differential game. IET Control Theory Appl 9(3):353–360MathSciNetzbMATHGoogle Scholar
  31. 31.
    Vrabie D, Lewis FL (2009) Adaptive optimal control algorithm for continuous-time nonlinear systems based on policy iteration. In: Proceedings of the 48th IEEE conference on decision and control, Shanghai, pp 73–79Google Scholar
  32. 32.
    Guo W, Si J, Liu F, Mei S (2018) Policy approximation in policy iteration approximate dynamic programming for discrete-time nonlinear systems. IEEE Trans Neural Netw Learn Syst 29(7):2794–2807MathSciNetGoogle Scholar
  33. 33.
    Kiumarsi B, Lewis FL, Modares H, Karimpour A, Naghibi-Sistani MB (2014) Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics. Automatica 50(4):1167–1175MathSciNetzbMATHGoogle Scholar
  34. 34.
    Liu YJ, Li S, Tong CT, Chen CLP (2019) Adaptive reinforcement learning control based on neural approximation for nonlinear discrete-time systems with unknown nonaffine dead-zone input. IEEE Trans Neural Netw Learn Syst 30(1):295–305Google Scholar
  35. 35.
    Wu HN, Luo B (2013) Simultaneous policy update algorithms for learning the solution of linear continuous-time H state feedback control. Inf Sci 222(11):472–485MathSciNetzbMATHGoogle Scholar
  36. 36.
    Mu C, Wang D, He H (2017) Data-driven finite-horizon approximate optimal control for discrete-time nonlinear systems using iterative HDP approach. IEEE Trans Cybern. (In Press) Google Scholar
  37. 37.
    He X, Huang T, Yu J, Li C, Zhang Y (2017) A continuous-time algorithm for distributed optimization based on multiagent networks. IEEE Trans Syst Man Cybern Syst. (In Press) Google Scholar
  38. 38.
    Yang X, He H, Liu Y (2017) Adaptive dynamic programming for robust neural control of unknown continuous-time nonlinear systems. IET Control Theory Appl 11(14):2307–2316MathSciNetGoogle Scholar
  39. 39.
    Xu W, Huang Z, Zuo L, He H (2017) Manifold-based reinforcement learning via locally linear reconstruction. IEEE Trans Neural Netw Learn Syst 28(4):934–947Google Scholar
  40. 40.
    Alipour MM, Razavi SN, Derakhshi MRF, Balafar MA (2018) A hybrid algorithm using a genetic algorithm and multiagent reinforcement learning heuristic to solve the traveling salesman problem. Neural Comput Appl 30(9):2935–2951Google Scholar
  41. 41.
    Zhu Y, Zhao D (2015) A data-based online reinforcement learning algorithm satisfying probably approximately correct principle. Neural Comput Appl 26(4):775–787Google Scholar
  42. 42.
    Tang L, Liu Y, Tong S (2014) Adaptive neural control using reinforcement learning for a class of robot manipulator. Neural Comput Appl 25(1):135–141Google Scholar
  43. 43.
    Mu C, Ni Z, Sun C, He H (2017) Air-breathing hypersonic vehicle tracking control based on adaptive dynamic programming. IEEE Trans Neural Netw Learn Syst 28(3):584–598MathSciNetGoogle Scholar
  44. 44.
    Xie X, Yue D, Hu S (2017) Fault estimation observer design of discrete-time nonlinear systems via a joint real-time scheduling law. IEEE Trans Syst Man Cybern Syst 45(7):1451–1463Google Scholar
  45. 45.
    He S, Song J, Ding Z, Liu F (2015) Online adaptive optimal control for continuous-time Markov jump linear systems using a novel policy iteration algorithm. IET Control Theory Appl 9(10):1536–1543MathSciNetGoogle Scholar
  46. 46.
    Song J, He S, Liu F, Niu Y, Ding Z (2016) Data-driven policy iteration algorithm for optimal control of continuous-time Itô stochastic systems with Markovian jumps. IET Control Theory Appl 10(12):1431–1439MathSciNetGoogle Scholar
  47. 47.
    Song J, He S, Ding Z, Liu F (2016) A new iterative algorithm for solving H control problem of continuous-time Markovian jumping linear systems based on online implementation. Int J Robust Nonlinear Control 26(17):3737–3754MathSciNetzbMATHGoogle Scholar
  48. 48.
    Gajic Z, Borno I (2000) General transformation for block diagonalization of weakly coupled linear systems composed of N-subsystems. IEEE Trans Circuits Syst I Fundam Theory Appl 47(6):909–912MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2019

Authors and Affiliations

  1. 1.School of Electrical Engineering and AutomationAnhui UniversityHefeiChina
  2. 2.Institute of Physical Science and Information TechnologyAnhui UniversityHefeiChina
  3. 3.Key Laboratory of Advanced Process Control for Light Industry (Ministry of Education), Institute of AutomationJiangnan UniversityWuxiChina
  4. 4.School of Electrical and Electronic EngineeringThe University of ManchesterManchesterUK

Personalised recommendations