Sliding mode-based online fault compensation control for modular reconfigurable robots through adaptive dynamic programming

In this paper, a sliding mode (SM)-based online fault compensation control scheme is investigated for modular reconfigurable robots (MRRs) with actuator failures via adaptive dynamic programming. It consists of a SM-based iterative controller, an adaptive robust term and an online fault compensator. For fault-free MRR systems, the SM surface-based Hamilton–Jacobi–Bellman equation is solved by online policy iteration algorithm. The adaptive robust term is added to guarantee the reachable condition of SM surface. For faulty MRR systems, the actuator failure is compensated online to avoid the fault detection and isolation mechanism. The closed-loop MRR system is guaranteed to be asymptotically stable under the developed fault compensation control scheme. Simulation results verify the effectiveness of the present fault compensation control approach.


Introduction
As modular reconfigurable robots (MRRs) take advantages of structural flexibility, low cost, excellent adaptability, etc., they often work in perilous and complex working environments, such as disaster rescue, deep space/sea exploration, smart manufacturing, and many other hazardous environments that human cannot involve directly [1][2][3].  In recent years, several researches on MRRs control approaches have been attracted a great deal of attention, such as centralized control [4,5], distributed control [6,7], and decentralized control [8,9]. These control approaches mainly tackled force/position control problems [10,11], fault tolerant control problems [12,13], and so on. Despite above methods have achieved good performance, the designed controller always contain some adjustable parameters, which increase the design difficulty and structural complexity. Thus, more attention should be paid to simplify the control structure and reduce the computational burden.
As we know, the optimal control is one of the key requirements in modern control theory. It can not only ensure the stability of the control system, but also obtain proper optimal performance. Since the strong self-learning and optimization ability, adaptive dynamic programming (ADP) [14] was introduced and extensively investigated in optimal control by solving the Hamilton-Jacobi-Bellman equation (HJBE) without the "curse of dimensionality". Therefore, more and more ADP-based control methods were investigated to deal with trajectory tracking [15,16], zero-sum games [17], uncertainties [18], and actuator saturation [19], etc. It is significant to point out that reinforcement learning (RL) and ADP are almost in the same spirit when dealing with optimal control problems. Therefore, RL is often regarded as the synonym for ADP. Up to now, a lot of ADP and RL methods have been investigated [20][21][22][23]. Bai et al. designed an optimal control approach via the neural network (NN) technique and RL algorithm to tackle the nonstrict-feedback control problem [24] and input saturation [25]. For systems with known dynamics, Shi et al. [26] proposed an optimal tracking control (OTC) approach to handle time delay problems via integral RL and value iteration method. In [27], a novel approximate OTC strategy was addressed using an event-driven ADP algorithm to handle such problems. However, the aforementioned methods require accurate system dynamics, which is difficult to obtain in real industrial applications of MRRs. Recently, some model-free RL-based control methods have been presented. These control approaches depend merely on the input and the output measurement data of the controlled plant [28]. Actually, the model-free methods require large online or offline data to train NN, which wastes computation and training time.
Furthermore, MRRs work in hazardous environments for a long time may lead to the occurrence of failures, which can not only degrade the system performance, but even damage the surrounding workspace. As we all know, the actuator failure is always regarded as one of challenging failures to handle, because the occurrence of unknown actuator failures may easily cause serious deterioration of the system control performance compared with other fault scenarios. Furthermore, it is unrealizable to repair MRRs in hazardous environments. Hence, exploring a fault tolerant control (FTC) method is imperative to guarantee MRRs to continue working reliable in the presence of actuator failures.
FTC strategies mainly include passive FTC (PFTC) and active FTC (AFT-C). PFTC does not need fault detection and identification (FDI) unit. Over the last few decades, many PFTC approaches have been presented mainly based on quantitative feedback theory [29], linear matrix inequality [30], and H ∞ theory [31]. PFTC designs a fixed controller before fault occurs, thus it can only solve the known faults [32]. Alternatively, AFTC can effectively avoid the drawback of PFTC. AFTC obtains fault information via FDI, then readjusts or reconstructs the control law. Various AFTC approaches can be categorized into fault accommodation [33], fault reconfiguration [34], and fault compensation [35]. Owing to the better performance of AFTC, it is potential in robot manipulators [36], quadrotors [37], inverted pendulums [38] and other practical applications. Moreover, some FTC schemes have been developed through RL or ADP. Zhao et al. [32] employed the information of the fault observer to construct an improved cost function and utilized online iteration algorithm to develop a novel FTC method for nonlinear systems. Fan and Yang [39] investigated an FTC strategy to handle the time-varying actuator bias faults via ADP. In [40], an ADP-based stabilizing scheme for nonlinear systems with unknown actuator saturation was developed via NN compensation. However, these literatures have solved stabilizing problems, rather than trajectory tracking, which is feasible to MRRs.
To get rapid response and convergence, sliding modebased control schemes have been presented. Owing to the low sensitivity and robustness to system uncertainties and external disturbances, sliding mode control (SMC) reduces the necessity of accurate model, and is feasible to apply to design control systems no matter in normal or faulty conditions [41][42][43]. Hence, SMC methods always have been applied to systems with high nonlinearities, variable parameters and external disturbances, such as aircraft systems [44], direct current (DC) servomotors [45], multi-machine power systems [46] and MRRs [12].
Although previous ADP-based FTC methods can guarantee the stability of faulty system, we further require a faster control action in practice. Thus, motivated by [47], this paper develops a SM-based online fault compensation control (SMOFCC) scheme for MRRs with unknown actuator failures. For the fault-free case, the SM-based approximate optimal control (SMAOC) is derived using the SM-based iterative controller and an adaptive robust term. When the actuator failures occurs, the SMOFCC is obtained by adding an online fault compensator to SMAOC. The main contribution and novelties of this work are presented as follows.
(1) The scheme extends the ADP-based SMC method to FTC problem for MRRs with unknown actuator failures, and the online fault compensation is achieved without FDI. (2) The proposed SMOFCC scheme, which is composed of SM-based iterative controller, an adaptive robust term and a fault compensator, can guarantee the MRR system to be asymptotically stable, rather than ultimately uniformly bounded (UUB) [3,32,39,40]. (3) By employing the SMC technique, the developed SMOFCC has a faster control response compared to that based on tracking error feedback only [3].
The rest of this paper is organised as follows. In the next section, we present the problem statement. In the subsequent section, the SM-based control scheme for MRRs in fault-free case is presented. Then, an online fault compensator is developed to obtain the FTC, and the stability analysis is provided. The numerical simulation demonstrates the effectiveness of the SMOFCC before the final sections. In the last section, a brief conclusion is drawn.

Problem statement
The n-DOF (degree of freedom) MRR system with unknown actuator failures can be described by where q ∈ R n denotes the vector of joint displacements, M(q) ∈ R n×n denotes the nonsingular symmetric inertia matrix, C(q,q)q ∈ R n denotes the Coriolis and centripetal force, G(q) ∈ R n denotes the gravity term, and u ∈ R n denotes the joint input torque, f a ∈ R n represents the unknown additive actuator failure.
Define the system state as where x ∈ R 2n and y ∈ R n are the state and the output vectors, respectively,

Assumption 2
The desired reference trajectory q d , the velocity vectorq d , and the acceleration vectorq d are normbounded as [15] where q κ > 0 is a known constant.

Assumption 3
The actuator failure f a is norm-bounded as For the fault-free case of the MRR system (2), i.e., f a = 0, we define the nominal system as where u 0 is a SMAOC law.
The tracking error is defined as where x ϑ = [q ϑ ,q ϑ ] T is the desired reference trajectory. Thus, the time derivative of the tracking error (4) becomeṡ To accelerate the convergence rate, we introduce the SM surface as where Λ is a positive definite matrix.
To realize the approximate optimal control, the SM-based iterative controller u s is used to make the trajectory tracking error converge to the steady state, then the cost function is defined as where Z (s, u s ) = s T Qs + u T s Ru s , J (s(t)) ≥ 0 for arbitrary s and u s , and J (0) = 0. Q ∈ R n×n , R ∈ R m×m are positive definite matrices. (6), we can observe that the SM surface consists of the position tracking error e and the velocity tracking errorė, rather than position tracking error e only. Thus, compare the optimal control with the SM signal to that with position tracking error feedback only, the optimal control with SM has a faster convergence and smaller overshoot. Furthermore, the SMC has low sensitivity and strong robustness to system uncertainties, and is easily implemented in practice.

Sliding mode-based HJBE
For each admissible control strategy μ(s) ∈ Ψ (Ω) of system (3), where Ψ (Ω) is the set of admissible control, if the cost function (8) is continuously differentiable, then the nonlinear Lyapunov equation can be derived as where ∇ J (s) = ∂ J (s) ∂s . The Hamiltonian is defined as and the optimal cost function can be defined as Based on the Bellman principle of optimization, J * (s) satisfies the HJBE Since ∂ H (s,u * s ,∇ J * (s)) ∂u * s = 0, the optimal control law can be obtained as By equivalent transformation, (13) becomes

Online PI algorithm
According to [36,37,39,40], the solution of HJBE (12) can be approximated through the online PI algorithm when the system is in normal state. Unlike [3], the online PI algorithm is realized with the help of SM feedback signal, rather than system tracking error. The online PI algorithm is presented in Algorithm 1.

Sliding mode-based critic neural network
The cost function J (s) can be reconstructed by a single-layer NN as where W c ∈ R M and σ c (s) denote the ideal weight vector and the activation function, respectively, M denotes the number of neurons in the hidden layer, and ε c (s) denotes the approximation error caused by critic NN (CNN) approximation. Then, from (15), we can obtain According (16), the Hamiltonian (10) can be rewritten as where s cH is the residual error caused by the NN approximation.
To estimate W c , the CNN (15) is approximated aŝ and we can obtain ∇Ĵ (s) from (18) that Inserting (19) into (17), we have the approximate Hamiltonian as Denote = ∇σ c (s)ṡ, and assume that there exists a constant M > 0 such that ≤ M . By minimizing the objective function E c = 1 2 s T c s c with gradient descent algorithm,Ŵ c should be updated bẏ where β c > 0 is the learning rate.
Define the weight approximation error as From (17), (20) and (22), one has Then, the weight approximation error is updated bẏ Inserting (16) to (13), the ideal SM-based iterative control strategy is expressed by and it is approximated aŝ (21), the weight approximation error is guaranteed to be UUB.

Theorem 1 Considering the nominal MRR system (3), if the weight vector of the SM-based CNN is tuned by
Proof Choose a Lyapunov function candidate as Taking the time derivative of (27), we havė Therefore, we can obtainL 1 ≤ 0 as long asW c lies outside the compact set Ω c = {W c : W c < s cH M }. Thus, the CNN weight estimation errorW c is UUB. This completes the proof.

Sliding mode-based approximate optimal control
In light of previous analysis, we can design the SMAOC law as where sgn(s) = [sgn(s 1 ), sgn(s 2 ), . . . , sgn(s n )] ∈ R n , k 1 and k 2 are positive definite constant matrices, andφ is a robust term which is tuned bẏ where β ϕ > 0 is a positive definite matrix. According to the SMC theory and the proposed controller (29), the reachable condition s Tṡ ≤ 0 ensures the MRR system states reach and stay on the SM surface. (3), the SM surface (6) and its time derivative (7). The tracking error of the MRR system can arrive and stay on the SM surface thereafter under the developed SMAOC law (29).

Theorem 2 Consider the nominal MRR system
Proof Choose a Lyapunov function candidate as whereφ = ϕ −φ. Introducing the SMAOC law (29) into (31), we havė From Assumption 1, there exist two unknown positive constants D f and D g , s.t. f (x) ≤ D f and g(x) ≤ D g . Using Young's inequality, (32) becomeṡ where δ = λ min (k 2 ) − D f − D g u s . It implies that the system tracking errors reach the SMC and remain on it with δ ≥ 0. FromL 2 ≤ 0, we can see that s andṡ are all bounded.L 2 ≤ −δ s means that its integral exists as long as . Owing to L 2 (0) is bounded andL 2 is monotonically decreasing and has a lower bound, lim t→∞ t 0 s dτ is also bounded. Then, s(t) is asymptotically stable via the Barbalat Lemma, we have lim t→∞ s(t) = 0. Furthermore, e(t) converges to zero asymptotically. Therefore, the system states can arrive the SM surface in a finite time. This completes the proof.

Sliding mode-based online fault compensator
Based on the analysis of the fault-free case of MRRs, an online fault compensator is developed to ensure the closedloop system stable when actuator failures occur, i.e., f a = 0. By introducing the SMAOC law (let u = u 0 ), the faulty MRR system (2) becomes ⎧ ⎨ According (8), J * (s) ≥ 0 with J * (0) = 0. Thus, J * (s) is a positive definite function. Then, we can obtainJ * (s) thaṫ Combining (7) with (34), one can obtaiṅ Considering (9), (14) and assuming there exists a positive constant ϕ M , s.t. φ ≤ ϕ M , we havė where ζ = λ min (k 2 ) − ϕ M , we can conclude that whetheṙ J * (s) is negative or not depends on f a . Therefore, it is expected to design an online fault compensator to guarantee the stability of the closed-loop MRR system with actuator failures. Thus, the SMOFCC law for MRR system (2) is designed as wheref a is estimation referring to the unknown actuator failure, which is adaptively updated bẏ According to aforementioned design procedure, the block diagram of designed SMOFCC is shown in Fig. 1.

Remark 2
We notice that the SMOFCC scheme (38) is developed based on the online fault compensation control technique [35], rather than the state observer technique [32]. It is worth noticing that the fault compensator (39) can not only to estimate the failure, but also compensate the NN approximation error. However, the state-based fault observer can estimate the failure only.

Stability analysis
Theorem 3 Consider the faulty MRR system (2), the CNN (15) with the updating law (21), the cost function (8) and the online compensatorf a (39). The closed-loop of faulty MRR system can be ensured to be asymptotically stable under the developed SMOFCC policy (38).

Remark 3
The difference between model-free control and model-based control lies in whether the dynamic model of the controlled plant is known or not. In this paper, the SMOFCC scheme is designed based on known system dynamics, and it surely can be extended to model-free case as long as the system dynamics is available. To achieve this goal, one strategy is to employ the observer [15] or identifier [48] to estimate the system dynamics, and then directly applied it to propose the control method. While, another way is to develop a pure model-free control method, i.e., the controller is designed directly with system input-output data [28].

Simulation study and results analysis
In this section, a 2-DOF MRR (see configuration b in [3]) is employed to verify the effectiveness of the theoretical results of SMOFCC comparatively. The desired trajectories of two joint modules are defined as and an unknown additive actuator failure is assumed to be To approximate (8) Table 1.
To show the superiority of the proposed scheme, we compare the control performance of the developed SMOFCC scheme with the existing optimal control scheme that based on tracking error feedback only in [3]. Figures 2, 3, 4 and 5 illustrate the simulation results under the proposed control strategy, and Figs. 6, 7, 8 and 9 depict the simulation results under the control method in [3].
The designed compensator (39) is employed to estimate the fault amplitude online. From Figs. 2 and 6, we can observe that the estimated failure tracks the actual failure within less than 1 s after the system runs with the developed SMOFCC scheme, while it takes near 10 s with the control scheme in [3]. Moreover, the SMOFCC can obtain a smaller overshoot. That is to say, the estimated fault amplitude with the proposed compensator (39) has a smaller bias and a higher accuracy. Figure 3 shows that the actual trajectories track the desired ones within 3 s under the proposed scheme, and a faster convergence rate of MRR is obtained compared to that in [3]. Meanwhile, compared to Fig. 7, Fig. 3 shows that SMOFCC provides a faster convergence at the beginning of the system runs in fault free scenario. We can see from Fig. 4 that the tracking errors gradually decrease and reach steady state after 4 s, which indicates the above results more intuitively. Figures 5 and 9 illustrate the control inputs of two different control methods, respectively. We can see that the control input of joint 1 under the SMOFCC has a slight change after the actuator failure occurs at t = 30 s due to the online fault compensation. From above simulation results, we can see that the SMOFCC has a faster convergence rate, a smaller overshoot and a higher accuracy. That is because the SMOFCC scheme introduces the SMC technique which combines the  [49]. The proportion regulation improves the system convergence rate and reduces the system error, and derivation reduces system overshoot and tuning time. Besides, the FDI unit is removed since introducing the online fault estimation. Therefore, the fault diagnosis time is greatly reduced and a good fault tolerance performance can be obtained in spite of the actuator failure occurs. Furthermore, the closed-loop MRR system is asymptotically stable, rather than UUB. The system states can be recovered by the online fault compensation after the actuator failure occurs. Thus, the tracking performance under the designed SMOFCC is superior to that in [3]. In summary, the proposed control scheme performs a better tracking and   [3] fault tolerant performance for MRRs by introducing SM surface.

Conclusion
In this paper, we propose SMOFCC scheme which extends the ADP-based control with SMC technique to solve the  Control inputs under the optimal control in [3] FTC problem of MRRs with unknown actuator failures. The SMOFCC consists of the SM-based iterative controller, an adaptive robust term and an online fault compensator. Thus, the prior nominal controller that relies on the knowledge of accurate dynamic model can be relaxed based on the SMC technique. Hereafter, the stability of the closed-loop MRR system is ensured to be asymptotically stable, rather than UUB. Based on the online estimated actuator failures, the proposed SMOFCC scheme removes the FDI unit. Comparable simulation results show that the developed scheme can provide a faster convergence and a less overshoot than existing optimal control methods which were developed based on tracking error feedback only. In the future work, the approx-imated optimal FTC problems for MRRs with other fault scenarios and noises will be further considered.

Conflict of Interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/.