Adaptive dynamic programming enhanced admittance control for robots with environment interaction and actuator saturation

This paper focuses on the optimal tracking control problem for robot systems with environment interaction and actuator saturation. A control scheme combined with admittance adaptation and adaptive dynamic programming (ADP) is developed. The unknown environment is modelled as a linear system and admittance controller is derived to achieve compliant behaviour of the robot. In the ADP framework, the cost function is defined with non-quadratic form and the critic network is designed with radial basis function neural network which introduces to obtain an approximate optimal control of the Hamilton–Jacobi–Bellman equation, which guarantees the optimal trajectory tracking. The system stability is analysed by Lyapunov theorem and simulations demonstrate the effectiveness of the proposed strategy.


Introduction
In recent decades, robots are widely applied in industrial automation, such as assembling robots, handling robots, welding robots. They can not only cooperate with human partners for certain work, but also can complete some tasks independently, or even replace human beings to work in some hazard environment with high temperature, pressure and radiation. However, in some practical applications, robots will unavoidably interact with the external environment, which will not only affect execution of the work, but also directly threaten safety of human partners and robots themselves. Consequently, interaction control between robot and the environment has become an important research topic.
It's noted that there are two main approaches applied in current research in robotics to ensure the compliant behaviour, i.e., hybrid position/force control proposed by Raibert and Craig (1981) and impedance control proposed by Hogan (1981). The former requires decomposition in position and force subspaces and control law switching during implementation process. Since the dynamic coupling between the robot and external environment is not considered, the accuracy of this approach is difficult to be guaranteed. Comparatively, the latter establishes the relationship between the robot and environment, and achieve compliant behaviour by adjusting mechanical impedance to a target value in case interaction occurs, which guarantees the interaction safety. Impedance control has two execution methods according to the controller causality, i.e., impedance control and admittance control. For impedance control system, the external force imposed by the environment can be obtained by desired trajectory and impedance model, while for admittance control system, the modified motion trajectory can be derived from the measured interaction force and the expected admittance model. Therefore, we adopt admittance control to deal with robot-environment interaction problem.
The interaction force and admittance model are significant parts for admittance control. If interaction between the robot and environment occurs, the interaction force can be measured by the force sensors which are mounted at the end-effector of the robot. However, due to the complex environment, it's often very hard to obtain the desired admittance model which is critical for admittance control system. In addition, the fixed model can't satisfy requirements of all situations. Consequently, Braun et al. (2012) took humanrobot cooperation as an example and proposed that it was essential to adopt variable admittance model to improve system efficiency. For variable admittance control, iterative learning has been studied to derive the admittance parameters to adapt to unknown environment in robot intelligent control field. To complete a wall-following task, Cohen and Flash (1991) proposed an impedance learning strategy with an associate network. Tsuji et al. (1996) introduced neural network into impedance control to tuning the model parameters. But iterative learning approach requires the robot to perform the same task repeatedly, which is not available in some practical application execution. So researchers have adopted adaptation methods to solve this problem such as Love and Book (2004), Uemura andKawamura (2009), Stanisic andFernndez (2012), Landi et al. (2017) and Yao et al. (2018).
Tracking control is a very important research topic in robot intelligent control area. In the current studies, a lot of control methods have been employed to robot systems. Cervantes and Alvarez-Ramirez (2001) and Parra-Vega et al. (2003) applied the classic proportional-integralderivative(PID) control into the robot system with satisfied tracking performance. PID control is often used in the industrial field owing to the simple structure and good performance. But for complex systems, it is very difficult to choose appropriate PID parameters which normally depends on experiences of the operator. In recent years, neural network(NN) control has been investigated and applied to robot systems because of strong approximation property for unknown system ). In Zhang et al. (2018), NN control was employed to improve the tracking performance of the robot system with uncertainties. In Yang et al. (2019), NN-based controller combined with admittance adaptation was proposed to tackle the robot-environment interaction problem. However, these control methods only deal with stabilization problem of the system without considering optimal control. Based on the optimal control theory, we expect to find a control strategy that enables the system to reach the target in an optimal manner. To achieve this goal, it is usually required to minimize the specified cost function by solving the Hamilton-Jacobi-Bellman (HJB) equation. The HJB equation for a nonlinear system is nonlinear partial differential, so its analytical solution is non-trivial to derive. Dynamic programming proposed by Bellman (1957) provides a useful method for solving HJB equation. However, since this method is studied based on backward numerical process, it will be affected by well-known curse of dimensionality with the increase of system dimension. To overcome this problem, Werbos (1992) proposed adaptive dynamic programming (ADP) strategy using NN to approximate the cost function forward and then obtain the solution of HJB equation. During the past few years, great efforts have been made on ADP to deal with the control issues for nonlinear systems (Liu et al. 2014;Jiang and Jiang 2015), such as systems with dynamic uncertainties ) and disturbances (Cui et al. 2017).
In practical control system, actuator saturation is a common phenomenon, which may affect the system performance, or even result in system instability. Therefore, it is essential and challenging to derive optimal control strategy for nonlinear systems with actuator saturation. Wenzhi and Selmic (2006) proposed a NN-based and feed-forward saturation compensation strategy for nonlinear systems with Brunovsky canonical form. In Wen et al. (2011), the Nussbaum function was employed to compensate for the nonlinear term caused by the input saturation. To handle the control issue for nonlinear systems with unknown saturation, an auxiliary system in He et al. (2016) and Peng et al. (2020) was proposed to tackle the actuator saturation. And in Zhao et al. (2018) a control strategy consisted of an ADP-based nominal control and a NN-based compensator was proposed. In Abu-Khalaf and Lewis (2005), the HJB equation was in the form of a non-quadratic function and NN least-squares method was proposed to obtain the solution.
In Peng et al. (2020), robot-environment interaction and actuator saturation are considered, while optimal control is not. However, for robot systems, it's worthwhile to investigate how to realize tracking control in an optimal manner. Therefore, based on our previous work, optimal tracking control issue for robot systems with environment interaction and actuator saturation will be studied in this paper. Inspired by Abu-Khalaf and Lewis (2005), Lyshevski (1998) and Jiang and Jiang (2012), a control scheme based on admittance control and ADP method is employed to improve the control performance of robot systems. The main contributions of this paper are summarized as follows (i) To solve interaction problem, the unknown environment is regarded as a linear system and an admittance adaptation approach based on iterative linear quadratic regulator(LQR) is adopted to obtain the compliant behaviour of the robot.
(ii) To tackle the optimal tracking problem, an ADP-basd controller is designed. The cost function is defined with nonquadratic form. The critic network with RBFNN is developed to derive an approximate solution to the minimum cost of the HJB equation, and then the corresponding optimal control is obtained.
The rest of this paper is arranged as follows. In Sect. 2, the robot systems with actuator saturation and the environment dynamics are described, and the control objective is provided. In Sect. 3, the control strategy based on admittance adaptation and ADP-based optimal controller is proposed. In Sect. 4, simulation studies are performed on a 2-DOF planar manipulator. In Sect. 5, conclusion is drawn. The system stability is discussed and proved in Appendix.

Robot dynamics
The dynamics of n-link robot manipulator subjected to actuator saturation is described as: where q ∈ ℝ n , ̇q ∈ ℝ n , and q ∈ ℝ n denote the position, velocity and acceleration vectors in joint space of the robot, respectively. , and A denote the joint torque, admissible control set and constant saturation bound, respectively, and ∈ , = { ∈ ℝ n ∶ | i | ≤ A} is satisfied. For the sake of brevity, we use M, C and G to denote the known inertial matrix M(q) ∈ ℝ n×n , coriolis/centrifugal matrix C(q,̇q) ∈ ℝ n×n and gravity vector G(q) ∈ ℝ n , respectively.
If we define the reference trajectory as q r ∈ ℝ n , the tracking error q e ∈ ℝ n is given as q e = q − q r . Define the sliding motion surface as = q e +̇q e , where ∈ R n×n is a constant positive matrix, then we have According to (1) and (2), the error dynamics is derived as Consequently, we can obtain the following system: where f ∶ ℝ n → ℝ n and g ∶ ℝ n → ℝ n×n are non-linear functions and described as

Environment dynamics
In this paper, we consider the unknown interaction environment which is regarded as a damping-stiffness model in Ge et al. (2014) given by where C E and G E are unknown damping and stiffness of the environment, respectively. F represents the measured interaction force by the force sensor and x is the end-effector position of the robot in Cartesian space.
We define x d as the corresponding desired trajectory, and U d ∈ ℝ m×m is a known matrix, then x d is expressed as follows Consequently, define = [x, x d ] T , the dynamics of the environment and desired trajectory will be derived.
Therefore, (8) can be regarded as a linear system, where F is the control input and is the controlled state. F = −K e is the corresponding optimal feedback control law and the object is to minimize the cost function given as From (9), we can see that the purpose of modifying trajectory x d is to balance interaction force F and tracking error x e defined as x e = x − x d , which can be realized by adjusting user-defined matrices Q E1 and R E .
The robot dynamics with saturated actuator and unknown environment dynamics are described in this section. Then, an ADP enhanced admittance control scheme will be designed to ensure the complaint behaviour and optimal trajectory tracking with robot-environment interaction.

Control strategy
As shown in Fig.1, the designed control scheme inspired by Zhan et al. (2020) in this section consists of three parts, including an optimal trajectory modifier using admittance control to modify user-desired trajectory x d to modified trajectory x r , a closed loop inverse kinematics (CLIK) solver to transform x r in Cartesian space into q r in joint space and an optimal trajectory tracking controller based on ADP with the output torque acting on robot manipulator to ensure optimal tracking performance.

Trajectory modifier using admittance control
Formula (9) can be written as the following form by transformation, and the system counterpart is consistent with system (8).
It's noted that solving (10) can be regarded as the process similar to LQR problem. Then, the algebraic riccati equation (ARE) associated with (9) and (10) is given in (11). And in this subsection, an algorithm proposed by Jiang and Jiang (2012) is employed to solve the ARE and obtain the feedback gain K e in (11). Now, we list the matrices with sampled signals as follows where n, m, d denote the length of , F and the sample times integer, respectively. p ij , i represent entries of P and , respectively. In addition, in (12), ⊗ reprents the Kronecker product, and p ∈ ℝ Let ‖ * ‖ and vec( * ) denote the 2-norm of * and the column vectorization of * , respectively. Let k and I n ∈ ℝ n×n denote iteration index and an identity matrix, respectively. If the sampled data is large enough and the rank condition in (13) is satisfied, K e can be solved by iteratively calculate (14) until ||p (k) −p (k−1) || < , where is an acceptable range.
When we obtain the optimal feedback gain K e , the modified trajectory x r which is to be tracked and equal to x in (15) can be calculated by (16), where K e1 and K e2 are compatible matrices of K e .

CLIK solver
We adopt the CLIK algorithm proposed by Siciliano 1990 to transform reference trajectory x r in Cartesian space into q r in joint space. Let ( * ) and K f represent the forward kinematics and a positive user-defined matrix, respectively. Define e ∶= (q r ) − x r , ̇e = −K f e , ̇x = J coq , J co = (q)∕ q , then Integrating both sides of the above equation, q r can be obtained as follows Note that is used to prevent the singularity problem and it is also required to be small enough to promote the accuracy of the solution. (14)

Optimal control using ADP
The objective of this section is to find the stabilizing control input of the robot system (4) which could minimize the defined cost function. According to optimal theory, the optimal feedback control of system (4) can be obtained by solving HJB equation in ADP framework. The structure diagram of the ADP-based tracking controller is given in Fig. 2. We assume that system (4) is controllable, the nonlinear functions f ( ) and g( ) are Lipschitz continuous and differentiable in ℝ 2n . In order to deal with actuator saturation of the robot system, inspired by Abu-Khalaf and Lewis (2005) and Lyshevski (1998), we define the cost function as follows where It is noted that Q ∈ ℝ n×n in (20) is symmetric positive definite. In (21) is a strictly monotonic odd function and its first derivative is bounded by a constant B. Meanwhile, R is also a symmetric and positive definite matrix. Therefore, U( (s), ( (s))) is also positive definite. Without loss of generality, we select (⋅) = tanh(⋅) and R = rI n with r as a positive constant and I n as the identity matrix of n-dimension.
If J( (t)) defined in (19) is continuously differentiable, by taking the time derivative of (19), we can get the following nonlinear Lyapunov equation with J(0) = 0 , which is an infinitesimal form of (19).
where J( ) denotes J( (t)) and ∇ * ≜ * denotes the partial derivative of * for convenience. Therefore, the Hamiltonian function and optimal cost function are described as We can derive HJB equation as below Suppose that the minimum value on the right side of formula (25) exists and also is unique, then from H( , ( ),∇J * ( )) = 0 , we can obtain the optimal control * ( ) as Substituting (26) into (22), another HJB equation form related to ∇J * ( ) will be derived as Then, from (26) and (27) (28), (28) can be rewritten as follows However, (30) is a nonlinear partial differential equation with regard to J * ( ) and it's very difficult to obtain J * ( ) from it, even impossible.
Suppose J * ( ) is continuously differentiable, it can be constructed by RBFNN and described by where w ∈ ℝ l and S ∶ ℝ 2n → ℝ l represent the ideal constant weight and the activation function, respectively. l and ( ) denote the node number of the hidden layer and the unknown approximation error of the critic NN, respectively. Consequently, we can obtain the derivation of (31) refer to as follows.
To train the critic NN, inspired by Liu et al. (2017) and Yang et al. (2013), a suitable weight updating law ŵ is designed, which can minimize the objective function E c = 1 2 E 2 H and also ensure that ŵ converges to w.
Here, V s ( ) is a polynomial with regard to the state variable , which can be appropriately selected, such as V s ( ) = 1 2 T k .

Remark 1
The ̇ŵ in (45) composes of two parts: the first term is based on the standard gradient descent algorithm and the others are introduced to ensure the stability of the robot system during the critic NN learning process. Note that in (46), if (∇V s ( )) T (f ( ) − Ag( ) tanh(B 2 ( ))) ≥ 0 , the system is unstable, then h = 1 and the second term in (46) will be activated, which improves the learning process. Therefore, the initial stabilizing control requirement will be released. (40) and (45), we can see that if x = 0 and f (x) = 0 , then Ĥ ( ,̂( ), ∇Ĵ( )) = 0 . If F 2 = F 1 T , then we have ̇ŵ = 0 and the critic NN will not be updated and the optimal control may not be obtained. Consequently, the persistence excitation is required.

Stability analysis
We will discuss the system stability of the robot and give detailed proof that the estimated weight error w and the system state are ultimately uniformly bounded. Now we give the necessary assumption as follows: Assumption There exist known positive constants w m , (4) is bounded over a compact set , i.e., there exist positive constants g m and g M such that g m ≤ ‖g( )‖ ≤ g M .
Theorem Considering the robot system (1) referring to actuator saturation, the corresponding HJB equation (30) and Assumption, if the control law is designed as (39) and the critic NN weight update according to (45), it can be concluded that the critic NN weight approximation error w and the state are guaranteed to be ultimately uniformly bounded (UUB).
Proof see the Appendix. ◻ 4 Simulation study

Simulation settings
A two-link manipulator, constructed by the robotics toolbox in Corke (2017) and shown in Fig. 3, is employed to verify the proposed control strategy, whose dynamic parameters are given in Table 1. The simulation runs on the Matlab 2018a software with an ode3 solver and the fixed time step is 0.01s. The robot manipulator is required to track a reference trajectory and interact simultaneously with a virtual environment govern by where C E = 0.1 , G E = 1.0 , x 0 denotes the contour of an object and F denotes the reactive force due to the penetration into the object. For simplicity and generality, only the trajectory along x-axis is modified and disturbed by the external interaction forces. Parameters of the proposed control scheme are selected as follows: for the "Optimal Trajectory Modifier" block in Fig. 1, in (10

Results analysis
The control performance is shown in Fig. 4, from which we can see that at the beginning of the control, there exists a large transient error since the weights of the RBFNN have not converged. However, before the trajectory starts to be modified at t = 4.2 s , the tracking error has reduced to an acceptable range. Subsequently, the actual trajectory gradually converges to the desired trajectory. Fig. 5 gives the control signals during the control process. In this figure, we can clearly see that the control input stays within the limits of the actuator and weights of the RBFNN eventually converge to constant values. These observations demonstrate the effectiveness of the ADP-based controller under the saturation effect.
To show the effectiveness of the optimal admittance adaptation control, the control performance under two different feedback gain K e that affects the trajectory modification in (16) is compared, wherein K opt e is obtained by assuming that the dynamic parameters of the environment in (6) are exactly known, while K pro e is calculated by the algorithm presented in (14). Note that unlike the virtual environment used in (48), environmental dynamics in (6) adopted for theoretical design does not take the contour of the environment x 0 into consideration. Thus, K opt e is sub-optimal. The results are shown in Fig. 6. We can notice that both the tracking error and value of the cost function in (9) under K pro e are smaller than those under K opt e , which shows the superiority of the proposed method when dynamics of the environment are unknown.

Conclusion
In this paper, the optimal tracking control issue for robot systems with environment interaction and actuator saturation is addressed. An ADP-based controller enhanced admittance adaptation control scheme is developed. The unknown environment is considered as a linear system and admittance adaptation control ensures the complaint behaviour of the robot. In ADP-based controller, to guarantee the optimal tracking performance, RBFNN is used to approach to the minimum cost function and make an optimal control of the HJB equation. The system stability is analysed and the simulation studies are performed to demonstrate the effectiveness of this control scheme.
Other input constraints such as dead zones and hysteresis, and dynamic uncertainties are also very common in actual robotic systems. These constraints will not only reduce the system performance, but also affect the system stability. Consequently, under ADP framework, the optimal control with other constraints and dynamic uncertainties will be considered in our future work.

Stability analysis
This appendix demonstrates the stability of the ADPbased controller proposed in this paper for robot systems with actuator saturation. The Lyapunov candidate is selected as follows (Liu et al. 2017) From (49) and (39), the derivative of V( ) can be derived as Next we will calculate the last term in (50). Note that From (40) and (51)  where T( ) = sgn(B 2 ( )) − tanh(B 2 ( )).
where min ( * ) denotes the minimum eigenvalue of matrix * and b m is the upper bound of ‖W 2 ‖. Case One: h = 0 , that is (∇V s ( )) T (f ( ) − Ag( ) tanh(B 2 ( ))) < 0 . Since ‖ ‖ > 0 , then there exist a constant a s such that 0 < a s ≤ ‖̇‖ implies ∇V s ( )) Ṫ≤ −a s ‖∇V s ( )‖ . Consequently, we can obtain From (61), we can see that if one of the following conditions is satisfied, then V ( ) < 0 will be obtained. Case Two: h = 1 , that is (∇V s ( )) T (f ( ) − Ag( ) tanh(B 2 ( ))) ≥ 0 , then (60) becomes From (68), we can see that if one of the following conditions is satisfied, then V ( ) < 0 will be obtained. According to the Lyapunov theorem and combining Case One and Case Two, it's concluded that the NN weight approximation error w and function V s ( ) are UUB. Since V s ( ) is a selected polynomial with regard to , we can concluded that the state is also UUB. This completes the stability analysis.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.