Adaptive dynamic programming-based feature tracking control of visual servoing manipulators with unknown dynamics

This paper investigates a feature tracking control method for visual servoing (VS) manipulators adaptive dynamic programming (ADP)-based the unknown dynamics. The major superiority of ADP-based optimal control lies in that the visual tracking problem is converted to the feature tracking error control with optimal cost function. Moreover, an adaptive neural network observer is developed to approximate the entire uncertainties, which are utilized to construct an improved cost function. By establishing a critic neural network, the Hamilton–Jacobi–Bellman (HJB) equation is solved, and the approximate optimal error control policy is derived. The closed-loop VS manipulator system is verified to be ultimately uniformly bounded with the developed ADP-based feature tracking control strategy according to the Lyapunov theory. Finally, simulation results under various situations demonstrate that the proposed method achieves higher tracking accuracy than other methods, as well as satisfies energy optimal requirements.


Introduction
To comply with the requirements of modern manufacturing for efficiency, visualization and wireless communication [1][2][3], manipulators equipped with different sensors are competent to adapt extreme ambient conditions by means of the non-contact detection. Manipulators with visual sensors simulate human vision which allows the feedback controller to be measured in non-contact positions and directions. At present, visual servoing (VS) manipulators have a wide application potential in many scenarios such as disaster rescue, medical detection, space exploration, etc [4][5][6]. It is well known that the tracking control strategy is demanded to provide the more precision and less consumption of the VS systems especially B Xiaolin Ren xlren1985@ccut.edu.cn B Hongwen Li lihongwen@ciomp.ac.cn 1 Generally, the goal of the feature trajectory tracking control is to drive the system outputs track specified desired trajectories in the VS system. Hence, visual feedback signals have been used as significant information in robotics to tackle the positioning or motion control in unstructured environments. The visual tracking problem for manipulators has been studied over the past few years and a wide range of technologies have been explored.
Kang et al. [7] adopted a reinforcement learning method to adaptively adjust the servoing gain to improve the convergence rate and stability. Li et al. [8] combined proportional derivative (PD) control with sliding mode control (SMC) to tackle the disturbance and uncertainties on a 6-degree-offreedom (DOF) VS manipulator. Sharma et al. [9] proposed a fractional order SMC method to drive vehicle motion using visual information of image plane. Furthermore, considering the uncertainties, they designed a new adaptive rule to adjust the sliding surface parameters to ensure the finite time stability of the system. Qiu et al. [10] presented a depth-independent interaction matrix based on model predictive control method by taking the input and output constraints into account.
Unlike the above kinematic VS control problems, the dynamic VS control problems can be solved by establishing the composite Jacobian matrix mapping from the image space to the robot joint space which is usually adopted. Based on the obtained system parameters, an effective controller can be designed. For example, Wang et al. [11] presented a new adaptive algorithm based on the estimated image depth, and proved the global stability by Lyapunov method. Li et al. [12] addressed an effective controller design problem for an uncalibrated camera-manipulator system to ensure the finitetime convergence. In addition, some scholars have also paid attention to the large measurement error caused by external interference or system modeling deviation. Based on the parameter uncertainties in manipulator kinematics and dynamics, Cheah et al. [13] proposed an adaptive regression strategy to estimate the Jacobian matrix adaptively. Hua et al. [14] adopted immersion and invariance observer to identify an uncalibrated VS system without measuring the joint velocity on the basis of depth independent matrix. Wang et al. [15] designed a new nonlinear observer to dynamically track the motion of a target in Cartesian space. Wang et al. [16] proposed a novel adaptive observer controller which employed the feature velocity term contained in the unknown kinematics. The superiority of the proposed image-space observer lied in its simple structure in handling uncertainties; thus, it avoided the over parametrization in the existing works. Leite et al. [17] developed a cascade control strategy based on an indirect/direct adaptive method, which introduced uncertainties in robot kinematics and dynamics of visual tracking problem. Considered the output nonlinearity and unknown dynamics, Wang et al. [18] investigated an adaptive neural network control for the VS manipulator system whose dynamic model is not required to be linearly decomposable. Zhang et al. [19] developed an adaptive neural network controller with the Barrier Lyapunov Function (BLF) to overcome nonlinearities and visibility constraint problems.
In addition, the key points of optimal control problems of nonlinear systems are to design a suitable controller tackling input/output constraints, external disturbance, uncertainties, etc [20][21][22][23]. During the past few years, adaptive dynamic programming (ADP) algorithm which is proposed by Werbos [24] has extensively developed the optimal control schemes for robots or manipulators to enhance the control performance and reduce the energy consumption of the controller [25]. Kong et al. [26] introduced an approximate optimal strategy to resolve the non-linearity saturation problem of n-DOF manipulators. In [27], an adaptive fuzzy neural network control method with impedance learning was presented for robots with constraints. In [28], the optimal coordination control which was applied to multi-robots to follow expected trajectories was presented by means of reinforcement learning. Tang et al. [29] employed a reinforcement learning-based adaptive optimal control method to realize the optimal tracking of n-DOF manipulators. Li et al. [30] established the nonlinear discrete-time dynamic model of wheeled mobile robots, where the reinforcement learning and ADP method were adopted to tackle the tracking problem for systems with skidding and slipping constraints. In [31], an artificial potential field scheme cooperated with ADP method was proposed for path planning of bio-mimetic robot fish, where heuristic learning programming was applied to obtain the position and angle. Lian et al. [32] presented a receding-horizon dual heuristic programming algorithm for tracking control of wheeled mobile robots, and developed a backstepping kinematic controller. Zhan et al. [33] proposed an ADP-based control approach to deal with tracking problem for robots with environment interactions. Li et al. [34] proposed a policy iteration-based fault compensation control for modular reconfigurable robots subject to actuator failures. Zhao et al. [35] developed an event-triggered ADP algorithm of decentralized tracking control which can reduce communication frequency and extend the service life of mechanical and electronic devices. Dong et al. [36] designed a novel force/position control scheme based on zero-sum optimal ADP decentralized control strategy for decentralized system by considering the influence of unknown and interconnected dynamics for reconfigurable manipulators. The application of ADP-based optimal control in the field of robotics has progressed in the recent years, the optimal feature tracking control for VS system which is expected in practical systems is still an open problem.

Motivation and contribution
In recent few years, many kinematics-based visual controllers have been proposed by assuming that the VS manipulator has an accurate positioning device with negligible dynamics [37][38][39]. However, in the control perspective, it is difficult to ensure the dynamic performance and stable control when neglecting the nonlinearities in kinematic control due to the existence of both parameter uncertainties in robot dynamics and errors in camera calibration. Therefore, the controller design is a challenging task with considering the influence of control error and system stability, especially in the robot positioning or trajectory tracking control [12,40]. Unfortunately, there is few discussion on the difficulties of VS manipulator systems, especially with uncertainties of intrinsic parameters, camera calibration errors, external disturbance, friction, etc. From the aforementioned literature, we conclude that the difficulties in designing controllers lie in how to handle unmodeled dynamics and external disturbance without linearity-in-parameters. Furthermore, it is expected to optimize the performance index in VS manipulator systems.
Inspired by the above literature, this paper investigates a feature tracking controller based on ADP scheme for VS manipulators subject to unknown dynamics by taking optimal performance index into account. Based on the radial basis function (RBF) estimated uncertain dynamics of the VS manipulator model, an adaptive NN observer is proposed to identify the uncertainties (e.g., unmodeled dynamics, external disturbance, joint friction, etc) in real time. The cost function is improved by inserting the estimated uncertainties, and the visual tracking problem is converted to the feature error control. Then, the optimal feature tracking control is derived directly. Therefore, the stability of VS manipulator systems is guaranteed by utilizing Lyapunov stability theorem. Finally, in order to show the robustness and effectiveness of the designed controller, a 3-DOF (Degree of Freedom) eye-to-hand (ETH) manipulator is employed to simulation.
The main contributions of the presented scheme can be summarized as follows.
1. In this paper, the proposed feature tracking control strategy, which directly acts on image feature, is facilitated more feasibility and intuitively. Thus, the designing controller based on the camera-manipulator model does not need to obtain regression matrix and avoids complicated calculation. 2. It is the first time to develop the ADP technique to featurebased visual tracking control for VS manipulator systems with unknown dynamics. Unlike the existing visual tracking control approaches, the critic NN-based controller is designed in an optimal manner, which saves the energy cost and is significant in practice. 3. The major advantage of the improved cost function lies in that the estimated uncertainties is introduced and given full consideration in controller design. Simultaneously, the closed-loop VS manipulator system can be guaranteed to be ultimately uniformly bounded (UUB) using the proposed ADP control scheme.
The remainder of this paper is organized as follows. In "Preliminaries and problem statement", the basic preliminaries and dynamic model are presented. In "ADP-based feature tracking controller", the unknown dynamics of VS manipulator systems is approximated by an adaptive NN observer, and the optimal error controller is designed in detail. Then, the stability is analyzed. In "Simulation tests", simulation examples are provided to illustrate the effectiveness of the proposed control scheme. Finally, a brief conclusion is given in "Conclusion".

Camera-robot kinematics model
In this paper, the ETH structure is selected for the VS system that is shown in Fig. 1, and a n-DOF VS manipulator is employed to construct the forward kinematics. Denote the image coordinate of feature point as f uv = [u, v] T . The mapping from feature point to robot position [14] can be expressed as where r (t) ∈ R 3 is the Cartesian coordinate of robot endeffector with respect to the base frame, D epth (t) ∈ R is the depth of feature point in the camera frame, M c ∈ R 3×4 is the perspective projection matrix which can be expressed as where M in ∈ R 3×4 is the intrinsic matrix of the camera, M ex ∈ R 4×4 is the homogenous transformation matrix computed via forward kinematics, which also represents the extrinsic matrix. By separating f uv (t) from (1), we can obtain where M sub ∈ R 2×3 is the sub-matrix of perspective projection matrix M c , which is given by where m i j is the ijth component of M c . The depth of the feature point can be given by where M D = [m 31 , m 32 , m 33 ] T . Assume D epth (t) be a positive and bounded constant; i.e., By differentiating (2) and (3), one obtainṡ where J I ∈ R 2s×3 is the feature Jacobian matrix (or interaction matrix), s is the number of feature points. Let q(t) ∈ R n be the joint angle vector. From the robot kinematics, the velocity relationship of joint space to Cartesian space can be expressed aṡ where J R ∈ R 3×n is the robot Jacobian matrix. Combining (4) with (5), we obtaiṅ where J com ∈ R 2s×n denotes the compound Jacobian matrix. We can rewrite (6) aṡ where J + com = (J T com J com ) −1 J T com is the pseudo-inverse of the compound Jacobian matrix. In practice, the manipulator is required to perform a servoing task in a reachable finite task-space [41]. To avoid the Jacobian matrix singularity, M c should be full rank. Hence, J com is full rank and its pseudoinverse matrix exists, whose detailed illustration can be found in [42,43].
By differentiating (7), the acceleration of joint angle q(t) is formulated as

Camera-manipulator dynamic model
Considering a general n link manipulator, whose dynamic model can be mathematically formulated as where N (q) ∈ R n×n is the inertia matrix, B(q,q) ∈ R n×n is the centrifugal and coriolis force, G(q) ∈ R n is the gravitational term, A(q) ∈ R n denotes the friction term, F d ∈ R n indicates the external disturbance, and τ ∈ R n denotes the output torque. Combining (7)-(9), the dynamics of VS manipulators can be expressed as Multiplying the term (J + com ) T on both sides of (10), the dynamics of the VS manipulators is expressed in the workspace as where Due to the uncertain kinematic parameters and unmodeled dynamics, the actual parameters of system can be decomposed into the nominal part and uncertainties, so (11) can be written as By separating the uncertainties from the dynamics of camera-manipulator model, (12) can be reformulated as where the uncertainties D( f uv ) is given as Before designing and analyzing the optimal feature tracking controller, the camera-manipulator dynamic system (13) is supposed to satisfy the following properties.

Property 1
The inertia matrix N o ( f uv ) is symmetric, positive scalar, and satisfies where λ 1 and λ 2 are positive constants.

Property 2
The time-derivative of the inertia matrix N o ( f uv ) and centripetal and coriolis matrix B o ( f uv ,ḟ uv ) satisfies the skew-symmetric relationship as

Assumption 1 The friction torque
From the properties above, the VS manipulator systems can be rewritten to facilitate the ADP design. By transforming the dynamic model (13), the state space expression of VS system is proposed as is the system state vector, y is the output vector, k(x) and g(x) can be defined as Assumption 3 k(x) and g(x) are locally Lipschitz and continuous in their arguments with k(x) = 0. (13), the input and output of dynamic model are mapped from the direct form (i.e., τ o → f uv ,ḟ uv ,f uv ) to the indirect form (i.e., τ →q,q →ṙ → f uv , see (4), (6) and (9)). Moreover, a camera-manipulaotr dynamic model is established by taking the uncertainties D( f uv ) into account. In this way, linearity in parameter features cannot be employed in the VS manipulator systems. In this paper, the ADP-based control approach is presented to solve the feature tracking control problem of VS manipulator systems with uncertainties. This implies that the proposed scheme guarantees the closed-loop VS manipulator systems to converge to zero, i.e., the actual trajectories can follow their desired trajectories.

Optimal visual control
As we know, the aim of the optimal feature tracking control is to design an effective tracking control policy which follows the desired feature trajectory. To achieve this objective, the feature tracking control can be obtained by combining the desired visual tracking control and feature tracking error control.

Assumption 4
The desired feature trajectory f uv d , the desired feature velocityḟ uv d and the desired feature accelerationf uv d are all bounded and known. Letting the desired feature trajectory can be described aṡ where τ f uv d denotes the desired control torque. Then, the desired visual tracking controller can be obtained by From the state space expression of system (15), the feature tracking error dynamics can be expressed by where e f indicates the feature error andė f denotes the time derivative of e f . For the state space expression of cameramanipulator system (15), the optimal objective is to derive the control law by minimizing the following infinite horizon cost function where P (e f , τ fe ) = e T f Qe f +τ T fe Rτ fe denotes the utility function, P (e f , τ fe ) ≥ 0 with P (0, 0) = 0, τ f e = τ o − τ f uv d is the optimal control input error, Q ∈ R 2s×2s and R ∈ R 2s×2s are the positive definite matrices,D(t) is the estimation of uncertainties, and α > 0 is an unknown constant.
Definition 1 [44] For the dynamic system (15), a control policies τ fe is admissible with respect to the cost function (19) on a compact set Ω , if τ fe is continuous on Ω with τ fe (0) = 0, τ fe stabilizes on Ω, and U (e f ) is finite ∀e f ∈ Ω.
Given a series of admissible control policies τ fe ∈ Ξ(Ω), then the infinitesimal version of (19) is the the so-called Lyapunov equation as where U (0) = 0 and ∇U (e f ) = ∂U (e f ) ∂e f is the partial derivative of U (e f ) with respect to e f . The Hamiltonian and the improved cost function can be given by Thus, the solution of the HJB equation can be obtained by where ∇U * (e f ) = ∂U * (e f ) ∂e f . If U * (e f ) is continuously differentiable, the optimal feature tracking error controller of the VS system will be derived as According to (21) and (23), we can obtain

Adaptive neural network observer design
The uncertainties are estimated by an adaptive NN observer, which can be formulated bẏ where Γ (x,x fo uv ) = k e (x,x fo uv ) + g e (x,x fo uv )(τ o − D(x)), k e (x,x fo uv ) = k(x) − k(x fo uv ) and g e (x,x fo uv ) = g(x) − g(x fo uv ) are the observation errors of k(x) and g(x), respectively. According to Assumption 3, ε is a positive constant such that ||g(x f ouv )|| = ε.
To estimate the uncertainties D(x) , RBFNN is constructed as where W D ∈ R l 1 ×2s denotes the ideal weight matrix, Φ (x) ∈ R l 1 denotes the NN activation function, l 1 indicates the number of neurons in the hidden layer, and χ D indicates the NN approximation error. LetŴ D be the estimation of W D .D(x fo uv ) is the estimation of D(x) , which can be expressed aŝ whereŴ D can be updated bẏ where μ is a positive definite matrix. From (28) and (29), one obtains whereW D = W D −Ŵ D is the weight estimation error, Φ x,x fo uv = Φ (x) −Φ x fo uv is the estimation error of the activation function.

Assumption 6
The local observation error W e = W T DΦ (x, x fo uv + χ D is norm-bounded as W e ≤ ω 2 , where ω 2 > 0 is an unknown constant.

Theorem 1 For the VS manipulator systems with uncertainties, the proposed adaptive NN observer can ensure the observation error to be UUB with the help of the NN updating law (30).
Proof Choose a Lyapunov function candidate as The time derivative of (32) iṡ where λ min (β) is the minimum eigenvalue of the matrix. Thus, substituting (29) into (33), we havė It can be seen thatL 1 ≤ 0 when O e lies outside the com- According to the Lyapunov ' s direct method, the state observation error can be guaranteed to be UUB. This concludes the proof.

Critic NN and implementation
As an excellent learning tool of nonlinear functions, NN is widely considered to approximate the cost function (22). Thereby, the improved cost function can be expressed by a critic NN on the compact set Ω 2 , which is given by where W U ∈ R l 2 denotes the ideal weight matrix, Φ U (e f ) ∈ R l 2 indicates the NN basis function, l 2 denotes the number of neurons in the hidden layer, and χ U is the NN approximation error. The partial derivative of T U (e f ) with respect to e f is where and ∇χ U are the partial derivatives of the basis function Φ U (e f ) and the NN approximation error χ U , respectively. A critic NN is utilized to approximate the improved cost function aŝ Thus, the partial derivative ofT U (e f ) with respect to e f is Considering (23), the ideal optimal feature tracking error control policy can be described by Thus, according to (37) and (38), the approximation optimal feature tracking error control can be given bŷ For the uncertain system (15), considering (20) and (36), one can obtain Therefore, the Hamiltonian can be expressed by where E U H = −(∇χ U ) Tė f is the approximation residual. And the approximate Hamiltonian is derived in the same manner, which is expressed aŝ Defining the error function as whereW U = W U −Ŵ U is the weight estimation error.

Assumption 7
The NN function δ = ∇Φ U (e f )ė f is normbounded as δ ≤ δ e , where δ e is a positive constant.
To adjust the critic NN weight vectorŴ U , we can minimize the objective function E obj = 1 2 E T U E U with the updating law as where μ U is the learning rate of the critic NN. Hence, considering (44) and (45), one can obtain the updating law of weight estimation error aṡ Theorem 2 For the uncertain camera-manipulator system, the weight vector approximation error of the critic NN can be guaranteed to be UUB with the updating law (45).
Proof Choose a Lyapunov function candidate as The time derivative of (47) iṡ According to Young's inequality, we can obtaiṅ Therefore,Ż 2 < 0 when the weight approximation error W U lies outside the compact set Thus, the weight approximation error can be guaranteed to be UUB. This concludes the proof.

Stability analysis
Unlike existing visual tracking control methods which neglected the optimal control performance, this paper improves the cost function with the information from an adaptive NN observer. Furthermore, via the ADP approach, we develop a novel optimal feature tracking error control method that optimizes the control performance and ensures the system stability.
The optimal feature tracking controller which composes of the desired tracking controller τ fd and feature tracking error controller τ fe is derived by (15) and improved cost function (19), the closed-loop VS system is UUB under the optimal tracking control policy (50).

Theorem 3 Consider system dynamics of VS manipulator
Proof Choose a Lyapunov function candidate as Considering (15), (16) and (24), the time derivative of (51) is expressed aṡ where Assuming τ fd ≤ ζ 1 and D(x) −D(x) ≤ ζ 2 , where ζ 1 and ζ 2 are positive constant. We havė Therefore, it can be seen thatŻ 3 < 0 when e f lies outside with the following conditions hold.

Simulation tests
In this section, we employ a 3-DOF humanoid manipulator with one feature point marked on the end-effector for simulation tests [45,46]. The performance of the proposed ADP-based feature tracking control are implemented in two cases, i.e., without/with uncertainties. The 3-DOF manipulator system and the control parameters are presented in Tables 1 and 2 Define the desired feature trajectories as f d (2) = 65 + 20 * cos(t).
In the adaptive NN observer, Gaussian type function is selected as the activation function, the center of the basis   Fig. 2. It is observed that the actual feature trajectories can follow their desired ones with different initial feature points. The image tracking errors are displayed to illustrate the visual tracking performance intuitively in Fig. 3. We can see that the desired trajectory with the initial point f 2 has a faster convergence rate than others. It is obvious that velocity trajectory curves of the VS manipulator systems are smooth and continuous except a slight oscillation at the beginning as shown in Fig. 4. Feature curves on the image plane are described in Fig. 5. From the feature tracking trajectories and their error curves, the VS system is performed to be asymptotically stable.

Case 2: VS system with uncertainties
To test the robustness of our proposed method, we consider a simple servoing task by introducing different uncertainties.    To further exhibit the performance of the proposed ADPbased controller, the comparison results with adaptive neural network (ANN) scheme [47] and adaptive sliding mode control (ASMC) scheme [48] are also provided. The uncertainty is set as constant vector   Fig. 12. It can be seen the error curve of the ASMC scheme has the fastest convergence rate in spite of an obvious fluctuation in Fig.  12b. Furthermore, the error curve of the ANN scheme has a small overshoot compared to our method. It is obvious that the responses of feature tracking exhibit no oscillations and overshoot, and smooth transient performance in Fig. 12c. The comparison of image feature position of three methods on the image plane are shown in Fig. 13. We can observe that the results of tracking in a complete circle period, which validates the accurate of our method. To quantize the track-ing accuracy, three performance index functions, where E max denotes maximum value of absolute of image feature error, E min denotes minimum value of absolute of image feature error and MSE denotes mean-square-error (MSE) measure of image feature error, are defined as  After 2 s, the numerical quantitative comparison results of the proposed ADP-based scheme, ANN scheme and ASMC scheme are listed in Table 3. The significance underline points out the minimum value of the row in Table 3. It is clearly indicated that the proposed ADP-based scheme has a more precise tracking accuracy in contrast to other two methods.
It concludes that the proposed scheme can fulfill prominent tracking tasks by considering uncertainties.

Conclusion
In this paper, a feature tracking control scheme for VS manipulator with uncertainties based on ADP has been proposed.
Under the effective estimation of uncertainties based on the adaptive NN observer, an improved cost function is designed to account for the influence of system uncertainties. The improved HJB equation is solved by a critic neural network, and the approximated optimal feature tracking error controller can be derived directly. Thus, the feature tracking controller is obtained by combining the optimal feature error controller and the desired controller. Moreover, the VS system is guaranteed to be UUB based on Lyapunov stability analysis. Simulation results illustrate the effectiveness of the proposed feature tracking control scheme. It is shown that the proposed controller is capable of controlling the VS manipulator which are regarded as highly nonlinear dynamic systems successfully.
In this study, we investigate the visual servoing control problem for manipulators subject to the unknown dynamics with energy cost optimization. In our future work, the dynamic control of manipulators with time delay, uncertainties in Jacobian matrix, and depth information, as well as VS control with image processing are potential research topics. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/.