Dynamic Movement Primitives Based Robot Skills Learning

In this article, a robot skills learning framework is developed, which considers both motion modeling and execution. In order to enable the robot to learn skills from demonstrations, a learning method called dynamic movement primitives (DMPs) is introduced to model motion. A staged teaching strategy is integrated into DMPs frameworks to enhance the generality such that the complicated tasks can be also performed for multi-joint manipulators. The DMP connection method is used to make an accurate and smooth transition in position and velocity space to connect complex motion sequences. In addition, motions are categorized into different goals and durations. It is worth mentioning that an adaptive neural networks (NNs) control method is proposed to achieve highly accurate trajectory tracking and to ensure the performance of action execution, which is beneficial to the improvement of reliability of the skills learning system. The experiment test on the Baxter robot verifies the effectiveness of the proposed method.


Introduction
With the development of technology, robots are gradually being widely applied in manufacturing services [1−7] . Motion planning is important for a wide range of robotics research and applications [8−11] . At present, the professional knowledge required in robotic motion planning sets a high threshold for users, which greatly limits the further applications of robots [12,13] . Learning from demonstrations (LfD) is becoming a key instrument in improving the learning performance of robots [14−16] . In other words, robots can imitate motions from human tutors after learning actions from demonstrations. However, there still exist some problems to be solved. One major problem is how to learn a complex task with long sequences of motion. Besides, there is an urgent need to enhance the learning ability of robots to deal with the uncertainty existing in the robotic model.
An intelligent robot should not only repeat the behaviour of a demonstration, but also apply it to a new situ-ation. Therefore, an effective motion model is requisite for motion learning and generalization. Hidden Markov models (HMMs) [17] , Gaussian mixture models (GMMs) [18] , and dynamic movement primitives (DMPs) [19] are widely used methodologies for LfD. Recently, more focus has been given to DMPs because of their strength, such as: 1) guaranteed global stability; 2) efficient learning and generalization ability; 3) the ability to scale in spatial and temporal; and 4) the ability to incorporate coupling terms [19−20] . DMPs describe sequences of a particular movement by modeling attractive behaviors of autonomous nonlinear dynamic systems (DS) by combining statistical learning methods. The essence of DMPs is a highpriority dynamic system (spring-damper system) modulated by a learnable autonomous forcing term [21] . Consequently, global stability is ensured, and smooth trajectories can be generated. These flexible movement primitives have been applied to a large number of robotic applications, e.g., peg-in-hole tasks [22] , gluing tasks [23] , etc. Other studies show that skills can be learned from a single demonstration or multiple demonstrations [24,25] . In addition, reinforcement learning is employed to perform high-dimensional robotic tasks [26] . Up to now, DMPs have become a powerful algorithm in LfD and show great potential in robotic applications.
Humans are capable of implementing complex move-ment sequences naturally and coherently. Most of the research has focused on optimizing the DMP mechanism or coupling external information, such as force sensors and cameras. Few discussions about joining several movement sequences are reported [27] . The teaching procedure for multistage movement sequences on multi-joint manipulators is more complicated. For some specific tasks, we have to split the demonstration into several phases and demonstrate independently, which further indicates the importance of joining movement sequences. For the original DMP system, the main disadvantage is that the velocity at the end of the generated trajectory is close to zero. This drawback will cause incoherence between movements, and the speed difference may affect the stability of the manipulator. In [28], partial contraction theory [29] is coupled to make the DMP systems transit smoothly. However, this approach has defects in accuracy when reproducing trajectories.
n Previous research has established that neural networks (NNs) have excellent capabilities in function approximation [30−32] and optimization [33−36] . Thus, NNs have been widely used to deal with the uncertainties in robotic dynamics. In [37], adaptive neural networks are proposed to effectively handle the system uncertainties and input saturation. In [38], the trajectory tracking problem of an -link manipulator is solved by an adaptive neural network control combined with a radial basis function (RBF). In [39,40], a neural networks-based adaptive control is proposed, which effectively removes the adverse effect of unknown models in robotic manipulators.
In this paper, based on the modification of the original DMP formulation, in order to make robots accurately learn skills from demonstrations, we introduce a new joining method reproducing the target trajectory with high accuracy regarding both the position and the velocity profile and produces smooth and natural transitions in position space, as well as in velocity space. Although the method developed in [24] allows the robot to learn skills from multiple demonstrations, it does not address the problem of DMP joining, which implies that the tasks performed by the DMPs in [24] are relatively simple. We apply the modified DMPs to very complex trajectories and address the problem of dynamically joining them using overlapping kernels, which is different from the method proposed in [24]. Moreover, compared with [27], we integrate neural network approximation into the DMPsbased trajectory tracking control such that robots can be controlled highly accurately to enhance the reliability of the robot skill learning framework in some complex tasks. Based on the above consideration, the main works in the article are summarized as follows: 1) To take advantage of the performance of DS, a new joining method is introduced based on the original DMP formulation and can produce a smooth target trajectory with high accuracy in position and velocity profile.
2) A neural network-based learning method is proposed to approximate unknown models, which improves the tracking accuracy and the imitation performance of the closed-loop system.

Basic model of motion generation
This section briefly introduces a method to generate motion through DMPs and presents a modification of joining several DMPs to carry out one DMP, which makes the transition smooth and natural.

Modification of original DMPs
According to the brain activation theory, human motion skills are divided into periodic and point-to-point movements. Any discrete or rhythmic movement can be encoded using DMPs. In this paper, we utilize discrete DMPs to generate point-to-point movements.
The DMPs have the ability to encode each degree of freedom (DOF) separately. The motions of a robotic arm with seven DOFs in joint space can be regarded as seven one-dimensional trajectories, and each trajectory can be encoded by one DMP model individually. Here, we only consider the problem of motion generation in joint space. The original DMP system is integrated by a set of differential equations, which includes the canonical system and the transformation system given as where , , and denote the position, velocity, and acceleration, respectively, is the temporal scaling factor, and are constants, is a known goal position, is a nonlinear function which is continuous and bounded, and is considered as a delayed goal function.
f The function is given as are the Gaussian basis functions chosen as where and are the width and center of the -th Gaussian kernel, respectively, and and are weights and number of kernels, respectively. serves as the scaling term, which is able to guarantee f appropriate scaling when changing the start position or the end position. The function is not related to time, and it depends on the canonical system described as τv = −αev (6) αe v where is a pre-defined positive constant and is the phase variable considered as the exponential decay term.
The nonlinear system (2) is similar to a spring-damper system, and we choose and to guarantee that the system becomes critically damped. Apparently, when the state of the canonical system converges to zero, the whole DMP model is stable and the state of the transformation system will converge to the goal position [19] . (3) is an exponential decay system, and the state plays the role of moderating the initial acceleration of the transformation system, as a large initial acceleration should be avoided in actual robot experiments.
r v r In the original DMP system described by (1)−(3), the delayed goal function and the exponential decay function are phase-defined. For the sake of joining some DMPs, we use a modified DMP model, where overlapping Gaussian kernels on a scalable time axis are used. The delayed goal function is revised as [27] is the duration of the movement, is the sampling rate, and is the start position of the movement trajectory. In this paper, we sampled the manipulator with a sampling frequency of 200 Hz, which corresponds to = 5 ms. And we used a sigmoidal decay function to replace the exponential decay function [27] . The sigmoidal decay function is given aṡ is a positive constant. f A slightly different nonlinear function is given as where is a positive scaling factor and is the width of the -th kernel. The centers of the kernels are set evenly in time between the start and end points of the movement trajectory. The modified DMP model is timedefined, which makes this approach more similar to splines. Besides, it preserves the advantages of the g s g αw τ ωi original method, such as generalization and robustness.
Since the state of the DMP model will converge to the goal , we can scale the movement trajectory in space by changing the start point , goal point , and general scaling factor . Meanwhile, the duration of the motion can be scaled by changing the time constant . Any shape of the movement trajectory can be generated by learning weights from demonstrations. In this paper, we use locally weighted regression, which will be introduced in the next subsection.

Learning of the modified DMP model
Trajectories can be learned from demonstrations through various learning algorithms, such as locally weighted regression (LWR), reinforcement learning (RL), or other machine learning techniques. Advanced learning algorithms can improve the accuracy, but also increase the complexity of the system.
For a specific task, a demonstration is given by , where . Then, we can get the desired velocity and acceleration, The force term is comprised of a weighted summation of basis functions that are activated through time, and thus an optimization technique like locally weighted regression is chosen to select the weights over basis functions such that the force term matches the desired function . Locally weighted regression was set up to minimize and the solution is , where and . LWR has a fast one-shot learning procedure and can be easily implemented. However, only the weights of the kernels can be learned, and the number of Gaussian kernels, center, and widths are unavailable. In the case of multiple degrees of freedom, for example, a robotic arm with 7-DOF, we can share one canonical system among all DOFs, and the weight of each DOF can be learned independently. Besides, the weights of multiple demonstrations can be the average of the weights of each demonstration.

Joining DMPs
The most straightforward way to join two trajectories is to place them end-to-end. In this way, the terminal velocity of the first movement trajectory will be close to zero, which means that there is a short pause between ψi ci σi two motions. This simple approach might work for most situations, but the drawback is that the transition between two trajectories is unnatural. A novel method to connect multiple DMPs is used such that the transition path is natural and smooth. In this approach, we construct a set of overlapping kernels defined by centers and width and rearrange the kernels for the entire trajectory.
ci The centers are described as follows: for and , where is the duration of the -th DMP, and is the duration of the entire trajectory. Let , and then the width of kernels is defined as . The width of the kernels was scaled by the duration of the joint trajectory . And the weights can be described as follows: r ′ v Then, the goal function and the sigmoidal decay function can be described as is the number of DMPs, which is consistent with our previous definition. The only difference between (8) and (17) is that we use instead of . By overlapping the kernels on the time axis, we can join several DMPs and preserve the features of the traditional DMPs [27] .

Adaptive neural network control of a robotic manipulator
In practical applications, robots will be affected by dynamic environments and there are many uncertainties due to unknown system dynamics. In this section, to enhance the learning performance of the whole framework, an NNs-based controller is designed to approximate the uncertainties and guarantee the accuracy of the motion execution.

Dynamics description
where , , and denote the position, velocity, and acceleration vectors, respectively, is the positive-definite inertia matrix, is the Coriolis and centripetal torque matrix, denotes the gravitational force, denotes the control torque.
From this point onwards, to simplify the notation, the time and state dependence of the system will be omitted. x1 = q x2 =q Let and , and (18) is rewritten aṡ The objective of this section is to design an adaptive controller such that the system variable tracks the reference trajectory , which is obtained from the DMP model to obtain a high-accuracy motion execution.

Radial basis function neural networks (RBFNNs)
Artificial NNs are widely used to approximate the continuous function in the form of , where is the input vector, denotes the weight vector, is the number of neural network nodes, and . Any continuous function can be approximated in the form of , where is the ideal weight vector, is the approximation error assumed to be bounded, i.e., , for and . is defined to minimize the approximation error for all , where [41−43] .
The operator " " is defined as where and .

Control design
In this paper, adaptive NNs control is designed to ap- proximate the unknown dynamics and adapt interactions based on full-state feedback. The position tracking error is defined as , whose time derivative is . The second error variable is defined as , with being a virtual control designed as where is the gain matrix, then we have . Differentiating , we have . Considering a Lyapunov function , and taking the time deviation of , we haveV where is a cross term that will be canceled later. A Lyapunov function is introduced as . Differentiating with respect to time and considering (22), we obtaiṅ According to [24], we know that is a skewsymmetric matrix and therefore has for . Based on this property, (23) is rewritten aṡ The model-based controller can be designed as is a positive-definite matrix. Substituting (25) into (24) yieldsV (26), it is obtained that as and , which implies that and can asymptotically converge to zero. However, the model-based controller (25) requires the robotic model to be completely known, while this condition is difficult to achieve in practice. In the subsequent design, it is assumed that the model parameters , , and are unknown. To deal with this challenge, we utilize neural networks to approximate the uncertainty to enhance the accuracy of the tracking control under the framework of robot skills learning using DMPs. According to Section 3.2, neural networks approximation is specified as where is the input of neural networks, and is the estimation error assumed to be norm-bounded denoted by with . The neural networks-based controller is designed as is the estimate of the ideal weight . The estimated error is defined as . The adaptive law for is designed aṡ where is a positive-definite matrix, and is a small constant. If are removed from (29), since it is impossible to render the approximation error approach zero despite taking arbitrary numbers of nodes of neural networks, tends to become large and even divergent in the presence of external disturbances, which hinders the robustness of the controlled system.
Theorem 1. For the system described by (18), under the action of adaptive neural networks control (28)  □ A summary of this paper is as follows: 1) A DMPsbased approach is introduced to learn and generate skills. In addition, movement sequences are catenated dynamically with smooth transition; and 2) under the framework of the backstepping technique, an NNs-based adaptive controller is designed for tracking control to ensure the performance of motion execution. The detailed algorithm structure of this paper is presented in Fig. 1.

S0 S1 E0 E1 W0 W1 W2
The experiment in this paper is comprised of two parts, i.e., trajectory generation and trajectory tracking. The humanoid Baxter robot with two 7-DOF arms is used in the experiment. As shown in Fig. 2, each arm is composed of two shoulder joints, two elbow joints and three wrist joints, named , , , , , , and , respectively. The position, velocity and torque information of each joint can be acquired in real-time. The zero-G mode of the Baxter robot enables us to drag the arm freely by grasping the cup [24] .

Joining of dynamic movement primitives
S0, S1, E0, E1, W0 W1 In the experiments, we validate the joint of multiple DMPs. For the pouring water task, as shown in Fig. 3, the whole demonstration can be divided into two steps: 1) dragging the manipulator to the target position and 2) pouring water into a cup, which could be difficult at one time for a redundant manipulator under the influence of gravity. These two steps define two separate movement primitives. During the teaching demonstration, the joints , and on the left arm are dragged in Baxter′s zero-G mode. Both steps of teaching are recorded at the sampling frequency of 200 Hz for about 3 s, and the recorded trajectories are used as a sample of learning for DMPs. To deal with the noise produced by the vibration of the Baxter manipulator, we used cubic spline interpolation on the recorded trajectories to make them smooth. The parameters of the modified DMP model are set as , , , and s, and the width of the -th Gaussian kernel is set as , where is the number of the basis functions for each DMP. Generally, more basis functions and small Gaussian variance will improve the accuracy of learning, and over-fitting trajectory becomes unnecessary.
The results for repetition and generalization are shown in Fig. 4, from which we can see that the modified DMPs have a remarkable learning performance. After each DMP is trained, we can not only reappear the demonstration trajectories, but also change the start point or generalize them to new goals. With the generalization ability, movements to the other cup can be produced by modulating the target position. Besides, we plan both movements to 3 s (600 steps) for the convenience of experimentation, which also shows that the modified DMP can easily adjust the duration of movements. Meanwhile, the generalized trajectories of each joint angle retain the shape of the original trajectories. For the original joining methods, after two steps of movement are trained respectively, we run the second DMP at the end of the first DMP (the end point of the first DMP is used as the start point of the second DMP). By doing this, we can easily generalize both of the movements to new goals and produce a new connected movement. The modified DMPs work in such a way that the second DMP starts when the first one is about to end. Meanwhile, the overlapping kernels produce continuous angles and velocity profiles between them. Noting that the parameters of the original DMP model are set to the same value as the modified one. Fig. 5 shows the results for joining two DMPs in the position profile, respectively, and it can be observed that the behavior of the modified DMPs is similar to the original. The results for joining two DMPs in the velocity profile are presented in Fig. 6, where the black circles denote the junction points between two original DMPs.  Fig. 1 Algorithm structure based on DMPs and NNs control From Fig. 6, it is found that there exists a jump in the velocity profile at the junction point, which implies that the obtained velocity profile is not smooth, leading to the acceleration surge for tracking control. It is a great burden for the manipulator, which brings a large tracking error. As compared with the original method, the modified one can produce a smooth velocity profile, which is beneficial for the tracking control of robotic manipulators.

Conclusion on joining movement sequences
The DMPs are a general framework to model goal-directed behaviors. In practice, teaching the multi-joint manipulator for one time gives the manipulator a reliable imitation performance, which is very difficult to achieve, especially when dealing with complex tasks. A possible way is a step-by-step instruction, and the movements are sorted. This principle is able to model complex motions by concatenating sequences on the basis of movement primitives. However, before reaching the endpoint, a long and shallow attenuation exists in the velocity profile of the original DMP model, which is due to the exponential decay canonical system in the original DMP model. The original canonical system converges quickly due to the exponential function, which shows that most of the basis functions are activated, and then the activation slows down gradually. However, this shortcoming can be avoided if the manipulator is controlled for moving at the velocity we performed without a pause. It should be pointed out that with sigmoidal decay canonical systems, the basis functions can be activated more evenly by phase variables. Moreover, the canonical systems of two DMPs can be transited stably. However, there exist jumps and peaks in the velocity profile for the original method, which is the primary disadvantage. The transition can be     402 set to be more natural and smooth by using overlapping kernels in the modified joining method, and the implementation is also easy by rearranging the kernels. In addition, the modified joining method can also be applied to the tasks reaching the way-point at the specified time or predefined velocity.
In order to show the advantage of the modified DMPs, the relative root mean square (RMS) error for the velocity profile is given as where is the sampling time, and is the velocity under the original or modified DMP methods. According to (30), errors of the velocity profile for the original and modified method are given in Table 1. It can be clearly seen from Table 1 that compared to the original DMPs, the RMS error of each joint for the modified DMPs is smaller, which implies that the velocity profile under the modified DMPs is smoother and more natural. On the basis of the above discussions, it is concluded that the joining method based on the modified DMPs has a better performance.

Tracking performance with NNs control
This part of the experiment validates the trajectory tracking ability of the NNs control in the presence of unknown dynamics caused by the payload. The experiment setup is shown in Fig. 7. The Baxter′s left gripper grabs a bottle of water, and the trajectory tracked by the Baxter robotic manipulator in joint space is obtained from Section 4.1. The parameters for the NNs control are configured as follows.
and . The centers of the neural nodes are chosen evenly within the limits of the joint position and velocity, and the adaptive gain is set as . Moreover, for comparison, a test based on PD control is carried out.
The experimental results are shown in Figs. 8 and 9. It is found in Figs. 8 and 9 that the tracking errors based on the PD control are larger than those based on the NNs control. In the experiment of the NNs control, the tracking error is significantly reduced because of the compensation torque provided by the NNs to estimate uncertainty. Therefore, we can conclude that the NNs-based control performs better.

Conclusions
This paper presents a general framework for robot  to learn and generalize motions. Besides, a stage teaching and joining DMP strategy is proposed to learn complex movement sequences. Compared to the original approach, modified DMPs can produce a smooth transition between primitive movements. Furthermore, an NNsbased control policy is proposed to ensure the tracking performance of trajectories generated from the motion model. For further studies, it is possible to apply finitetime control [44,45] to robot learning from demonstration or event-triggered control [46] to trajectory tracking control for robots with unknown actuator failures [47] and output constraints [48] .

Open Access
This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.    The images or other third party material in this article are included in the article′s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article′s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.