DMPs-based skill learning for redundant dual-arm robotic synchronized cooperative manipulation

Dual-arm robot manipulation is applicable to many domains, such as industrial, medical, and home service scenes. Learning from demonstrations is a highly effective paradigm for robotic learning, where a robot learns from human actions directly and can be used autonomously for new tasks, avoiding the complicated analytical calculation for motion programming. However, the learned skills are not easy to generalize to new cases where special constraints such as varying relative distance limitation of robotic end effectors for human-like cooperative manipulations exist. In this paper, we propose a dynamic movement primitives (DMPs) based skills learning framework for redundant dual-arm robots. The method, with a coupling acceleration term to the DMPs function, is inspired by the transient performance control of Barrier Lyapunov Functions. The additional coupling acceleration term is calculated based on the constant joint distance and varying relative distance limitations of end effectors for object-approaching actions. In addition, we integrate the generated actions in joint space and the solution for a redundant dual-arm robot to complete a human-like manipulation. Simulations undertaken in Matlab and Gazebo environments certify the effectiveness of the proposed method.


Introduction
With the development of artificial intelligence (AI) and modern engineering technology, robots are widely used in the industry and military domains [1,2] to complete dexterous manipulation such as grasping and holding [3] objects, etc. Compared with a single-arm robot, the dual-arm robot can complete more complex tasks and take heavier objects through the cooperative actions of two robot arms [4], which draws a great of attention in academic and industrial areas [5][6][7].
Different from traditional analytical methods spending a lot of time for system modelling, trajectory planning, and force control, learning from demonstration (LfD) is a new paradigm that robots acquire new skills by learning to imitate an expert that is easy to be extended and adapted to novel situations and draws more and more attention in recent years [8][9][10][11]. Some researchers used a dual-arm robot for robotic skill learning and training which benefits the isomorphic structure of the two arms: the operator use one arm to record robotic motions and the other robot arm is used as an actuator [12][13][14][15]. There are great series of robotic skill learning methods, such as the Hidden Markov model (HMM), Gaussian mixture model (GMM) and Gaussian mixture regression (GMR), and Dynamic movement primitives (DMPs). Compared with HMM and GMM-GMR, DMPs are easier to be explained and have linearity in the parameters of expressions with robustness, and continuity.
Due to the advantages, DMPs were widely used for dualarm robot skill learning. Kulvicius et al. proposed sensory feedback together with a predictive learning mechanism that allows tightly coupled dual-agent systems to learn an adaptive, sensor-driven interaction based on DMPs [16]. Gams et al. reckoned that the original DMPs function should be modified by adding not only acceleration term but also velocity term, to get a smoother interaction. So they proposed the coupling of originally independent robotic trajectories by expanding framework of DMPs, which enables the bimanual execution tightly coupled for cooperative tasks [17]. Zhao et al. presented a reinforcement learning (RL) algorithm called the policy improvement with path integrals for sequences of DMPs (SDMPs) to learn and adjust recorded 1 3 trajectories of dual-arm robot cooperative manipulation [18]. Colome et al. studied simultaneously learning a DMPcharacterized robot motion and the joint couplings through linear dimensionality reduction (DR), which provides valuable qualitative information leading to a reduced and intuitive algebraic description of such motion [19]. Lee et al. integrated a similar method for cooperative aerial transportation with the random tree star (RRT*) to enable cooperative aerial manipulators to carry a common object and keep reducing the interaction force between multiple robots while avoiding an obstacle in the unstructured environment [20].
Seen from the above-mentioned methods, it is not hard to notice that the original DMPs function is modified by adding coupling terms calculated by the relative distance or force errors to change the path and ensure the relative distance tracking errors converge to 0. But, the dynamic performance of the trajectories such as how to enlarge and reduce relative distance flexibly and avoidance obstacles are not considered for the moving process. If we only use a fixed relative distance limitation for the dual arms, the object-approaching skills such as changes of the speed and contact forces will be ignored.
In this paper, we will integrate the control strategy of Barrier Lyapunov Functions (BLFs) and DMPs to enable the generate trajectories to satisfy predesigned transient performance for the relative distance of robot end effects. As the trajectory generalized by DMPs is determined by three variables: the start and end points and sampling time interval, and the measured data with errors and noises will be processed (e.g. aligning, filtering etc.) for skill learning, even for the processed data, the data-driven learned results may be against physical limitations. The proposed method based on the integration of BLFs and DMPs will address this problem. Similar ideas combining control and motion planning methods for manipulation have been explored in previous work [21][22][23].
Additionally, we will combine the generated actions in the joint space with the solution of a redundant dual-arm robot to perform human-like operations. Though a similar study about human-like coordinative learning in the Cartesian and joint space for a redundant dual-arm robot has been studied by Qu et al. [24], we will propose another idea by defining a "swivel angle" and combining the null-space method and the results of DMPs with distance constraints. With the dual-arm demonstration data acquired through a Kinect, an experiment is taken based on Matlab and Gazebo to verify the effectiveness of the proposed method.
The rest of the paper is organized as follows: "Problem description" makes a brief introduction about DMPs and the problems of skill learning for dual-arm redundant robots. "BLFs-based improved DMPs for human-like skill learning and redundancy resolution" presents the DMPs and BLFs framework and related three calculating modules. "Experiment" masks three experiments to certify the effectiveness and application of the proposed method. Finally, in "Conclusion", the conclusions of this paper are summarized.

General DMPs model
DMPs model is firstly proposed in [25] and updated by Ijspeert et al. [26], whose function is expressed as where z , z > 0 are coefficients of a two-order function as the linear part in (1), ensuring the convergence of the generated trajectory to the unique attractor point at x = g , v = 0, f (s) = T Ψ(s) is a forcing function and a linear combination of nonlinear radial basis functions, and = w 1 , w 2 , … , w n T , Ψ(s) = 1 , 2 , … , n T and where c k and h k > 0 are centers and widths of radial basis functions respectively. > 0 is a timing parameter adjusting speed before execution of movements and s is a phase variable to achieve dependency of function f (s) out of time. The dynamics of s is expressed by a canonical system Term s has implicit relation with time that can modify the convergence time by changing , and can be learned by supervised learning algorithms e.g. locally weighted regression (LWR). The purpose of the calculating process is to minimize the error function: where f (s) is the forcing function in (1), and f Tar (s) represents the target value of f (s):

DMP-based dual-arm robot manipulation
From human demonstrations to the skills learned and generalized by robots, we will solve the following three problems: joint distance restriction, redundant joint resolution, and relative distance limitations (Fig. 1).
Joint distance restriction is caused by the bone lengths of the adjacent joints such as the elbow and the wrist or the elbow and the shoulder. It is a constant. The redundant joint resolution is to plan the robot arm joints to achieve human-like motions and the relative distance limitations provide constraints to the robot end effectors.
Meanwhile, measuring noise is acquired together with raw data. If we use the data for skill learning, then learned results will be influenced by noise and measuring errors. Therefore, the signals such as EMG measurements will be processed before learning procedures. Images and videos affected by occlusions are reconstructed, which may cause new uncertainties and errors. An example shown in Fig. 2 reveals the measuring errors of hand positions are larger than the ones of the shoulders and elbows. Based on the pre-knowledge such as the size of the object, bone length between the elbow and the shoulder, or exact position of the object, we can re-plan joints and positions of the end effector's to suit new cases. Here, we argue that the known conditions have fixed constraints and propose a new DMP -based framework for dual-arm robot cooperative skill learning with consideration of the above three problems.
Robotic human-like manipulation has been studied for several decades. By using the nullspace method for the redundant robotic arm, robots can avoid conflicts with own arms and outer obstacles with multiple joint motion planning results. As the skills for robot ends and joints are generated in both the Cartesian and joint space, the previous researches like [9] and [13] only for the joint or Cartesian space cases are not applicable. We will combine the constrained skills and the nullspace method for skill generalization for redundant dual-arm robots.

BLFs-based improved DMPs for human-like skill learning and redundancy resolution
In this section, we propose three solutions for the above relative distance limitation, joint distance restriction, and redundant joint resolution in three subsections. We firstly specify mathematical symbols in the following paragraphs in Table 1.

Integrated BLFs and DMPs skills learning for relative distance limitation
Interactive actions of robot end effectors can be seen as a common effect of relative distance and posture changes. For the cooperative actions e.g. folding clothes, grasping and placing objects, relative distance is always changing during the interaction process with environmental objects. Too large relative tracking errors may cause operational failure such as losing control of the object or conflict with the obstacle.
Following robotic desired relative distance d j (t) , we set predesigned error boundaries as ij , i = 1, 2 , which means boundary violation is not allowed throughout the cooperative manipulation process, then the relationship of the jth hand wj and its cooperative role wj is Variables of left and right arms (or robot arms) Cooperative role of the jth arm Calculations in the first and second round The expression of DMPs model can be rewritten as a strict feedback nonlinear system as where u has the same usage with the forcing function f (s) in (6), while in (1) of [27], it represents the input signal to where 2 is a function to be designed and build the Lyaponov candidate as The difference of V c can be calculated as Theorem 1 Considering the DMPs function described as (6) under the condition of (5), if the initial conditions are such (9) is satisfied, then the output constraint is never violated and all the closed loop signals are bounded.
Proof Taking the expressions of z i 2 and (6) into (8), we have where k 2 > 0 is a positive number. As k z > 0 , and terms z i 1 − i 2j + i 1j 2 and z i 2 d z are not always 0 , then the sufficient condition for V c < 0 is Then we can get the expressions of 2 ,u i j and k n in (9). According to lemma 1 in [27], z i 1 will be kept within the range of − where Δu i j = k n z i 2 +̇z i 2 d z , which means that Δu i j will be calculated by two steps: first, using (3) to get the forcing function f (s) ; second, calculating z i 1 and z i 2 timely and adding them to (12). Then the output y is determined by the common function of f (s) and the z i 2 generated by BLFs function. [16][17][18][19][20], by adding a term Δu i j , the original path point x (or output y ) is modified to fit the constraints in (5). However, the proposed method in (9) is more general compared with the previous special designs only providing limitations for the end effectors or point shape of multi-agent formation. The following subsection will extend this method to the case of joint distance restriction.

Integrated BLFs and DMPs skills learning with joint distance restriction
The challenges for the joint distance restrictions are modifying the distance between the adjacent joints such as the elbow and the shoulder or the elbow and twist. Following the definitions in Table 2, the distance errors ranges (like ij for relative distance limitation) are set as e and s . Similar to (5), we can use the following inequalities to reshape the elbow and shoulder positions as However, as mentioned in "Problem description", if we want to replan positions of the elbow and the shoulder, we should take the real measurements of the elbow and the shoulder as reference, but they are limited by the hands' relative distance conditions. Therefore, we will reshape the elbow and the shoulder distance satisfying both (5) and (13) by calculating a common result for the two conditions. Additionally, following Fig. 2, the measuring errors of the hands are larger than those of the elbows. There are measuring mistakes due to the occupations, then the errors are processed in the following two steps: 1. Satisfy limitations for hands' motions 1 wj as: where 1 wj represents the results for the first round. Equation (14) is used to filter the data of hands first by the elbow measurements. 2. Synchronized constraints for the hands, elbows and shoulders where 2 wj represents the results for the second round and Eqs. (15) to (17) consider both limitations of the relative distance and joint distance to rebuild trajectory.

Remark 2
The calculating basis for (15) and (16) is to find a common conditional result satisfying both two inequalities. But, sometimes there is no overlap area for the two conditions. Here, we reckon that the priority of joint distance restriction is higher than that of relative distance, thus the inequality of joint distance restriction will be first considered and then j will be modified to adapt to the condition i 1j (k) ≤ z i 1 (k) ≤ i 2j (k) for the kth calculation. [16][17][18][19][20], the skill learning process will be handled first, and then by adding new term Δu i j , the trajectory will be generalized. Here, we will learn and generalize angle skills for the elbow and the shoulder first without using any limitations. Then the integrated BLFs and DMPs skills learning are used to modify the generalized trajectory to suit inequalities (15) to (17). The improved DMPs hold the properties of normal DMPs and dynamic performance determined by the factors: start and end points, sampling interval.

Remark 3 Similar to the methods in
Based on the BLFs and DMPs integrated skill learning method, we present the general calculation procedure as follows: In Fig. 3, f (s) in (12) of DMPs and constraints (like (5), (13)- (17)) of BLFs are firstly designed separately. After initialing Δu i j in (12), u i j will be calculated to generate new trajectories, and the new position and velocity information will be used to update Δu i j for the next circulation till the destination.

Robotic human-like redundancy resolution
For replanning robotic actions based on the learned skill from demonstrations, some previous researches proposed human-like swivel motion by using its redundant degrees of the manipulator [28,29]. After rebuilding positions of the hand, elbow, and shoulder, we can generalize the joints and end effectors of the redundant robots. Following the depictions in [28], arm plane is the plane built with three joint points of the wrist, elbow and shoulder and reference plane is set as the vertical plane to the human body. We set the swivel angle j , j = l, r as the angle of arm plane and reference plane, shown in Fig. 4.
Set the joint velocities of 7-Dof redundant robot arm as ̇j ∈ R 7×1 , j = l, f , and set the generalized end effectors' positions as 2̄w j ∈ R 3×1 , j = l, f , and the distance between the hand and the wrist are ignored for the two-arm holding actions as ̄h j ≈ 2̄w j (in fact due to the object occlusion, all hand positions cannot be well got). Then by extending ̄h j with gestures of the end effectors to hj , we have ̇h j = j̇j , and j ∈ R 6×7 represents a Jacobian matrix. Following nullspace projection, we calculate the redundancy solution of a redundant robot arm as where Ej ∈ R 3×7 is the Jacobian matrix from the elbow of the robot to the robot base as well as the mapping between the swivel angle and joint velocities [28]. j is defined as the velocity director of swivel motion: where ���� ⃗ SE = sj − ej represents a vector from the shoulder to the elbow and �����⃗ EW = ej − 2̄w j represents a vector from the elbow to the wrist for the generalized human demonstrations. Then the vector will be used for the robot joint planning and calculation of shown in Fig. 1.

DemonstraƟons f(s) learning Constraints
New requirements u(s) calculaƟon u

New trajectory
DMPs BLFs x,v

Experiment
In the experiment, we will achieve demonstration data by Kinect and verify the manipulation effect through the virtual model in Gazebo. The skeleton data of a task of holding and  Figures 6 and 7 are the results based on the initialization of hand's positions using (14). Figure 6 presents changes of the trajectories and skeletons of the original and modified DMPs. Figure 7 presents tracking errors to the predesigned referring trajectory fitting the constraints in (14) to (17). Combining with Fig. 7, we can see that the trajectories of both the left and right arms are changed to avoid violation (14) to (17). Seen from the blue lines (results of original DMPs) in Fig. 7, the amplitudes of the errors in each axis vary from − 0.1 to about 0.05, causing the final distance of the elbow and the wrist changed within a large range of (0.12, 0.25) , which seriously against to the fact.
As the distance is measured between the elbow and the hand contact point (palm) to about 0.28 m, namely d e (∞) = 0.28 . Setting e = 0.01 , then the generalized results show that the errors to desired position decrease to the range of (−0.01, 0.01) after the first few steps and the distance converges to the value around the desired value of 0.28 for both hands. The initial distance errors are large. It needs about 15 iterations to guide the large distance errors to decrease to the desired conditions in (14), which is further processed in step 2 by using (15) to (17). Figure 8a, b presents the original distance of the joints (the elbow and wrist, and the elbow and shoulder). It shows that the joint distance varies within a range of (0.13, 0.32) for both the elbow and the shoulder, and the elbow and the wrist of both arms. Using the proposed two-step method, we can get the fixed distance for the joint links and the varying relative distance for the dual hands' manipulation, both of which converge to a stable interval (Figs. 9a, b, 10).
As we set the desired joint distance as d e ≡ 0.28 and d s ≡ 0.22 by actual measurements, and the error range as 0.01, the results presented in Fig. 9 verify the effectiveness of the proposed method even for the large initial errors. The performance functions for the upper and lower boundaries are set as where k represents the sampling times. Additionally, we compare the results with the method in [17] and set the desired distance as d j (t) ≡ 0.3 . Figure 10 shows the measured relative distances of human two hands (blue lines). The black dash lines present the boundaries of relative distance that decrease from 0.32 to the interval (0.29, 0.31) . The method in [17] enables the relative distance to quickly decrease and keep the value around 0.305, but the relative  Figure 11 presents the simulation process that Two Franka robots are controlled to move the object to follow the trajectories generated in Matlab under the PD control. During the simulation, we set the object and robot end effectors have a certain degree of deformation to counteract the influence of the relative distance tracking errors.

Conclusion
In this paper, we proposed a new DMP-based skill learning and generalization framework for the dual-arm redundant cooperative manipulation. The framework has three functions: skill learning and generalization for the relative distance limitation, trajectory replanning for the joint distance restriction, and redundant solution for multi-Dof robot based on the generalized dual-arm skills. The two former skill/ trajectory learning and generalization methods are studied based on the integration of BLFs and DMPs methods. Using the demonstration data acquired by Kinect, the effectiveness of the proposed framework is verified by a task of holding and placing an object based on the simulations in Matlab and Gazebo. Each technical method is proved and explained by the simulation results. The future work is hopeful to be taken on the real robotic system to complete skill learning autonomously.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.