Enhancing learning capabilities of movement primitives under distributed probabilistic framework for flexible assembly tasks

This paper presents a novel probabilistic distributed framework based on movement primitives for flexible robot assembly. Since the modern advanced industrial cell usually deals with various scenarios that are not fixed via-point trajectories but highly reconfigurable tasks, the industrial robots used in these applications must be capable of adapting and learning new in-demand skills without programming experts. Therefore, we propose a probabilistic framework that could accommodate various learning abilities trained with different movement-primitive datasets, separately. Derived from the Bayesian Committee Machine, this framework could infer new adapting trajectories with weighted contributions of each training dataset. To verify the feasibility of our proposed imitation learning framework, the simulation comparison with the state-of-the-art movement learning framework task-parametrised GMM is conducted. Several key aspects, such as generalisation capability, learning accuracy and computation expense, are discussed and compared. Moreover, two real-world experiments, i.e. riveting picking and nutplate picking, are further tested with the YuMi collaborative robot to verify the application feasibility in industrial assembly manufacturing.


Introduction
In modern advanced manufacturing, the industrial robots are widely used in assembly tasks, such as peg-in-hole [7,27], slide-in-the-groove [18], bolt screwing [11,12] and pick-and-place [10,24]. Owing to high-precision sensors, leading driven techniques and excellent mechanical structure, industrial robots can successfully deal with known objects within the well-structured assembly environment. However, current industrial robots can hardly handle complex assembly processes or adapt to unexpected changes. To maintain a robust control performance, industrial robots are usually programmed to follow fixed trajectories, especially for large workpiece assembly. For flexible manufacturing applications, industrial robots are required to perform several tasks with various end-effectors regarding different assembly environments. Therefore, assistant measurement devices, i.e. machine vision system and metrology [15], could provide a reference target for the robots. Nevertheless, they can only be applied in a certain region of interest, which more or less limits the generalisation of retrieving novel trajectories.
Generally, the core idea of assembly is to generate ordered operations consisting of a set of movement primitives, which could bring individual components together to produce a novel product. Similarly, an excellent operator does have the prime skills in terms of performing assembly tasks, which promotes a feasible scenario for robots to learn from human demonstration.
In the context of learning from demonstration, several algorithms, i.e. probabilistic movement primitives (ProMP) [17] and dynamic movement primitives (DMP) [20], have been proposed to generate desired trajectories regarding different modulations. Both ProMP and DMP introduce various weight coefficients to describe basis functions and govern explicit dynamic equations, separately. As a timedriven algorithm, the weight parameters of the basis function are learned towards an optimal function value without addressing high-dimensional inputs.
In order to address high-dimensional issues and alleviate specified trajectory equations, Gaussian Mixture Model (GMM) [3] is applied to model several Gaussian distributions of demonstrations probabilistically using the EM algorithm. Combining with Gaussian Mixture Regression (GMR) [4], the novel predicted trajectories are derived from a weighted conditional Gaussian distribution. However, the capability of generating trajectories is limited by the similarity (Euclidean distance in the covariance function) [26] of the demonstration and the desired input. A similar kernel-based framework, such as movement primitives with multi-output Gaussian Process [6] and Kernelised Movement Primitives [5], could be seen as the variations of GMM/GMR, which take advantage of the kernel function to retrieve more flexible trajectories.
Reinforcement learning is considered as an alternative to adapt new tasks according to the optimisation reward. In [22], Policy Improvement with Path Integrals (PI 2 ) is used to refine the movement primitives of DMP. A modified version of PI 2 based on Monte Carlo Sampling is introduced in [21] to enhance the learning performance. Additionally, Q learning algorithm, such as nature actor-critic [19], is applied for automatically selecting the centres of GMM clusters. Nevertheless, the learning procedure based on sampling optimisation might be time-consuming.
Although robots are usually supposed to generate feasible trajectories in a wide range of various circumstances, human demonstrations could only provide limited sets of learning instances. Therefore, in addition to the abovementioned imitation learning algorithms, several modified versions have been proposed to add more advanced properties in order to enhance the capability of generating adapting trajectories. In [14], based on ProMP, a probabilistic human-robot interaction methodology is proposed in collaboration with an operator. Moreover, the springdamper dynamic behaviour regarding impedance control is discussed in [9]. A task-parametrised formulation extended from GMM is presented in [2], which essentially models movement behaviours with a set of task parameters, and therefore improving generalisation capability.
The remainder of the paper is organised as follows: after the introduction, an overview of the distributed probabilistic framework is presented in Sect. 2; Additionally, Sect. 3 outlines the individual movement primitive learning of GMM clustering and GMR regression with the EM algorithm; in Sect. 4, the multiple movement primitives learning under the distributed regression framework is addressed; Sect. 5 presents the comparison between the task-parametrised GMM and our proposed learning framework, along with several assembly tasks using ABB YuMi robot in order to verify the application feasibility; finally, the conclusion is reported in Sect. 6.

Distributed probabilistic framework-an overview
Nearly all the movement-primitive imitation learning methods focus on the adaptation and modulation of a single human demonstration template. As usual, these human demonstrations are captured under specific conditions such as obstacle constraints, limited sensor devices or with a redundant manipulator. All these facts would place barriers in the way of reconfiguring or retrieving novel trajectories regarding a different task setting. Therefore, in this paper, we propose a novel distributed probabilistic framework for enhancing the learning capabilities among different movement primitives. More specifically, as illustrated in Fig. 1, this framework aims to accommodate different movement primitives by storing the task parameters, along with primitives parameters obtained by GMM and GMR. Both parameters are further utilised to establish a nonlinear mapping based on the Gaussian process.
Furthermore, the Bayesian Committee Machine is employed as a probabilistic fusion machine to automatically choose a training movement primitive of retrieving a new movement primitive given the combination of several training movement primitives (adaptation). The core idea of our proposed framework is that it preserves the individual functions and features of each movement primitive, and meanwhile flexibly outputs novel motions that meet the demand of the task environment.
To improve the readability of this paper, we highlight our contributions as follows: 1. We propose a novel distributed probabilistic framework, which could accommodate various movementprimitive datasets into an overall regression structure. 2. Based on the Evidence Maximisation, the hyperparameters of the Gaussian process regression model of the task-parametrised and the GMM parameters are automatically optimised. 3. Derived from the Bayesian Committee Machine, the prediction of the new task trajectories is derived from the weight contributions of all the trained Gaussian process regression models from corresponding movement-primitive datasets. 4. In order to demonstrate the application feasibility of our proposed distributed probabilistic framework, the task-parametrised GMM methodology is compared with our proposed distributed framework. Moreover, the application feasibility of this framework is further verified through real-world experiments.

Individual movement primitive learning
We start Sect. 3.1 by briefly introducing the learning process of encoding human demonstrations with GMM clustering and retrieving trajectories using GMR regression [3]. Moreover, the model learning of the movement primitives with the EM algorithm is given in Sect. 3.2.

Human demonstration encoding
Basically, the i-th human demonstration can be defined as a dataset fn I ; n O g i , where n I 2 R I is considered as an time input variable. Hence, n O 2 R O is hence in either task space or joint space. Encoded by a GMM with K Gaussian processes, a datapoint n ¼ ½n I ; n O of D dimensions described by the GMM can be probabilistically defined as with the Gaussian distribution where l k and R k are the mean and covariance of the Gaussian distribution N ðn; l k ; R k Þ and p k ( P k p k ¼ 1) is the prior. Considering the input and output components separately the predicted distribution pðn O jn I ; kÞ $ N ðn k ;R k Þ is defined aŝ If we take the complete GMM into consideration, the predicted distribution can be rewritten as where h k is the posterior that decides the responsibility of the k-th Gaussian distribution According to the linear combination properties of the Gaussian distributions, the conditional distribution is thus Origin: estimated as a single Gaussian distribution. Given n I , the expectation and covariance of n O are approximated aŝ

Model learning with the EM algorithm
We utilise the EM (Expectation Maximisation) algorithm [16] for GMM training, which is an iterative algorithm to maximise the posterior estimation of parameters in the statistic model. According to Jensen's inequality and KL (Kullback-Leibler) divergence, the EM algorithm consists of two steps, i.e. expectation step and maximisation step. If we define the maximisation parameters of the GMM model as H ¼ fp k ; l k ; R k g K k¼1 , the expectation step is trying to find the value of the following object function at g step ð4Þ with x i the training data. In the maximisation step, the parameter H ðgþ1Þ is thus obtained by maximising QðH ðgÞ Þ For initialisation, the K-means algorithm [8] is utilised to choose the original parameters of the Gaussian distributions and hence the EM algorithm proceeds until converging and deriving a closed-form solution. Also, the graphic explanation of encoding the human demonstrations is given in Fig. 2.

Probabilistic distributed framework
According to the analysis of individual movement primitive in Sect. 3, a learned individual primitive model based on several human demonstrations can be represented by GMM parameters H ¼ fp k ; l k ; R k g K k¼1 . Inspired by [2], if a connection between the GMM parameters and the task-specific feature is established, an individual primitive model could generate more extension. Therefore, we introduce a Gaussian process regression model that maps the Cartesian task parameters to GMM parameters in Sect. 4.1. Also, the probabilistic distributed learning framework for multiple movement primitives is detailed in Sect. 4.2.

Task-parametrised model
In order to encode the relationship between the task parameter Q and the GMM parameters H, we consider a regression model based on Gaussian process with the Gaussian white noise x and the variance R x . The regression model can be fully specified by the mean function m f ðÁÞ and semi-positive covariance function k f ðÁ; ÁÞ. Moreover, the kernel covariance is defined as with the length-scales K ¼ diag ðl 2 1 ; :::; l 2 n Þ, the signal variance r 2 f , and the noise variance r 2 x , which are defined as the GP hyper-parameters h ¼ fl i ; r f ; r x g.
Given desired task parameters Q d , the new GMM parameters derived from conditional probability of the Gaussian distribution are defined as In the initialisation step, seven Gaussian distribution models are initialised with K-means. After 48 steps training with EM algorithm, the expectation of GMM converges to predefined interval. Therefore, the trajectory is hence retrieved using GMR as shown above In [2], the covariance k f ðQ d ; Q d Þ of conditional probability is neglected and only the mean m f ðQ d Þ is used to retrieve novel trajectories. As the hyper-parameters are not optimised, the covariance may have a negative value. In the proposed framework, the covariance k f ðQ d ; Q d Þ is seen as the crucial information that is utilised to indicate the confidence interval in the data fusion. The hyper-parameters of the Gaussian process model should be optimised, and therefore, the covariance could have a meaningful value which indicates a positive connection among different GMM parameters.
After choosing a flat pðhÞ, the posterior distribution is only proportional to the marginal likelihood To optimise the vector of hyper-parameters h, we follow the recommendation from [13]. Particularly, the log-marginal likelihood can be given as Therefore, the hyper-parameters are set by maximising the marginal likelihood. We define the partial derivatives of the marginal likelihood w.r.t. the hyper-parameters h i [26] o oh i log pðHjQ; hÞ ¼ where K r ¼ K þ r 2 x I. In the above equation, the two terms usually refer to the data-fit term and the model complexity. The gradient technique aims to seek the trade-off between the data-fit and model complexity.

Distributed learning
The obtained covariance k f ðQ d ; Q d Þ of Gaussian distribution shows the confidence interval of the predictions, which could be seen as the robustness of Gaussian process regression. In this paper, the covariance of prediction is used as a data fusion indicator.
Owing to independence assumption, the marginal likelihood could be factorised into several individual terms where each factor term p k depends on the k-th individual GP regression model as discussed in sect. 4.1.
The following information details how to combine M individual primitive models to form an overall prediction with the Bayesian Committee Machine (BCM) [23]. As we can see, the BCM explicitly combines the GP prior p(f) when making prediction.
Given M individual primitive models, the predictive distribution can be generally defined by Therefore, given an input x Ã , the posterior predictive distribution is defined as Then the mean and the precision are separately, with r À2 ÃÃ the prior covariance of pðf Ã Þ.

Experiments
In order to verify the feasibility of the proposed probabilistic framework, several experiments are implemented in this section. In Sect. 5.1, the task-parametrised GMM is compared in terms of several aspects, such as generalisation capability, accuracy and computation expense. Furthermore, two assembly tasks, rivets and nutplate pick-up are given in Sect. 5.2 to demonstrate real-world application feasibility.

Comparison with task-parametrised GMM
The task-parametrised GMM is a powerful tool to retrieve trajectories in a variety of tasks, such as movement primitive reproduction, viapoint adaptation and modulation. The proposed probabilistic distributed framework aims to mutually combine and simultaneously accommodate various movement primitives in an overall scenario, and therefore augments the generalisation capability and makes great use of every single movement primitive. In this subsection, we would like to explore more functions both from task-parametrised GMM and our proposed framework in terms of generalisation capability, accuracy and robustness, and computation expense. Generalisation capability for exploring the generalisation capability, twelve movement primitive datasets are generated randomly as shown in Fig. 3. Each dataset accommodates four movement primitives with three GMM components. Moreover, the origin frame and task frame are recorded in pink and green separately for further analysis. It is worth pointing out that basically every dataset could be seen as a task-parametrised GMM model.
Four different task frames are presented for testing the generalisation capability, as shown in Fig. 4. Particularly, for a desired task frame in green, every dataset retrieves its own predicted trajectory as given in Fig. 4a-d. Moreover, three GMM components are displayed with the mean in black dot and the covariance in blue, yellow and purple ellipses.
As shown in Fig. 4, although all the movement datasets give their predictions, some of these predictions do not match the desired task frame in terms of position and orientation. This is because a single movement dataset has a limited generalisation capability. If the desired task frame is too far from the task frames of the data sample, the taskparametrised will have a poor retrieving performance.
Our proposed distributed framework takes all the predictions from the datasets into consideration and fuses the trajectories on a GMM-parameter level. In addition, this probabilistic framework could bear poor prediction derived from several datasets, and meanwhile, output satisfying results as presented in Fig. 5.
Accuracy and robustness in order to provide a more comprehensive analysis, the weights and prediction intervals of every dataset are presented in Fig. 6 derived from Eqs. 20 and 21, separately. Moreover, the prediction accuracies of each primitive dataset and our proposed distributed framework are compared in Fig. 7.
As shown in Fig. 6, the confidential interval in the red bar shows the prediction range of each movement dataset. If the confidential interval is large, then the corresponding movement dataset will lose its confidence in predicting novel trajectories. On the contrary, if the confidential interval is narrow, then the movement dataset has more faith in its own prediction. Consequently, in our proposed distributed framework, large confidential interval matches with low weight as presented in Fig. 6 in blue and vice versa.
As shown in Fig. 7a, the prediction error of each primitive dataset is nearly proportional to the confidential intervals in Fig. 6a. The similar situations can also be observed in the other three group simulations, i.e. Figs. 7b and 6b, Figs. 7c and 6c, and Figs. 7d and 6d. This is why we use the information of the confidential intervals of each primitive dataset are used to quantitatively explain the weights applied in Equ. 20. In addition, the proposed distributed framework shown in green gives a better prediction accuracy compared with the accuracy from each primitive datasets shown in yellow according to the four prediction errors given in Fig. 7a-d.
We would like to point out that it is not always the case that a very small confidential interval leads to better a b c d e f g h i j k l prediction results. Sometimes, a narrow confidential interval indicates that the algorithm is very aggressive. Additionally, a large confidential interval may result in conservative predictions. So it is crucial to keep a balance between uncertainty and over-fitting and maintain robustness.
Computation expense another crucial property we add to the distributed framework is the optimisation of the hyperparameters of the Gaussian process with Evidence Maximisation. However, the training of hyper-parameter requires additional computation expense Oðn 3 Þ and Oðn 2 Þ for prediction if the trained parameters are cached, with n the volume of the training dataset. Therefore, for our proposed framework, the whole computation expense is Oðm Ã n 3 þ m Ã n 2 Þ, with m primitive datasets. For more information on the computation expense of the distributed framework, we refer to our previous work [25].

Assembly tasks
After addressing all the key issues of our proposed distributed framework in Sect. 5.1, in this subsection, the feasibility with real-world experiments is verified, such as rivet picking and nutplate picking. As presented in Fig. 9, we test our proposed framework with the ABB YuMi robot. The YuMi is a two-arm collaborative robot with an industrial camera mounted on the wrist of the right arm, and the payload is 0.5 kg for each arm. To amplify the function of the YuMi, two grippers are equipped with two arms, respectively.Rivet picking the first experiment implemented in this subsection is picking rivet from the rivet block as shown in Fig. 10. We collect twelve groups of human demonstrations as given in Fig. 8, along with the trajectories in Fig. 11a. As shown in Fig. 11a, the collected demonstrations have some inaccuracies. Particularly, the demonstrations are not smooth enough and some of them b Fig. 4 The retrieved trajectories of movement-primitive datasets. The generalisation capability is tested with four different task frames given in green. The initial frame is shown in pink, and three GMM components corresponding to each movement primitive are presented in blue, yellow and purple. In addition, each retrieved trajectory could be seen as the prediction of the task-parametrised GMM (color figure online) Fig. 5 The retrieved trajectories of our proposed distributed framework. For a group of twelve movement primitive datasets given in Fig. 4a-d, our proposed framework could accommodate all the primitive datasets together and predict novel trajectories regarding the desired task frame  may not be successfully inserted into the holes of the rivet block.
Each primitive-dataset group is trained with three GMM using EM algorithm as presented in Fig. 11b. Moreover, as shown in Fig. 11b if the training dataset is decentralised, the GMM ellipsoid is large and the Gaussian process model has a wide distribution and verse versa.
Besides, we construct a Gaussian process regression between the task frames of twelve primitive datasets and corresponding GMM model parameters. Under our proposed distributed framework, the twelve novel trajectories are inferred in Fig. 11c, along with the desired task frames in square black and green makers and the origin frames in square black and yellow makers.
To reveal further details, each three consecutive predicted trajectories are separated in four figures as represented in Fig.11d-g. Additionally, the confidential intervals derived from GMR are plotted in green.
The prediction errors are given in Fig. 13. As we can see in the figure, all the prediction errors are below 0.35 mm. Most of the prediction errors are lower than 0.2 mm, which is the reference assembly precision of aerospace manufacturing. However, we still notice that the prediction error of the ninth hole is larger than 0.2 mm and the second is even higher than 0.3 mm. This is mainly caused by the accuracy of the human demonstrations. If all the human demonstrations are far from the desired target, the prediction will have poor retrieving performance. We would like to point out that the above precision or prediction errors are enough accurate for picking applications, such as rivet  Fig. 9 The experimental platform. We test our proposed distributed framework with the ABB collaborative robot, YuMi. The rivet block is on the left side of the YuMi, while the picking board is located on the right side Fig. 10 The top view of the rivet block. The rivet block is designed to locate the rivets. In addition, the diameter of each hole is 3 mm picking. Theoretically, the proposed distributed framework can manage to keep the prediction errors below 0.2 mm with more accurate demonstrations. Nutplate picking the second experiment is nutplate picking, which is implemented with machine vision techniques using Cognex in-Sight smart camera. The initial experiment setting includes the checkerboard calibration to the YuMi robot and features extractions, which is achieved with the function PatMax Patterns [1]. Besides, we set an adapted exposure time which depends on the ambient lighting condition.
Similarly, we obtain several human demonstrations as presented in Fig. 12a. Then, these demonstrations are trained with GMM models and retrieve novel trajectories under the proposed distributed probabilistic framework.
For the picking guidance, the position and orientation of the nutplates are located using the machine vision techniques, as given in Table 1. As given in the left subfigure in  Fig. 14. The additional information of the learned trajectories is given in Fig. 15. Besides, the picking process of the human demonstration and the nutplate in the middle is recorded in Fig. 12a and b, respectively. We would like to point out that only the target positions are learned by the proposed distributed framework and the orientations are directly sent to the picking program. The orientation learning will be addressed in our future work. In order to further analyse the picking accuracy, several experiments are implemented with the same experimental setting.
Combining these nutplate picking experiments, the average prediction errors with error bars are presented in Fig. 16. The average errors are below 0.3 mm, which include implementation error, machine vision error, and the error of our proposed framework. The machine vision error is derived from the lens distortion and ambient lighting condition. With the compensation of the gripper, the YuMi can successfully perform nutplate picking tasks.

Conclusion
In this paper, we propose a novel distributed probabilistic framework, which can accommodate various movement primitives together and retrieve novel trajectories in a weight-based scenario. Specifically, the core idea of this framework is to not only provide functionalities of generating new movement primitive given task parameters but also aim to explore a feasible solution to save various primitives and select or modulate them regarding different demands. The human demonstration for establishing the primitive dataset is captured with GMM and GMR. Moreover, the regression model between the task parameters and primitives parameters is obtained by the Gaussian process and could be automatically optimised with Evidence Maximisation. Also, given the desired task frame, the retrieved trajectories are predicted using Bayesian Committee Machine. The assembly task experiments, such as rivet and nutplate picking, show the application feasibility of our proposed framework. Our future work will focus on the 14 The desired task frames and gripper picking records. The desired task frame is derived from machine vision corresponding to the information of Table 1. The gripper picking actions are recorded at the right side of the figure

Declarations
Conflict of interest The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.