Iterative learning of human partner’s desired trajectory for proactive human–robot collaboration

A period-varying iterative learning control scheme is proposed for a robotic manipulator to learn a target trajectory that is planned by a human partner but unknown to the robot, which is a typical scenario in many applications. The proposed method updates the robot’s reference trajectory in an iterative manner to minimize the interaction force applied by the human. Although a repetitive human–robot collaboration task is considered, the task period is subject to uncertainty introduced by the human. To address this issue, a novel learning mechanism is proposed to achieve the control objective. Theoretical analysis is performed to prove the performance of the learning algorithm and robot controller. Selective simulations and experiments on a robotic arm are carried out to show the effectiveness of the proposed method in human–robot collaboration.


Introduction
Human-robot interaction (HRI) is a rapidly expanding research field with a great need for robotic assistance in various applications such as assembly, teleoperated surgery, heavy load transport and so on [1][2][3][4][5][6]. HRI has been mainly studied in two complementary branches, such as divisible shared control (DSC) that has been discussed in [7][8][9] and interactive shared control (ISC) in [10][11][12]. DSC enables clear division of the human partner's and the robot's subtasks so that they can work independently, while ISC allows for flexible intervention to improve the collaborative performance which is the focus of this paper. How to design a simple, effective and safe robotic controller has been a focus in the field of HRI [13], especially in the subfield of physical human-robot interaction (pHRI).
A critical yet typical problem in pHRI is how to obtain the human's desired trajectory or motion intention, as the knowledge of this information enables a robot to adapt its behaviour according to the human partner and to complete a task cooperatively [14,15]. In the literature of pHRI, researchers have tried to extract this information from the human partner by utilizing the haptic information, especially the interaction force that can be measured from the force/torque sensor at the interaction point. In [10], neural networks are used to build up a mapping between the human's desired trajectory and the interaction force. In [11], the robot's desired trajectory is deformed according to the interaction force, while the human's desired trajectory is not explicitly estimated. A different approach is adopted in [12], which develops a controller for lifting a beam with human-robot collaboration, using a probabilistic framework based on Gaussian Mixture Models (GMM) and Gaussian Mixture Regression (GMR) to encode force/ torque at the end-effector with the demonstrated velocity. After learning the robot could perform the collaborative lifting by conditioning the GMM on the perceived force/ torque and generate the velocity response using GMR. The strategy of using interaction force as stimuli is also applied in [16] for teaching multiple solution trajectories by means of a Hidden Markov Model (HMM) and GMR. The perceived force/torque could be displayed to the human partner by using a haptic device, and the human partner demonstrates motion response. In addition, without using a force sensor, [17] proposes a human intent detection 1 3 method by observing the impact of the physical interaction on the robot's control effort. From the control point of view, these works use the estimated human information to alter the robot's controller, but they do not evaluate how it affects the performance of the closed-loop system. This is important as human is involved in the system so the system stability should be ensured at all instances.
Impedance control provides a useful framework for analyzing a pHRI system [18], which is adopted as a basis of many works. In [19][20][21], adaptive control is developed to deal with uncertainty of the parameters in robot dynamics to achieve a target impedance model. In [22], a desired impedance model is learned for a robot through repetitive interaction with its environment. It is important to note that impedance control is useful for not only regulating the interaction between the robot and the environment, but also for establishing a human motor control model. Although describing an individual human's behavior in physical interaction with an environment is challenging, impedance control is found to be an effective and robust computational model [23,24]. With a unified model of impedance control for robot control and human motor control, it is natural to use it for skill transfer from the human to the robot. In [25], by measuring the sensory outputs from the robot and the corresponding motor commands generated by the human tutor, a proper mapping between the selected muscle EMG and robot's end-effector impedance is established. Moreover, the coupled HRI system can be explicitly described within the same framework of impedance control, so it becomes possible to evaluate the system performance with estimation of the human state and adaptation of the robot's controller. In the literature, a similar idea is found for impedance-based force control [20], where the robot is under impedance control and its environment is treated as a spring model. In this way, the robot-environment system can be analyzed within the same control framework. Differently, in pHRI, the human dynamics cannot be described as a passive system by a spring model. In this paper, a human model with an actively planned trajectory is investigated.
On the other hand, it is found that many pHRI processes are repetitive, so it is a natural idea to introduce iterative learning control (ILC) for the robot to gain skills from the human partner. In the past decades, ILC has been developed for improving performance of systems that carry out repetitive operations [26][27][28]. Its main goal is to generate a control that tracks a given reference or rejects a periodic disturbance [29]. However, in the pHRI problem under study in this paper, the human's desired trajectory is unknown to the robot so the control objective is different from the traditional ILC. It will be shown that the control problem is formulated in a way that is different from conventional trajectory tracking, although ILC is employed.
Furthermore, due to human variance during repetitive pHRI, the same period cannot be guaranteed for every iteration. Therefore, this paper develops a period-varying ILC that ensures learning convergence in the presence of human uncertainty. In the literature of ILC, some recent works [30][31][32][33][34][35] investigate the problem of period-varying ILC which lay the foundation of our controller. In particular, [33] proposes an ILC scheme with an iteration-average operator for discretetime linear systems where the iteration period could be randomly varying. [34] introduces an ILC scheme based on an iteratively moving average operator for nonlinear dynamic systems with randomly varying iterative periods. Different from the above two methods, [35] applies the adaptive design framework for continuous-time parametric nonlinear systems with varying iteration periods. Similar to other ILC methods as discussed above, these three methods also require a predefined desired output trajectory. In this paper, the human's unknown desired trajectory will be first learned by using a human limb model and by formulating a zero force regulation problem. Then, it will be tracked by the robot so it can reduce human's control effort.
Through the above discussion, this paper has the following contributions: 1. Based on a simplified human limb model, estimation of the human partner's desired trajectory is formulated as a force regulation problem. 2. Different from traditional ILC which has been mainly used for trajectory tracking, we develop ILC to achieve force regulation and estimation of human trajectory, enabling the robot to proactively cooperate with the human through repetitive pHRI. 3. A period-varying ILC is developed to deal with the problem of interactions with uncertain time durations due to human variance, so it can be applied to a wider range of pHRI applications.
The rest of the paper is arranged as follows: Sect. 2 introduces the main problem formulation, including the system description and the novel control objective. In Sect. 3, a non-standard ILC is designed to solve the problem of varying iterative periods and theoretical analysis is conducted to show the learning convergence and system stability. Furthermore, the proposed algorithm is testified by simulations and experiments in Sect. 4. Finally, conclusions are drawn in Sect. 5.

System description
In this paper, we consider a typical human-robot collaboration scenario that is composed of a robotic manipulator and its human partner. The human partner guides the robotic manipulator along a trajectory to complete a task, e.g. painting a surface with a certain curvature. This trajectory is determined by the human but cannot be preprogrammed to the robotic manipulator. The interaction force between the human hand and the robotic manipulator is measured by a force sensor at the end-effector of the robotic manipulator. The dynamics model of the n-degrees-of-freedom (n-DOF) robotic manipulator is given as below: where ,̇,̈∈ ℝ n represent the joint position, velocity and acceleration vectors, respectively; H( ) ∈ ℝ n×n is the symmetric positive definite mass matrix; C( ,̇)̇∈ ℝ n , G( ) ∈ ℝ n denote the torques due to centrifugal and gravity; u ∈ ℝ n is the joint torque applied by robotic actuators; J ∈ ℝ n×n is the Jacobian matrix that relates the joint velocity to the linear and angular velocities of the end-effector; and F h ∈ ℝ n is the robot/human interaction force that can be measured by a force sensor.
It is often desirable to describe the manipulator dynamics in the Cartesian space for the convenience of analysis, when the interaction takes place at the end-effector. The robot dynamics in the Cartesian space are given by where X ∈ ℝ n represents the position of the end effector. The velocity and acceleration satisfy Then, we obtain According to [20], dynamics of a robot's environment can be described by a spring model. This corresponds to the equilibrium point control model that describes human motor control [36], which is given below: where K h is the n × n equivalent stiffness matrix and X h is the desired trajectory of the human. With Eqs. (2) and (5), we have a full description of the system dynamics and are ready to discuss the control objective.

Control objective
In traditional control tasks, the reference trajectory of the robot is available for the control design. However, in this paper, it should be designed according to the desired trajectory that is determined by the human partner and unknown to the robot. In the rest of the paper, we propose an approach to estimate the human partner's desired trajectory and control the robot to track it.
For this purpose, impedance control is adopted with the following target impedance model: where X r is the robot's reference trajectory. M d , C d and G d are inertia, damping and stiffness matrices that are specified by the control designer, respectively.
To make Eq. (2) identical to Eq. (6), the robot's joint torque is given by u = J T F with To estimate the human partner's desired trajectory, we observe from Eq. (5) that the interaction force F h becomes 0 when the human's desired trajectory is perfectly tracked, i.e. X = X h . This indicates that we can minimize F h so that the robot's trajectory X gets close to X h as much as possible. Based on this idea, the control objective becomes The next question is how to design the reference trajectory X r to achieve the above control objective. As discussed in the Introduction, this reference trajectory can be learned through repetitive interaction between the robot and the human. In particular, we will adopt ILC to iteratively update X r so that the control objective F h = 0 is eventually achieved.
The control flow is illustrated in Fig. 1, where i represents the number of iterations. The "Impedance Control" part shows that the robot dynamics (2) are controlled by the robot controller (7), which has inputs X r i and F h i measured by a force sensor in the "Measurement and Storage" part. In the "Update Reference Position" part, (5) is an internal human model that relates the position information and interaction force F h i , which is also used to derive the dynamics model (12). Then, the developed ILC (26) uses the dynamics model (12) to update the reference trajectory X r i+1 and store it for the use in the next iteration, which will be detailed in the following section (Fig. 1).

System model transformation
For clarity of presentation, in the subsequent analysis we consider each Cartesian variable independently and replace M d , In this way, the target impedance model in a single direction becomes By using the Laplace transform of (9), we have Using the environment model (5), x(s) can be described by Substituting for x(s) in (10) yields The above equation combines the robot's impedance model and the human motor control model. By defining . In order to apply ILC, this statespace form is further written for each iteration as below: where t ∈ [0, T i ] and i ∈ [1, N] is the iteration index. N is the total number of iterations and T i is the trial length of the ith iteration. C = [1, 1] and e i (t) , i (t) , i (t) denote input, state and output of the system (14), respectively. Note that T i may be different in a different iteration, as the human introduces uncertainty to the interaction. Therefore, traditional ILC with a fixed period cannot apply to this case. In the following subsection, we will develop an ILC with varying periods. Moreover, it is important to emphasize that the control objective is to make the interaction force F h be zero, indicating that the human partner's desired trajectory will be tracked. This is denoted as lim i→∞ f h i (t) = 0 and lim i→∞ e i (t) = 0 in the framework of ILC. Thus, for the system (12), e d (t) = 0 and d (t) = 0 are the desired input and output, respectively.

ILC design
The main problem about the ILC design for the system in Eq. (14) is that the period T i is uncertain and usually different from a fixed period T d . In order to address it, we first give some definitions and assumptions that will be used in the later design.

Definition 1 [⋅] , [⋅] and [⋅]
denote the occurrence probability of the event, the expectation of a random variable and the probability distribution function.

Assumption 1 Assume that T i is a stochastic variable, and its probability distribution function is
where 0 ≤ p(t) ≤ 1 is a known continuous function. T min and T max are the minimum and maximum of T i that can be obtained through multiple iterative experiments.
, which indicates that the initial state is the same at every iteration.
If T i = T d and initial conditions are consistent, an open loop D-type ILC for the system (14) is usually designed as (16) e i+1 (t) = e i (t) +̇i(t) Fig. 1 Control flow [0, T d ] , and is the learning rate. Considering that x h (t) does not change with the iteration, i.e. x h i (t) = x h i+1 (t) , Eq. (16) can be rewritten as which gives an updating law for x r . When the actual period T i ≠ T d , which is the case of the non-standard ILC, the updating law (17) has to be redesigned.
Since T i changes with iteration, two cases need to be discussed: T i < T d and T i ≥ T d . On the one hand, T i < T d means that the ith iteration ends before the desired period, so both the output i (t) and i (t) on the time interval [T i , T d ] are missing, which thus cannot be used for learning. On the other hand, T i ≥ T d means that the ith iteration is still running after the desired period, the data from (T i , T d ] are redundant and useless. In order to deal with those missing data or redundant data in different cases, a sequence of stochastic variables is defined to satisfy Binomial distribution, so that a newly defined force error i (t) is introduced to improve the traditional ILC.
The main improvement process has the following four steps: 1. Define the stochastic variable i (t) to satisfy the Binomial distribution.
In this paper, we define i (t) , t ∈ [0, T max ] as a stochastic variable satisfying Binomial distribution and taking binary values 1 and 0. i (t) = 1 indicates that the control process of (14) can arrive at time t in the ith iteration. The probability of i (t) = 1 is q(t), where 0 < q(t) ≤ 1 is a distribution probability function of time t. i (t) = 0 indicates that the control process of (14) cannot arrive at time t, which occurs with a distribution probability of 1 − q(t).

Compute the probability [ i (t) = 1] and the expectation
Because of T min ≤ T i ≤ T max the control process (14) will not stop before T min . i (t) = 1 is the inevitable event when t ∈ [0, T min ] , which implies that q(t) = 1 , ∀t ∈ [0, T min ) . For the scenario of t ∈ [T min , T max ] , the event i (t) = 1 represents that the control process (14) stops at or after time t, which means that T i ≥ t .
[T i = t] = 0 is applied, thus Combining Eqs. (15) and (18), we can obtain that Because i (t) satisfies the Binomial distribution and taking binary values 1 and 0, the expectation In order to compensate for the missing data and reduce the error of single learning, an iteratively moving average operator is introduced, as below for a sequence e i−m (t), e i−m+1 (t), e i−m+2 (t) … , e i (t) with m ≥ 1 , which includes only the recent m + 1 iterations that could provide more accurate control information for learning process. The ILC updating law is given as follows: where the learning rates j ∈ , j = 0, 1, 2, 3 … , m and e i (t) = 0, i < 0 . Considering Eq. (17), we develop the following updating law for x r : With the above design, the main result of this paper is presented in the following theorem and its proof is in the Appendix.

Remark 1
While the probability distribution of T i is unknown, it could be estimated based on preliminary experiments. Thus, the probability distribution function F T i (t) in Assumption 1 is known and q(t) is available for controller design that can be calculated by Eq. (19).

Simulation I
In this section, we consider a single joint application scenario with an exoskeleton robot for gait tracking through iterative learning. Related works have been done for lowerlimb exoskeleton for gait tracking and rehabilitation, such as [37,38]. As shown in Fig. 2a, in the simulation we suppose that the interaction force could be measured by the force sensor mounted on the exoskeleton.
The desired joint trajectory of the user's gait is set as stiffness is set as k h = 18 N/m. In the Cartesian space, we take into account the change of the endpoint position in the X direction. The link length is 0.5 m, so according to the mapping between joint angles and endpoint position, we can obtain that the desired endpoint position with time is . The reference trajector y before lear ning is x r 0 (t) = (−0.5, 0) m, t ∈ [0, 10] and the initial position is (−0.5, 0) m for all iteration trails. The learning period of each iteration will change because of the user's influence, so the periods of the 4th, 5th, 6th, 7th, 8th and 10th are different from T d . And we assume that the period obeys a normal distribution on the time range [6,10] s with a mean of 8 and a standard deviation of 0.2. As is shown in Fig. 2b-d, the end of each iteration is marked with a dot. In order to obtain better simulation results, we take the first three results as the iteratively moving average operator's elements, so m is set as 3 and 0 = 1 = 2 = 3 = 0.25.
The tracking performance of the trajectory and joint angle are shown in Fig. 2b, c. It can be seen from the two figures that the actual position and joint angle trajectory (the solid lines in different colors) track the desired trajectory (the curve shown by the dotted red line) rapidly and precisely. In addition, the interaction force converges to zero after 20 learning iterations which can be observed from Fig. 2d. These results show that the whole learning process is also a skill transfer process. When the robot learns the user's gait, it becomes easier for the user to cooperate with the exoskeleton robot.

Simulation II
Different from simulation I, in this section, a 2-DOF manipulator is used to verify the proposed period-varying ILC algorithm. A typical trajectory learning process is considered in this simulation which can emulate an application of object manipulation, as is shown in Fig. 3a. The robot moves in the X − Y plane with an initial position (0, 0)m. With the guidance of a human partner, the robot gradually learns to track the desired trajectory that is determined by the human partner. It is assumed that the interaction force is only exerted in the Y direction and the robot's speed in the X direction is 0.1 m/s. The desired trajectory in the Cartesian space is x h = sin0.2 t Without loss of generality, the input of the initial iteration is x r 0 (t) = [0, 0] , t ∈ [0, 10] s. In this simulation, we assume that the iteration period obeys the Gaussian distribution with a mean of 10 and a standard deviation of 0.2. We further set m = 3 and 0 = 1 = 2 = 3 = 0.2 . The algorithm runs for 20 iterations.
The tracking performance of the trajectory is shown in Fig. 3b, where the lengths of the 4th, 5th, 6th, 8th and 10th periods are different from the desired one. It is obvious that the robot can gradually reach the human's desired trajectory x h . The interaction forces at the 4th, 5th, 6th, 8th, 10th, 15th and 20th are shown in Fig. 3c. It is observed that the interaction force is reduced to zero in the 20th iteration, so the robot fully learns the desired trajectory and it can eventually complete the task alone.
In addition, we change the value of k h to represent the stiffness of different human users, varying from 10 to 28 N/m, i.e. k h user1 = 10 N/m, k h user2 = 12 N/m … k h user10 = 28 N/m. According to the above task and parameters, simulations are designed to show the tracking performance of ten different users. Furthermore, we introduce traditional impedance control with zero stiffness for comparison. In order to facilitate comparison of the results, the values of inertia and damping parameters are the same as in the proposed ILC. For the proposed ILC, we record the different users' total iteration numbers, maximum errors and maximum interaction forces after convergence. For the impedance control, the maximum errors and maximum interaction forces are recorded. The tracking performance and interaction force of impedance control with different users are shown in Fig. 4a, b where different colors represent different users, while it can be found that different users' tracking trajectories almost overlap. With the proposed method, the number of iterations is different for each user, which decreases as k h increases in an appropriate range as shown in Fig. 5a. From Fig. 5b, c, it can be found that the proposed ILC achieves smaller maximum error and interaction force compared with impedance control for different users. Thus, the proposed method improves the tracking accuracy and dramatically reduces the interaction force so less human effort is required.
Simulation I and II demonstrate that the proposed method for tracking repeated trajectories is feasible. However, parameters of the desired impedance model and learning rates of the ILC are different, as the choice of parameters will affect the accuracy of tracking and the speed of convergence. Based on the desired impedance parameters in reference [22], we adjust them appropriately and find that they should not be chosen too large, otherwise the system will become unstable. For iterative learning rates, they must satisfy the condition in Theorem 1. It is worth noting that large learning rates i lead to faster convergence but larger tracking errors, whereas small learning rates will make the iteration number increase. The influence of the impedance parameters and the iterative learning rates have been extensively studied in the literature [22,33,34], respectively, so this paper does not discuss it further.

Experiment verification
In this section, to verify the validity of the proposed algorithm we carry out an experiment on the Sawyer robot, which is a 7-DOF manipulator equipped with a motor encoder and a torque sensor in each joint. As is shown in Fig. 6, a human user guides the robot manipulator to track a circular trajectory according to his intent but this trajectory is unknown to the robot. This experiment setup emulates an industrial application where the robot learns a motion from the human partner through repetitive collaboration. In the experiment, joints 1 and 2 are used which make the experimental platform similar to that in the simulation in Fig. 3a.
The experiment process is shown in Fig. 6, with a circular trajectory that starts from A(0.26, 0) m and has a diameter of 0.4 m. We set the desired period as T d = 20 s that is estimated by preliminary experiments. In the X-axis direction, we set where T i is the period of each iteration determined by the human user. In the Y-axis direction, there is a vertical interaction force on the robot's end-effecter. One key issue of the experiment is to measure the interaction force, since there is no force sensor that can directly measure the force at the end-effector. Therefore, we convert the joint torque that can be measured by the torque sensor at each joint to obtain the approximate interaction force. The parameters used in the experiment are summarized in Table 1. Because the physical meaning of each parameter is introduced above, we do not repeat them here.
During the process of the experiment, due to continuous change of human-robot interaction force and the interference of friction in each joint, the learning period T i changes accordingly. We record the learning periods of typical iterations in Table 2. As is shown in Fig. 7a, we present the change of the interaction force over the iterations. It is worth noting that due to the force measurement noise, the measured data must be smoothed through a filter. And then we take the interaction force into the designed ILC algorithm to constantly update the robot's reference trajectory. In order to control the motors, we convert the end-effector positions to joint angles through inverse kinematics. To further illustrate the effectiveness of the proposed algorithm, we also fit the filtered force data to get better tracking performance as is shown in Fig. 7b. Finally, the position change of the endeffector in the Y-direction and the trajectory in the vertical plane are given in Fig. 7d, f. From Fig. 7c, e, it can be seen that the robotic manipulator accomplishes the trajectory tracking task under the effect of the actual interaction force, and the maximum error in the 15th iteration is less than 1 cm. In addition, we get a better tracking performance through the fitted interaction force as in Fig. 7d, f. Correspondingly, the interaction force gradually decreases to 0 and the trajectory gradually converges. These results clearly demonstrate that the robot is able to track unknown human user's desired trajectory and reduce human force, in presence of time-varying periods.
The proposed method is based on impedance control where the interaction force is modulated by refining the robot's reference trajectory. However, traditional impedance control generates a virtual reference trajectory but its real reference trajectory does not change. In comparison, the proposed ILC updates the robot's real reference trajectory and enables the robot to proactively move to the human partner's desired trajectory with less effort.

Conclusions
Unlike iterative learning control in conventional applications such as motion control, the reference trajectory cannot be programmed for the robot in the coupled system of human-robot interaction. Therefore, we introduce a simplified interaction force model to estimate the human partner's desired trajectory and apply the filtered interaction force to update the robot's reference trajectory. Stability of the non-standard iterative learning control system has been shown to be guaranteed by rigorous analysis, where the condition of a fixed period is relaxed for the iterative learning control design. The validity of the proposed method has been verified through two simulations of gait tracking and trajectory learning. A further experiment of trajectory learning on a physical robot proves its feasibility. These results show that the proposed method has a potential to reduce human effort in human-robot collaboration and transfer human skills to robots, which can be applied to applications such as heavy load transport in automotive industry and compliant machining.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.

Appendix
Due to the control objectives of f h (t) = 0 and e(t) = 0 , we have f h d (t) = 0 and d (t) = 0 , then we obtain From Eqs. (25) and (21), the following relationship is obtained