Iterative learning-based path control for robot-assisted upper-limb rehabilitation

In robot-assisted rehabilitation, the performance of robotic assistance is dependent on the human user’s dynamics, which are subject to uncertainties. In order to enhance the rehabilitation performance and in particular to provide a constant level of assistance, we separate the task space into two subspaces where a combined scheme of adaptive impedance control and trajectory learning is developed. Human movement speed can vary from person to person and it cannot be predefined for the robot. Therefore, in the direction of human movement, an iterative trajectory learning approach is developed to update the robot reference according to human movement and to achieve the desired interaction force between the robot and the human user. In the direction normal to the task trajectory, human’s unintentional force may deteriorate the trajectory tracking performance. Therefore, an impedance adaptation method is utilized to compensate for unknown human force and prevent the human user drifting away from the updated robot reference trajectory. The proposed scheme was tested in experiments that emulated three upper-limb rehabilitation modes: zero interaction force, assistive and resistive. Experimental results showed that the desired assistance level could be achieved, despite uncertain human dynamics.


Introduction
According to the World Health Organization (WHO), each year around eight million people suffer from upper-limb motor dysfunctions [1]. One of major means of recovery is after-stroke rehabilitation [2], which uses various training modes according to human patients' recovery stages [3,4]. Over the last few decades, robot-assisted rehabilitation (RAR) has gained considerable interest and proved its effectiveness to address motor dysfunction [5].
In RAR, regulation of physical human-robot interaction (pHRI) plays a key role in improving human patients' recovery [6] and more technically, it affects the stability and performance of HRI systems [7][8][9]. Therefore, various control techniques have been introduced for RAR, including impedance control, position control and force control [10][11][12][13]. Due to its inherent robustness, impedance control has been explored extensively in the literature for pHRI and particularly RAR [6,[14][15][16]. One challenge of using impedance control-related approaches is how to obtain optimal impedance parameters that determine the relationship between the interaction force and position [17,18]. By choosing the robot's impedance parameters, it can provide corresponding assistance to the human users. For example, stiff interaction with high impedance is desirable to assist human users with little arm function. Conversely, excessive assistance from the robot is not beneficial to promote human users' recovery. These problems motivate researchers towards variable impedance control, aiming to design adaptive impedance approaches to maintain a desired level of pHRI [19,20]. An adaptive impedance controller was developed in [21], where surface electromyography (sEMG) signals were used to obtain the optimal reference impedance parameters for an upper-limb robotic exoskeleton. In [22], minimal-intervention-based admittance control was developed to improve the degree of participation and maximize the effects of motor function training for patients. In the context of RAR, in this paper we will develop an adaptive impedance method to guarantee the robot's tracking capability in the presence of external disturbance, including the interaction forces generated by unintentional human movements.
Besides impedance parameters, the robot's reference trajectory is another open factor that can be designed to regulate the pHRI. In early research works, predefined reference trajectories were usually used [23]. Recent researches have looked into the update of robot's reference trajectory in order to improve pHRI [24,25]. In [26,27], the rehabilitation robot updated its trajectory or followed the target trajectory in response to the change of human partner's interaction profiles, such as force and torque. In [28], a therapist-in-the-loop framework was introduced to adjust the desired trajectory for the patients when it is unsuitable. Despite these works, a systematic framework to automatically update the robot's reference trajectory in the presence of uncertain human dynamics is still missing [29]. In this paper, we explore iterative learning control (ILC) given the repetition nature of rehabilitation tasks.
ILC is a well-established control approach suitable for repetitive tasks and it has been used to cope with the uncertainties and unknown dynamics in various motion control systems. In [30], ILC was used to model human learning in repetitive tasks. In [31], online linear quadratic regulator based on ILC was proposed to determine the optimal weight matrix for trajectory tracking. In [32], a passivity-based ILC approach was developed to guarantee the convergence of the tracking error. In this paper, we propose a novel approach to use ILC for the rehabilitation robot's controller design, by updating the robot's reference trajectory for the next cycle according to the interaction force in the current one. In the presence of uncertainty and without requirement of knowledge of human dynamics, the proposed method is able to achieve a constant level of assistance to the human by repeating the rehabilitation exercise, represented by a predefined desired interaction force. As the proposed approach is based on ILC, its learning convergence can be explicitly proved, which is essential to ensure the desired level of interaction.
Noticing the fact that a human user needs assistance in their movement direction but constraint in other directions in order to achieve accurate task path tracking, we divide the task space into two subspaces with different control strategies. In a 2-dimensional case, the aforementioned adaptive impedance control is implemented to constrain the human user onto a predefined task path, while trajectory learning is implemented in the direction along the task path to provide a desired level of assistance to the human user. For this purpose, we adopt the coordinate transformation method in contouring control [33][34][35], where the robot's reference frame is attached to its own reference trajectory with an axis along the trajectory and the other normal to it. Therefore, the proposed approach achieves both assistance and constraints to the human movement, in the context of RAR.
The main context of the proposed approach in this paper is summarized as below.
-Adaptive impedance control is introduced to ensure the tracking performance of the rehabilitation robot. An update law is developed to regulate the robot's impedance parameters to cope with the unknown disturbance from the external environment and human user's unintentional movement. -Trajectory learning is proposed to provide a constant level of assistance with a desired force to the human user, in the presence of uncertain and unknown human dynamics. A learning law is developed by using the interaction force to update the robot's reference trajectory. Due to the nature of rehabilitation exercise and ILC, the learning convergence can be achieved without the knowledge of human dynamics that are typically different from one human user to the other. -A task frame with reference to the robot's reference trajectory is defined, so that the above two control strategies can be implemented in two separate subspaces to provide both assistance and constraints to the human movement. It merges the idea of contouring control in motion control systems and provides a new direction of controller design for pHRI applications.
Compared to related works in the literature, the novelties of the proposed approach are threefold: an impedance adaptation method is proposed to compensate for unknown human force and assist the human user to follow the task trajectory; a new trajectory learning method is developed to achieve a desired assistance force, which addresses the problem of unknown human movement speed; and an online motion planning framework is proposed to allow the robot to achieve two independent control objectives in two subspaces. Section 2 presents the problem formulation and preliminaries about coordinate transformation. Section 3 describes the proposed controller with adaptive impedance control and trajectory learning in two subspaces. The experimental results are presented in Sect. 4. Finally, the conclusion and future work are summarized in Sect. 5. For the convenience of the readers, related notations are summarized in Table 1. 2 Problem formulation and preliminaries

Problem formulation
An upper-limb rehabilitation scenario is illustrated in Fig. 1, where a human hand holds a handle (the robot's end-point) to carry out a predefined exercise, e.g. following a circular path. The robot is able to provide assistance forces to the human hand, whose levels can be predefined according to the human user's recovery stages, e.g. a large assistance force for a user who can barely move their arm and a small one for a user who can complete the task partially. While the robot has prior knowledge of the task path, it does not know the human user's movement pattern, e.g. human speed.
In this scenario, we mainly consider two objectives that a typical rehabilitation robot should achieve. First, the robot should provide a desired level of assistance to the human user in the direction along the predefined path, which is quantified by the interaction force between the robot and the human hand. Second, the robot should assist the human user to stay onto the path when their hand drifts away, e.g. due to hand trembling. These two control objectives can be achieved in two separate subspaces divided with reference to the predefined path. As shown in a 2-dimensional case in Fig. 2, the robot's task space can be defined by a coordinate frame attached to the predefined path, with one axis normal to the path and the other tangential.
With the robot's task frame defined, two controllers will be, respectively, designed in two directions: a position controller in the direction normal to the path and a force  controller along the path. However, the design of these two controllers is nontrivial. For the position controller, the robot is subject to external disturbance and unintentional human movement, so we will develop an adaptive impedance controller to address these issues. For the force controller, as human movement and dynamics are different in each individual, we propose an ILC-based learning method to update the robot's reference trajectory according to the interaction force, without requirement of human model and relying on the repetition nature of the rehabilitation exercise.

Preliminaries
This section introduces the preliminaries about the coordinate transformation from a world frame to the frame attached to the task path, which will facilitate the controller design in the following section.

Contouring error
We start with introducing contouring error, which has been mainly studied in the literature of motion control [36]. Without considering the orientation of the robot's endpoint, its actual position in the original world frame is defined as The robot's predefined desired position in the world frame is Thus, the tracking error in the world frame is The contouring error e oc is the minimal distance between the actual position and the desired path, defined as From the above definition, we find that it is nontrivial to compute the contouring error, which in many cases does not have an analytic solution. In this paper, we adopt a firstorder method to approximate the contouring error [36], which will be detailed in the following subsection.

Coordinate frame transformation
Given the desired trajectory X od , the following unit vectors can be computed: where m is a unit tangent vector, n is a unit normal vector and p is a unit binormal vector. Then, a transformation matrix is obtained as which can be used to transform e o from the world frame to the task frame, defined as where e is the tracking error in the task frame corresponding to e o . When the desired position X od is close to the actual position X o , the contouring error e oc can be approximated by the normal and binormal components of e o . In a 2-dimensional case, the contouring error can be approximated by the projection of the tracking error to the normal direction, i.e.
where e n is the tracking error in the normal direction, as a component of with e t as the tracking error in the tangential direction.
As the contouring error is approximated by the tracking error in the normal direction, we can design a position controller to reduce this error so that the human movement will be constrained to the desired path. In the tangential direction, a force controller can be designed to achieve a desired level of assistance to the human user. In this way, the robot's task space is divided into two subspaces, with two controllers to be developed independently.

System dynamics
The dynamics model of a planar rehabilitation robot is given as where M d and B d are positive definite inertia matrix and damping matrix, respectively, u is the robot's control input, f h and f d are the human force applied to the robot and the disturbance force, respectively. For analysis purpose, the human force can be modelled as where K h is the human arm stiffness and X hd is the desired position of the human arm's endpoint. Since the human user performs repetitive rehabilitation exercises, X hd corresponds to the predefined task path so it is assumed to be periodic with a task duration T. Note that these parameters will not be used in the robot's controller. We also consider a disturbance force due to external environment or human's unintentional movement, modelled as where K d1 , K d2 are unknown constant matrices, X d is the desired trajectory of the robot's end-effector. This model shows that the disturbance force makes the robot diverge from its reference trajectory.

Robot controller
The robot's controller is designed as where u 1 is the feed-forward term to compensate for the robot's dynamics, u 2 is the feedback term to guarantee the stability when there is no disturbance and u 3 is the adaptive impedance term to deal with the unknown disturbance f d . They are, respectively, designed as where K p and K d are the robot's feedback gains, and where K and D are the robot's adaptive impedance matrices. Combining Eqs. (10)- (16), the dynamics of the closedloop system can be written as In order to obtain stiffness K and damping D, we consider a Lyapunov function candidate where h k and h d are positive parameters to adjust the stiffness and damping, respectively. By considering the time derivative of (19) (detailed derivations are found in the Appendix), K and D can be updated as With the coordinate transformation in Sect. 2.2, the system dynamics are divided to a subspace along the predefined path and the other normal to it. Where it does not cause any confusion, we use subscripts ''t'' and ''n'' to denote tangential and normal components in the matrices and variables, respectively.

Trajectory learning
In the direction along the predefined task path, we want to achieve a constant level of assistance from the robot to the human, defined by a desired force f hdt . By considering the model of human force in Eq. (11), we have where X vt is a virtual trajectory that generates the desired force f hdt . Since the human's parameters K ht and X hdt are unknown, X vt cannot be computed directly. Therefore, we will develop a trajectory learning method to obtain it. Combining Eqs. (11) and (21), we obtain with Although X vt is unknown to the robot, we can find that Df hdt is proportional to DX vt according to Eq. (22). Inspired by this observation, a learning law is designed to obtain the robot's desired trajectory X dt as below: where / is a positive learning rate. In other words, the desired trajectory is learned through minimizing the error between the desired force f hdt and the actual one f ht . The learning of X d will converge when f ht ¼ f hdt , i.e. when the desired interaction force in the tangential direction is achieved.
In summary, the proposed controller is designed with reference to a coordinate frame attached to the predefined task path. Adaptive impedance control is developed to guarantee the tracking performance in both normal and tangential directions. Trajectory learning is designed to provide a constant level of assistance by achieving the desired interaction force in the tangential direction. The proposed control scheme is presented in Fig. 3, with its performance analysis given in the ''Appendix''.

Experimental setup
As shown in Fig. 4, the experimental platform contains an H-MAN robot (ARTICARES Pte Ltd), a force sensor and a control computer. H-MAN is utilized to physically interact with the human user through the handle in a planar space. An ATI Mini-40 force/torque sensor is mounted on the handle of H-MAN. The force information is communicated with an ATI Net Box between the H-MAN and the control computer. All devices send the information to the control computer through a transmission control protocol.
Choosing the right exercise/training modes according to the human patients' recovery stage is important [37]. Upper-limb rehabilitation training modes can be divided into four main categories: passive, assistive, active and resistive modes [38]. In our experiments, three different modes were considered to represent typical tasks: zero interaction force mode, assistive mode with a negative desired force and resistive mode with a positive desired force. During the experiments, the human user was asked to follow a predefined path. At the same time, the robot's control objective was to guarantee the tracking of the predefined path and to provide a constant level of assistance to the human user along the path.
The robot's initial desired trajectory in Fig. 5 is defined as where A ¼ 10 cm, x ¼ 2p rad/s. This trajectory is updated in each cycle by Eq. (24) with / ¼ 0:004.The robot's feedback gains are set as K p ¼ 300 N/m, K d ¼ 100 Ns/m. These impedance parameters are initiated as 0 and updated with factors h k ¼ 1000 and h d ¼ 1000, respectively, to guarantee the smooth interaction during the task. The desired interaction force in the tangential direction f hdt is set in different modes as detailed in the following.

Zero interaction force mode
The experimental results of zero interaction force mode are shown in Fig. 6. In this mode, the desired interaction force was set to be 0N, in other words ideally the human user does not feel any force during the exercise along the given circular path after trajectory learning. To this end, trajectory learning was used to update the robot's desired trajectory in the tangential direction until it achieves the required interaction. Figure 6a shows the actual position of the robot during each cycle, converging to the predefined circular path. Figure 6b shows the normal direction tracking error reduces to small values, indicating the robot assisting the human user to stick to the predefined path. Figure 6c, d shows the velocity change in the tangential direction to match the human speed and thus the interaction force iteratively converges to about 0N. Figure 6e-f shows the impedance adaptation in both normal and tangential directions to keep the human movement close to the circular path. Note that the disturbance modelled by Eq. (12) is unknown in the experiments, but results of impedance adaptation have shown how the robot automatically updates its parameters to deal with the unknown disturbance and to guarantee the tracking performance.

Assistive mode
Assistive mode is usually used in initial recovery stage for motor dysfunction. Experimental results in this mode are presented in Fig. 7. Figure 7a shows how the human user followed the robot lead and completed the path following task. It can be seen that the robot did not go back to the initial position due to the slow movement of the human user. Nevertheless, the position error in the normal direction was significantly reduced as illustrated in Fig. 7b. In this mode, the robot's desired trajectory in the tangential direction was also automatically updated so the required velocity and interaction force of À2N were achieved, as shown in Fig. 7c-d. Figure 7e shows similar converging stiffness values compared to that in the zero interaction force mode. However, Fig. 7f shows higher converging stiffness values compared to the counterpart in the zero interaction force mode. Due to the desired interaction force of À2N, it requires a higher stiffness to ensure trajectory tracking in the presence of human force. These results demonstrate how adaptive impedance control can automatically deal with different cases to ensure the tracking performance.

Resistive mode
In RAR, resistive mode plays a key role to promote motor learning in a later stage of recovery. This mode was emulated by setting a desired interaction force of 3N and the results are shown in Fig. 8. Similarly, Fig. 8a-f, respectively, illustrates the robot's actual position during exercise, normal direction position error reduced to a small value through impedance adaptation, velocity changed to match the human speed, tangential direction force converging to the desired value and stiffness parameters converging to a certain value. Compared to other two modes, the robot's velocity converges to a smaller value, indicating its attempt to resist the human user from reaching their desired speed.

Comparative analysis
To further explain the effectiveness of this proposed approach, fixed impedance control is used to perform experiments for above mentioned three modes of training. The robot's desired trajectory for fixed impedance control is the same as in Fig. 5 and its fixed impedance gains are set as K p ¼ 300 N/m, K d ¼ 100 Ns/m. The desired interaction force set in different modes is modulated by changing x in Eq. (25). Before the experiments, x is estimated when human user moves with the robot inactivated and the estimated value x ¼ 2p is used for zero interaction force mode. For assistive mode, x is multiplied with a constant 1.07 and for resistive mode, x is divided by 1.07. Figure 9 shows results with fixed impedance control in the above three modes of training. From Fig. 9a, d, fixed impedance control yields a larger normal position error and divergence of tangent force compared to results in Fig. 6b, d. Similarly, desired interaction force convergence is also not achieved in assistive and resistive modes in Fig. 9e, f with larger normal position errors in Fig. 9b, c, compared to results shown in Figs. 7 and 8. To conclude, as compared to fixed impedance control, iterative learning-based path control achieves convergence of interaction force and impedance adaptation ensures path following.

Multiple trials
Our approach is based on an assumption of consistent human movement. Although this assumption can be partly fulfilled by asking the human subject to repeat the same movement during each cycle, there exist inevitable uncertainties. In this subsection, more experiments with multiple trials were carried out to examine the robustness of the ) and ensure small normal direction errors (subfigure (a)) in all modes. In addition to this, the average values of interaction force show that different desired forces have been achieved (subfigure (b)) in each mode. In particular, Fig. 10b shows an interaction force between 0.5 and 1.1 N in the zero interaction force mode. The nonzero force is likely due to the friction that has not been compensated. Figure 11b shows an interaction force between -ð1:9 À 2:1ÞN, which is close to the desired  Figure 12b shows an interaction force between 2.6 and 3.1 N, which is also around the desired value of 3 N in the resistive mode. Finally, we use one-way analysis of variance (ANOVA) to verify the differences of performance measures between three modes. The mean value and standard deviations of normal direction position errors and tangential direction forces in each mode were computed using 200 data points for each mode in five trials. Figure 13 shows that the tracking error in the normal direction has significant differences between zero interaction force mode and the other two modes. Larger tracking errors are due to nonzero interaction forces but their mean values are less than 0.3 cm, which is acceptable. Figure 13 also shows that the interaction force in the tangential direction has significant differences between different modes, indicating different desired levels of interaction. Above experimental results show that although human movement uncertainties exist, our approach ensures relatively consistent performance.

Conclusions
In this paper, a combined scheme of adaptive impedance control and trajectory learning is proposed and performed in a coordinate frame attached to a predefined path for RAR. In order to deal with the influence of unknown disturbances from the environment (including the human user) and the robotic system, we have proposed an updating law to adapt the robot's impedance. Considering different dynamics of the humans, a trajectory learning algorithm has been developed to provide a constant level of assistance for the repetitive training tasks in the tangential direction. The robot's trajectory can be updated based on The validation of the proposed approach has been performed by comparative experiments in different interaction modes. Further analysis has been also performed with multiple trials to demonstrate its robustness. Future works include test and improvement of the proposed approach in real clinic trials with human patients. Moreover, multiple robots assisting in completing a common task, i.e. the distributed collaborative optimization problem, will be also studied for RAR [39,40].
achieving a desired interaction force, i.e. when t ! 1, f ht ¼ f hdt . Let us first consider the adaptive impedance control, by defining where J c has been defined in (19) and J e is given by Taking the time derivative of (19), we have Taking the time derivative of (27) and considering the closed-loop dynamics (17), we have By combining Eqs. (28) and (29), we have Therefore, when t ! 1, _ e ! 0. According to Eq. (17), we have e ! 0. For the trajectory learning in the tangential direction, another Lyapunov function candidate is defined as Then, the difference between J rt in two cycles is DJ rt ¼J rt ðtÞ À J rt ðt À TÞ: After further expanding Eq. (32), we have

Declarations
Conflict of interest The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.