Intent inference in shared-control teleoperation system in consideration of user behavior

In shared-control teleoperation, rather than directly executing a user’s input, a robot system assists the user via part of autonomy to reduce user’s workload and improve efficiency. Effective assistance is challenging task as it requires correctly inferring the user intent, including predicting the user goal from all possible candidates as well as inferring the user preferred movement in the next step. In this paper, we present a probabilistic formulation for inferring the user intent by taking consideration of user behavior. In our approach, the user behavior is learned from demonstrations, which is then incorporated in goal prediction and path planning. Using maximum entropy principle, two goal prediction methods are tailored according to the similarity metrics between user’s short-term movements and the learned user behavior. We have validated the proposed approaches with a user study—examining the performance of our goal prediction methods in approaching tasks in multiple goals scenario. The results show that our approaches perform well in user goal prediction and are able to respond quickly to dynamic changing of the user’s goals. Comparison analysis shows that the proposed approaches outperform the existing methods especially in scenarios with goal ambiguity.


Introduction
Teleoperation is an effective way for robotic system to implement complex tasks in the unstructured environment [1]. In direct teleoperation system, the real-time input command from the user via an interface is simply mapped to the actuation signal for remote robotic manipulator control in task space. While direct teleoperation systems anticipate exploiting the user capabilities and intelligence to interact with the remote environment, it can result in heavy workload for the user due to the intense concentration on monitoring of remote motion details. For example, consider most repetitive and tedious point to point space motions, such as approaching to target object and moving to the desired location in "pick and place" task. Direct teleoperation is also limited by the inadequacies of the interface, noise of input, and delay in communication, leading to inefficient and poor task performance, or even failure.
Shared-control teleoperation systems blend user input and robotic autonomy, thus alleviating user burden and improving the efficiency of the executing tasks. Here, shared-control system should tackle two fundamental problems for effective assisting: (1) intent inference, that is predicting the user goal as well as planning the path that the user would like to take towards the predicted goal, and (2) formalizing the arbitration policy to blend user input and the prediction. The two problems are successive steps in shared-control. The correct intent inference is the prerequisite to take effective assistance. Many approaches have been investigated to tackle the intent inference problem. But few of the existing work considered the user behavior in goal prediction and it remains controversial whether the predicted path to the goal is that the user prefers. For example, in grasping task, the user has specific decision and preference on how to approach the objects. As shown in Fig. 1a, to grasp the object G , the user may prefer path j than i , though path i looks more optimized-shorter and less time-consuming. Assistance policy giving predicted path that the user does not prefer might backfire on the overall performance and result in laborious, noisy, and time-consuming movements. User behavior may also affect the prediction of the goal as the scenario illustrated in Fig. 1b. The object G 2 is considered as the intended object by mistake before passing by it. Actually, the object G 1 is the targeted one. Therefore, considering the user behavior in intent inference is nontrivial and will contribute to efficient assistance.
In this work, we focus on tackling the intent inference problem in shared-control teleoperation. Particularly, the user behavior was investigated and considered in goal prediction and planning the path to the goal. We took a primitive motion-point to point teleoperation task as an example in this study. In the point to point teleoperation task, we assumed that users preferred to adopt the motion paths following their behaviors. Our approach modeled the user behavior and generated the action paths via learning from demonstration (LfD). A simulation system (Fig. 3a) was developed to demonstrate the approaching task in virtual reality. The recorded point to point motions were modeled as nonlinear autonomous dynamical systems (DS) with a single attractor at the goal. The learned DS based user behavior model was then used to compute the similarity metrics between the short-term path of the user and the user behavior. At each time step of teleoperation, maximum entropy principle was employed to model a probability distribution over the candidate goals based on the similarity metrics. A smooth action path to the most likely goal was generated based on the DS based user behavior model. Figure 2 illustrates our system schema. Experimental results indicated that our approaches perform well in goal prediction by taking into account of the user behavior. Especially, our methods could quickly respond to goal changes or perturbations.
The contributions of this work include: • We considered the user behavior in intent inference under shared-control framework and developed a simulated teleoperation system to sample and model the user's motion behavior. • For goal prediction, we proposed two metrics-path distance and directionality deviation to measure the similarity of the user executing path to the modeled user behavior. Under the proposed similarity metrics, two short-term memory based goal prediction approachesshort-term path distance (SPD) and short-term directionality deviation (SDD) were respectively proposed using maximum entropy principle. Fig. 1 User behavior in the approaching task. a Different paths approaching to the goals, path j is the preferred one but path i is the shortest one. b User behavior affects the goal prediction Fig. 2 System schema. Core components include user behavior learning from demonstrations and user intent inference for shared autonomy assistance

Related work
The first case involved assistive teleoperation was presented by Goertz [2] in the task of turning cranks. Since then, a great variety of methods have been proposed to introduce autonomy in teleoperation system [3]. For example, Aigner and McCarragher proposed to blend user input with a potential field to avoid obstacles [4]. Gopinath et al. developed an interactive interface allowing user to optimize the arbitration parameters for assistance during teleoperation [5]. A predictthen-blend framework for shared control teleoperation was discussed by Dragan and Srinivasa [6], providing a unifying view of assistance. The predict-then-blend framework were also used in some other studies [7,8]. Goal prediction problem, which referred to the first task in intent inference, has been widely investigated under various human-robot interaction scenarios. Examples include goal prediction of a shared control teleoperation system considering eye gaze behaviors [9], short-term goal prediction using Bayesian reasoning [10], and task types estimation given the 2D mouse inputs [11]. Many methods have been proposed for addressing the goal prediction problem by leveraging the machine learning tools, such as Hidden Markov model [12,13] and Bayesian inference [14]. Recently, Rakita et al. developed a shared control-based bimanual manipulation system. They implemented a sequence-to-sequence recurrent neural network architecture to infer the class of bimanual action, which can be considered as a goal prediction problem with four candidate goals [15]. Li et al. examined three classical machine learning models (neural network, support vector machine, and Bayesian network) and determined the best one for inferring manipulation intent based on the grasping configuration [16]. The well-established principle of maximum entropy was also widely adopted for goal prediction, which was usually relevant to the historical observations of user input [6,17]. In this work, we also used the principle of maximum entropy for goal prediction. However, unlike most of previous methods using the whole historical data of user input, we exploited the short-term observations for goal prediction in order to enhance the efficiency and adapt to dynamic changes in task environment.
Planning a path that the user prefers to reach the predicted goal is another task in intent inference. Most motion planners, such as probabilistic roadmaps (PRMs) [18,19]  and rapidly-exploring random trees (RRTs) [20,21], were effective for planning the path towards the predicted goal in shared autonomy. Alternative approaches were known to use trajectory optimizer to acquire the intended path [22,23]. These methods could provide optimal solutions in path planning but they might not be the motion strategy that the user prefers to the goal. This phenomenon was depicted by Dragan and Srinivasa [6] in their user study. LfD framework provides an alternative to reproduce the motions from the demonstrated samples. Promising results have been achieved in recent years. Pastor et al. used dynamical movement primitives (DMPs) [24] to learn the pick and place operation and a water-serving task [25]. Calinon et al. proposed to represent the demonstrated motions using Gaussian mixture model (GMM) and reproduce the motion by Gaussian mixture regression (GMR) [26]. Havoutis et al. introduced a task-parameterized Hidden semi-Markov model (TP-HSMM) to learn task representations and generate robot motion from a few demonstrations of ROV teleoperation tasks [27]. With the algorithm, the operator's work was more efficient and the ROV could autonomously execute the learned tasks even communication is lost. Most recently, Zhang et al. built a virtual reality teleoperation system to implement high-quality demonstrations and proposed a deep neural network architecture for learning the tasks from RGBD images and robot arm pose data [28]. This work developed a simulated teleoperation system for capturing user data and LfD approach was employed to learn user behavior model. We gained insight from multiple works within the LfD framework using dynamical systems (DS) [29,30] to represent motion. Particularly, an approach known as stable estimator of dynamical systems (SEDS) was introduced in our implementation. The SEDS algorithm learned the parameters of a time-independent dynamical system under stability constraints ensuring global asymptotic stability [31].

User behavior modeling via learning from demonstrations
With the paradigm of LfD, the user behavior can be learned and generated from observations of task demonstrations. We here briefly present the modeling of the motions in approaching task as autonomous dynamical systems (DS) with a single attractor at the target.
Let ∈ ℝ d be the state-space variable that can be used to denote the observation of the user demonstration (i.e., the position of the user hand in Cartesian space). Given N demonstrations of the approaching task where the state vector and its velocities are recorded at system time steps, t n ,̇t n ,∀t ∈ 0, T n , n ∈ [1, N] , T n represents total number of data points in the n -th demonstration. These instances are assumed to be generated by a latent function that can be formulated by a first-order autonomous ordinary differential equation (ODE) where f ∶ ℝ d → ℝ d is a nonlinear continuous and continuously differentiable function with a single equilibrium point at the attractor ̇ * = f ( * ; ) = 0 , f is parameterized by and denotes white Gaussian noise. The optimal values of can be obtained from the set of demonstrations using different statistical approaches, such as Gaussian processes regression (GPR), locally weighted projection regression (LWPR) and Gaussian Mixture Regression (GMR). We employed the widely used method GMR in this work to estimate f by encoding demonstrated paths through a mixture of K Gaussian components, which can be determined based on the Bayesian information criterion (BIC) [30,31]. Using mixture modeling, each recorded point t n ,̇t n in the paths is assumed to be generated from a joint probability distribution P t n ,̇t n where P(k) = k is the prior, satisfying the constraint ∑ K k=1 k = 1 , and P t n ,̇t n |k is the conditional probability density function with the mean k ∈ ℝ 2d and the covariance matrices is a set of complete parameters will be required to estimate f . Taking the mean estimate of conditional density E P ̇| yields the estimate of the latent function: To ensure that the estimate of f ( ) is globally asymptotically stable at the target, the stable estimator of dynamical system (SEDS) approach [31] can be used to compute optimal values of by maximizing the likelihood of the demonstrated instances The estimated model characterizes the user's decision on the path taking to reach a goal from any point in the task space. Next, we explain how this basic user behavior model is exploited to infer user intent.

User intention inference and assistance
The problem of inferring the user intent can be dealt with two successive steps: (1) predict user's goal given available data; (2) plan a user preferred motion to the predicted goal.

Goal prediction
One of the critical cues for goal prediction is where the user came from. Prediction can be memory-based by taking into account the historical path of the user inputs. Motivated by previous work about goal prediction based on historical path of the user, we here present short-term memory based goal prediction methods to adapt to the dynamic changes and reduce the high computational cost due to large size of memory data. Rather than using all the historical observations of the user input, our short-term memory based method only considers the latest N observations of the user's input. Let U 0 denote the observation of user's input at the starting motion step, U T denote the current observation of user input at motion step T , U T−(N−1) →U T denote the short-term sequential observations of the user inputs from U T−(N−1) until U T , and Ω denote a set of approachable goals in the task environment. The problem of predicting the intended goal G * given U T−(N−1) →U T , can be formulated as We implement an intuitive idea to predict the user goal: a goal is more likely the intended one, if the user is taking the short-term movements U T−(N−1) →U T that is more similar to the way the user prefers to behave towards that goal. That is, we assume that the user prefers the path towards a goal following similar dynamics of the behavior in the demonstrations. Here, two metrics are proposed to be used for describing the similarity C G U T−(N−1) →U T on goals G ∈ Ω given U T−(N−1) →U T : (1) path distance; (2) directionality deviation.

Path distance
The user behavior model f ( ) can be utilized to predict a path according to user preference from a point to any approachable goals in the task environment. Let * U T−(N−1) →G denote the predicted path from motion step U T− (N−1) to G and * U T →G denote the predicted path from current motion step U T to G . We then concatenate the short-term user path U T−(N−1) →U T with the path * U T →G into a complete path to G through U T , which is denoted as U T−(N−1) →U T →G . Next, the path distance indicating the simi- →G is the dynamic time warping distance (DTWD) between path U T−(N−1) →U T →G and * U T−(N−1) →G . The path distance metric measures how close the observed user path behaves to the path that the user prefers to a goal. Under this metric, the method predicts the goal with smallest path distance.

Directionality deviation
The directionality deviation metric explores the similarity of the motion directions between the observed user movements and that the user prefers. Given the data of latest N motion steps, it is considered that the goal is likely not the intended one with larger cumulative directionality deviation between the observed user movement velocity and the preferred one to the goal. At each motion step, the velocity following the user behavior can be estimated by model f ( ) . The similarity C G U T−(N−1) →U T indicated by the directionality deviation metric then is defined as Using the maximum entropy principle (MEP), the optimal prediction G * can be deduced by the similarity C G U T−(N−1) →U T : where is an adjust parameter. The optimal value of can be derived as opt = − ln

C G,min
, where is a small positive cons t a n t ( s u c h a s = 0.01 ) [ 3 2 ] a n d C G,min = min G∈Ω C G U T−(N−1) →U T . Both the two metrics of the similarity are adopted in goal prediction, where we name our goal prediction method short-term path distance (SPD) based on path distance metric and short-term directionality deviation (SDD) based on directionality deviation metric.

Evaluating confidence
Assistance with wrong goal prediction might backfire on the overall performance. To provide effective assistance, we should take into account the confidence in the goal prediction, c(G * ) . Here, we measure it by the difference between the probability of the predicted goal and that of the next most probable candidate [6]

Motion planning based on user behavior
Based on the user behavior model learned from demonstrations, we can easily plan a user preferred motion at every motion step once the goal has been predicted. At motion step T , given the predicted goal G * located relatively at P G * , the motion in next step p T can be obtained by the user behavior model converging to G *

Assistance under shared-control framework
To formulate assistance, a blending function can be implemented under shared-control paradigm [6,8] where u T denotes the user input, p T denotes the planned motion to the predicted goal, u * is the shared-control command sent to the robot, and ∈ [0, 1] is a blending factor which decides how much autonomy control is blended in the task. In the implementation, is a piecewise linear function of the confidence in the goal prediction. The switching threshold of confidence c(G * ) is set to min below which assistance is not active, i.e., = 0 , while achieves to the maximum ( max ) and keep a constant if confidence c(G * ) larger than max .

Experimental results
In this section, we conducted a user study in approaching scenario, and aimed at analyzing the prediction methods on the teleoperation data. The experiments were performed with an in-house developed simulated teleoperation system. This simulation system was implemented by game engine Unity3D. The HTC VIVE headset was used for immersive displays while the controllers with buttons were configured for tracking user's hand pose as input and controlling the gripper. With the simulation system, user approaching demonstrations were firstly sampled. Based on the recorded data (3D position, 50 Hz), user behavior model was learned in MATLAB using SEDS [31]. The learned user behavior model was then incorporated in the implementation of the proposed SPD and SDD methods.

User behavior learned from demonstrations
In the simulation system, the user demonstrated the approaching task as shown in Fig. 3a. The task was to pick up small cubes located at different positions on the table using the gripper, transporting and placing them into a bin. The paths of gripper motion from different cube locations to the bins are recorded. Figure 3b illustrated the 10 demonstrated paths (dotted red line) obtained from a single user. The direction of motion was indicated by arrows. The user behavior model learned via SEDS [31] was proved to be able to generate paths (thick blue line) following the same dynamics given new starting positions, as showed in Fig. 3c. The learned DS model was also robust if the goal was changed during the task execution. Figure 3d illustrated a case that the model could adapt to the new goal. In the case, the original path (thin blue line) was planned to Goal 1 while a new path is used if the intended goal is changed to Goal 2.

Goal prediction study
We evaluated the proposed goal prediction methods-SPD and SDD, and compared them with Amnesic and Memorybased prediction [6]. The goal prediction was computed during the teleoperation without assistance. For goal prediction, the amnesic and memory-based approaches tackled with the historical information in different way and they both ignored the user behavior.

Amnesic prediction
The amnesic approach predicts the intended goal based on the distance to the goal and the closest goal gets highest probability. This approach only considers the current observation but neglects all the historical information.

Memory-based prediction
The memory-based approach takes into account all the historical observations of the trajectory for predicting the most likely goal. Based on the principle of maximum entropy, the predicted goal g * is given as where s is the starting point, x is the current point, and g is the goal, c g s→x is the cost of history trajectory between s and x , c g * x→g is the cost of the optimal trajectory between x and g , c g * x→g is the cost of the optimal trajectory between s and g , and P(g) is the prior of g.
We compared the performance of these four approaches in two scenarios: (i) no change of goal, where the user maintained a single goal from start to end, and (ii) change of goal, where the user changed the intended goal during the course of approaching. The study involved two goals. The cubes were located at different positions and required to be picked up and placed into 2 potential bins, i.e., blue bin (Goal 1) and red bin (Goal 2), as shown in Fig. 4. For each cube, the user was instructed to (i) pick up the cube and place into the red bin and (ii) perform an additional trial in which the targeted bin was changed from the red bin to the blue one during the approaching. The time stamp of the goal change was recorded via a button press. Starting from the initial position of the task, all four approaches were used for predicting the goal at each motion step and the predicted results were recorded.
We assessed the performance of the four methods by analyzing the percentage of time that the predictions were correct. Figure 5 shows the performance of the four prediction methods in the approaching tasks. Over all, SDD outperformed other approaches for the tasks with and without goal change. According to the statistical analysis, the amnesic method performed the worst compared to the other three methods (p < 0.001), even in most of time it outputs a wrong prediction. Both the proposed SPD and SDD methods performed slightly better than the memory-based method, but it was not significant. However, it was found that the performance of SPD and SDD methods were not enormously affected by the change of user's intended goal during task execution. Compared to the memory-based method, the short-term memory based methods SPD (p < 0.01) and SDD Fig. 4 Schematic diagram of the user task-picking cubes into two bins (p < 0.001) were more robust in responding to the dynamic goal change. Further analysis confirmed that the predictions from SPD and SDD were faster than memory-based method (p < 0.001) if the intended goal was changed. The averaged lag time of correct prediction was less than 10 steps as shown in Fig. 6. Figure 7 presents a concrete example of the goal prediction in the scenario with goal change. As the user changed the intended goal at the 40th motion step, SDD predicted the goal change after 6 motion steps, SPD took 8 motion steps for correctly predicting the goal change, while Memory-based method performed the worst, lagging 13 motion steps. This is because the memory-based method used the whole historic trajectory observations for goal prediction while SPD and SDD only used short-term trajectory data. In the scenario that the intended goal was changed, the Memory-based method needs more steps for predicting of the new goal, thus causing the lag response.

Exploratory cases
We showed that our goal prediction method could work in more challenging scenarios, which were firstly presented by Dragan and Srinivasa [6], as illustrated in Fig. 8a and c. With theoretic proof and experimental trials, Dragan and Srinivasa found the limitations of the Memory-based prediction method-the prediction was incorrectly biased towards to the further goal when the goals were collinear with the start point or towards to the rightmost goal. Compared to the Memory-based prediction method, we considered the user behavior in our prediction approaches (SPD and SDD). We tested the memory-based, SPD, and SDD methods on the scenario that two goals were collinear with the start point, as shown in Fig. 8a. The results in Fig. 8b indicated that the Memory-based method was biased to the farther goal (Goal 2) during the approaching motion, while our methods SPD and SDD were able to predict the goal correctly. We also conducted an exploratory experiment on another scenario with unintended goal on the right of the intended one as shown in Fig. 8c. According to the prediction results in Fig. 8d, during the path heading to intended goal (Goal 1), the Memory-based method always predicted the intended goal to be Goal 2 before the user passing by it. The SPD and SDD methods were also robust to this scenario and provided correct predictions in the whole path thanking to the consideration of the user behaviors that learned from demonstration.

Conclusion and future work
In this work, we presented a mathematical formulation for intent inference in shared-control teleoperation. Particularly, we considered the user behavior in the intent inference problem. A simulated teleoperation system was developed to sample and learn user's approaching behavior. Using the learned behavior model, user-preferred path could be generated with any given goal. We proposed SPD and SDD methods to tackle with the goal prediction problem. User studies were conducted to examine the efficiency of goal prediction on approaching task. Results showed that our approaches were able to achieve sound performance of goal prediction even in dynamic environment, e.g., intended goal changed during the approaching task. Exploratory experiments were also conducted in some special scenarios, where the existing memory-based approach tended to fail. Our methods were able to adapt to those special scenarios because a better Fig. 7 A concrete example of goal prediction in the scenario with goal change. a At 40th motion step, user changed the intended goal from Goal 2 to Goal 1. b Prediction results during the user approaching motion model of user behavior was learned. Overall, the proposed system was able to provide efficient prediction in consideration of user behavior.
In teleoperation task, the user behavior is one of the critical factors which should be taken into account. In the presented work, the results have shown that the blended input based on the intent inference could directly affect the user's behavior during task execution. Current work exploited a learned static user behavior model. In future work, the user's real time behavior will be taken into account to modify the offline learned model so as to obtain robust intent inference. In addition, in the current method, we model the user behavior in a simple scenario, e.g., there are no obstacles in the working space. In the scenario where the robot is deployed, it is quite possible that there are obstacles. Therefore, how the presence of obstacles affects the intent inference with current algorithm is an open question that could be addressed in the future studies. Other interesting research points following the current work can also be: (1) studying the feasibility to obtain a generalized behavior model among Fig. 8 Goal prediction in two exploratory cases. a Scenario 1 with goals collinear with the initial point: the user performed approaching motion to Goal 1. b Prediction results during the user approachingmotion to Goal 1 in scenario 1. c Scenario 2 with unintended goal onthe right: the user performed approaching motion to Goal 1. d Prediction results during the user approaching motion to Goal 1 in scenario 2 the different users, (2) considering more goal variables in intent inference, e.g., including the orientation, shape as the goal variables.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.