Learning from demonstration for semi-autonomous teleoperation


Teleoperation in domains such as deep-sea or space often requires the completion of a set of recurrent tasks. We present a framework that uses a probabilistic approach to learn from demonstration models of manipulation tasks. We show how such a framework can be used in an underwater ROV teleoperation context to assist the operator. The learned representation can be used to resolve inconsistencies between the operator’s and the robot’s space in a structured manner, and as a fall-back system to perform previously learned tasks autonomously when teleoperation is not possible. We evaluate our framework with a realistic ROV task on a teleoperation mock-up with a group of volunteers, showing a significant decrease in time to complete the task when our approach is used. In addition, we illustrate how the system can execute previously learned tasks autonomously when the communication with the operator is lost.


Manipulator arms are often used as a proxy for human manual labour in environments that are hostile. Such tasks might involve manipulating equipment in deep underwater facilities or handling materials that can be dangerous to handle by hand. Often such systems are controlled directly by a human operator who receives visual feedback from the teleoperated manipulator workspace and accordingly controls the degrees of freedom (DoFs) of the arm.

Such direct teleoperation can be cognitively very demanding for the operator. Teleoperation tasks are often very structured, but the robot operating environment seldom changes dramatically. Most variability in such tasks comes from the change of the pose of the manipulators and the pose of the items that are to be manipulated. For example, imagine an underwater ROV that needs to connect a set of hydraulic hoses to a distribution panel at a deep-sea facility. The task of connecting each hose is intrinsically the same, but the parametrization of each task instance changes. This way, for each connection the ROV might assume a different position while the connector for each hose would be at a different pose on the distribution panel.

Long range teleoperation is often the only solution for most dangerous environments, including space and deep-sea. In such cases the communication bandwidth imposes a hard constraint on the efficiency of the system. For example, streaming live video feeds from the teleoperated manipulator workspace imposes large delays to the completion of the set of tasks at hand.

Fig. 1

Overview of the approach. The left side presents a traditional teleoperation scenario where the operator is situated on the vessel that supports the ROV at sea. The operator receives a set of video feeds from the cameras of the ROV, as well as the ROV’s telemetry and joint configuration data. Based on this, the operator then directly commands the desired configuration for the robotic manipulator. The right side presents our new paradigm of mixed teleoperation. Here the operator commands a virtual ROV in a virtual environment (onshore), set up according to the model of the environment that the ROV will operate in, at sea (offshore). The arms of the ROV are controlled locally by a cognitive engine that does not require any large data (video feeds, telemetry) to be communicated. Both systems, onshore and offshore, are locally controlled while both sides have a local parametrization of the corresponding variables. Only minimal communication between the two sides is now required to adjust the local control loops. (i.e. operator state and coordinate system importances, in contrast to video feeds, telemetry, joint states, etc.)

Fig. 2

Current stage of the DexROV project. Top: On the left, the simulated DexROV, equipped with manipulator and gripper. On the right, the simulated underwater testing rig, with a set of standardized ROV handles. The inset image shows the DexROV project onshore control center. Bottom: Left, the prototype exoskeleton that will be used as an input interface for the VR system. Right, the developed ROV with prototype manipulators and prototype underwater vision system visible between the arms

In this paper, we present a probabilistic method that can assist an operator in performing a teleoperation task efficiently, through learning a representation of the task by demonstration. We present a structured way to deal with discrepancies between operator space and robot space and a way of autonomously executing the task in case of communication break-down (see Figs. 1 and 2 for an overview). We use a task-parametrized Hidden semi-Markov model (TP-HSMM) to learn task representations, and generate motions by sampling the model in combination with an optimal control formulation, namely with linear quadratic tracking (LQT) with a double integrator system. Our approach leverages the key advantages of two models, the flexibility of a task-parametrized mixture model encoding (Calinon 2016) alongside with the ability of the HSMM to accurately capture motion duration and generate novel trajectories.

Throughout this manuscript we will use the term mixed teleoperation to refer to the behavior of the system. We chose this term as, we believe, it accurately describes that the behavior on the robot’s side is a combination of the operator’s input and the learned task model. Mixed teleoperation corresponds to the similar terms, often found in the literature, of assisted teleoperation, semi-autonomous teleoperation or shared autonomy.

Our contribution in this work is twofold. First, we present the task-parametrized extension of our previous work (Havoutis et al. 2016), that allows learned movements to be used in novel and changing situations. Second, we show how such a probabilistic model, learned by demonstration, can be used in a teleoperation context to disambiguate between different task parametrizations, making the operators’ work easier and faster, and how the system can be used as a fall-back in case of communication break-down.

Underwater ROVs capable of teleoperated manipulation are typically tethered to the support vessel. Consequently, the communication bottleneck occurs between the support vessel and the onshore control center, typically connected over a satellite link.

Current field ROVs that target teleoperated manipulation use simple extending gripper arms to secure the ROV on a designated rail before the operator begins the manipulation phase. This way, the operators do not need to constantly adjust the control of the ROV body and can focus on the manipulation task at hand. The underwater vehicle dynamics, external disturbances of the underwater environment or the control of the ROV body are beyond the scope of this paper.

The rest of the paper is structured as follows. We discuss previous research in Sect. 2 and we present our approach in Sect. 3. In Sect. 4 we present two illustrative planar examples with 2 and 3 frames to aid in the reader’s understanding of the approach. Section 5 describes the learning from demonstration procedure and the evaluation trials with a group of volunteers. Last Sect. 6 summarizes this work and presents directions for future work.

Related work

The use of learning techniques in the context of ROV teleoperation is novel (Palomeras et al. 2016). The traditional teleoperation approach requires the operator to directly command the different actuators of the ROV. The automation efforts in this field have largely been concerned with the control of the body of the ROV, e.g., how to home to a desired position or how to counter the effect of currents, while ROV manipulators are directly teleoperated.

In contrast, developing learning approaches to model and reproduce skills given motion examples has been a central topic of robotics research. A number of such learning from demonstration (LfD) approaches exist to date. One of the most widely used approaches is using Dynamic Movement Primitives (DMPs) (Ijspeert et al. 2013). DMPs are dynamical systems with simple convergence behaviour that are coupled with a learned non-linear function that modulates their output. This way, DMPs can provide adaptive motion representations that are easy to implement. For example, the approach in Palomeras et al. (2016) uses DMPs to model and synthesise a valve-turning motion for a 4-DoF arm. One drawback of standard DMPs is that a sequence of radial basis activation functions needs to be allocated manually (usually spread equally in time with a shared variance), and that each DoF of the system is separately described (synchronized by a phase variable), sometimes leading to sets of DMPs that have difficulty in capturing joint synergies when few basis functions are used.

A number of extensions to the DMP framework have been proposed in the current literature. Gams et al. (2014) extends the DMP framework by adding a modulation term that allows interaction with objects and the environment. Learning of this coupling term is performed with an iterative learning control algorithm. Other extensions include the work of Ude et al. (2014) and Pastor et al. (2011), with the former encoding orientation trajectories as DMPs, and the latter using a model of previous sensory information to modulate the target of learned DMPs.

An alternative LfD approach is to encode sets of motions as a Gaussian mixture model (GMM) and use Gaussian Mixture Regression (GMR) to regenerate motion. It was shown in Calinon et al. (2012) that a DMP can be recast as a GMR problem for a GMM with diagonal covariance, and that its natural extension to a GMM with full covariances can retrieve local coordination patterns. With this probabilistic form of DMP, the basis functions can also be learned from the demonstrations. GMM/GMR have been successful in representing sets of demonstrated examples that are time-indexed, this way using time as the Gaussian conditioning variable to perform the regression, see Calinon (2016) for an overview. Such systems are often used for learning models from a set of demonstrations but are somewhat restrictive in their generalization capability. An extension to the traditional GMM-based learning approach is the parametrization of the problem with a set of different coordinate systems (Calinon 2016). In this setting, learning is performed in multiple coordinate systems, whose information is fused through products of Gaussians used to generate the final motion. Our work in Zeestraten et al. (2017) shows how one can learn position and orientation trajectories of bimanual tasks and how the resulting task-parametrized GMM can generalize to different orientations of objects or tools.

Hidden Markov Models (HMMs) have been used in robotics in a number of approaches. For example, Lee and Ott proposed to combine HMM with GMR to cope with the poor duration modeling properties of standard HMMs (Lee and Ott 2010; Lee et al. 2010). Similarly, Chan et al. (2013) used GMR and HMM as part of a constrained manipulator visual servoing controller. Bowen and Alterovitz presented the combination of an HMM (for task representation) with a sampling-based motion planner to produce (asymptotically) optimal plans (Bowen and Alterovitz 2014). Kulic et al. (2008) used HMMs to incrementally group together human whole-body motions, using hierarchical agglomerative clustering, based on their relative distances in HMM space.

Fig. 3

Sketch of our approach in a 2-dimensional case with 2 coordinate systems and two Gaussians per coordinate system. The red dot represents the endpoint of the motion. On the left, the setup that is available to the operator. The solid line is the motion performed up to now (until time t) and the black square is the current state of the operator’s end-effector. On the right, the (remote) space where the robot performs the task. Note that the coordinate system parametrization between the two spaces is different. The solid line represents the motion performed until time t while the red square represents the current state or the robot’s end-effector. The coordinate system importance, \(h^{(j)}_t\), is computed on the operator space and passed to the robot space. The red dotted line is the motion predicted by the model from the current robot state. The gray square in the robot space represents what the robot end-effector state would be with a direct teleoperation behavior, which would only poorly match the current situation in the robot space (Color figure online)

Often, the use of HMM-based approaches in robotics applications is limited by the simplistic state duration modeling that HMMs provide. Other signal processing related disciplines, such as speech synthesis, have developed a number of models that seek to model state duration information more explicitly (for an overview see Rabiner 1989). One such model is the Hidden Semi-Markov Model (HSMM), see Yu and Kobayashi (2006) for a review. Recently we experimented with the use of HSMM in robot applications, by contrasting it to a set of different graphical model based approaches (Calinon et al. 2011). HSMMs are relevant for robot motion generation because they model the transitions and the durations of state activations, thus providing a relative time instead of a global time representation. In Tanwani and Calinon (2016), we exploited this local duration representation for autonomously learning and reproducing in new situations the tasks of opening a valve and moving objects while avoiding obstacles. In Havoutis and Calinon (2016), we showed how a similar task representation can be used to learn assistive behaviours of recurrent tasks. In Havoutis et al. (2016) and Havoutis and Calinon (2017), we showed how such a model can be learned in an online manner, and be used in a teleoperation scenario with failing communication, to semi-autonomously perform an ROV task (hot-stabbing) using an MPC formulation for motion generation.

This paper is extending this line of research by exploiting the strengths of the task-parametrized representation in a mixed teleoperation setting. While in Havoutis and Calinon (2017), we focused on learning a TP-HSMM online in a supervisory teleoperation setting, we present here a different but complementary approach in a shared control setting. By relying on task-adaptive statistical models of movements and a batch training with an infinite horizon LQT formulation, we show how such a teleoperation strategy can be used to resolve differences between local and remote environment configurations. With such decoupling of the operator’s and the robot’s spaces in the statistical representation, we demonstrate that the proposed approach can replace the conventional use of video streams in teleoperation, with a minimal exchange of activation weights as communication overhead. We also extend our analysis with a user study, highlighting the benefit of using our model both in terms of time to complete the task and resulting trajectory quality.


Here we present the TP-HSMM representation used throughout this work. This model is an extension of a TP-GMM in which the HSMM also represents the temporal evolution of the motion. Our model is trained from a set of demonstrations with varying task parameters.

The task parameters can be regarded as coordinate systems relevant to the task at hand. For example, consider the task of pushing a button with a manipulator. One task parameter can be the pose of the manipulator base in a globally defined coordinate system, while a second task parameter can be the pose of the button in the global coordinate system. Varying the pose of the button and/or the pose of the base of the manipulator would result in a different adaptation of the motion to push the button. Task parameters in our model are represented by P coordinate systems, defined at time step t by \(\{{\varvec{b}}_{t,j},{\varvec{A}}_{t,j}\}_{j=1}^P\), representing the transformation matrix for each coordinate system (see toy example in Fig. 3).

Each demonstration \(m\in \{1,\ldots ,M\}\) contains \(T_m\) datapoints forming a dataset of N datapoints \(\{{\varvec{\xi }}_{t}\}_{t=1}^N\) with \(N=\sum _{m}^{M}T_m\). The demonstrations \({\varvec{\xi }}\in \mathbb {R}^{D\times N}\) are observed from the perspective of each of the different coordinate systems, forming P trajectory samples \({\varvec{X}}^{(j)}\in \mathbb {R}^{D\times N}\). These samples can be collected from each coordinate system individually or can be computed with

$$\begin{aligned} {\varvec{X}}^{(j)}_t = {\varvec{A}}_{t,j}^{-1} ({\varvec{\xi }}_t - {\varvec{b}}_{t,j}) . \end{aligned}$$

For example, a demonstrated motion is provided in the global coordinate system and is later projected to the local coordinate system of the manipulator base and the coordinate system of the button.

The parameters of a TP-HSMM with K components are defined by

$$\begin{aligned} \big \{\{\pi _i,\{{\varvec{\mu }}^{(j)}_i,{\varvec{\varSigma }}^{(j)}_i\}_{j=1}^P, {\varvec{\mu }}^{\scriptscriptstyle {\mathcal {D}}}_i,{\varvec{\varSigma }}^{\scriptscriptstyle {\mathcal {D}}}_i\}_{i=1}^K, a_{i,j} \big \}, \end{aligned}$$

where \(\pi _i\) are the mixing coefficients, \({\varvec{\mu }}^{(j)}_i\) and \({\varvec{\varSigma }}^{(j)}_i\) are the center and covariance matrix of the ith Gaussian component in coordinate system j, \(\mu ^{\scriptscriptstyle {\mathcal {D}}}_i\) and \(\varSigma ^{\scriptscriptstyle {\mathcal {D}}}_i\) are univariate Gaussian distributions that directly model the duration of each state, and \(a_{i,j}\) the transition probabilities between states.

Parameter estimation

Learning of the TP-HSMM parameters is performed in two steps. First the parameters of the spatial part of the model are estimated, and second the encoding of the temporal aspect of the data is performed.

First we estimate the priors, centers and covariances, \(\{\pi _i,\{{\varvec{\mu }}^{(j)}_i,{\varvec{\varSigma }}^{(j)}_i\}_{j=1}^P, \}_{i=1}^K\), of the model with an expectation-maximization (EM) algorithm (Dempster et al. 1977) that are iteratively computed until model convergence, see “Appendix I”.

For the duration aspect of TP-HSMM, we need to estimate the center and covariance for each state duration as a distribution with parameters \(\{{\varvec{\mu }}^{\scriptscriptstyle {\mathcal {D}}}_i,{\varvec{\varSigma }}^{\scriptscriptstyle {\mathcal {D}}}_i\}_{i=1}^K\), as well as the transition probabilities. \(a_{i,j}\) can be arranged as a \(K\times K\) transition probability matrix, where each element represents the probability to move to state \(q_j\), while currently being in state \(q_i\). For all demonstrations, given the spatial model that was learned above, we can compute the state probabilities of every datapoint. This way, for each datapoint \({\varvec{\xi }}_t\) we can estimate the state \(q_j\) and the previous state \(q_{i}\). To build up the transition probabilities, \(a_{i,j}\), we keep a log of \(c_{i,j} \in \mathbb {R}^{K\times K}\), counting the number of state transitions that are not self-transitory, by performing a pass through each demonstration, namely

$$\begin{aligned} \forall \{{\varvec{\xi }}_{t-1}, {\varvec{\xi }}_t\} ~ \Rightarrow c_{i,j}&= c_{i,j} + 1, \qquad i \ne j \\ a_{i,j}&= \frac{c_{i,j}}{\sum _{j=1}^{K} c_{i,j}}. \end{aligned}$$

When computing the transition probabilities, we only need to keep track of the non self-transitory instances, as we are modeling the relative time during which the system will stay at each state. For simplicity, we use a univariate Gaussian distribution \(\mathcal {N}(\mu ^{\scriptscriptstyle {\mathcal {D}}}_i,\varSigma ^{\scriptscriptstyle {\mathcal {D}}}_i)\) to model this duration, but other distributions that would better model durations are possible. Hence we bypass the computationally expensive HSMM batch training procedure and replace it with an iterative approach keeping statistics over the state transitions, which showed in practice to be a reasonable approximation in our experiments. This way, as we add each demonstration, we keep track of each state duration and accordingly update the statistics of each state. This is done using a running statistics method to compute the mean and variance for each state duration. This requires that we only keep track of the total number of samples while we incrementally add new observations.

The total computation time for learning a model with 2 coordinate systems, by providing 6 demonstrations would amount to approximately 3 s on commodity hardware. Sampling as detailed below is much faster and can be computed online (below 1 ms).

Orientation data

We use unit quaternions to represent orientation data, as in Silvério et al. (2015). We chose this representation as it is singularity-free and uses only 4 parameters. We define a unit quaternion as \({\varvec{\epsilon }}=[v,~{\varvec{u}}^{\scriptscriptstyle \top }]^{\scriptscriptstyle \top }\), with \(v\in \mathbb {R}\) the scalar and \({\varvec{u}}\in \mathbb {R}^3\) the vector part of the representation. Accordingly, the conjugate quaternion is defined as \(\bar{{\varvec{\epsilon }}}=[v,~{\varvec{-u}}^{\scriptscriptstyle \top }]^{\scriptscriptstyle \top }\). To describe the orientation of one coordinate system with respect to another we use the quaternion product which is defined as

$$\begin{aligned} {\varvec{\epsilon }}_1 * {\varvec{\epsilon }}_2 = \left[ \begin{array}{l} v_1 v_2 - {\varvec{u}}_1^{\scriptscriptstyle \top }{\varvec{u}}_2\\ v_1{\varvec{u}}_2 + v_2{\varvec{u}}_1 + {\varvec{u}}_1 \times {\varvec{u}}_2 \end{array} \right] . \end{aligned}$$

The quaternion product can also be implemented in matrix-vector multiplication form as \({\varvec{\epsilon }}= {\varvec{\mathcal {E}}}_1 {\varvec{\epsilon }}_2\), where

$$\begin{aligned} {\varvec{\mathcal {E}}}_1 = \left[ \begin{array}{llll} v_1 &{}\quad -\,u_{1,1} &{}\quad -\,u_{1,2} &{}\quad -\,u_{1,3} \\ u_{1,1} &{}\quad v_1 &{}\quad -\,u_{1,3} &{}\quad u_{1,2} \\ u_{1,2} &{}\quad u_{1,3} &{}\quad v_1 &{}\quad -\,u_{1,1} \\ u_{1,3} &{}\quad -\,u_{1,2} &{}\quad u_{1,1} &{}\quad v_1 \\ \end{array} \right] , \end{aligned}$$

is a quaternion matrix built from the elements of the quaternion \({\varvec{\epsilon }}_1\). This way we can directly use this representation of orientations under the TP-HSMM formulation by setting \({\varvec{\xi }}_n={\varvec{\hat{\epsilon }}}_n\), \({\varvec{b}}_{n,j}=0\) and \({\varvec{A}}_{n,j}={\varvec{\mathcal {E}}}_{n,j}\) for the orientation part of the task parameters, see Silvério et al. (2015) for details. The projection of Eq. (1) then becomes \({\varvec{\mathcal {E}}}^{-1}_{n,j}{\varvec{\hat{\epsilon }}}_n\) and maps the reference orientation (demonstration) \({\varvec{\hat{\epsilon }}}_n\) to coordinate system j. This allows us to seamlessly integrate orientation data to our learning approach and encode both position and orientation of demonstrated motions.


The learned TP-HSMM is used to generate motions given a new set of coordinate system parameters \(\{{\varvec{\hat{b}}}_{t,j}, {\varvec{\hat{A}}}_{t,j}\}_{j=1}^P\). This is done in two steps. First we generate a GMM with parameters \(\{\pi _i,{\varvec{\hat{\mu }}}_{t,i},{\varvec{\hat{\varSigma }}}_{t,i}\}_{i=1}^K\) where

$$\begin{aligned}&\mathcal {N}\Big ( {\varvec{\hat{\mu }}}_{t,i} , {\varvec{\hat{\varSigma }}}_{t,i} \Big ) \;\propto \; \prod \limits _{j=1}^P \mathcal {N}\Big ( {\varvec{\hat{\mu }}}{}_{t,i}^{(j)} ,\; {\varvec{\hat{\varSigma }}}{}_{t,i}^{(j)} \Big ) , \;\mathrm {with}\nonumber \\&\quad {\varvec{\hat{\mu }}}{}^{(j)}_{t,i} = {\varvec{A}}_{t,j} {\varvec{\mu }}^{(j)}_i + {\varvec{b}}_{t,j} \;,\quad {\varvec{\hat{\varSigma }}}{}^{(j)}_{t,i} = {\varvec{A}}_{t,j}{\varvec{\varSigma }}^{(j)}_i {\varvec{A}}_{t,j}^{\scriptscriptstyle \top }, \end{aligned}$$

where the result of the Gaussian product is given by

$$\begin{aligned} {\varvec{\hat{\varSigma }}}_{t,i} = \left( \sum \limits _{j=1}^P {{\varvec{\hat{\varSigma }}}{}^{(j)}_{t,i}}^{-1} \right) ^{-1} ,\quad {\varvec{\hat{\mu }}}_{t,i} = {\varvec{\hat{\varSigma }}}_{t,i} \sum \limits _{j=1}^P {{\varvec{\hat{\varSigma }}}{}^{(j)}_{t,i}}^{-1} {\varvec{\hat{\mu }}}{}^{(j)}_{t,i} . \end{aligned}$$

Next we generate the state sequence by recursively computing the probability of the datapoint \({\varvec{\xi }}_t\) to be in state i at time step t, given the partial observation \(\{{\varvec{\xi }}_1,{\varvec{\xi }}_2,\ldots ,{\varvec{\xi }}_t\}\), using the forward variable \(\alpha ^{\tiny {\text {HSMM}}}_{i,t}\) as in Rabiner (1989) with

$$\begin{aligned} \alpha ^{\tiny {\text {HSMM}}}_{i,t}&= \sum \limits _{j=1}^K \sum \limits _{d=1}^{d^{\max }} \alpha ^{\tiny {\text {HSMM}}}_{j,t-d} \; a_{j,i} \; \mathcal {N}^{\scriptscriptstyle {\mathcal {D}}}_{d,i} \prod \limits _{s=t-d+1}^{t} \mathcal {N}_{s,i} , \quad \mathrm {where}\\ \mathcal {N}^{\scriptscriptstyle {\mathcal {D}}}_{d,i}&= \mathcal {N}(d | \mu ^{\scriptscriptstyle {\mathcal {D}}}_i,\varSigma ^{\scriptscriptstyle {\mathcal {D}}}_i) \;\mathrm {and}\; \mathcal {N}_{s,i} = \mathcal {N}\big ({\varvec{\xi }}_s |\; {\varvec{\mu }}_i,{\varvec{\varSigma }}_i\big ) , \end{aligned}$$

that is computed with an iterative procedure, described in “Appendix II”. With this representation, a generative process can be constructed by setting equal observation probability \(\mathcal {N}_{s,i} = 1 \; \forall i\), i.e. the spatial part of the motion representation. This yields a step-wise state reference that we use in the subsequent optimal control formulation.

Optimal control trajectory generation

We formulate the trajectory generation step as an optimal control problem. We use a double integrator system as a virtual unit mass attached to the end-effector of the controlled arm. This system has the form

$$\begin{aligned} {\varvec{\dot{\xi }}}_{t+1} = \left[ \begin{array}{ll} {\varvec{0}} &{} \quad {\varvec{I}}\\ {\varvec{0}} &{} \quad {\varvec{0}}\\ \end{array} \right] {\varvec{\xi }}_t + \left[ \begin{array}{l} {\varvec{0}} \\ {\varvec{I}} \\ \end{array} \right] {\varvec{u}}_t. \end{aligned}$$

Using an infinite-horizon linear-quadratic tracking (LQT) controller, we can generate a trajectory to smoothly track the previously computed stepwise reference state sequence, \(q_1 \ldots q_T\).Footnote 1 The step-wise reference trajectory \(\mathcal {N}({\varvec{\hat{\mu }}}_{q_t}, {\varvec{\hat{\varSigma }}}_{q_t})\) is used to set up a cost function that trades-off accuracy to control effort according to the demonstrated motions (Calinon et al. 2014). The cost function to minimize at each time step t is given by

$$\begin{aligned} c({\varvec{\xi }}_t, {\varvec{u}}_t) = \sum _{t=t_0}^{\infty } ({\varvec{\xi }}_t - {\varvec{\hat{\mu }}}_{q_{t}})^{{\scriptscriptstyle \top }} {\varvec{Q}}_{t} ({\varvec{\xi }}_t - {\varvec{\hat{\mu }}}_{q_{t}}) + {\varvec{u}}_t^{{\scriptscriptstyle \top }} {\varvec{R}} {\varvec{u}}_t, \end{aligned}$$

where \({\varvec{u}}_t \in \mathbb {R}^{m}\) is the control input of the system. Setting \({\varvec{Q}}_t = {\varvec{\hat{\varSigma }}}_{q_t}^{-1}, {\varvec{R}} \succ 0, {\varvec{\xi }}_t = {[{{\varvec{x}}_t}^{{\scriptscriptstyle \top }}, \; {{\varvec{\dot{x}}}_t}^{{\scriptscriptstyle \top }}]}^{{\scriptscriptstyle \top }}\), \({\varvec{\hat{\mu }}}_{q_t} = {[{\varvec{\hat{\mu }}}_t^{x^{{\scriptscriptstyle \top }}}, \; {\varvec{\hat{\mu }}}_t^{\dot{x}^{{\scriptscriptstyle \top }}}]}^{{\scriptscriptstyle \top }}\) with \({\varvec{x}}, {\varvec{\dot{x}}}\) representing the position and velocity of the system, the optimal control input \({\varvec{u}}_t^{*}\) is obtained by solving the algebraic Riccati equation (see “Appendix III” for details), yielding full stiffness and damping matrices that are regulated in accordance to the precision required by the task, as learned by the demonstrations.

Arm control

For controlling the arm, we chose a torque control approach. This allows us to naturally regulate the compliance of the controlled arm, according to the full stiffness and damping terms computed by the above LQT solution. The torque command that the robot low-level controller tracks is

$$\begin{aligned} {\varvec{\tau }}= {\varvec{J}}^{\scriptscriptstyle \top }{\varvec{f}} + {\varvec{\tau }}_G, \end{aligned}$$

where \({\varvec{J}}\) is the arm Jacobian, \({\varvec{f}}={[{\varvec{f}}_p^{\scriptscriptstyle \top },{\varvec{f}}_o^{\scriptscriptstyle \top }]}^{\scriptscriptstyle \top }\) is a wrench and \({\varvec{\tau }}_G\) are the gravity compensation terms computed based on the manipulator model and state. Given the desired pose of the arm end-effector, \({\varvec{\hat{\xi }}}_t = [{\varvec{\hat{x}}},{\varvec{\hat{\epsilon }}}]^{\scriptscriptstyle \top }\),

$$\begin{aligned} {\varvec{f}}_p&= {\varvec{K}}^{{\scriptscriptstyle {\mathcal {P}}},x}_t({\varvec{\hat{x}}}_t-{\varvec{x}}) - {\varvec{K}}^{{\scriptscriptstyle {\mathcal {V}}},x}_t {\varvec{\dot{x}}}, \end{aligned}$$
$$\begin{aligned} {\varvec{f}}_o&= {\varvec{K}}^{{\scriptscriptstyle {\mathcal {P}}},\epsilon }_t 2 \log ({\varvec{\hat{\epsilon }}}_t * {\varvec{\epsilon }}) - {\varvec{K}}^{{\scriptscriptstyle {\mathcal {V}}},\epsilon }_t {\varvec{\omega }}, \end{aligned}$$

where \({\varvec{\omega }}\in \mathbb {R}^3\) is the angular velocity of the end-effector. The quaternion product, \({\varvec{\hat{\epsilon }}}_t * {\varvec{\epsilon }}\), computes the orientation error between desired and current orientation, and the logarithmic map, \(\log (\cdot )\), converts the quaternion error to an axis-angle difference. As outlined above, the computed LQT gains can be used when tracking a generated trajectory. When the system is at a mixed teleoperation mode, the stiffness and damping terms are heuristically set to a desired manipulator compliance.

Task model based discrepancy resolution

The ROVs operate in an environment that is often dynamic. In most operation scenarios a model of the main structures of the environment, where the ROV will be operating at, is known a priori. Nonetheless, such models are often only partially accurate.

Fig. 4

a The state on the operator’s space: (top row) considered globally and (bottom row) as seen from each of the 2 frames in the example. b Two examples of different frame importances in the robot space. In example 1 (top row), frame 1 is weighted more, while in example 2 (bottom row) frame 2 is considered more important. The global prediction in the robot side (red square) is the importance-weighted average of the per-frame predictions (Color figure online)

A naive (VR) teleoperation approach would require real-time knowledge of all the task-relevant components. This way the ROV would either need to locally recognize the relevant components or send back a video feed for this operation to be carried on-shore. As the relevant information is processed, all coordinate systems will have to be readjusted so that the operator’s space (VR) parametrization provides a consistent match with the real-world configuration that the ROV is operating in (robot space).

Our approach decouples the configuration of the operator’s (VR) space and the configuration of the real environment (robot space), where the ROV is operating at. This way we can locally resolve discrepancies/inconsistencies between the parametrizations of coordinate systems, without the need to transmit such information or readjust the operator’s (VR) environment. Hence the operator’s space (VR environment) keeps only a local configuration of the environment and the operator is performing the task in this setup. Here we make the implicit assumption that the same set of reference frames is available on both sides, a feature that is trivial to enforce programmatically given the learned task model. The ROV operates in its local configuration while we resolve the difference between the two spaces using the learned task model.

We do this by computing the probability of the state of the system to belong to each coordinate system in the model, on the operator’s side. We calculate the probability of the current state, projected on the different coordinate systems, to belong to the model of each coordinate system. This is computed as

$$\begin{aligned} P({\varvec{\xi }}_t,j)&= \sum \limits _{i=1}^K \pi _i \; \mathcal {N}\Big ({\varvec{\xi }}^{(j)}_t \Big |\; {\varvec{\mu }}^{(j)}_i,{\varvec{\varSigma }}^{(j)}_i \Big ) , \end{aligned}$$
$$\begin{aligned} h^{(j)}_t&= \frac{P({\varvec{\xi }}_t,j)}{\sum \nolimits _{j=1}^PP({\varvec{\xi }}_t,j)} , \end{aligned}$$

which can be interpreted as the relevance of each coordinate system in the current position. By treating these coordinate system probabilities as a measure of importance, we can then look at the related state of the system on the operator’s side, and reconstruct the global state in the ROV space according to this information.

In steps, this is done by first computing the frame-local state of the global state on the operator’s side, using the frames’ parameters as

$$\begin{aligned} {\varvec{\xi }}^{(j)}_t = {\varvec{A}}_{t}^{(j){\scriptscriptstyle \top }} ({\varvec{\xi }}_t - {\varvec{b}}_{t}^{(j)}) . \end{aligned}$$

for each frame \(j=1,\ldots ,P\). Next, we compute a global robot state, \({\varvec{\xi '}}^{(j)}_t\), for each of the robot frame’s with

$$\begin{aligned} {\varvec{\xi '}}^{(j)}_t = {\varvec{A'}}_{t}^{(j)} {\varvec{\xi }}^{(j)}_t + {\varvec{b'}}_{t}^{(j)} \end{aligned}$$

where \([~]'\) denotes the parameters of the robot side frames that can differ from the operator’s side. This is the per-frame estimate of the global (desired) robot state. To arrive at the final prediction we compute a weighted sum of all frames’ estimates, in the form

$$\begin{aligned} {\varvec{\xi '}}_t = \sum \limits _{j=1}^P h^{(j)}_t {\varvec{\xi '}}^{(j)}_t \end{aligned}$$

We thus transform each coordinate system state representation to the global coordinate system and compute an average, weighted by the coordinate system importance that is communicated to the robot side. Figure 4 helps explain the relevant variables and gives an intuitive presentation of the above steps.

Fig. 5

Planar examples of our developed mixed-teleoperation framework with tasks involving 2 frames (top row) and 3 frames (bottom row). All frames’ parameters can vary. In particular, the parameters of frame 2 differ in the first example (top row), while in the second example (bottom row) both frame 2 and 3 are in different configurations with respect to the operator’s space. Please see Sect. 4 for details. The accompanying video presents the full sequences for both examples

Planar examples

We use two planar examples to illustrate our approach. The task in both cases is to start from the “U” shape on the left of the panel and reach the “U” shape on the right side. In the case of 3 frames, the path needs to pass from the area between the two lines in the middle, representing a cross-section of a cylinder, as an additional constraint to the task. In both example, each shape represents a frame. The operator uses a mouse cursor to interact with the local side and provide motion demonstrations. Snapshots are presented in Fig. 5 and the accompanying video presents the full sequence.

We begin by learning the task in the local side, where the operator demonstrates a number of motions that are encoded online. In the 2-frame case 2 demonstrations are provided, while for the 3-frame case the operator demonstrates 3 motions. The models are being built online and the blue line represents the current motion prediction of the system.

The middle column of Fig. 5 shows the prediction of the model in the operator (local) and robot (robot) space while the parametrization of the frames remains identical. This way as the operator executes the motion, the robot on its side follows the same path (red square in the accompanying video). Next, we change the parametrization on the robot’s space to demonstrate the behavior of the system in a scenario where the parametrization of the corresponding frames differs. In the 2-frame example we change the configuration of frame 2 in the robot’s space, while in the 3-frame example we want to highlight that all frame parametrizations can change and we change both frame 2 and 3 in the robot space. In this example, the blue line represents the predicted motion in the now different parametrization. As the operator performs her version of the task on the local panel, the system can correctly compensate for the inconsistency. This way, on the 2-frame example, the state can successfully reach the (differently configured) target and in the 3-frame case, the remote state correctly passes through the cylinder and reaches the target, even if the configurations between the two spaces differ.

Experimental evaluation

This section describes the learning of a task model and presents two examples of how such a model can be used in the context of underwater teleoperation. We show how a model learned with our approach can be used to generate novel motions in unseen situations whenever we want the ROV to perform the task autonomously or when communication with the operator breaks down. In addition we show how the learned model can be used to map the control of the operator to the real environment parametrization of the ROV. This way we can decouple the two control spaces and perform control only locally with minimal communication overhead (i.e. operator state and coordinate system importances, in contrast to video feeds, telemetry, joint states, etc.).

Fig. 6

Example of standardised ROV-operated rotary valve. Valve handle highlighted in yellow (Color figure online)

Fig. 7

Experimental setup as a teleoperation mock-up using the Baxter robot. In mixed teleoperation mode the pose of the coordinate systems in operator and robot space are decoupled

Use case

As a use case we have selected an experimental setup that resembles an ROV rotary control valve, see Fig. 6. ROVs often need to operate such control valves, that are crucial for sealing or pressurizing parts of hydraulic circuits. As we do not have access to a real ROV valve, we use a symmetric handle as a mock-up (Fig. 7). The task in this setup is to reach the control valve (handle), placed on the robot side (right arm of the robot), using the left arm of the robot as the input device (master arm) and optionally turn it \(90^{\circ }\) clockwise. The valves on the operator’s space and the robot’s space can move freely. For the direct teleoperation case, the camera mounted in the teleoperated arm’s wrist is used to provide visual feedback to the operator.

Model learning

We use kinesthetic demonstrations to collect a set of trajectories with an operator performing a valve reaching and turning task. The trajectories consist of end-effector poses, containing a Cartesian position and a quaternion orientation part. Accordingly, two coordinate systems are set: the base of the arm and the valve. The coordinate system of the arm is set to the base of the operator’s arm with an orientation equal to the orientation of the global coordinate system. The second coordinate system, attached to the valve, is free-floating in the operator’s space and is being visually tracked using a fiducial markerFootnote 2 (Fig. 7), by using the camera embedded within the robot’s hand. This camera has wide lens and produces a stream at 25 frames per second. It is placed to look over the two fingers of the standard Baxter gripper, i.e. the gripper fingers are always visible.

Fig. 8

The TP-HSMM representation learned from the set of demonstration trajectories. a The Gaussians represented in the coordinate system of the valve (isocontour of one standard deviation), where all demonstrations converge to the handle. b The Gaussians learned in the coordinate system of the robot arm base. Note that the variance in this coordinate system is higher, leading to larger ellipsoids. c The learned state transition graph with the corresponding state duration probabilities. Note that the state colors are consistent across the images, and that the training samples are displayed in both coordinate systems in black color

Fig. 9

Pose of the robot arm end-effector during autonomous reproduction of the learned task. Black lines are training samples, the red lines show one sample from the testing set, and the blue lines show the motion generated from the TP-HSMM given the same task parametrization as the test sample (Color figure online)

We use a torque controller compensating for the effect of gravity on the arm to let the operator directly guide the arm to the position and orientation of the valve (handle). We collected 7 movements of the operator performing the task and used 5 demonstrations to learn a model with the procedure described in Sect. 3. The remaining 2 trajectories are kept for model testing and evaluation.

Model evaluation

We learn a TP-HSMM for the valve task, setting the number of states \(K=5\) empirically. A number of techniques can alternatively be used for model selection, for example with a Bayesian information criterion (Schwarz 1978) or a Dirichlet process (Rasmussen 2000). The learned TP-HSMM for the valve task is shown in Fig. 8, where both the Gaussian states per coordinate system and the state connectivity are depicted.

We first use the learned TP-HSMM with the same set of task parameters as in the demonstrations to generate a trajectory. We compare the trajectories that our model generates against the demonstrated “ground truth”, for both the training samples and the test samples that were not used in learning. The RMSE for the training set is 0.047 m and 0.31 rad, while the RMSE for the test set is 0.053 m and 0.38 rad, for position and orientation accordingly.

Figure 9 shows positions and orientation trajectories for all demonstrations in black lines, one of the test trajectories in red and a trajectory generated by the TP-HSMM with the same coordinate system parametrization in blue. This figure highlights that the learned model can generalize to novel parametrizations (not “seen” during learning) and successfully perform the encoded task in an unseen situation.

Mixed teleoperation evaluation trials

To directly demonstrate the benefit of using our approach we organized evaluation trials with a set of 5 volunteers. The volunteers have some experience using the robot arm but have no specific prior experience of the proposed teleoperation setup.

We distinguish two cases; first direct teleoperation of the robot arm using visual feedback, second mixed teleoperation where the operator performs the task locally (in the operator space) and the correspondence between operator space and robot space parametrization is handled by the proposed approach.

Fig. 10

Direct and mixed teleoperation trials from the evaluation study. a Direct teleoperation mode, the robot arm directly tracks the pose of the operator’s arm. The operator receives visual feedback of the task execution from a camera in the robot space. b Mixed teleoperation mode where the operator performs a version of the task locally, and the learned task model is used to map his/her input to the control of the robot arm. Note that, in contrast to the direct teleoperation mode, the task coordinate systems in robot and operator spaces are not coupled. In principle the operator can be performing the exact same task locally while the robot on the remote side is successfully executing a different version of the task (i.e. with other task parametrization)


All the volunteers start with 1 minute of direct teleoperation to familiarize with the use of the robot arm under gravity compensation. Next the volunteers are asked to perform the valve task in direct teleoperation, where they can directly see the robot arm. Next, a separating barrier is placed between the operator’s space and the robot’s space. The volunteers are asked to perform the valve task with visual feedback from the wrist-mounted camera on the robot side. After these familiarization steps, we begin the main set of trials.

The volunteers are asked to perform the valve reaching task from an approximately similar starting pose, while only visual feedback from the wrist camera is available through the monitor of the robot. The valve (handle) is placed each time at a random position, within reachable limits, in the robot space. The trial is considered successful once the handle is placed within the distance between the robot fingers, signaled by the trials’ supervisor. Note that no reference exists in the operator’s space at this time. This direct teleoperation trial is repeated 3 times with different configurations, where both operator arm and robot arm end-effector poses are collected.

Next a valve reference (handle) is placed at a random position in the operator’s space and the volunteers are asked to perform the valve task using this local task reference. Note that the pose of the local valve (operator space) does not correspond to the pose of the remote valve (robot space), i.e. the coordinate system parametrizations between the two spaces are different. During this period, our system uses the reference information in both spaces, the operator’s input and the learned model to disambiguate the end-effector target pose for the teleoperated arm. This way the operator is performing the task locally while the remote arm is performing a “different version” of the task successfully, using the approach described in Sect. 3.5. Again the end of each trial is signaled by the supervisor, as soon as the handle is placed between the robot fingers. As before, the mixed teleoperation trial is repeated 3 times in different configurations, where the end-effector poses of both arms are collected. During this, no visual feedback from the remote side is available to the operator, i.e. the monitor of the robot displays a black screen.

Fig. 11

Left: Average change in direction per performed motion, averaged over all trials per teleoperation mode. Right: Average time required to complete the valve task for each teleoperation mode. The red error-bars represent ± 1 standard deviation (Color figure online)


The evaluation trials, as previously described, yield 30 teleoperation trajectories, 15 trajectories for each teleoperation mode. Figure 10 presents 3 samples from each mode. Qualitatively, the direct teleoperation motions appear coarse in comparison to the mixed teleoperation motions. This reflects the strategy that all volunteers used when performing the task, with small incremental movements where each action was planned accordingly to the visual feedback from the previous action. In contrast, in the mixed teleoperation samples, where the teleoperator had direct access to a local feedback, we observed that motions are consistently smooth and fluid for all samples. The average directional change per motion and the average over the set of trials for each teleoperation mode are computed. This provides a measure of smoothness of the performed motions, the results of which are summarized in Fig. 11a.

Assisting the operator by providing immediate local feedback about the task, by completing a local version of the task, resulted in a significant improvement of the time required to complete the valve reaching task.Footnote 3 The direct teleoperation trials required on average 1 min (\(58.9 \pm 24.4\) s) to complete the task. In contrast, the volunteers completed the task in the proposed mixed teleoperation setting in under 6 seconds on average and more consistently (\(5.69 \pm 1.9\) s). These results are summarized in Fig. 11b.

Communication disruption

In addition to the significant reduction of the task execution time, we can also use the learned task model to generate motions that achieve the task at hand when the communication is disrupted. An example of this situation is presented in Fig. 12. We consider the example in which the operator is performing the valve task in the mixed teleoperation mode when the communication between the two spaces is abruptly disrupted (see the \(\times \) points on the trajectory in Fig. 12). In this situation, the robot arm on the remote side, having access to a local copy of the TP-HSMM, can still generate a motion from the current state and continue to perform the task autonomously. In turn, this feature can also be used for autonomous task execution, where the operator is presented with the generated motion and can select between teleoperation and autonomous task execution. The predicted motion can be presented either on the operator’s monitor or in a VR system. The operator can then select whether she would like to have the robot execute the motion or proceed in a mixed teleoperation fashion. Execution can be triggered through the operator’s interface, for example by confirming the action execution by clicking the appropriate button on the graphical user interface.

Fig. 12

Examples of mixed teleoperation trials where communication is disrupted. Dashed lines represent the robot end-effector motion until the communication is cut. Solid lines on the robot space are the motions that are generated from the TP-HSMM and executed by the robot autonomously


In most mixed teleoperation trials, we observed that the system presents a transient behavior around the area where the probabilities of the coordinate systems are approximately equal. This is visible in Fig. 10b and is related to the shape of the Gaussians. Broader Gaussians (e.g. using a prior on minimal covariance used as regularization term) would result in smoother changes of coordinate system weighting, but might not be suitable for tasks requiring very high precision.

The reduction of task execution time based on the trials with volunteers highlights the support that our system can provide to non-experienced users. In a real world ROV teleoperation situation, an experienced ROV pilot operates the controls of the ROV and generally, the visual feedback is more extensive, i.e. multiple cameras from different viewpoints are traditionally used. Nonetheless such feedback can only be accessible to ROV operators off-shore, as communicating multiple video streams through a satellite link is—at least currently—infeasible. To perform the teleoperation from an on-shore site, in the context of the DexROV project, video streams are not available. In contrast, we showed how the use of a learning approach can make video streams redundant by decoupling the operator’s and the robot’s spaces with minimal communication overhead.


We presented a learning by demonstration approach for manipulation tasks that can be used in a teleoperation context to remotely operate underwater ROVs. We proposed a mechanism that can be used to disambiguate differences between the operator’s and the robot’s workspaces based on a learned task model. This way the operator is performing the task using local feedback while the robot performs a different version of the same task, based on the input of the operator. In addition, we showed how the system can be used as a fall-back in case of communication break-down—something common when operating through a satellite link—and proceed to execute the remainder of the task autonomously.

In future work, we aim to investigate ways of accommodating force and torque dimensions in our demonstrations, as such information is often important for successful interaction with the environment. Along the same line, we will investigate the role of force feedback to the operator and how such loop can be closed in cases where the operator and the robot spaces differ. Finally, we are currently exploring ways of inferring the operators intent while he/she is performing a task, and choosing which task to execute on the robot side from a set of tasks in a learned task library.


  1. 1.

    A model predictive control (MPC) formulation can alternatively be employed as in Zeestraten et al. (2016), generating the state sequence in a receding horizon manner.

  2. 2.

    Please note that fiducial markers are used here to simplify the perception part of the problem. These will be replaced at a later stage with a specialized model-based underwater perception solution, that is contributed by another partner within the DexROV consortium.

  3. 3.

    Note that ultimately, the target in the project is to replace the operator’s space by an immersive VR setup rather than the teleoperation mock-up.


  1. Bowen, C., & Alterovitz, R. (2014). Closed-loop global motion planning for reactive execution of learned tasks. In IEEE/RSJ international conference on intelligent robots and systems (IROS).

  2. Calinon, S. (2016). A tutorial on task-parameterized movement learning and retrieval. Intelligent Service Robotics, 9(1), 1–29. https://doi.org/10.1007/s11370-015-0187-9.

    Article  Google Scholar 

  3. Calinon, S., Bruno, D., & Caldwell, D. G. (2014). A task-parameterized probabilistic model with minimal intervention control. In Proceedings of the IEEE international conference on robotics and automation (ICRA), Hong Kong, China (pp. 3339–3344).

  4. Calinon, S., Li, Z., Alizadeh, T., Tsagarakis, N. G., & Caldwell, D. G. (2012). Statistical dynamical systems for skills acquisition in humanoids. In Proceedings of the IEEE international conference on humanoid robots (Humanoids), Osaka, Japan (pp. 323–329).

  5. Calinon, S., Pistillo, A., & Caldwell, D. G. (2011). Encoding the time and space constraints of a task in explicit-duration hidden Markov model. In Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (IROS), San Francisco, CA, USA (pp. 3413–3418).

  6. Chan, A., Croft, E. A., Little, J. J. (2013). Modeling nonconvex workspace constraints from diverse demonstration sets for constrained manipulator visual servoing. In Proceedings of the IEEE international conference on robotics and automation (ICRA) (pp. 3062–3068).

  7. Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B, 39(1), 1–38.

    MathSciNet  MATH  Google Scholar 

  8. Gams, A., Nemec, B., Ijspeert, A., & Ude, A. (2014). Coupling movement primitives: Interaction with the environment and bimanual tasks. IEEE Transactions on Robotics, 30(4), 816–830.

    Article  Google Scholar 

  9. Havoutis, I., & Calinon, S. (2016). Learning assistive teleoperation behaviors from demonstration. In Proceedings of the IEEE international symposium on safety, security and rescue robotics, Lausanne, Switzerland (pp. 258–263).

  10. Havoutis, I., & Calinon, S. (2017). Supervisory teleoperation with online learning and optimal control. In Proceedings of the IEEE international conference on robotics and automation (ICRA), Singapore (pp. 1534–1540).

  11. Havoutis, I., Tanwani, A. K., & Calinon, S. (2016). Online incremental learning of manipulation tasks for semi-autonomous teleoperation. In IEEE/RSJ international conference on intelligent robots and systems (IROS), workshop on closed-loop grasping and manipulation: Challenges and progress, Daejeon, Korea.

  12. Ijspeert, A., Nakanishi, J., Pastor, P., Hoffmann, H., & Schaal, S. (2013). Dynamical movement primitives: Learning attractor models for motor behaviors. Neural Computation, 25(2), 328–373.

    MathSciNet  Article  MATH  Google Scholar 

  13. Kulic, D., Takano, W., & Nakamura, Y. (2008). Incremental learning, clustering and hierarchy formation of whole body motion patterns using adaptive hidden Markov chains. The International Journal of Robotics Research, 27(7), 761–784.

    Article  Google Scholar 

  14. Laub, A. J. (1979). A Schur method for solving algebraic Riccati equations. IEEE Transactions on automatic control, 24(6), 913–921.

    MathSciNet  Article  MATH  Google Scholar 

  15. Lee, D., & Ott, C. (2010). Incremental motion primitive learning by physical coaching using impedance control. In Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (IROS), Taipei, Taiwan (pp. 4133–4140).

  16. Lee, D., Ott, C., & Nakamura, Y. (2010). Mimetic communication model with compliant physical contact in human–humanoid interaction. The International Journal of Robotics Research, 29(13), 1684–1704.

    Article  Google Scholar 

  17. Palomeras, N., Carrera, A., Hurtós, N., Karras, G. C., Bechlioulis, C. P., Cashmore, M., et al. (2016). Toward persistent autonomous intervention in a subsea panel. Autonomous Robots, 40(7), 1279–1306.

    Article  Google Scholar 

  18. Pastor, P., Righetti, L., Kalakrishnan, M., & Schaal, S. (2011). Online movement adaptation based on previous sensor experiences. In Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 365–371).

  19. Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–285.

    Article  Google Scholar 

  20. Rasmussen, C. E. (2000). The infinite Gaussian mixture model. In S. A. Solla, T. K. Leen, & K. R. Mueller (Eds.), Advances in neural information processing systems (NIPS) (pp. 554–560). Cambridge: MIT Press.

    Google Scholar 

  21. Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6(2), 461–464.

    MathSciNet  Article  MATH  Google Scholar 

  22. Silvério, J., Rozo, L., Calinon, S., & Caldwell, D. G. (2015). Learning bimanual end-effector poses from demonstrations using task-parameterized dynamical systems. In Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (IROS), Hamburg, Germany (pp. 464–470).

  23. Tanwani, A., & Calinon, S. (2016). Learning robot manipulation tasks with task-parameterized semitied hidden semi-markov model. IEEE Robotics and Automation Letters, 1(1), 235–242. https://doi.org/10.1109/LRA.2016.2517825.

    Article  Google Scholar 

  24. Ude, A., Nemec, B., Petrič, T., & Morimoto, J. (2014). Orientation in cartesian space dynamic movement primitives. In Proceedings of the IEEE international conference on robotics and automation (ICRA), Hong Kong, China (pp. 2997–3004).

  25. Yu, S. Z. (2010). Hidden semi-Markov models. Artificial Intelligence, 174, 215–243.

    MathSciNet  Article  MATH  Google Scholar 

  26. Yu, S. Z., & Kobayashi, H. (2006). Practical implementation of an efficient forward–backward algorithm for an explicit-duration hidden Markov model. IEEE Transactions on Signal Processing, 54(5), 1947–1951.

    Article  MATH  Google Scholar 

  27. Zeestraten, M. J. A., Calinon, S., & Caldwell, D. G. (2016). Variable duration movement encoding with minimal intervention control. In Proceedings of the IEEE international conference on robotics and automation (ICRA), Stockholm, Sweden (pp. 497–503).

  28. Zeestraten, M. J. A., Havoutis, I., Silvério, J., Calinon, S., & Caldwell, D. G. (2017). An approach for imitation learning on Riemannian manifolds. IEEE Robotics and Automation Letters (RA-L), 2(3), 1240–1247. https://doi.org/10.1109/LRA.2017.2657001.

    Article  Google Scholar 

Download references


Havoutis was supported by the UK EPSRC grants EP/M019918/1, EP/R026084/1 and EP/R026173/1.

Author information



Corresponding author

Correspondence to Ioannis Havoutis.

Additional information

This work was in part supported by the DexROV project through the EC Horizon 2020 programme (Grant #635491).

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (mp4 15100 KB)


Appendix I

The E and M steps have the following form.


$$\begin{aligned} h_{t,i} = \frac{\pi _i \prod \nolimits _{j=1}^P \mathcal {N}\Big ({\varvec{X}}^{(j)}_t \Big |\; {\varvec{\mu }}^{(j)}_i,{\varvec{\varSigma }}^{(j)}_i \Big )}{\sum _{k=1}^K \pi _k \prod \nolimits _{j=1}^P \mathcal {N}\Big ({\varvec{X}}^{(j)}_t \Big |\; {\varvec{\mu }}^{(j)}_k,{\varvec{\varSigma }}^{(j)}_k \Big )} . \end{aligned}$$


$$\begin{aligned} \pi _i&\leftarrow \frac{\sum _{t=1}^N h_{t,i}}{N} , \end{aligned}$$
$$\begin{aligned} {\varvec{\mu }}^{(j)}_i&\leftarrow \frac{ \sum _{t=1}^N h_{t,i}\; {\varvec{X}}^{(j)}_t }{\sum _{t=1}^N h_{t,i}} ,\end{aligned}$$
$$\begin{aligned} {\varvec{\varSigma }}^{(j)}_i&\leftarrow \frac{\sum _{t=1}^N h_{t,i}\; \Big ({\varvec{X}}^{(j)}_t-{\varvec{\mu }}^{(j)}_i\Big ) \Big ({\varvec{X}}^{(j)}_t-{\varvec{\mu }}^{(j)}_i \Big )^{\scriptscriptstyle \top }}{\sum _{t=1}^N h_{t,i}} . \end{aligned}$$

These steps are the result of a log-likelihood maximization process subject to the constraint that the data in the different coordinate systems are drawn from the same source.

Appendix II

We compute the the forward variable, \(\alpha ^{\tiny {\text {HSMM}}}_{i,t}\) as in Rabiner (1989) with

$$\begin{aligned} \alpha ^{\tiny {\text {HSMM}}}_{i,t}&= \sum \limits _{j=1}^K \sum \limits _{d=1}^{d^{\max }} \alpha ^{\tiny {\text {HSMM}}}_{j,t-d} \; a_{j,i} \; \mathcal {N}^{\scriptscriptstyle {\mathcal {D}}}_{d,i} \prod \limits _{s=t-d+1}^{t} \mathcal {N}_{s,i} , \quad \mathrm {where}\\ \mathcal {N}^{\scriptscriptstyle {\mathcal {D}}}_{d,i}&= \mathcal {N}(d | \mu ^{\scriptscriptstyle {\mathcal {D}}}_i,\varSigma ^{\scriptscriptstyle {\mathcal {D}}}_i) \;\mathrm {and}\; \mathcal {N}_{s,i} = \mathcal {N}\big ({\varvec{\xi }}_s |\; {\varvec{\mu }}_i,{\varvec{\varSigma }}_i\big ) , \end{aligned}$$

with the initial iterations (for \(t<d^{\max }\)) given by

$$\begin{aligned} \alpha ^{\tiny {\text {HSMM}}}_{i,1}= & {} \varPi _i \mathcal {N}^{\scriptscriptstyle {\mathcal {D}}}_{1,i} \; \mathcal {N}_{1,i} , \\ \alpha ^{\tiny {\text {HSMM}}}_{i,2}= & {} \varPi _i \mathcal {N}^{\scriptscriptstyle {\mathcal {D}}}_{2,i} \prod _{s=1}^2 \mathcal {N}_{s,i} + \sum _{j=1}^K \alpha ^{\tiny {\text {HSMM}}}_{j,1} \; a_{j,i} \mathcal {N}^{\scriptscriptstyle {\mathcal {D}}}_{1,i} \mathcal {N}_{2,i} ,\\ \alpha ^{\tiny {\text {HSMM}}}_{i,3}= & {} \varPi _i \mathcal {N}^{\scriptscriptstyle {\mathcal {D}}}_{3,i} \prod _{s=1}^3 \mathcal {N}_{s,i} + \sum _{j=1}^K \sum \limits _{d=1}^{2} \alpha ^{\tiny {\text {HSMM}}}_{j,3-d} \; a_{j,i} \mathcal {N}^{\scriptscriptstyle {\mathcal {D}}}_{d,i} \prod \limits _{s=4-d}^{3} \mathcal {N}_{s,i} , \end{aligned}$$

etc., corresponding to the update rule

$$\begin{aligned} \alpha ^{\tiny {\text {HSMM}}}_{i,t} = \varPi _i \mathcal {N}^{\scriptscriptstyle {\mathcal {D}}}_{t,i} \prod _{s=1}^t \mathcal {N}_{s,i} + \sum _{j=1}^K \sum \limits _{d=1}^{t-1} \alpha ^{\tiny {\text {HSMM}}}_{j,t-d} a_{j,i} \mathcal {N}^{\scriptscriptstyle {\mathcal {D}}}_{d,i} \prod \limits _{s=t-d+1}^{t} \mathcal {N}_{s,i}.\nonumber \\ \end{aligned}$$

Note that the iterations can be reformulated for efficient computation, see Yu and Kobayashi (2006) and Yu (2010) for details.

Appendix III

For the infinite horizon LQT formulation, the cost function in Eq. (7) can be minimized with the algebraic Riccati equation (ARE)

$$\begin{aligned}&{\varvec{A}}^{\scriptscriptstyle \top }{\varvec{P}} + {\varvec{P}} {\varvec{A}} - {\varvec{P}} {\varvec{B}} {\varvec{R}}^{-1}{\varvec{B}}^{\scriptscriptstyle \top }{\varvec{P}} + {\varvec{Q}} = {\varvec{0}}\nonumber \\&\quad \Leftrightarrow \begin{bmatrix}{\varvec{P}}&-{\varvec{I}} \end{bmatrix} {\varvec{H}} \begin{bmatrix}{\varvec{I}} \\ {\varvec{P}} \end{bmatrix} = {\varvec{0}} , \end{aligned}$$

with the Hamiltonian matrix

$$\begin{aligned} {\varvec{H}} = \begin{bmatrix} {\varvec{A}}&\quad -\,{\varvec{{\varvec{B}} {\varvec{R}}^{-1}{\varvec{B}}^{\scriptscriptstyle \top }}} \\ -\,{\varvec{Q}}&\quad -\,{\varvec{A^{\scriptscriptstyle \top }}} \end{bmatrix} . \end{aligned}$$

Under suitable hypotheses about symmetry, stabilizability and detectability, Eq. (21) has a unique positive semidefinite solution \({\varvec{P}}\), which can be obtained by several methods, see Laub (1979) for details.

The key is to convert the problem to a stable invariant subspace problem of the Hamiltonian matrix, i.e., finding the invariant subspace corresponding to eigenvalues of \({\varvec{H}}\) with negative real parts. This subspace can be found in several ways. With the cost \({\varvec{Q}}\) evaluated from precision matrices, a simple solution is to use an eigendecomposition approach (with ordered eigencomponents) to decompose \({\varvec{H}}\) as

$$\begin{aligned} {\varvec{H}} = {\varvec{V}} \begin{bmatrix} {\varvec{\lambda }}_1&\quad {\varvec{0}} \\ {\varvec{0}}&\quad {\varvec{\lambda }}_2 \end{bmatrix} {\varvec{V}}^{\scriptscriptstyle \top },\quad \mathrm {with}\quad {\varvec{V}} = \begin{bmatrix} {\varvec{V}}_1&\quad {\varvec{V}}_{12} \\ {\varvec{V}}_{21}&\quad {\varvec{V}}_2 \end{bmatrix} , \end{aligned}$$

where \(\begin{bmatrix}{\varvec{V}}_1\\{\varvec{V}}_{21}\end{bmatrix}\) forms the stable eigenspace of \({\varvec{H}}\). We have that \({\varvec{H}}\in \mathbb {R}^{4D\times 4D}\) for a double integrator as in Eq. (6), which precisely has 2D eigenvalues with negative real parts. Together with (21), it provides the ARE solution

$$\begin{aligned} {\varvec{P}} = {\varvec{V}}_{21} {\varvec{V}}_1^{-1}, \end{aligned}$$

which is used at each iteration to compute the tracking gains.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Havoutis, I., Calinon, S. Learning from demonstration for semi-autonomous teleoperation. Auton Robot 43, 713–726 (2019). https://doi.org/10.1007/s10514-018-9745-2

Download citation


  • Learning from demonstration
  • Teleoperation
  • Shared autonomy
  • Manipulation in ROVs