Skip to main content
Log in

Discovering relevant task spaces using inverse feedback control

Autonomous Robots Aims and scope Submit manuscript


Learning complex skills by repeating and generalizing expert behavior is a fundamental problem in robotics. However, the usual approaches do not answer the question of what are appropriate representations to generate motion for a specific task. Since it is time-consuming for a human expert to manually design the motion control representation for a task, we propose to uncover such structure from data-observed motion trajectories. Inspired by Inverse Optimal Control, we present a novel method to learn a latent value function, imitate and generalize demonstrated behavior, and discover a task relevant motion representation. We test our method, called Task Space Retrieval Using Inverse Feedback Control (TRIC), on several challenging high-dimensional tasks. TRIC learns the important control dimensions for the tasks from a few example movements and is able to robustly generalize to new situations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17


  1. To simplify even more we use \(t_0 =1\) and \(t_T = T\).

  2. Note that the demonstrations may be from different situations, with objects placed on different locations. Since the features \(y_i^t\) may be object relative they cannot be captured from \(q_t^i\) alone—which we neglected in our notation. We therefore “record” also \(y_t^i\) and the Jacobians \(\dfrac{\partial \phi }{\partial q}(q_t^i)\) for all demonstrations. If it is clear from the context that we are dealing with just one trajectory, we will skip the superscript \(i\) and write just \(q_t\) instead of \(q_t^i\).

  3. Here we write \(\dfrac{\partial f}{\partial y}\) instead of \(\dfrac{\partial f}{\partial \phi }\), because \(y = \phi (q)\).

  4. It is still decreasing despite the gradient sign change because of other features coupled geometrically to \(p_{3,1}^y\).

  5. Implicit surface object models are learned from sensory data and the object surface is a nonlinear function potential itself, e.g. a Gaussian Process or SVR, see Steinke et al. (2005).


  • Argall, B. D., Chernova, S., Veloso, M. M., & Browning, B. (2009). A survey of robot learning from demonstration. Robotics and Autonomous Systems, 57(5), 469–483.

    Article  Google Scholar 

  • Bain, M., and Sammut, C. (1996). A framework for behavioural cloning. In Machine intelligence, vol 15 (pp. 103–129). Oxford: Oxford University Press.

  • Berniker, M., & Kording, K. (2008). Estimating the sources of motor errors for adaptation and generalization. Nature Neuroscience, 11(12), 1454–1461.

    Article  Google Scholar 

  • Billard, A., Epars, Y., Calinon, S., Cheng, G., & Schaal, S. (2004). Discovering optimal imitation strategies. Robotics and Autonomous Systems, Special Issue: Robot Learning from Demonstration, 47(2–3), 69–77.

    Article  Google Scholar 

  • Calinon, S., and Billard, A. (2007). Incremental learning of gestures by imitation in a humanoid robot. In HRI ’07: Proceedings of the ACM/IEEE International Conference on Human–Robot Interaction (pp. 255–262).

  • Call, J., and Carpenter, M. (2002). Three sources of information in social learning. Imitation in animals and artifacts, pp. 211–228.

  • Craig, J. J. (1989). Introduction to robotics: Mechanics and control (2nd ed.). Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc. ISBN 0201095289.

  • Dragiev, S., Toussaint, M., and Gienger, M. (2011). Gaussian process implict surface for object estimation and grasping. In IEEE International Conference on Robotics and Automation (ICRA).

  • Gienger, M., Toussaint, M., Jetchev, N., Bendig, A., and Goerick, C. (2008). Optimization of fluent approach and grasp motions. In 8th IEEE-RAS International Conference on Humanoid Robots.

  • Haindl, M., Somol, P., Ververidis, D., and Kotropoulos, C. (2006). Feature selection based on mutual correlation. In CIARP, pp. 569–577.

  • Hiraki, K., Sashima, A., & Phillips, S. (1998). From egocentric to allocentric spatial behavior: A computational model of spatial development. Adaptive Behavior, 6(3–4), 371–391.

    Article  Google Scholar 

  • Ho, E. S. L., Komura, T., & Tai, C.-L. (2010). Spatial relationship preserving character motion adaptation. ACM Transactions on Graphics, 29(4), 1–8.

    Article  Google Scholar 

  • Howard, M., Klanke, S., Gienger, M., Goerick, C., & Vijayakumar, S. (2009). A novel method for learning policies from variable constraint data. Autonomous Robots, 27, 105–121.

    Article  Google Scholar 

  • Jenkins, O. C., and Matarić, M. J. (2004). A spatio-temporal extension to isomap nonlinear dimension reduction. In 21st International Conference on Machine Learning (ICML).

  • Jetchev, N. (2012). Learning representations from motion trajectories: Analysis and applications to robot planning and control. PhD thesis, FU Berlin. Retrieved from Accessed 1 Aug 2012.

  • Jetchev N., and Toussaint, M. (2011). Task space retrieval using inverse feedback control. In 28th International Conference on Machine Learning (ICML), pp. 449–456.

  • Khansari-Zadeh, S. M., and Billard, A. (2010). Bm: An iterative algorithm to learn stable non-linear dynamical systems with gaussian mixture models. In IEEE International Conference on Robotics and Automation (ICRA), pp. 2381–2388.

  • Kroemer, O., Detry, R., Piater, J. H., and Peters, J. (2009). Active learning using mean shift optimization for robot grasping. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2610–2615.

  • Kroemer, O., Detry, R., Piater, J. H., & Peters, J. (2010). Grasping with vision descriptors and motor primitives. ICINCO, 2, 47–54.

    Google Scholar 

  • LeCun, Y., Chopra, S., Hadsell, R., Ranzato, M., & Huang, F. (2006). A tutorial on energy-based learning. In Predicting structured data.

  • Montavon, G., Braun, M., & Müller, K.-R. (2011). Kernel analysis of deep networks. Journal of Machine Learning Research, 12, 2563–2581.

    MATH  Google Scholar 

  • Muehlig, M., Gienger, M., Steil, J. J., and Goerick, C. (2009). Automatic selection of task spaces for imitation learning. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4996–5002.

  • Myers, C. S., & Rabiner, L. R. (1981). A comparative study of several dynamic time-warping algorithms for connected word recognition. The Bell System Technical Journal, 60(7), 1389–1409.

    Article  Google Scholar 

  • Nouri, A., & Littman, M. L. (2010). Dimension reduction and its application to model-based exploration in continuous spaces. Machine Learning, 81(1), 85–98.

    Article  MathSciNet  Google Scholar 

  • Perkins, T. J., & Barto, A. G. (2002). Lyapunov design for safe reinforcement learning. Journal of Machine Learning Research, 3, 803–832.

    MathSciNet  Google Scholar 

  • Pomerleau, D. A. (1991). Efficient training of artificial neural networks for autonomous navigation. Neural Computation, 3, 88–97.

    Article  Google Scholar 

  • Puterman, M. L. (1994). Markov decision processes: Discrete stochastic dynamic programming. New York: Wiley.

    Book  MATH  Google Scholar 

  • Ratliff, N., Ziebart, B., Peterson, K., Bagnell, J. A., Hebert, M., Dey, A. K., and Srinivasa, S. (2009). Inverse optimal heuristic control for imitation learning. In Proceedings of AISTATS, pp. 424–431.

  • Ratliff, N. D., Bagnell, J. A., and Zinkevich, M. A. (2006). Maximum margin planning. In 26th International Conference on Machine Learning (ICML), pp. 729–736.

  • Schaal, S., Peters, J., Nakanishi, J., and Ijspeert, A. J. (2003). Learning movement primitives. In International Symposium on Robotics Research, pp. 561–572.

  • Siciliano, B., & Khatib, O. (Eds.). (2008). Springer handbook of robotics. Berlin: Springer.

    MATH  Google Scholar 

  • Slotine, J.-J., & Li, W. (1991). Applied nonlinear control. Upper Saddle River, NJ: Prentice Hall.

    MATH  Google Scholar 

  • Steinke, F., Schölkopf, B., & Blanz, V. (2005). Support vector machines for 3D shape processing. Computer Graphics Forum, 24(3), 285–294.

    Article  Google Scholar 

  • Tegin, J., Ekvall, S., Kragic, D., Wikander, J., & Iliev, B. (2009). Demonstration-based learning and control for automatic grasping. Intelligent Service Robotics, 2, 23–30.

    Article  Google Scholar 

  • Toussaint, M. (2009). Robot trajectory optimization using approximate inference. In 26th International Conference on Machine Learning (ICML), pp. 1049–1056.

  • Toussaint, M. (2011). Robotics. University Lecture, 2011. Retrieved from Accessed 1 Aug 2012.

  • Tsochantaridis, I., Joachims, T., Hofmann, T., & Altun, Y. (2005). Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6, 1453–1484.

    MATH  MathSciNet  Google Scholar 

  • Ude, A., Gams, A., Asfour, T., & Morimoto, J. (2010). Task-specific generalization of discrete and periodic dynamic movement primitives. IEEE Transactions on Robotics, 26(5), 800–815.

    Article  Google Scholar 

  • Wagner, T., Visser, U., & Herzog, O. (2004). Egocentric qualitative spatial knowledge representation for physical robots. Robotics and Autonomous Systems, 49(1–2), 25–42.

    Article  Google Scholar 

Download references


This work was supported by the German Research Foundation (DFG), Emmy Noether fellowship TO 409/1-3, and the EU FP7 project TOMSY.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Nikolay Jetchev.



1.1 Proof of proposition 1 for the direction of IK generated motion steps

Proposition 1 If \(\varrho \rightarrow \infty \) then the IK solution \(q_{t+1}\) minimizing Eq. (9) has the property that the next step \(q_{t+1} - q_{t}\) is approximately proportional to the value function gradient \(\mathcal {J}\) in a small region around \(q_t\).


If \(\varrho \rightarrow \infty \) then the term \(||f \circ \phi (q) - f \circ \phi (q_t) + \delta ||^2\) of Eq. (9) is weighted so high that \(C_{prior}\) and any other cost terms we might add are neglected. Let \(\mathcal {J}\) be the gradient of the value function \(f \circ \phi (q)\) evaluated at \(q = q_t\). Using the linearization \(f \circ \phi (q_{t+1}) = f \circ \phi (q_t) + \mathcal {J}(q_{t+1} - q_t)\), we can apply the IK Equation (Toussaint 2011):

$$\begin{aligned} q_{t+1}&= q_t - \delta \mathcal {J^{\sharp }}\\ \mathcal {J^{\sharp }}&= {\left( \varrho \mathcal {J}^T\mathcal {J} + \mathbb {I} \right) }^{-1}\mathcal {J}^T\varrho \\&= \mathcal {J}^T {\left( \mathcal {J}\mathcal {J}^T + {\varrho }^{-1}\right) }^{-1} = \frac{1}{||\mathcal {J}||^2}\mathcal {J}^T \end{aligned}$$

We have used the Woodbury identity and the fact that \(\mathcal {J}\mathcal {J}^T = ||\mathcal {J}||^2\) in the case where we have a 1-dimensional task variable \(y\) (in that case the Jacobian is a row vector gradient). \(\mathcal {J^{\sharp }}\) is called the pseudoinverse of \(\mathcal {J}\). Thus, the steps generated by our motion model are proportional to \(\mathcal {J}\) times a negative scalar number.\(\square \)

1.2 Proof of proposition 2 for Lyapunov attractor properties of TRIC

Proposition 2 Suppose we have trained TRIC on a single trajectory \(\{q_t, y_t\}_{t=1}^T\) and that \(f \circ \phi (q_T)\) is a minimum of the value function. Additionally, we generate motion with \(\varrho \rightarrow \infty \), i.e. very high weighting of the value function. Then the motion generated by the model in Eq. (8) fulfills the conditions of Theorem 1 and is thus asymptotically stable at the attractor subspace \(Q' = \{q': \phi (q') = \phi (q_T) = y_T \}\).


Because of \(\varrho \rightarrow \infty \) the term decreasing the value function \(f\) will dominate the motion equation and we can ignore the effect of the other terms. Let’s construct \(V(q) = f\circ \phi (q) - c_T\), where \(c_T = f\circ \phi (q_T)\). Then the Lyapunov stability conditions hold:

  • (a) holds directly because of the assumption that \(c_T= f\circ \phi (q_T)\) is minimum and any other joint state \(q\) s.t. \(\phi (q) \ne \phi (q_T)\) will have higher value \(f\circ \phi (q)\) than it.

  • (b) holds by the construction of \(V(q)\) directly implying that

    $$\begin{aligned} V(q_T) = f\circ \phi (q_T) - c_T = 0 \end{aligned}$$
  • Proposition 1 holds because we assumed that \(\varrho \rightarrow \infty \). This implies that the steps of the motion model are proportional to the gradient \(\mathcal {J}\). Thus the motion model of Eq. (9) will make steps decreasing the value \(f\circ \phi (q_t)\) constantly and (c) holds.

  • (d) holds because \(c_T= f\circ \phi (q_T)\) is local minimum and after we reach a joint state \(q'\) s.t. \(\phi (q') = \phi (q_T)\) the gradient of the value function will be 0 and no further decrease will be possible.\(\square \)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jetchev, N., Toussaint, M. Discovering relevant task spaces using inverse feedback control. Auton Robot 37, 169–189 (2014).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: