Abstract
Learning complex skills by repeating and generalizing expert behavior is a fundamental problem in robotics. However, the usual approaches do not answer the question of what are appropriate representations to generate motion for a specific task. Since it is timeconsuming for a human expert to manually design the motion control representation for a task, we propose to uncover such structure from dataobserved motion trajectories. Inspired by Inverse Optimal Control, we present a novel method to learn a latent value function, imitate and generalize demonstrated behavior, and discover a task relevant motion representation. We test our method, called Task Space Retrieval Using Inverse Feedback Control (TRIC), on several challenging highdimensional tasks. TRIC learns the important control dimensions for the tasks from a few example movements and is able to robustly generalize to new situations.
Notes
To simplify even more we use \(t_0 =1\) and \(t_T = T\).
Note that the demonstrations may be from different situations, with objects placed on different locations. Since the features \(y_i^t\) may be object relative they cannot be captured from \(q_t^i\) alone—which we neglected in our notation. We therefore “record” also \(y_t^i\) and the Jacobians \(\dfrac{\partial \phi }{\partial q}(q_t^i)\) for all demonstrations. If it is clear from the context that we are dealing with just one trajectory, we will skip the superscript \(i\) and write just \(q_t\) instead of \(q_t^i\).
Here we write \(\dfrac{\partial f}{\partial y}\) instead of \(\dfrac{\partial f}{\partial \phi }\), because \(y = \phi (q)\).
It is still decreasing despite the gradient sign change because of other features coupled geometrically to \(p_{3,1}^y\).
Implicit surface object models are learned from sensory data and the object surface is a nonlinear function potential itself, e.g. a Gaussian Process or SVR, see Steinke et al. (2005).
References
Argall, B. D., Chernova, S., Veloso, M. M., & Browning, B. (2009). A survey of robot learning from demonstration. Robotics and Autonomous Systems, 57(5), 469–483.
Bain, M., and Sammut, C. (1996). A framework for behavioural cloning. In Machine intelligence, vol 15 (pp. 103–129). Oxford: Oxford University Press.
Berniker, M., & Kording, K. (2008). Estimating the sources of motor errors for adaptation and generalization. Nature Neuroscience, 11(12), 1454–1461.
Billard, A., Epars, Y., Calinon, S., Cheng, G., & Schaal, S. (2004). Discovering optimal imitation strategies. Robotics and Autonomous Systems, Special Issue: Robot Learning from Demonstration, 47(2–3), 69–77.
Calinon, S., and Billard, A. (2007). Incremental learning of gestures by imitation in a humanoid robot. In HRI ’07: Proceedings of the ACM/IEEE International Conference on Human–Robot Interaction (pp. 255–262).
Call, J., and Carpenter, M. (2002). Three sources of information in social learning. Imitation in animals and artifacts, pp. 211–228.
Craig, J. J. (1989). Introduction to robotics: Mechanics and control (2nd ed.). Boston, MA, USA: AddisonWesley Longman Publishing Co., Inc. ISBN 0201095289.
Dragiev, S., Toussaint, M., and Gienger, M. (2011). Gaussian process implict surface for object estimation and grasping. In IEEE International Conference on Robotics and Automation (ICRA).
Gienger, M., Toussaint, M., Jetchev, N., Bendig, A., and Goerick, C. (2008). Optimization of fluent approach and grasp motions. In 8th IEEERAS International Conference on Humanoid Robots.
Haindl, M., Somol, P., Ververidis, D., and Kotropoulos, C. (2006). Feature selection based on mutual correlation. In CIARP, pp. 569–577.
Hiraki, K., Sashima, A., & Phillips, S. (1998). From egocentric to allocentric spatial behavior: A computational model of spatial development. Adaptive Behavior, 6(3–4), 371–391.
Ho, E. S. L., Komura, T., & Tai, C.L. (2010). Spatial relationship preserving character motion adaptation. ACM Transactions on Graphics, 29(4), 1–8.
Howard, M., Klanke, S., Gienger, M., Goerick, C., & Vijayakumar, S. (2009). A novel method for learning policies from variable constraint data. Autonomous Robots, 27, 105–121.
Jenkins, O. C., and Matarić, M. J. (2004). A spatiotemporal extension to isomap nonlinear dimension reduction. In 21st International Conference on Machine Learning (ICML).
Jetchev, N. (2012). Learning representations from motion trajectories: Analysis and applications to robot planning and control. PhD thesis, FU Berlin. Retrieved from http://www.diss.fuberlin.de/diss/receive/FUDISS_thesis_000000037417. Accessed 1 Aug 2012.
Jetchev N., and Toussaint, M. (2011). Task space retrieval using inverse feedback control. In 28th International Conference on Machine Learning (ICML), pp. 449–456.
KhansariZadeh, S. M., and Billard, A. (2010). Bm: An iterative algorithm to learn stable nonlinear dynamical systems with gaussian mixture models. In IEEE International Conference on Robotics and Automation (ICRA), pp. 2381–2388.
Kroemer, O., Detry, R., Piater, J. H., and Peters, J. (2009). Active learning using mean shift optimization for robot grasping. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2610–2615.
Kroemer, O., Detry, R., Piater, J. H., & Peters, J. (2010). Grasping with vision descriptors and motor primitives. ICINCO, 2, 47–54.
LeCun, Y., Chopra, S., Hadsell, R., Ranzato, M., & Huang, F. (2006). A tutorial on energybased learning. In Predicting structured data.
Montavon, G., Braun, M., & Müller, K.R. (2011). Kernel analysis of deep networks. Journal of Machine Learning Research, 12, 2563–2581.
Muehlig, M., Gienger, M., Steil, J. J., and Goerick, C. (2009). Automatic selection of task spaces for imitation learning. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4996–5002.
Myers, C. S., & Rabiner, L. R. (1981). A comparative study of several dynamic timewarping algorithms for connected word recognition. The Bell System Technical Journal, 60(7), 1389–1409.
Nouri, A., & Littman, M. L. (2010). Dimension reduction and its application to modelbased exploration in continuous spaces. Machine Learning, 81(1), 85–98.
Perkins, T. J., & Barto, A. G. (2002). Lyapunov design for safe reinforcement learning. Journal of Machine Learning Research, 3, 803–832.
Pomerleau, D. A. (1991). Efficient training of artificial neural networks for autonomous navigation. Neural Computation, 3, 88–97.
Puterman, M. L. (1994). Markov decision processes: Discrete stochastic dynamic programming. New York: Wiley.
Ratliff, N., Ziebart, B., Peterson, K., Bagnell, J. A., Hebert, M., Dey, A. K., and Srinivasa, S. (2009). Inverse optimal heuristic control for imitation learning. In Proceedings of AISTATS, pp. 424–431.
Ratliff, N. D., Bagnell, J. A., and Zinkevich, M. A. (2006). Maximum margin planning. In 26th International Conference on Machine Learning (ICML), pp. 729–736.
Schaal, S., Peters, J., Nakanishi, J., and Ijspeert, A. J. (2003). Learning movement primitives. In International Symposium on Robotics Research, pp. 561–572.
Siciliano, B., & Khatib, O. (Eds.). (2008). Springer handbook of robotics. Berlin: Springer.
Slotine, J.J., & Li, W. (1991). Applied nonlinear control. Upper Saddle River, NJ: Prentice Hall.
Steinke, F., Schölkopf, B., & Blanz, V. (2005). Support vector machines for 3D shape processing. Computer Graphics Forum, 24(3), 285–294.
Tegin, J., Ekvall, S., Kragic, D., Wikander, J., & Iliev, B. (2009). Demonstrationbased learning and control for automatic grasping. Intelligent Service Robotics, 2, 23–30.
Toussaint, M. (2009). Robot trajectory optimization using approximate inference. In 26th International Conference on Machine Learning (ICML), pp. 1049–1056.
Toussaint, M. (2011). Robotics. University Lecture, 2011. Retrieved from http://userpage.fuberlin.de/mtoussai/teaching/11Robotics/. Accessed 1 Aug 2012.
Tsochantaridis, I., Joachims, T., Hofmann, T., & Altun, Y. (2005). Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6, 1453–1484.
Ude, A., Gams, A., Asfour, T., & Morimoto, J. (2010). Taskspecific generalization of discrete and periodic dynamic movement primitives. IEEE Transactions on Robotics, 26(5), 800–815.
Wagner, T., Visser, U., & Herzog, O. (2004). Egocentric qualitative spatial knowledge representation for physical robots. Robotics and Autonomous Systems, 49(1–2), 25–42.
Acknowledgments
This work was supported by the German Research Foundation (DFG), Emmy Noether fellowship TO 409/13, and the EU FP7 project TOMSY.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
1.1 Proof of proposition 1 for the direction of IK generated motion steps
Proposition 1 If \(\varrho \rightarrow \infty \) then the IK solution \(q_{t+1}\) minimizing Eq. (9) has the property that the next step \(q_{t+1}  q_{t}\) is approximately proportional to the value function gradient \(\mathcal {J}\) in a small region around \(q_t\).
Proof
If \(\varrho \rightarrow \infty \) then the term \(f \circ \phi (q)  f \circ \phi (q_t) + \delta ^2\) of Eq. (9) is weighted so high that \(C_{prior}\) and any other cost terms we might add are neglected. Let \(\mathcal {J}\) be the gradient of the value function \(f \circ \phi (q)\) evaluated at \(q = q_t\). Using the linearization \(f \circ \phi (q_{t+1}) = f \circ \phi (q_t) + \mathcal {J}(q_{t+1}  q_t)\), we can apply the IK Equation (Toussaint 2011):
We have used the Woodbury identity and the fact that \(\mathcal {J}\mathcal {J}^T = \mathcal {J}^2\) in the case where we have a 1dimensional task variable \(y\) (in that case the Jacobian is a row vector gradient). \(\mathcal {J^{\sharp }}\) is called the pseudoinverse of \(\mathcal {J}\). Thus, the steps generated by our motion model are proportional to \(\mathcal {J}\) times a negative scalar number.\(\square \)
1.2 Proof of proposition 2 for Lyapunov attractor properties of TRIC
Proposition 2 Suppose we have trained TRIC on a single trajectory \(\{q_t, y_t\}_{t=1}^T\) and that \(f \circ \phi (q_T)\) is a minimum of the value function. Additionally, we generate motion with \(\varrho \rightarrow \infty \), i.e. very high weighting of the value function. Then the motion generated by the model in Eq. (8) fulfills the conditions of Theorem 1 and is thus asymptotically stable at the attractor subspace \(Q' = \{q': \phi (q') = \phi (q_T) = y_T \}\).
Proof
Because of \(\varrho \rightarrow \infty \) the term decreasing the value function \(f\) will dominate the motion equation and we can ignore the effect of the other terms. Let’s construct \(V(q) = f\circ \phi (q)  c_T\), where \(c_T = f\circ \phi (q_T)\). Then the Lyapunov stability conditions hold:

(a) holds directly because of the assumption that \(c_T= f\circ \phi (q_T)\) is minimum and any other joint state \(q\) s.t. \(\phi (q) \ne \phi (q_T)\) will have higher value \(f\circ \phi (q)\) than it.

(b) holds by the construction of \(V(q)\) directly implying that
$$\begin{aligned} V(q_T) = f\circ \phi (q_T)  c_T = 0 \end{aligned}$$ 
Proposition 1 holds because we assumed that \(\varrho \rightarrow \infty \). This implies that the steps of the motion model are proportional to the gradient \(\mathcal {J}\). Thus the motion model of Eq. (9) will make steps decreasing the value \(f\circ \phi (q_t)\) constantly and (c) holds.

(d) holds because \(c_T= f\circ \phi (q_T)\) is local minimum and after we reach a joint state \(q'\) s.t. \(\phi (q') = \phi (q_T)\) the gradient of the value function will be 0 and no further decrease will be possible.\(\square \)
Rights and permissions
About this article
Cite this article
Jetchev, N., Toussaint, M. Discovering relevant task spaces using inverse feedback control. Auton Robot 37, 169–189 (2014). https://doi.org/10.1007/s1051401493841
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s1051401493841