We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Advertisement

An ensemble inverse optimal control approach for robotic task learning and adaptation

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

This paper contributes a novel framework to efficiently learn cost-to-go function representations for robotic tasks with latent modes. The proposed approach relies on the principle behind ensemble methods, where improved performance is obtained by aggregating a group of simple models, each of which can be efficiently learnedq. The maximum-entropy approximation is adopted as an effective initialization and the quality of this surrogate is guaranteed by a theoretical bound. Our approach also provides an alternative perspective to view the popular mixture of Gaussians under the framework of inverse optimal control. We further propose to enforce a dynamics on the model ensemble, using Kalman estimation to infer and modulate model modes. This allows robots to exploit the demonstration redundancy and to adapt to human interventions, especially in tasks where sensory observations are non-Markovian. The framework is demonstrated with a synthetic inverted pendulum example and online adaptation tasks, which include robotic handwriting and mail delivery.

This is a preview of subscription content, log in to check access.

Access options

Buy single article

Instant unlimited access to the full article PDF.

US$ 39.95

Price includes VAT for USA

Subscribe to journal

Immediate online access to all issues from 2019. Subscription will auto renew annually.

US$ 99

This is the net price. Taxes to be calculated in checkout.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24

Notes

  1. 1.

    Namely, by recursively evaluating \(\varvec{\varLambda }_{t} = \varvec{Q}_t + \varvec{A}_t^T\varvec{\varLambda }_{t+1}\varvec{A}_t - \varvec{A}_t^T\varvec{\varLambda }_{t+1}\varvec{B}_t(\varvec{B}_t^T\varvec{\varLambda }_{t+1}\varvec{B}_t + \varvec{R}_t)^{-1}\varvec{B}_t^T\varvec{\varLambda }_{t+1}\varvec{A}_t\) with \(\varvec{\varLambda }_T = \varvec{Q}_T\).

References

  1. Abdolmaleki, A., Lau, N., Paulo Reis, L., & Neumann, G. (2016). Contextual stochastic search. In Proceedings of the 2016 on genetic and evolutionary computation conference companion, GECCO ’16 companion (pp. 29–30). New York, NY: ACM.

  2. Akgun, B., Cakmak, M., Yoo, J. W., & Thomaz, A. L. (2012). Trajectories and keyframes for kinesthetic teaching: A human-robot interaction perspective. In Proceedings of the ACM/IEEE international conference on human-robot interaction (HRI) (pp 391–398). New York, NY.

  3. Bagnell, J. A. D. (2015). An invitation to imitation. Tech. Rep. CMU-RI-TR-15-08, Robotics Institute, Pittsburgh, PA.

  4. Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140.

  5. Calinon, S. (2015). Robot learning with task-parameterized generative models. In Proceedings of the international symposium of robotics research (ISRR).

  6. Calinon, S., Pervez, A., & Caldwell, D. G. (2012). Multi-optima exploration with adaptive Gaussian mixture model. In Proceedings of the international conference on development and learning (ICDL-EpiRob). San Diego, CA.

  7. Calinon, S., Bruno, D., & Caldwell, D. G. (2014). A task-parameterized probabilistic model with minimal intervention control. In Proceedings of the IEEE international conference on robotics and automation (ICRA) (pp. 3339–3344).

  8. Criminisi, A., Shotton, J., & Konukoglu, E. (2012). Decision forests: A unified framework for classification, regression, density estimation, manifold learning and semi-supervised learning. Foundations and Trends in Computer Graphics and Vision, 7(2–3), 81–227.

  9. Dvijotham, K., & Todorov, E. (2010). Inverse optimal control with linearly-solvable mdps. In Proceedings of the international conference on machine learning (ICML) (pp. 335–342).

  10. Englert, P., Paraschos, A., Peters, J., & Deisenroth, M. P. (2013). Model-based imitation learning by probabilistic trajectory matching. In Proceedings of the IEEE international conference on robotics and automation (ICRA) (pp. 1922–1927).

  11. Ewerton, M., Neumann, G., Lioutikov, R., Ben Amor, H., Peters, J., & Maeda, G. (2015). Learning multiple collaborative tasks with a mixture of interaction primitives. In IEEE international conference on robotics and automation (pp. 1535–1542).

  12. Finn, C., Levine, S., & Abbeel, P. (2016). Guided cost learning: Deep inverse optimal control via policy optimization. In Proceedings of the international conference on machine learning (ICML) abs/1603.00448.

  13. Frigola, R., Chen, Y., & Rasmussen, C. E. (2014). Variational Gaussian Process state-space models. In Proceedings of neural information processing systems (NIPS).

  14. Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees. Machine Learning, 63(1), 3–42.

  15. Kalakrishnan, M., Pastor, P., Righetti, L., & Schaal, S. (2013). Learning objective functions for manipulation. In Proceedings of the IEEE international conference on robotics and automation (ICRA) (pp. 1331–1336).

  16. Kalman, R. E. (1964). When is a linear control system optimal. Journal of Basic Engineering, 86, 51–60.

  17. Kappen, H. J., Gmez, V., & Opper, M. (2012). Optimal control as a graphical model inference problem. Machine Learning, 87(2), 159–182.

  18. Khansari, M., Kronander, K., & Billard, A. (2014). Modeling robot discrete movements with state-varying stiffness and damping: A framework for integrated motion generation and impedance control. In Proceedings of robotics: Science and systems (RSS).

  19. Kobilarov, M. (2012). Cross-entropy motion planning. International Journal of Robotics Research, 31(7), 855–871.

  20. Kukliski, K., Fischer, K., Marhenke, I., Kirstein, F., aus der Wieschen, M. V., Sølvason, D., Krüger, N., & Savarimuthu, T. R. (2014). Teleoperation for learning by demonstration: Data glove versus object manipulation for intuitive robot control. In Ultra modern telecommunications and control systems and workshops (ICUMT), 2014 6th international congress on (pp. 346 – 351). IEEE.

  21. Levine, S., & Koltun, V. (2012). Continuous inverse optimal control with locally optimal examples. In Proceedings of the international conference on machine learning (ICML).

  22. Levine, S., Popovic, Z., & Koltun, V. (2011). Nonlinear inverse reinforcement learning with gaussian processes. In Proceedings of neural information processing systems (NIPS) (pp. 19–27). Curran Associates, Inc.

  23. Monfort, M., Liu, A., & Ziebart, B. D. (2015). Intent prediction and trajectory forecasting via predictive inverse linear-quadratic regulation. In Proceedings of the twenty-ninth AAAI conference on artificial intelligence, AAAI’15 (pp 3672–3678). AAAI Press.

  24. Nehaniv, C. L., & Dautenhahn, K. (2002). The correspondence problem. In K. Dautenhahn & C. L. Nehaniv (Eds.), Imitation in animals and artifacts (pp. 41–61). Cambridge, MA: MIT Press.

  25. Nikolaidis, S., Ramakrishnan, R., Gu, K., & Shah, J. (2015). Efficient model learning from joint-action demonstrations for human-robot collaborative tasks. In Proceedings of the ACM/IEEE international conference on human-robot interaction (HRI) (pp. 189–196). New York, NY: ACM.

  26. Pomerleau, D. A. (1991). Efficient training of artificial neural networks for autonomous navigation. Neural Computation, 3(1), 88–97.

  27. Ratliff, N., Bagnell, J. A. D., & Zinkevich, M. (2006). Maximum margin planning. In Proceedings of the international conference on machine learning (ICML).

  28. Rozo, L., Bruno, D., Calinon, S., & Caldwell, D. G. (2015). Learning optimal controllers in human-robot cooperative transportation tasks with position and force constraints. In Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (IROS) Hamburg, Germany (pp. 1024–1030).

  29. Todorov, E. (2009). Compositionality of optimal control laws. In Proceedings of neural information processing systems (NIPS) (pp. 1856–1864). Curran Associates Inc., USA.

  30. Watter, M., Springenberg, J. T., Boedecker, J., & Riedmiller, M. A. (2015). Embed to control: A locally linear latent dynamics model for control from raw images. CoRR abs/1506.07365.

  31. Wulfmeier, M., Ondruska, P., & Posner, I. (2015). Maximum entropy deep inverse reinforcement learning. CoRR abs/1507.04888.

  32. Yin, H., Alves-Olivera, P., Melo, F. S., Billard, A., & Paiva, A. (2016). Synthesizing robotic handwriting motion by learning from human demonstrations. In Proceedings of international joint conference on artificial intelligence (IJCAI).

  33. Ziebart, B. D., Maas, A. L., Bagnell, J. A., & Dey, A. K. (2008). Maximum entropy inverse reinforcement learning. In Proceedings of the national conference on artificial intelligence (AAAI) (pp. 1433–1438).

Download references

Acknowledgements

This work is partially funded by Swiss National Center of Robotics Research and national funds through Fundação para a Ciência e a Tecnologia (FCT) with reference UID/CEC/50021/2013 and the doctoral Grant (ref. SFRH/BD/51933/2012) under IST-EPFL Joint Doctoral Initiative.

Author information

Correspondence to Hang Yin.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (wmv 25252 KB)

Supplementary material 1 (wmv 25252 KB)

Appendix: Proof for the MaxEnt approximation of probabilistic model with quadratic cost-to-go

Appendix: Proof for the MaxEnt approximation of probabilistic model with quadratic cost-to-go

Substituting the Gaussian passive dynamics and the quadratic cost-to-go function, we have:

$$\begin{aligned} \begin{aligned} P(\varvec{x}_{t+1}|\varvec{x}_t) = \frac{e^{-\frac{1}{2}\Vert \varvec{x}_{t+1} - f(\varvec{x}_t))\Vert _{\varSigma _0^{-1}} - \frac{1}{2}\Vert \varvec{x}_{t+1} - \varvec{\mu }\Vert _{\varvec{\varLambda }}}}{\int e^{-\frac{1}{2}\Vert \varvec{x}_{t+1}' - f(\varvec{x}_t)\Vert _{\varSigma _0^{-1}} - \frac{1}{2}\Vert \varvec{x}_{t+1}' - \varvec{\mu }\Vert _{\varvec{\varLambda }}}d\varvec{x}_{t+1}'} \end{aligned} \end{aligned}$$
(16)

The corresponding log-likelihood can be written as

$$\begin{aligned} \begin{aligned} \mathcal {L}(\varvec{\mu }, \varvec{\varLambda }) =&-\frac{1}{2}(\varvec{x}_{t+1} - f(\varvec{x}_t))^T\varvec{\varSigma }_0^{-1}(\varvec{x}_{t+1} - f(\varvec{x}_t)) \\&- \frac{1}{2}(\varvec{x}_{t+1} - \varvec{\mu })^T\varvec{\varLambda }(\varvec{x}_{t+1} - \varvec{\mu }) \\&- \log \int \underbrace{e^{-\frac{1}{2}(\varvec{x}_{t+1}' - f(\varvec{x}_t))^T\varvec{\varSigma }_0^{-1}(\varvec{x}_{t+1}' - f(\varvec{x}_t))}}_{ \le 1 \text { and positive}} \\&\times e^{- \frac{1}{2}(\varvec{x}_{t+1}' - \varvec{\mu })^T\varvec{\varLambda }(\varvec{x}_{t+1}' - \varvec{\mu })}d\varvec{x}_{t+1} \\ \ge&\underbrace{-\frac{1}{2}(\varvec{x}_{t+1} - f(\varvec{x}_t))^T\varSigma _0^{-1}(\varvec{x}_{t+1} - f(\varvec{x}_t))}_{\text {Independent of } \varvec{\mu } \text { and } \varvec{\varLambda }} \\&- \frac{1}{2}(\varvec{x}_{t+1} - \varvec{\mu })^T\varvec{\varLambda }(\varvec{x}_{t+1} - \varvec{\mu }) \\&+ \frac{d}{2}\log (2\pi ) + \frac{1}{2}\log |\varvec{\varSigma }_0| \\&\underbrace{- \frac{d}{2}\log (2\pi ) - \frac{1}{2}\log |\varvec{\varLambda }^{-1}|}_{- \log [\int e^{- \frac{1}{2}(\varvec{x}_{t+1}' - \varvec{\mu })^T\varvec{\varLambda }(\varvec{x}_{t+1}' - \varvec{\mu })}d\varvec{x}_{t+1}']} \\ =&- \frac{1}{2}(\varvec{x}_{t+1} - \varvec{\mu })^T\varvec{\varLambda }(\varvec{x}_{t+1} - \varvec{\mu }) \\&-\frac{1}{2}\log |\varvec{\varLambda }^{-1}| + \text {const} \\ =\,&\hat{\mathcal {L}}(\varvec{\mu }, \varvec{\varLambda }) \end{aligned} \end{aligned}$$
(17)

where d denotes the state dimension. The exponential from the passive dynamics (the third line of the equation) can be considered as a positive coefficient that is always less than one. Replacing the coefficient with one results in a simple integral of Gaussian function (the exponential of negative cost-to-go function, line 7), which is always larger than or equal to the integral involving passive dynamics. We can obtain a lower bound of the original likelihood by instead subtracting this simplified integral. The MaxEnt estimation \(\varvec{\mu } = \frac{1}{N} \sum \nolimits _{i=1}^{N} \varvec{x}_{t+1}^i\) and \(\varvec{\varLambda }^{-1} = \frac{1}{N} \sum \nolimits _{i=1}^{N} (\varvec{x}_{t+1}^i - \varvec{\mu })(\varvec{x}_{t+1}^i - \varvec{\mu })^T\) happens to be the optimal solution to the likelihood lower-bound \(\hat{\mathcal {L}}\). And the gap shrinks as noise magnitude \(\Vert \varvec{\varSigma }_0\Vert \rightarrow \infty \), with the approximation degenerating to the MaxEnt formulation.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Yin, H., Melo, F.S., Paiva, A. et al. An ensemble inverse optimal control approach for robotic task learning and adaptation. Auton Robot 43, 875–896 (2019). https://doi.org/10.1007/s10514-018-9757-y

Download citation

Keywords

  • Learning from demonstrations
  • Human-robot collaboration
  • Ensemble methods
  • Inverse optimal control