Skip to main content
Log in

Robot learning from demonstration for path planning: A review

  • Review
  • Published:
Science China Technological Sciences Aims and scope Submit manuscript

Abstract

Learning from demonstration (LfD) is an appealing method of helping robots learn new skills. Numerous papers have presented methods of LfD with good performance in robotics. However, complicated robot tasks that need to carefully regulate path planning strategies remain unanswered. Contact or non-contact constraints in specific robot tasks make the path planning problem more difficult, as the interaction between the robot and the environment is time-varying. In this paper, we focus on the path planning of complex robot tasks in the domain of LfD and give a novel perspective for classifying imitation learning and inverse reinforcement learning. This classification is based on constraints and obstacle avoidance. Finally, we summarize these methods and present promising directions for robot application and LfD theory.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Argall B D, Chernova S, Veloso M, et al. A survey of robot learning from demonstration. Robotics Autonomous Syst, 2009, 57: 469–483

    Article  Google Scholar 

  2. Billard A, Calinon S, Dillmann R, et al. Survey: Robot programming by demonstration. Handbook of Robotics, 2008, 59

  3. Schaal S. Is imitation learning the route to humanoid robots? Trends Cognitive Sci, 1999, 3: 233–242

    Article  Google Scholar 

  4. Hussein A, Gaber M M, Elyan E, et al. Imitation learning: A survey of learning methods. Acm Comput Surv (CSUR), 2017, 50: 1–35

    Article  Google Scholar 

  5. Arora S, Doshi P. A survey of inverse reinforcement learning: Challenges, methods and progress. ArXiv: 1806.06877

  6. Gao Y, Peters J, Tsourdos A, et al. A survey of inverse reinforcement learning techniques. Int Jnl Intel Comp Cyber, 2012, 5: 293–311

    Article  MathSciNet  Google Scholar 

  7. Argall B, Browning B, Veloso M. Learning by demonstration with critique from a human teacher. In: Proceedings of the IEEE International Conference on Human-Robot Interaction (HRI). 2nd ACM. IEEE, 2007. 57–64

  8. Argall B D, Browning B, Veloso M. Learning robot motion control with demonstration and advice-operators In: Proceedings of the International Conference on Intelligent Robots and Systems. IEEE, 2008. 399–404

  9. Calinon S. Robot Programming by Demonstration. In: Handbook of Robotics. Berlin, Heidelberg: Springer, 2008

    Google Scholar 

  10. Calinon S, Guenter F, Billard A. On learning, representing, and generalizing a task in a humanoid robot. IEEE Trans Syst, 2007, 37: 286–298

    Google Scholar 

  11. Calinon S, Billard A. Incremental learning of gestures by imitation in a humanoid robot. In: Proceedings of the ACM. IEEE International Conference on Human-Robot Interaction. Arlington: 2007. 255–262

  12. Calinon S, Billard A. Active teaching in robot programming by demonstration. In: Proceedings of the RO-MAN 2007-The 16th IEEE International Symposium on Robot and Human Interactive Communication. IEEE, 2007. 702–707

  13. Ijspeert A J, Nakanishi J, Schaal S. Movement imitation with nonlinear dynamical systems in humanoid robots. In: Proceedings of the IEEE International Conference on Robotics and Automation (Cat. No. 02CH37292). IEEE, 2002. 2: 1398–1403

  14. Peters J, Schaal S. Reinforcement learning of motor skills with policy gradients. Neural Networks, 2008, 21: 682–697

    Article  Google Scholar 

  15. Guenter F, Hersch M, Calinon S, et al. Reinforcement learning for imitating constrained reaching movements. Adv Robotics, 2007, 21: 1521–1544

    Article  Google Scholar 

  16. Schaal S, Mohajerian P, Ijspeert A. Dynamics systems vs. optimal controlla unifying view. Prog Brain Res, 2007, 165: 425–445

    Article  Google Scholar 

  17. Ijspeert A J, Nakanishi J, Schaal S. Learning attractor landscapes for learning motor primitives. In: Advances in Neural Information Processing Systems. Vancouver, 2003. 1547–1554

  18. Schaal S, Peters J, Nakanishi J, et al. Learning movement primitives. Robotics Research. In: the Eleventh International Symposium. Berlin, Heidelberg: Springer, 2005. 561–572

    Google Scholar 

  19. Ijspeert A J, Nakanishi J, Hoffmann H, et al. Dynamical movement primitives: Learning attractor models for motor behaviors. Neural Comput, 2013, 25: 328–373

    Article  MathSciNet  Google Scholar 

  20. Schaal S, Ijspeert A, Billard A. Computational approaches to motor learning by imitation. Phil Trans R Soc Lond B, 2003, 358: 537–547

    Article  Google Scholar 

  21. Fang B, Jia S, Guo D, et al. Survey of imitation learning for robotic manipulation. Int Jour Int Rot App, 2019 3: 362C369

    Google Scholar 

  22. Ahmed H, Mohamed M G, Eyad E, et al. Imitation learning: A survey of learning methods. ACM Computing Surveys, 2017, 50: 1–35

    Google Scholar 

  23. Billard A, Epars Y, Calinon S, et al. Discovering optimal imitation strategies. Robotics Autonomous Syst, 2004, 47: 69–77

    Article  Google Scholar 

  24. Billard A G, Calinon S, Guenter F. Discriminative and adaptive imitation in uni-manual and bi-manual tasks. Robotics Autonomous Syst, 2006, 54: 370–384

    Article  Google Scholar 

  25. Rabiner L R. A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE, 1989, 77: 257–286

    Article  Google Scholar 

  26. Inamura T, Toshima I, Tanie H, et al. Embodied symbol emergence based on mimesis theory. Int J Robotics Res, 2004, 23: 363–377

    Article  Google Scholar 

  27. Kulic D, Takano W, Nakamura Y. Incremental learning, clustering and hierarchy formation of whole body motion patterns using adaptive hidden Markov chains. Int J Robotics Res, 2008, 27: 761–784

    Article  Google Scholar 

  28. Takano W, Yamane K, Sugihara T, et al. Primitive communication based on motion recognition and generation with hierarchical mimesis model. In: Proceedings of the IEEE International Conference on Robotics and Automation. IEEE, 2006. 3602–3609

  29. Takano W, Yamane K, Nakamura Y. Primitive communication of humanoid robot with human via hierarchical mimesis model on the proto symbol space. In: Proceedings of the 5th IEEE-RAS International Conference on Humanoid Robots. IEEE, 2005. 167–174

  30. Ghahramani Z, Jordan M I. Factorial hidden Markov models. In: Advances in Neural Information Processing Systems. Denver, 1996. 472–478

  31. Kulic D, Takano W, Nakamura Y. Representability of human motions by factorial hidden markov models. In: International Conference on Intelligent Robots and Systems. IEEE, 2007. 2388–2393

  32. Kulic D, Takano W, Nakamura Y. Incremental on-line hierarchical clustering of whole body motion patterns. In: RO-MAN 2007-The 16th IEEE International Symposium on Robot and Human Interactive Communication. IEEE, 2007. 1016–1021

  33. Lee D, Ott C, Nakamura Y. Mimetic communication model with compliant physical contact in human-humanoid interaction. Int J Robotics Res, 2010, 29: 1684–1704

    Article  Google Scholar 

  34. Lee D, Nakamura Y. Mimesis model from partial observations for a humanoid robot. Int J Robotics Res, 2010, 29: 60–80

    Article  Google Scholar 

  35. Lee D, Nakamura Y. Mimesis from partial observations. In: Proceedings of the International Conference on Intelligent Robots and Systems. IEEE, 2005. 3758–3763

  36. Asfour T, Azad P, Gyarfas F, et al. Imitation learning of dual-arm manipulation tasks in humanoid robots. Int J Human Robot, 2008, 05: 183–202

    Article  Google Scholar 

  37. Calinon S, Billard A G. What is the teacher’s role in robot programming by demonstration? Toward benchmarks for improved learning. Interaction Studies, 2007, 8: 441–464

    Article  Google Scholar 

  38. Cederborg T, Li M, Baranes A, et al. Incremental local online Gaussian mixture regression for imitation learning of multiple tasks. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots Systems. IEEE, 2010

  39. Ijspeert A J, Nakanishi J, Schaal S. Trajectory formation for imitation with nonlinear dynamical systems. In: Proceedings of the International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No. 01CH37180). IEEE, 2001. 2: 752–757

  40. Vecerik M, Hester T, Scholz J, et al. Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards. ArXiv: 1707.08817

  41. Nair A, McGrew B, Andrychowicz M, et al. Overcoming exploration in reinforcement learning with demonstrations. In: Proceedings of the International Conference on Robotics and Automation (ICRA). IEEE, 2018. 6292–6299

  42. Bojarski M, Del Testa D, Dworakowski D, et al. End to end learning for self-driving cars. ArXiv: 1604.07316

  43. Kappler D, Pastor P, Kalakrishnan M, et al. Data-driven online decision making for autonomous manipulation. In: Robotics: Science and Systems. Rome, 2015

  44. Pastor P, Kalakrishnan M, Chitta S, et al. Skill learning and task outcome prediction for manipulation. In: Proceedings of the International Conference on Robotics and Automation. IEEE, 2011. 3828–3834

  45. Pastor P, Righetti L, Kalakrishnan M, et al. Online movement adaptation based on previous sensor experiences. In: Proceedings of the International Conference on Intelligent Robots and Systems. IEEE, 2011. 365–371

  46. Pastor P, Kalakrishnan M, Righetti L, et al. Towards associative skill memories. In: Proceedings of the 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2012). IEEE, 2012. 309–315

  47. Christopher G A, Andrew W M, Stefan S. Locally weighted learning for control. Artifi Intell Rev, 1997, 11: 75–113

    Article  Google Scholar 

  48. Schaal, S, Atkeson, C. Constructive Incremental learning from only local information. Neural Comput, 1998, 10: 2047–2084

    Article  Google Scholar 

  49. Vijayakumar S, D’Souza A, Schaal S. Incremental online learning in high dimensions. Neural Comput, 2005, 17: 2602–2634

    Article  MathSciNet  Google Scholar 

  50. Jara-Ettinger J. Theory of mind as inverse reinforcement learning. Cur Opi in Beh Sci, 2019, 29: 105–110

    Google Scholar 

  51. Ng A Y, Russell S J. Algorithms for inverse reinforcement learning. In: Proceedings of the Seventeenth International Conference on Machine Learning (ICML00). San Francisco: Morgan Kaufmann Publishers Inc., 2000. 663–670

    Google Scholar 

  52. Wulfmeier M, Ondruska P, Posner I. Deep inverse reinforcement learning. ArXiv: 1507.04888

  53. Coates A, Abbeel P, Ng A Y. Apprenticeship learning for helicopter control. Commun ACM, 2009, 52: 97–105

    Article  Google Scholar 

  54. Ratliff N D, Bagnell J A, Zinkevich M A. Maximum margin planning. In: Proceedings of the 23rd International Conference on Machine Learning. New York, 2006. 729–736

  55. Klein E, Geist M, Piot B, et al. Inverse reinforcement learning through structured classification. In: Advances in Neural Information Processing Systems. 2012. 1007–1015

  56. Lin J L, Hwang K S, Shi H, et al. An ensemble method for inverse reinforcement learning. Inf Sci, 2020, 512: 518–532

    Article  Google Scholar 

  57. Klein E, Piot B, Geist M, et al. Structured classification for inverse reinforcement learning. In: Proceedings of the European Workshop on Reinforcement Learning. Edinburgh, 2013. 1–14

  58. Ziebart B D, Maas A L, Bagnell J A, et al. Maximum entropy inverse reinforcement learning. In: Proceedings of the 23rd National Conference on Artificial Intelligence. Chicago: AAAI Press, 2008. 1433C1438

    Google Scholar 

  59. Halperin I. Inverse reinforcement learning for marketing. ArXiv: 1712.04612

  60. Boularias A, Kober J, Peters J. Relative entropy inverse reinforcement learning. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics. Freiburg, 2011. 182–189

  61. Ramachandran D, Amir E. Bayesian inverse reinforcement learning. IJCAI, 2007, 7: 2586–2591

    Google Scholar 

  62. Choi J, Kim K E. Hierarchical bayesian inverse reinforcement learning. IEEE Trans Cybernet, 2014, 45: 793–805

    Article  Google Scholar 

  63. Michini B, How J P. Bayesian nonparametric inverse reinforcement learning. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Berlin, Heidelberg: Springer, 2012. 148–163

    Chapter  Google Scholar 

  64. Rothkopf C A, Dimitrakakis C. Preference elicitation and inverse reinforcement learning. In: Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Berlin, Heidelberg: Springer, 2011. 34–48

    Chapter  Google Scholar 

  65. Qiao Q, Beling P A. Inverse reinforcement learning with Gaussian process. In: Proceedings of the 2011 American Control Conference. IEEE, 2011. 113–118

  66. Da Silva V F, Costa A H R, Lima P. Inverse reinforcement learning with evaluation. In: Proceedings of the IEEE International Conference on Robotics and Automation. Montreal, 2006. 4246–4251

  67. Amin K, Jiang N, Singh S. Repeated inverse reinforcement learning. In: Advances in Neural Information Processing Systems. Long Beach, 2017. 1815–1824

  68. Hadfield-Menell D, Russell S J, Abbeel P, et al. Cooperative inverse reinforcement learning. In: Advances in Neural Information Processing Systems. Barcelona SPAIN, 2016. 3909–3917

  69. Zhang X, Zhang K, Miehling E, et al. Non-cooperative inverse reinforcement learning. In: Advances in Neural Information Processing Systems. Vancouver, 2019. 9482–9493

  70. Chen R, Wang W, Zhao Z, et al. Active learning for risk-sensitive inverse reinforcement learning. ArXiv: 1909.07843

  71. Abbeel P, Coates A, Ng A Y. Autonomous helicopter aerobatics through apprenticeship learning. Int J Robotics Res, 2010, 29: 1608–1639

    Article  Google Scholar 

  72. Abbeel P, Ng A Y. Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the Twenty-first International Conference on Machine Learning. New York, 2004

  73. Natarajan S, Kunapuli G, Judah K, et al. Multi-agent inverse reinforcement learning. In: Proceedings of the Ninth International Conference on Machine Learning and Applications. IEEE, 2010. 395–400

  74. Amiranashvili A, Dosovitskiy A, Koltun V, et al. Motion perception in reinforcement learning with dynamic objects. ArXiv: 1901.03162

  75. Babes M, Marivate V, Subramanian K, et al. Apprenticeship learning about multiple intentions. In: Proceedings of the 28th International Conference on Machine Learning (ICML-11). Madison, 2011. 897–904

  76. Xin L, Li S E, Wang P, et al. Accelerated inverse reinforcement learning with randomly pre-sampled policies for autonomous driving reward design. In: Proceedings of the Intelligent Transportation Systems Conference (ITSC). IEEE, 2019. 2757–2764

  77. Xie X, Li C, Zhang C, et al. Learning virtual grasp with failed demonstrations via bayesian inverse reinforcement learning. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, 2019

  78. Finn C, Levine S, Abbeel P. Guided cost learning: Deep inverse optimal control via policy optimization. In: Proceedings of the International Conference on Machine Learning. New York, 2016. 49–58

  79. Kalakrishnan M, Pastor P, Righetti L, et al. Learning objective functions for manipulation. In: Proceedings of the IEEE International Conference on Robotics and Automation. IEEE, 2013. 1331–1336

  80. Tolstaya E, Ribeiro A, Kumar V, et al. Inverse optimal planning for air traffic control. ArXiv: 1903.10525

  81. Osogami T, Raymond R. Map matching with inverse reinforcement learning. In: Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence. Beijing, 2013

  82. Pietquin O. Inverse reinforcement learning for interactive systems. In: Proceedings of the 2nd Workshop on Machine Learning for Interactive Systems: Bridging the Gap Between Perception, Action and Communication. New York, 2013. 71–75

  83. Kishikawa D, Arai S. Comfortable driving by using deep inverse reinforcement Learning. In: Proceedings of the International Conference on Agents (ICA). IEEE, 2019. 38–43

  84. Rosbach S, James V, Grobjohann S, et al. Driving with style: Inverse reinforcement learning in general-purpose planning for automated driving. ArXiv: 1905.00229

  85. Wulfmeier M, Rao D, Wang D Z, et al. Large-scale cost function learning for path planning using deep inverse reinforcement learning. Int J Robotics Res, 2017, 36: 1073–1087

    Article  Google Scholar 

  86. Wulfmeier M, Wang D Z, Posner I. Watch this: Scalable cost-function learning for path planning in urban environments. In: Proceedings of the International Conference on Intelligent Robots and Systems (IROS). IEEE, 2016. 2089–2095

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to ZaiNan Jiang.

Additional information

This work was supported by the National Natural Science Foundation of China (Grant No. 91848202), and the Foundation for Innovative Research Groups of the National Natural Science Foundation of China (Grant No. 51521003).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xie, Z., Zhang, Q., Jiang, Z. et al. Robot learning from demonstration for path planning: A review. Sci. China Technol. Sci. 63, 1325–1334 (2020). https://doi.org/10.1007/s11431-020-1648-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11431-020-1648-4

Keywords

Navigation