Skip to main content
Log in

Primitive-contrastive network: data-efficient self-supervised learning from robot demonstration videos

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Due to the costly collection of expert demonstrations for robots, robot imitation learning suffers from the demonstration-insufficiency problem. A promising solution to this problem is self-supervised learning that leverages pretext tasks to extract general and high-level features from a relatively small amount of data. Since imitation learning tasks are typically composed of primitives (e.g., primary skills, such as grasping and reaching), learning representations of these primitives is crucial. However, existing methods have a weak ability to represent primitive, leading to unsatisfactory generalizability to learning scenarios with few data. To address this problem, we propose a novel primitive-contrastive network (PCN) and pretext task that optimizes the distances between pseudo-primitive distributions as a learning objective. Experimental results show that the proposed PCN can learn a more discriminative embedding space of primitives than existing self-supervised learning methods. Four representative robot manipulation experiments are conducted to demonstrate the superior data efficiency of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Cheng G, Ramirez-Amaro K, Beetz M, Kuniyoshi Y (2019) Purposive learning: Robot reasoning about the meanings of human activities. Sci Robot 4(26)

  2. Morimoto J (2017) Soft humanoid motor learning. Sci Robot 2(13)

  3. Bonardi A, James S, Davison A J (2020) Learning one-shot imitation from humans without humans. IEEE Robot Autom Lett 5(2):3533–3539

    Article  Google Scholar 

  4. Jing L, Tian Y (2020) Self-supervised visual feature learning with deep neural networks: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

  5. Sermanet P, Lynch C, Hsu J, Levine S (2017) Time-contrastive networks: Self-supervised learning from multi-view observation. In: Comput Vis Pattern Recogn Worksh:486–487

  6. Lee M A, Zhu Y, Srinivasan K, Shah P, Savarese S, Fei-Fei L, Garg A, Bohg J (2019) Making sense of vision and touch: Self-supervised learning of multimodal representations for contact-rich tasks. In: International Conference on Robotics and Automation (ICRA). IEEE, pp 8943–8950

  7. Lin J F-S, Karg M, Kulić D (2016) Movement primitive segmentation for human motion modeling: A framework for analysis. IEEE Trans Human-Mach Syst 46(3):325–339

    Article  Google Scholar 

  8. Li F, Jiang Q, Zhang S, Wei M, Song R (2019) Robot skill acquisition in assembly process using deep reinforcement learning. Neurocomputing 345:92–102

    Article  Google Scholar 

  9. Lu G, Zhang X, Ouyang W, Chen L, Gao Z, Xu D (2020) An end-to-end learning framework for video compression. IEEE transactions on pattern analysis and machine intelligence (TPAMI)

  10. Park D, Hoshi Y, Kemp C C (2018) A multimodal anomaly detector for robot-assisted feeding using an lstm-based variational autoencoder. IEEE Robot Autom Lett 3(3):1544–1551

    Article  Google Scholar 

  11. Laskin M, Srinivas A, Abbeel P (2020) Curl: Contrastive unsupervised representations for reinforcement learning. In: International Conference on Machine Learning. PMLR, pp 5639–5650

  12. Zhou S, Wang J, Meng D, Liang Y, Gong Y, Zheng N (2019) Discriminative feature learning with foreground attention for person re-identification. IEEE Trans Image Process 28(9):4671–4684

    Article  MathSciNet  Google Scholar 

  13. Chen J, Yang X, Jia Q, Liao C (2020) Denao: Monocular depth estimation network with auxiliary optical flow. IEEE transactions on pattern analysis and machine intelligence (TPAMI)

  14. Ibarz J, Tan J, Finn C, Kalakrishnan M, Pastor P, Levine S (2021) How to train your robot with deep reinforcement learning: lessons we have learned. Int J Robot Res:0278364920987859

  15. Higgins I, Matthey L, Pal A, Burgess C, Glorot X, Botvinick M, Mohamed S, Lerchner A (2016) beta-vae: Learning basic visual concepts with a constrained variational framework

  16. Hussein A, Gaber M M, Elyan E, Jayne C (2017) Imitation learning: A survey of learning methods. ACM Comput Surv (CSUR) 50(2):1–35

    Article  Google Scholar 

  17. Sasaki F, Yohira T, Kawaguchi A (2020) Adversarial behavioral cloning. Adv Robot 34 (9):592–598

    Article  Google Scholar 

  18. Levine S, Finn C, Darrell T, Abbeel P (2016) End-to-end training of deep visuomotor policies. J Mach Learn Res 17(1):1334–1373

    MathSciNet  MATH  Google Scholar 

  19. Xiang G, Su J (2019) Task-oriented deep reinforcement learning for robotic skill acquisition and control. IEEE transactions on cybernetics

  20. Finn C, Levine S, Abbeel P (2016) Guided cost learning: Deep inverse optimal control via policy optimization. In: International conference on machine learning. PMLR, pp 49–58

  21. Strudel R, Pashevich A, Kalevatykh I, Laptev I, Sivic J, Schmid C (2020) Learning to combine primitive skills: A step towards versatile robotic manipulation. In: International Conference on Robotics and Automation. IEEE, pp 4637–4643

  22. Aein M J, Aksoy E E, Wörgötter F (2019) Library of actions: Implementing a generic robot execution framework by using manipulation action semantics. Int J Robot Res 38(8):910– 934

    Article  Google Scholar 

  23. Lioutikov R, Neumann G, Maeda G, Peters J (2017) Learning movement primitive libraries through probabilistic segmentation. Int J Robot Res 36(8):879–894

    Article  Google Scholar 

  24. Gutzeit L, Fabisch A, Petzoldt C, Wiese H, Kirchner F (2019) Automated robot skill learning from demonstration for various robot systems. In: Joint German/Austrian Conference on Artificial Intelligence. Springer, pp 168–181

  25. Tanaka Y (2017) Velocity-based robotic assistance for refining motor skill training in a complex target-hitting task using a bio-mimetic trajectory generation model: A pilot study. Robot Auton Syst 92:152–161

    Article  Google Scholar 

  26. Yang C, Zeng C, Cong Y, Wang N, Wang M (2018) A learning framework of adaptive manipulative skills from human to robot. IEEE Trans Ind Inf 15(2):1153–1161

    Article  Google Scholar 

  27. Krishnan S, Garg A, Patil S, Lea C, Hager G, Abbeel P, Goldberg K (2017) Transition state clustering: Unsupervised surgical trajectory segmentation for robot learning. Int J Robot Res 36 (13-14):1595–1618

    Article  Google Scholar 

  28. Pignat E, Calinon S (2019) Bayesian gaussian mixture model for robotic policy imitation. IEEE Robot Autom Lett 4(4):4452– 4458

    Article  Google Scholar 

  29. Wang X, Gao L, Wang P, Sun X, Liu X (2017) Two-stream 3-d convnet fusion for action recognition in videos with arbitrary size and length. IEEE Trans Multimed 20(3):634–644

    Article  Google Scholar 

  30. Kaehler A, Bradski G (2016) Learning opencv 3: computer vision in c++ with the opencv library. O’Reilly Media, Inc.

  31. Wang H, Sun Y, Liu M (2019) Self-supervised drivable area and road anomaly segmentation using rgb-d data for robotic wheelchairs. IEEE Robot Autom Lett 4(4):4386– 4393

    Article  Google Scholar 

  32. Fränti P, Sieranoja S (2019) How much can k-means be improved by using better initialization and repeats?. Pattern Recogn 93:95–112

    Article  Google Scholar 

  33. Xie J, Girshick R, Farhadi A (2016) Unsupervised deep embedding for clustering analysis. In: International conference on machine learning. PMLR, pp 478–487

  34. Sermanet P, Xu K, Levine S (2017) Unsupervised perceptual rewards for imitation learning. In: Robotics: Science and Systems, vol 13

  35. Sharma P, Mohan L, Pinto L, Gupta A (2018) Multiple interactions made easy (mime): Large scale demonstrations data for imitation. In: Conference on robot learning. PMLR, pp 906–915

  36. Ren C-X, Ge P, Yang P, Yan S (2020) Learning target-domain-specific classifier for partial domain adaptation. IEEE Transactions on Neural Networks and Learning Systems

  37. Kobayashi T (2020) q-vae for disentangled representation learning and latent dynamical systems. In: IEEE Robotics and Automation Letters, vol 5, pp 5669–5676

  38. James S, Ma Z, Arrojo D R, Davison A J (2020) Rlbench: The robot learning benchmark & learning environment. IEEE Robot Autom Lett 5(2):3019–3026

    Article  Google Scholar 

  39. Selvaraju R R, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2020) Grad-cam: Visual explanations from deep networks via gradient-based localization. Int J Comput Vis 128(2):336– 359

    Article  Google Scholar 

  40. Li J, Liu X, Zhang M, Wang D (2020) Spatio-temporal deformable 3d convnets with attention for action recognition. Pattern Recogn 98:107037

    Article  Google Scholar 

  41. Ge P, Ren C-X, Dai D-Q, Feng J, Yan S (2019) Dual adversarial autoencoders for clustering. IEEE Trans Neural Netw Learn Syst 31(4):1417–1424

    Article  MathSciNet  Google Scholar 

  42. Jeon W, Seo S, Kim K-E (2018) A bayesian approach to generative adversarial imitation learning. In: NeurIPS, pp 7440–7450

  43. Sun J, Yu L, Dong P, Lu B, Zhou B (2021) Adversarial inverse reinforcement learning with self-attention dynamics model. IEEE Robotics and Automation Letters

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 61671266, Grant 61836004, Grant U19B2034, and Grant 61836014 and in part by the Tsinghua-Guoqiang research program under Grant 2019GQG006.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Feng Chen.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, P., Yang, Z., Zhang, T. et al. Primitive-contrastive network: data-efficient self-supervised learning from robot demonstration videos. Appl Intell 52, 4258–4273 (2022). https://doi.org/10.1007/s10489-021-02527-8

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02527-8

Keywords

Navigation