Primitive-contrastive network: data-efficient self-supervised learning from robot demonstration videos

Sun, Pengfei; Yang, Zhile; Zhang, Tianren; Guo, Shangqi; Chen, Feng

doi:10.1007/s10489-021-02527-8

Primitive-contrastive network: data-efficient self-supervised learning from robot demonstration videos

Published: 20 July 2021

Volume 52, pages 4258–4273, (2022)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Pengfei Sun^1,2,3,
Zhile Yang^1,2,3,
Tianren Zhang^1,2,3,
Shangqi Guo^1,2,3 &
…
Feng Chen ORCID: orcid.org/0000-0003-4813-2494^1,2,3

676 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Due to the costly collection of expert demonstrations for robots, robot imitation learning suffers from the demonstration-insufficiency problem. A promising solution to this problem is self-supervised learning that leverages pretext tasks to extract general and high-level features from a relatively small amount of data. Since imitation learning tasks are typically composed of primitives (e.g., primary skills, such as grasping and reaching), learning representations of these primitives is crucial. However, existing methods have a weak ability to represent primitive, leading to unsatisfactory generalizability to learning scenarios with few data. To address this problem, we propose a novel primitive-contrastive network (PCN) and pretext task that optimizes the distances between pseudo-primitive distributions as a learning objective. Experimental results show that the proposed PCN can learn a more discriminative embedding space of primitives than existing self-supervised learning methods. Four representative robot manipulation experiments are conducted to demonstrate the superior data efficiency of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 9

DexMV: Imitation Learning for Dexterous Manipulation from Human Videos

Deep latent-space sequential skill chaining from incomplete demonstrations

Article 01 March 2022

V2A - Vision to Action: Learning Robotic Arm Actions Based on Vision and Language

References

Cheng G, Ramirez-Amaro K, Beetz M, Kuniyoshi Y (2019) Purposive learning: Robot reasoning about the meanings of human activities. Sci Robot 4(26)
Morimoto J (2017) Soft humanoid motor learning. Sci Robot 2(13)
Bonardi A, James S, Davison A J (2020) Learning one-shot imitation from humans without humans. IEEE Robot Autom Lett 5(2):3533–3539
Article Google Scholar
Jing L, Tian Y (2020) Self-supervised visual feature learning with deep neural networks: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
Sermanet P, Lynch C, Hsu J, Levine S (2017) Time-contrastive networks: Self-supervised learning from multi-view observation. In: Comput Vis Pattern Recogn Worksh:486–487
Lee M A, Zhu Y, Srinivasan K, Shah P, Savarese S, Fei-Fei L, Garg A, Bohg J (2019) Making sense of vision and touch: Self-supervised learning of multimodal representations for contact-rich tasks. In: International Conference on Robotics and Automation (ICRA). IEEE, pp 8943–8950
Lin J F-S, Karg M, Kulić D (2016) Movement primitive segmentation for human motion modeling: A framework for analysis. IEEE Trans Human-Mach Syst 46(3):325–339
Article Google Scholar
Li F, Jiang Q, Zhang S, Wei M, Song R (2019) Robot skill acquisition in assembly process using deep reinforcement learning. Neurocomputing 345:92–102
Article Google Scholar
Lu G, Zhang X, Ouyang W, Chen L, Gao Z, Xu D (2020) An end-to-end learning framework for video compression. IEEE transactions on pattern analysis and machine intelligence (TPAMI)
Park D, Hoshi Y, Kemp C C (2018) A multimodal anomaly detector for robot-assisted feeding using an lstm-based variational autoencoder. IEEE Robot Autom Lett 3(3):1544–1551
Article Google Scholar
Laskin M, Srinivas A, Abbeel P (2020) Curl: Contrastive unsupervised representations for reinforcement learning. In: International Conference on Machine Learning. PMLR, pp 5639–5650
Zhou S, Wang J, Meng D, Liang Y, Gong Y, Zheng N (2019) Discriminative feature learning with foreground attention for person re-identification. IEEE Trans Image Process 28(9):4671–4684
Article MathSciNet Google Scholar
Chen J, Yang X, Jia Q, Liao C (2020) Denao: Monocular depth estimation network with auxiliary optical flow. IEEE transactions on pattern analysis and machine intelligence (TPAMI)
Ibarz J, Tan J, Finn C, Kalakrishnan M, Pastor P, Levine S (2021) How to train your robot with deep reinforcement learning: lessons we have learned. Int J Robot Res:0278364920987859
Higgins I, Matthey L, Pal A, Burgess C, Glorot X, Botvinick M, Mohamed S, Lerchner A (2016) beta-vae: Learning basic visual concepts with a constrained variational framework
Hussein A, Gaber M M, Elyan E, Jayne C (2017) Imitation learning: A survey of learning methods. ACM Comput Surv (CSUR) 50(2):1–35
Article Google Scholar
Sasaki F, Yohira T, Kawaguchi A (2020) Adversarial behavioral cloning. Adv Robot 34 (9):592–598
Article Google Scholar
Levine S, Finn C, Darrell T, Abbeel P (2016) End-to-end training of deep visuomotor policies. J Mach Learn Res 17(1):1334–1373
MathSciNet MATH Google Scholar
Xiang G, Su J (2019) Task-oriented deep reinforcement learning for robotic skill acquisition and control. IEEE transactions on cybernetics
Finn C, Levine S, Abbeel P (2016) Guided cost learning: Deep inverse optimal control via policy optimization. In: International conference on machine learning. PMLR, pp 49–58
Strudel R, Pashevich A, Kalevatykh I, Laptev I, Sivic J, Schmid C (2020) Learning to combine primitive skills: A step towards versatile robotic manipulation. In: International Conference on Robotics and Automation. IEEE, pp 4637–4643
Aein M J, Aksoy E E, Wörgötter F (2019) Library of actions: Implementing a generic robot execution framework by using manipulation action semantics. Int J Robot Res 38(8):910– 934
Article Google Scholar
Lioutikov R, Neumann G, Maeda G, Peters J (2017) Learning movement primitive libraries through probabilistic segmentation. Int J Robot Res 36(8):879–894
Article Google Scholar
Gutzeit L, Fabisch A, Petzoldt C, Wiese H, Kirchner F (2019) Automated robot skill learning from demonstration for various robot systems. In: Joint German/Austrian Conference on Artificial Intelligence. Springer, pp 168–181
Tanaka Y (2017) Velocity-based robotic assistance for refining motor skill training in a complex target-hitting task using a bio-mimetic trajectory generation model: A pilot study. Robot Auton Syst 92:152–161
Article Google Scholar
Yang C, Zeng C, Cong Y, Wang N, Wang M (2018) A learning framework of adaptive manipulative skills from human to robot. IEEE Trans Ind Inf 15(2):1153–1161
Article Google Scholar
Krishnan S, Garg A, Patil S, Lea C, Hager G, Abbeel P, Goldberg K (2017) Transition state clustering: Unsupervised surgical trajectory segmentation for robot learning. Int J Robot Res 36 (13-14):1595–1618
Article Google Scholar
Pignat E, Calinon S (2019) Bayesian gaussian mixture model for robotic policy imitation. IEEE Robot Autom Lett 4(4):4452– 4458
Article Google Scholar
Wang X, Gao L, Wang P, Sun X, Liu X (2017) Two-stream 3-d convnet fusion for action recognition in videos with arbitrary size and length. IEEE Trans Multimed 20(3):634–644
Article Google Scholar
Kaehler A, Bradski G (2016) Learning opencv 3: computer vision in c++ with the opencv library. O’Reilly Media, Inc.
Wang H, Sun Y, Liu M (2019) Self-supervised drivable area and road anomaly segmentation using rgb-d data for robotic wheelchairs. IEEE Robot Autom Lett 4(4):4386– 4393
Article Google Scholar
Fränti P, Sieranoja S (2019) How much can k-means be improved by using better initialization and repeats?. Pattern Recogn 93:95–112
Article Google Scholar
Xie J, Girshick R, Farhadi A (2016) Unsupervised deep embedding for clustering analysis. In: International conference on machine learning. PMLR, pp 478–487
Sermanet P, Xu K, Levine S (2017) Unsupervised perceptual rewards for imitation learning. In: Robotics: Science and Systems, vol 13
Sharma P, Mohan L, Pinto L, Gupta A (2018) Multiple interactions made easy (mime): Large scale demonstrations data for imitation. In: Conference on robot learning. PMLR, pp 906–915
Ren C-X, Ge P, Yang P, Yan S (2020) Learning target-domain-specific classifier for partial domain adaptation. IEEE Transactions on Neural Networks and Learning Systems
Kobayashi T (2020) q-vae for disentangled representation learning and latent dynamical systems. In: IEEE Robotics and Automation Letters, vol 5, pp 5669–5676
James S, Ma Z, Arrojo D R, Davison A J (2020) Rlbench: The robot learning benchmark & learning environment. IEEE Robot Autom Lett 5(2):3019–3026
Article Google Scholar
Selvaraju R R, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2020) Grad-cam: Visual explanations from deep networks via gradient-based localization. Int J Comput Vis 128(2):336– 359
Article Google Scholar
Li J, Liu X, Zhang M, Wang D (2020) Spatio-temporal deformable 3d convnets with attention for action recognition. Pattern Recogn 98:107037
Article Google Scholar
Ge P, Ren C-X, Dai D-Q, Feng J, Yan S (2019) Dual adversarial autoencoders for clustering. IEEE Trans Neural Netw Learn Syst 31(4):1417–1424
Article MathSciNet Google Scholar
Jeon W, Seo S, Kim K-E (2018) A bayesian approach to generative adversarial imitation learning. In: NeurIPS, pp 7440–7450
Sun J, Yu L, Dong P, Lu B, Zhou B (2021) Adversarial inverse reinforcement learning with self-attention dynamics model. IEEE Robotics and Automation Letters

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 61671266, Grant 61836004, Grant U19B2034, and Grant 61836014 and in part by the Tsinghua-Guoqiang research program under Grant 2019GQG006.

Author information

Authors and Affiliations

Department of Automation, Tsinghua University, Beijing, 100086, China
Pengfei Sun, Zhile Yang, Tianren Zhang, Shangqi Guo & Feng Chen
Beijing Innovation Center for Future Chip, Beijing, 100086, China
Pengfei Sun, Zhile Yang, Tianren Zhang, Shangqi Guo & Feng Chen
LSBDPA Beijing Key Laboratory, Beijing, 100084, China
Pengfei Sun, Zhile Yang, Tianren Zhang, Shangqi Guo & Feng Chen

Authors

Pengfei Sun
View author publications
You can also search for this author in PubMed Google Scholar
Zhile Yang
View author publications
You can also search for this author in PubMed Google Scholar
Tianren Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Shangqi Guo
View author publications
You can also search for this author in PubMed Google Scholar
Feng Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Feng Chen.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sun, P., Yang, Z., Zhang, T. et al. Primitive-contrastive network: data-efficient self-supervised learning from robot demonstration videos. Appl Intell 52, 4258–4273 (2022). https://doi.org/10.1007/s10489-021-02527-8

Download citation

Accepted: 10 May 2021
Published: 20 July 2021
Issue Date: March 2022
DOI: https://doi.org/10.1007/s10489-021-02527-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Primitive-contrastive network: data-efficient self-supervised learning from robot demonstration videos

Abstract

Access this article

Similar content being viewed by others

DexMV: Imitation Learning for Dexterous Manipulation from Human Videos

Deep latent-space sequential skill chaining from incomplete demonstrations

V2A - Vision to Action: Learning Robotic Arm Actions Based on Vision and Language

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Primitive-contrastive network: data-efficient self-supervised learning from robot demonstration videos

Abstract

Access this article

Similar content being viewed by others

DexMV: Imitation Learning for Dexterous Manipulation from Human Videos

Deep latent-space sequential skill chaining from incomplete demonstrations

V2A - Vision to Action: Learning Robotic Arm Actions Based on Vision and Language

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation