NAO Robot Learns to Interact with Humans through Imitation Learning from Video Observation

Alizadeh Kolagar, Seyed Adel; Taheri, Alireza; Meghdari, Ali F.

doi:10.1007/s10846-023-01938-8

NAO Robot Learns to Interact with Humans through Imitation Learning from Video Observation

Regular paper
Published: 18 August 2023

Volume 109, article number 4, (2023)
Cite this article

Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

Seyed Adel Alizadeh Kolagar¹,
Alireza Taheri ORCID: orcid.org/0000-0001-5826-260X¹ &
Ali F. Meghdari^1,2

279 Accesses
2 Citations
Explore all metrics

Abstract

One option for teaching a robot new skills is to use learning from demonstration techniques. While traditional techniques often involve expensive sensors/equipment, advancements in computer vision have made it possible to achieve similar outcomes at a lower cost. To the best of our knowledge, there is no previous research on a robot learning to produce 3D motions from 2D data and then using this knowledge to interact with people. To this end, we designed a study using a NAO robot to imitate human behavior by reproducing motions in 3D space after viewing a small number of 2D RGB videos for each motion. The goal is for the robot to learn certain social interactive skills by learning from video observation and then apply them during human-robot interaction. Five steps were taken to achieve this objective: 1) collecting a dataset, 2) human pose estimation, 3) transferring data from human space to the robot space, 4) robot control, and 5) human-robot interaction. These steps were separated into two phases, robot imitation learning and human-robot social interaction. The majority of the algorithms employed are deep learning-based, achieving ~96% accuracy in the action recognition on our dataset. The results were also promising when implemented on the robot. Overall, this preliminary exploratory study successfully showed the proof of concept for producing 3D motions from 2D data. This approach is noteworthy because of the amount of online training data, the robot can be trained quickly, and it does not require an expert.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Neural End-to-End Self-learning of Visuomotor Skills by Environment Interaction

End-to-End Deep Imitation Learning: Robot Soccer Case Study

DexMV: Imitation Learning for Dexterous Manipulation from Human Videos

Availability of Data and Material (data transparency)

All data from this project (e.g., videos of the sessions) are available in the Social & Cognitive Robotics Laboratory archive.

Code Availability

All the codes are available in the Social & Cognitive Robotics Laboratory archive. In case the readers need the codes, they may contact the corresponding author.

References

Roveda, L., et al.: Model-Based Reinforcement Learning Variable Impedance Control for Human-Robot Collaboration. Journal of Intelligent & Robotic Systems. 100(2), 417–433 (2020). https://doi.org/10.1007/s10846-020-01183-3
Article Google Scholar
Meghdari, A., Alemi, M., Zakipour, M., Kashanian, S.A.: Design and Realization of a Sign Language Educational Humanoid Robot. Journal of Intelligent & Robotic Systems. 95(1), 3–17 (2019). https://doi.org/10.1007/s10846-018-0860-2
Article Google Scholar
Basiri, S., Taheri, A., Meghdari, A., Alemi, M.: Design and Implementation of a Robotic Architecture for Adaptive Teaching: a Case Study on Iranian Sign Language. Journal of Intelligent & Robotic Systems. 102(2), 48 (2021). https://doi.org/10.1007/s10846-021-01413-2
Article Google Scholar
da Silva, I.J., Perico, D.H., Homem, T.P.D., da Costa Bianchi, R.A.: Deep Reinforcement Learning for a Humanoid Robot Soccer Player. Journal of Intelligent & Robotic Systems. 102(3), 69 (2021). https://doi.org/10.1007/s10846-021-01333-1
Article Google Scholar
Hong, A., Igharoro, O., Liu, Y., Niroui, F., Nejat, G., Benhabib, B.: Investigating Human-Robot Teams for Learning-Based Semi-autonomous Control in Urban Search and Rescue Environments. Journal of Intelligent & Robotic Systems. 94(3), 669–686 (2019). https://doi.org/10.1007/s10846-018-0899-0
Article Google Scholar
Ravichandar, H., Polydoros, A.S., Chernova, S., Billard, A.: Recent Advances in Robot Learning from Demonstration. Annual Review of Control, Robotics, and Autonomous Systems. 3, 297–330 (2020). https://doi.org/10.1146/annurev-control-100819-063206
Article Google Scholar
Torabi, F., Warnell, G., Stone, P.: Recent Advances in Imitation Learning from Observation. pp. 6325–6331 (2019)
Calinon, S., Billard, A.: Incremental Learning of Gestures by Imitation in a Humanoid Robot. pp. 255–262 (2007)
Peng, X.B., Abbeel, P., Levine, S., van de Panne, M.: DeepMimic: example-guided deep reinforcement learning of physics-based character skills. ACM Trans. Graph. 37(4), 143 (2018). https://doi.org/10.1145/3197517.3201311
Article Google Scholar
Nair, A., et al.: Combining self-supervised learning and imitation for vision-based rope manipulation. pp. 2146–2153 (2017)
Pavse, B.S., Torabi, F., Hanna, J., Warnell, G., Stone, P.: RIDM: Reinforced Inverse Dynamics Modeling for Learning from a Single Observed Demonstration. IEEE Robotics and Automation Letters. 5(4), 6262–6269 (2020). https://doi.org/10.1109/LRA.2020.3010750
Article Google Scholar
Torabi, F., Warnell, G., Stone, P.: Behavioral cloning from observation, presented at the Proceedings of the 27th International Joint Conference on Artificial Intelligence, Stockholm, Sweden (2018)
Guo, X., Chang, S., Yu, M., Tesauro, G., Campbell, M.: Hybrid reinforcement learning with expert state sequences, presented at the Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, Honolulu, Hawaii, USA, [Online]. (2019) https://doi.org/10.1609/aaai.v33i01.33013739
Edwards, A.D., Sahni, H., Schroecker, Y., Isbell, Jr C.L.: Imitating Latent Policies from Observation. CoRR, vol. abs/1805.07914. [Online]. (2018) Available: http://arxiv.org/abs/1805.07914
Zheng, C., et al.: Deep learning-based human pose estimation: A survey. arXiv preprint arXiv:2012.13392 (2020)
Cao, Z., Hidalgo, G., Simon, T., Wei, S.-E., Sheikh, Y.: OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields. arXiv preprint arXiv:1812.08008 (2018)
Pavlakos, G., Zhou, X., Derpanis, K.G., Daniilidis, K.: Coarse-to-Fine Volumetric Prediction for Single-Image 3D Human Pose. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 21–26 July 2017, pp. 1263–1272. (2017) https://doi.org/10.1109/CVPR.2017.139
Zhao, L., Peng, X., Tian, Y., Kapadia, M., Metaxas, D.N.: Semantic Graph Convolutional Networks for 3D Human Pose Regression. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 15–20 June 2019, pp. 3420–3430. (2019) https://doi.org/10.1109/CVPR.2019.00354
Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-End Recovery of Human Shape and Pose. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 18–23 June 2018, pp. 7122–7131 (2018) https://doi.org/10.1109/CVPR.2018.00744
Kocabas, M., Athanasiou, N., Black, M.J.: VIBE: Video Inference for Human Body Pose and Shape Estimation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 13–19 June 2020, pp. 5252–5262 (2020) https://doi.org/10.1109/CVPR42600.2020.00530
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. 34(6), 248 (2015). https://doi.org/10.1145/2816795.2818013
Article Google Scholar
Pavlakos, G., et al.: Expressive Body Capture: 3D Hands, Face, and Body From a Single Image. pp. 10967–10977 (2019)
Kolotouros, N., Pavlakos, G., Black, M., Daniilidis, K.: Learning to Reconstruct 3D Human Pose and Shape via Model-Fitting in the Loop. pp. 2252–2261 (2019)
Benzine, A., Chabot, F., Luvison, B., Pham, Q., Achard C.: PandaNet: Anchor-Based Single-Shot Multi-Person 3D Pose Estimation. pp. 6855–6864 (2020)
Mehta, D., et al.: XNect: real-time multi-person 3D motion capture with a single RGB camera. ACM Trans. Graph. 39, 82:1–82:17 (2020). https://doi.org/10.1145/3386569.3392410
Article Google Scholar
Zhang, Z., Niu, Y., Yan, Z., Lin, S.: Real-Time Whole-Body Imitation by Humanoid Robots and Task-Oriented Teleoperation Using an Analytical Mapping Method and Quantitative Evaluation. Appl. Sci. 8, 2005 (2018). https://doi.org/10.3390/app8102005
Article Google Scholar
Koenemann, J., Burget, F., Bennewitz, M.: Real-time Imitation of Human Whole-Body Motions by Humanoids. (2014)
Zhang, L., Cheng, Z., Gan, Y., Zhu, G., Shen, P., Song, J.: Fast human whole body motion imitation algorithm for humanoid robots. pp. 1430–1435 (2016)
Shahverdi, P., Masouleh, M.T.: A simple and fast geometric kinematic solution for imitation of human arms by a NAO humanoid robot. In: 2016 4th International Conference on Robotics and Mechatronics (ICROM), 26–28 Oct. 2016 2016, pp. 572–577. https://doi.org/10.1109/ICRoM.2016.7886806
Ren, B., Liu, M., Ding, R., Liu, H.: A survey on 3d skeleton-based action recognition using learning method. arXiv preprint arXiv:2002.05907, (2020)
Wang, L., Huynh, D.Q., Koniusz, P.: A comparative review of recent kinect-based action recognition algorithms. IEEE Trans. Image Process. 29, 15–28 (2019)
Article MathSciNet MATH Google Scholar
Wang, H., Wang L. L., Modeling Temporal Dynamics and Spatial Configurations of Actions Using Two-Stream Recurrent Neural Networks. (2017)
Liu, J., Wang, G., Hu, P., Duan, L., Kot, A.C.: Global Context-Aware Attention LSTM Networks for 3D Action Recognition," in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 21–26 July 2017, pp. 3671–3680, (2017) https://doi.org/10.1109/CVPR.2017.391
Caetano, C., Sena, J., Brémond, F., Dos Santos, J.A., Schwartz, W.R.: Skelemotion: A new representation of skeleton joint sequences based on motion information for 3d action recognition. In: 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), IEEE, pp. 1–8 (2019)
Shahroudy, A., Liu, J., Ng, T.-T., Wang, G.: Ntu rgb+ d: A large scale dataset for 3d human activity analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1010–1019 (2016)
Caetano, C., Brémond, F., Schwartz, W.R.: Skeleton image representation for 3D action recognition based on tree structure and reference joints. In: 2019 32nd SIBGRAPI conference on graphics, patterns and images (SIBGRAPI), IEEE, pp. 16–23 (2019)
Shi, L., Zhang, Y., Cheng, J., Lu, H.: Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 12026–12035 (2019)
Duan, H., Zhao, Y., Chen, K., Shao, D., Lin, D., Dai, B.: Revisiting skeleton-based action recognition. arXiv preprint arXiv:2104.13586, (2021)
Djordjevic, V., Tao, H., Song, X., He, S., Gao, W., Stojanovic, V.: Data-driven control of hydraulic servo actuator: An event-triggered adaptive dynamic programming approach. Math. Biosci. Eng. 20(5), 8561–8582 (2023)
Article MathSciNet Google Scholar
Nedic, N., Stojanovic, V., Djordjevic, V.: Optimal control of hydraulically driven parallel robot platform based on firefly algorithm. Nonlinear Dynamics. 82, 1457–1473 (2015)
Article MathSciNet MATH Google Scholar
Zhou, C., Tao, H., Chen, Y., Stojanovic, V., Paszke, W.: Robust point-to-point iterative learning control for constrained systems: A minimum energy approach. Int J Robust Nonlinear Control 32(18), 10139–10161 (2022)
Article MathSciNet Google Scholar
Taheri, A., Meghdari, A., Mahoor, M.H.: A close look at the imitation performance of children with autism and typically developing children using a robotic system. Int. J. Soc. Robot. 13, 1125–1147 (2021)
Article Google Scholar
Mahmood, N., Ghorbani, N., Troje, N., Pons-Moll, G., Black, M.: AMASS: Archive of Motion Capture As Surface Shapes. pp. 5441–5450 (2019)
Lugaresi, C., et al.: Mediapipe: A framework for building perception pipelines. arXiv preprint arXiv:1906.08172, (2019)
Aldebaran. http://doc.aldebaran.com/. Accessed
W. S. Cleveland and S. J. Devlin, "Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting," J. Am. Stat. Assoc., vol. 83, no. 403, pp. 596–610, 1988/09/01 1988, https://doi.org/10.1080/01621459.1988.10478639
Müller, M.: Dynamic time warping. Information Retrieval for Music and Motion. 2, 69–84 (2007). https://doi.org/10.1007/978-3-540-74048-3_4
Article Google Scholar
Yang, Z., Li, Y., Yang, J., Luo, J.: Action Recognition With Spatio–Temporal Visual Attention on Skeleton Image Sequences. IEEE Transactions on Circuits and Systems for Video Technology. 29(8), 2405–2415 (2019). https://doi.org/10.1109/TCSVT.2018.2864148
Article Google Scholar
Xu, H., Bazavan, E., Zanfir, A., Freeman, W., Sukthankar, R., Sminchisescu, C.: GHUM & GHUML: Generative 3D Human Shape and Articulated Pose Models. pp. 6183–6192 (2020)
Pham, H.H., Salmane, H., Khoudour, L., Crouzil, A., Velastin, S.A., Zegers, P.: A unified deep framework for joint 3d pose estimation and action recognition from a single rgb camera. Sensors. 20(7), 1825 (2020)
Article Google Scholar
Li, W., Zhang, Z., Liu, Z.: Action recognition based on a bag of 3D points. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops, 13–18 June 2010, pp. 9–14, (2010) https://doi.org/10.1109/CVPRW.2010.5543273
Yun, K., Honorio, J., Chattopadhyay, D., Berg, T.L., Samaras, D.: Two-person interaction detection using body-pose features and multiple instance learning. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 16–21 June 2012, pp. 28–35, (2012) https://doi.org/10.1109/CVPRW.2012.6239234
Mazhar, O., Ramdani, S., Navarro, B., Passama, R., Cherubini, A.: Towards Real-Time Physical Human-Robot Interaction Using Skeleton Information and Hand Gestures. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 1–5 Oct. 2018, pp. 1–6, (2018) https://doi.org/10.1109/IROS.2018.8594385
Bandi, C., Thomas, U.: Skeleton-based Action Recognition for Human-Robot Interaction using Self-Attention Mechanism. In: 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021), 15–18 Dec. 2021, pp. 1–8, (2021) https://doi.org/10.1109/FG52635.2021.9666948
Song, Z., et al.: Attention-Oriented Action Recognition for Real- Time Human-Robot Interaction. In: 2020 25th International Conference on Pattern Recognition (ICPR), 10–15 Jan. 2021, pp. 7087–7094, (2021) https://doi.org/10.1109/ICPR48806.2021.9412346

Download references

Acknowledgments

This research was supported by the Sharif University of Technology. The complementary and continued support of the Social & Cognitive Robotics Laboratory by a Dr. Ali Akbar Siassi Memorial Grant is also greatly appreciated. We also thank Mrs. Shari Holderread for the English editing of the final manuscript.

Funding

This research was funded by the Sharif University of Technology (Grant No. G980517).

Author information

Authors and Affiliations

Social and Cognitive Robotics Laboratory, Sharif University of Technology, Tehran, Iran
Seyed Adel Alizadeh Kolagar, Alireza Taheri & Ali F. Meghdari
Chancellor, Fereshtegaan International Branch, Islamic Azad University, Tehran, Iran
Ali F. Meghdari

Authors

Seyed Adel Alizadeh Kolagar
View author publications
You can also search for this author in PubMed Google Scholar
Alireza Taheri
View author publications
You can also search for this author in PubMed Google Scholar
Ali F. Meghdari
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the study’s conception and design. Material preparation, data collection, and analysis were performed by Seyed Adel Alizadeh Kolagar. The first draft of the manuscript was written by Seyed Adel Alizadeh Kolagar. All authors read, revised, and approved the final manuscript. Alireza Taheri defined the project. Alireza Taheri and Ali F. Meghdari supervised the research and provided guidance/expertise in the area of AI and HRI.

Corresponding author

Correspondence to Alireza Taheri.

Ethics declarations

Ethics Approval

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards.

Consent to Participate

Informed consent was obtained from all individual participants included in the study.

Consent for Publication

The authors affirm that the human research participants provided informed consent to publish the images used in all the figures.

Conflict of Interest

Author Alireza Taheri has received research grants from the Sharif University of Technology. The authors, Seyed Adel Alizadeh Kolagar and Ali F. Meghdari, declare no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Alizadeh Kolagar, S.A., Taheri, A. & Meghdari, A.F. NAO Robot Learns to Interact with Humans through Imitation Learning from Video Observation. J Intell Robot Syst 109, 4 (2023). https://doi.org/10.1007/s10846-023-01938-8

Download citation

Received: 12 September 2022
Accepted: 25 July 2023
Published: 18 August 2023
DOI: https://doi.org/10.1007/s10846-023-01938-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

NAO Robot Learns to Interact with Humans through Imitation Learning from Video Observation

Abstract

Access this article

Similar content being viewed by others

Neural End-to-End Self-learning of Visuomotor Skills by Environment Interaction

End-to-End Deep Imitation Learning: Robot Soccer Case Study

DexMV: Imitation Learning for Dexterous Manipulation from Human Videos

Availability of Data and Material (data transparency)

Code Availability

References

Acknowledgments

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics Approval

Consent to Participate

Consent for Publication

Conflict of Interest

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

NAO Robot Learns to Interact with Humans through Imitation Learning from Video Observation

Abstract

Access this article

Similar content being viewed by others

Neural End-to-End Self-learning of Visuomotor Skills by Environment Interaction

End-to-End Deep Imitation Learning: Robot Soccer Case Study

DexMV: Imitation Learning for Dexterous Manipulation from Human Videos

Availability of Data and Material (data transparency)

Code Availability

References

Acknowledgments

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics Approval

Consent to Participate

Consent for Publication

Conflict of Interest

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation