Skip to main content

Advertisement

Log in

NAO Robot Learns to Interact with Humans through Imitation Learning from Video Observation

  • Regular paper
  • Published:
Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

Abstract

One option for teaching a robot new skills is to use learning from demonstration techniques. While traditional techniques often involve expensive sensors/equipment, advancements in computer vision have made it possible to achieve similar outcomes at a lower cost. To the best of our knowledge, there is no previous research on a robot learning to produce 3D motions from 2D data and then using this knowledge to interact with people. To this end, we designed a study using a NAO robot to imitate human behavior by reproducing motions in 3D space after viewing a small number of 2D RGB videos for each motion. The goal is for the robot to learn certain social interactive skills by learning from video observation and then apply them during human-robot interaction. Five steps were taken to achieve this objective: 1) collecting a dataset, 2) human pose estimation, 3) transferring data from human space to the robot space, 4) robot control, and 5) human-robot interaction. These steps were separated into two phases, robot imitation learning and human-robot social interaction. The majority of the algorithms employed are deep learning-based, achieving ~96% accuracy in the action recognition on our dataset. The results were also promising when implemented on the robot. Overall, this preliminary exploratory study successfully showed the proof of concept for producing 3D motions from 2D data. This approach is noteworthy because of the amount of online training data, the robot can be trained quickly, and it does not require an expert.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Availability of Data and Material (data transparency)

All data from this project (e.g., videos of the sessions) are available in the Social & Cognitive Robotics Laboratory archive.

Code Availability

All the codes are available in the Social & Cognitive Robotics Laboratory archive. In case the readers need the codes, they may contact the corresponding author.

References

  1. Roveda, L., et al.: Model-Based Reinforcement Learning Variable Impedance Control for Human-Robot Collaboration. Journal of Intelligent & Robotic Systems. 100(2), 417–433 (2020). https://doi.org/10.1007/s10846-020-01183-3

    Article  Google Scholar 

  2. Meghdari, A., Alemi, M., Zakipour, M., Kashanian, S.A.: Design and Realization of a Sign Language Educational Humanoid Robot. Journal of Intelligent & Robotic Systems. 95(1), 3–17 (2019). https://doi.org/10.1007/s10846-018-0860-2

    Article  Google Scholar 

  3. Basiri, S., Taheri, A., Meghdari, A., Alemi, M.: Design and Implementation of a Robotic Architecture for Adaptive Teaching: a Case Study on Iranian Sign Language. Journal of Intelligent & Robotic Systems. 102(2), 48 (2021). https://doi.org/10.1007/s10846-021-01413-2

    Article  Google Scholar 

  4. da Silva, I.J., Perico, D.H., Homem, T.P.D., da Costa Bianchi, R.A.: Deep Reinforcement Learning for a Humanoid Robot Soccer Player. Journal of Intelligent & Robotic Systems. 102(3), 69 (2021). https://doi.org/10.1007/s10846-021-01333-1

    Article  Google Scholar 

  5. Hong, A., Igharoro, O., Liu, Y., Niroui, F., Nejat, G., Benhabib, B.: Investigating Human-Robot Teams for Learning-Based Semi-autonomous Control in Urban Search and Rescue Environments. Journal of Intelligent & Robotic Systems. 94(3), 669–686 (2019). https://doi.org/10.1007/s10846-018-0899-0

    Article  Google Scholar 

  6. Ravichandar, H., Polydoros, A.S., Chernova, S., Billard, A.: Recent Advances in Robot Learning from Demonstration. Annual Review of Control, Robotics, and Autonomous Systems. 3, 297–330 (2020). https://doi.org/10.1146/annurev-control-100819-063206

    Article  Google Scholar 

  7. Torabi, F., Warnell, G., Stone, P.: Recent Advances in Imitation Learning from Observation. pp. 6325–6331 (2019)

  8. Calinon, S., Billard, A.: Incremental Learning of Gestures by Imitation in a Humanoid Robot. pp. 255–262 (2007)

  9. Peng, X.B., Abbeel, P., Levine, S., van de Panne, M.: DeepMimic: example-guided deep reinforcement learning of physics-based character skills. ACM Trans. Graph. 37(4), 143 (2018). https://doi.org/10.1145/3197517.3201311

    Article  Google Scholar 

  10. Nair, A., et al.: Combining self-supervised learning and imitation for vision-based rope manipulation. pp. 2146–2153 (2017)

  11. Pavse, B.S., Torabi, F., Hanna, J., Warnell, G., Stone, P.: RIDM: Reinforced Inverse Dynamics Modeling for Learning from a Single Observed Demonstration. IEEE Robotics and Automation Letters. 5(4), 6262–6269 (2020). https://doi.org/10.1109/LRA.2020.3010750

    Article  Google Scholar 

  12. Torabi, F., Warnell, G., Stone, P.: Behavioral cloning from observation, presented at the Proceedings of the 27th International Joint Conference on Artificial Intelligence, Stockholm, Sweden (2018)

  13. Guo, X., Chang, S., Yu, M., Tesauro, G., Campbell, M.: Hybrid reinforcement learning with expert state sequences, presented at the Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, Honolulu, Hawaii, USA, [Online]. (2019) https://doi.org/10.1609/aaai.v33i01.33013739

  14. Edwards, A.D., Sahni, H., Schroecker, Y., Isbell, Jr C.L.: Imitating Latent Policies from Observation. CoRR, vol. abs/1805.07914. [Online]. (2018) Available: http://arxiv.org/abs/1805.07914

  15. Zheng, C., et al.: Deep learning-based human pose estimation: A survey. arXiv preprint arXiv:2012.13392 (2020)

  16. Cao, Z., Hidalgo, G., Simon, T., Wei, S.-E., Sheikh, Y.: OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields. arXiv preprint arXiv:1812.08008 (2018)

  17. Pavlakos, G., Zhou, X., Derpanis, K.G., Daniilidis, K.: Coarse-to-Fine Volumetric Prediction for Single-Image 3D Human Pose. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 21–26 July 2017, pp. 1263–1272. (2017) https://doi.org/10.1109/CVPR.2017.139

  18. Zhao, L., Peng, X., Tian, Y., Kapadia, M., Metaxas, D.N.: Semantic Graph Convolutional Networks for 3D Human Pose Regression. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 15–20 June 2019, pp. 3420–3430. (2019) https://doi.org/10.1109/CVPR.2019.00354

  19. Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-End Recovery of Human Shape and Pose. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 18–23 June 2018, pp. 7122–7131 (2018) https://doi.org/10.1109/CVPR.2018.00744

  20. Kocabas, M., Athanasiou, N., Black, M.J.: VIBE: Video Inference for Human Body Pose and Shape Estimation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 13–19 June 2020, pp. 5252–5262 (2020) https://doi.org/10.1109/CVPR42600.2020.00530

  21. Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. 34(6), 248 (2015). https://doi.org/10.1145/2816795.2818013

    Article  Google Scholar 

  22. Pavlakos, G., et al.: Expressive Body Capture: 3D Hands, Face, and Body From a Single Image. pp. 10967–10977 (2019)

  23. Kolotouros, N., Pavlakos, G., Black, M., Daniilidis, K.: Learning to Reconstruct 3D Human Pose and Shape via Model-Fitting in the Loop. pp. 2252–2261 (2019)

  24. Benzine, A., Chabot, F., Luvison, B., Pham, Q., Achard C.: PandaNet: Anchor-Based Single-Shot Multi-Person 3D Pose Estimation. pp. 6855–6864 (2020)

  25. Mehta, D., et al.: XNect: real-time multi-person 3D motion capture with a single RGB camera. ACM Trans. Graph. 39, 82:1–82:17 (2020). https://doi.org/10.1145/3386569.3392410

    Article  Google Scholar 

  26. Zhang, Z., Niu, Y., Yan, Z., Lin, S.: Real-Time Whole-Body Imitation by Humanoid Robots and Task-Oriented Teleoperation Using an Analytical Mapping Method and Quantitative Evaluation. Appl. Sci. 8, 2005 (2018). https://doi.org/10.3390/app8102005

    Article  Google Scholar 

  27. Koenemann, J., Burget, F., Bennewitz, M.: Real-time Imitation of Human Whole-Body Motions by Humanoids. (2014)

  28. Zhang, L., Cheng, Z., Gan, Y., Zhu, G., Shen, P., Song, J.: Fast human whole body motion imitation algorithm for humanoid robots. pp. 1430–1435 (2016)

  29. Shahverdi, P., Masouleh, M.T.: A simple and fast geometric kinematic solution for imitation of human arms by a NAO humanoid robot. In: 2016 4th International Conference on Robotics and Mechatronics (ICROM), 26–28 Oct. 2016 2016, pp. 572–577. https://doi.org/10.1109/ICRoM.2016.7886806

  30. Ren, B., Liu, M., Ding, R., Liu, H.: A survey on 3d skeleton-based action recognition using learning method. arXiv preprint arXiv:2002.05907, (2020)

  31. Wang, L., Huynh, D.Q., Koniusz, P.: A comparative review of recent kinect-based action recognition algorithms. IEEE Trans. Image Process. 29, 15–28 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  32. Wang, H., Wang L. L., Modeling Temporal Dynamics and Spatial Configurations of Actions Using Two-Stream Recurrent Neural Networks. (2017)

  33. Liu, J., Wang, G., Hu, P., Duan, L., Kot, A.C.: Global Context-Aware Attention LSTM Networks for 3D Action Recognition," in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 21–26 July 2017, pp. 3671–3680, (2017) https://doi.org/10.1109/CVPR.2017.391

  34. Caetano, C., Sena, J., Brémond, F., Dos Santos, J.A., Schwartz, W.R.: Skelemotion: A new representation of skeleton joint sequences based on motion information for 3d action recognition. In: 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), IEEE, pp. 1–8 (2019)

  35. Shahroudy, A., Liu, J., Ng, T.-T., Wang, G.: Ntu rgb+ d: A large scale dataset for 3d human activity analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1010–1019 (2016)

  36. Caetano, C., Brémond, F., Schwartz, W.R.: Skeleton image representation for 3D action recognition based on tree structure and reference joints. In: 2019 32nd SIBGRAPI conference on graphics, patterns and images (SIBGRAPI), IEEE, pp. 16–23 (2019)

  37. Shi, L., Zhang, Y., Cheng, J., Lu, H.: Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 12026–12035 (2019)

  38. Duan, H., Zhao, Y., Chen, K., Shao, D., Lin, D., Dai, B.: Revisiting skeleton-based action recognition. arXiv preprint arXiv:2104.13586, (2021)

  39. Djordjevic, V., Tao, H., Song, X., He, S., Gao, W., Stojanovic, V.: Data-driven control of hydraulic servo actuator: An event-triggered adaptive dynamic programming approach. Math. Biosci. Eng. 20(5), 8561–8582 (2023)

    Article  MathSciNet  Google Scholar 

  40. Nedic, N., Stojanovic, V., Djordjevic, V.: Optimal control of hydraulically driven parallel robot platform based on firefly algorithm. Nonlinear Dynamics. 82, 1457–1473 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  41. Zhou, C., Tao, H., Chen, Y., Stojanovic, V., Paszke, W.: Robust point-to-point iterative learning control for constrained systems: A minimum energy approach. Int J Robust Nonlinear Control 32(18), 10139–10161 (2022)

    Article  MathSciNet  Google Scholar 

  42. Taheri, A., Meghdari, A., Mahoor, M.H.: A close look at the imitation performance of children with autism and typically developing children using a robotic system. Int. J. Soc. Robot. 13, 1125–1147 (2021)

    Article  Google Scholar 

  43. Mahmood, N., Ghorbani, N., Troje, N., Pons-Moll, G., Black, M.: AMASS: Archive of Motion Capture As Surface Shapes. pp. 5441–5450 (2019)

  44. Lugaresi, C., et al.: Mediapipe: A framework for building perception pipelines. arXiv preprint arXiv:1906.08172, (2019)

  45. Aldebaran. http://doc.aldebaran.com/. Accessed

  46. W. S. Cleveland and S. J. Devlin, "Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting," J. Am. Stat. Assoc., vol. 83, no. 403, pp. 596–610, 1988/09/01 1988, https://doi.org/10.1080/01621459.1988.10478639

  47. Müller, M.: Dynamic time warping. Information Retrieval for Music and Motion. 2, 69–84 (2007). https://doi.org/10.1007/978-3-540-74048-3_4

    Article  Google Scholar 

  48. Yang, Z., Li, Y., Yang, J., Luo, J.: Action Recognition With Spatio–Temporal Visual Attention on Skeleton Image Sequences. IEEE Transactions on Circuits and Systems for Video Technology. 29(8), 2405–2415 (2019). https://doi.org/10.1109/TCSVT.2018.2864148

    Article  Google Scholar 

  49. Xu, H., Bazavan, E., Zanfir, A., Freeman, W., Sukthankar, R., Sminchisescu, C.: GHUM & GHUML: Generative 3D Human Shape and Articulated Pose Models. pp. 6183–6192 (2020)

  50. Pham, H.H., Salmane, H., Khoudour, L., Crouzil, A., Velastin, S.A., Zegers, P.: A unified deep framework for joint 3d pose estimation and action recognition from a single rgb camera. Sensors. 20(7), 1825 (2020)

    Article  Google Scholar 

  51. Li, W., Zhang, Z., Liu, Z.: Action recognition based on a bag of 3D points. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops, 13–18 June 2010, pp. 9–14, (2010) https://doi.org/10.1109/CVPRW.2010.5543273

  52. Yun, K., Honorio, J., Chattopadhyay, D., Berg, T.L., Samaras, D.: Two-person interaction detection using body-pose features and multiple instance learning. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 16–21 June 2012, pp. 28–35, (2012) https://doi.org/10.1109/CVPRW.2012.6239234

  53. Mazhar, O., Ramdani, S., Navarro, B., Passama, R., Cherubini, A.: Towards Real-Time Physical Human-Robot Interaction Using Skeleton Information and Hand Gestures. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 1–5 Oct. 2018, pp. 1–6, (2018) https://doi.org/10.1109/IROS.2018.8594385

  54. Bandi, C., Thomas, U.: Skeleton-based Action Recognition for Human-Robot Interaction using Self-Attention Mechanism. In: 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021), 15–18 Dec. 2021, pp. 1–8, (2021) https://doi.org/10.1109/FG52635.2021.9666948

  55. Song, Z., et al.: Attention-Oriented Action Recognition for Real- Time Human-Robot Interaction. In: 2020 25th International Conference on Pattern Recognition (ICPR), 10–15 Jan. 2021, pp. 7087–7094, (2021) https://doi.org/10.1109/ICPR48806.2021.9412346

Download references

Acknowledgments

This research was supported by the Sharif University of Technology. The complementary and continued support of the Social & Cognitive Robotics Laboratory by a Dr. Ali Akbar Siassi Memorial Grant is also greatly appreciated. We also thank Mrs. Shari Holderread for the English editing of the final manuscript.

Funding

This research was funded by the Sharif University of Technology (Grant No. G980517).

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study’s conception and design. Material preparation, data collection, and analysis were performed by Seyed Adel Alizadeh Kolagar. The first draft of the manuscript was written by Seyed Adel Alizadeh Kolagar. All authors read, revised, and approved the final manuscript. Alireza Taheri defined the project. Alireza Taheri and Ali F. Meghdari supervised the research and provided guidance/expertise in the area of AI and HRI.

Corresponding author

Correspondence to Alireza Taheri.

Ethics declarations

Ethics Approval

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards.

Consent to Participate

Informed consent was obtained from all individual participants included in the study.

Consent for Publication

The authors affirm that the human research participants provided informed consent to publish the images used in all the figures.

Conflict of Interest

Author Alireza Taheri has received research grants from the Sharif University of Technology. The authors, Seyed Adel Alizadeh Kolagar and Ali F. Meghdari, declare no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Alizadeh Kolagar, S.A., Taheri, A. & Meghdari, A.F. NAO Robot Learns to Interact with Humans through Imitation Learning from Video Observation. J Intell Robot Syst 109, 4 (2023). https://doi.org/10.1007/s10846-023-01938-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10846-023-01938-8

Keywords

Navigation