A Multimodal Perception and Cognition Framework and Its Application for Social Robots

Dong, Lanfang; Hu, PuZhao; Xiao, Xiao; Tang, YingChao; Mao, Meng; Li, Guoming

doi:10.1007/978-3-031-24667-8_42

Lanfang Dong¹⁵,
PuZhao Hu¹⁵,
Xiao Xiao¹⁵,
YingChao Tang¹⁵,
Meng Mao¹⁶ &
…
Guoming Li¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13817))

Included in the following conference series:

International Conference on Social Robotics

927 Accesses

Abstract

With the development of artificial intelligence and computer technology, more and more intelligent robots come into people’s view. And we can see the application of social robots in various scenarios, but these robots are insufficient in terms of anthropomorphism and personalization. In this paper, an interaction framework based on multimodal perception and cognition is proposed. This framework allows for more individualized engagement behaviors while also enhancing the cognitive system of social robots by gathering information on users’ words, expressions, and posture. The application of the interactive framework was demonstrated in the hospital scenario.

Supported by the National Key Research and Development Program of China under Grant No.2020YFB1313602.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

https://github.com/google/mediapipe
https://pypi.org/project/PyAudio
Chen, C., Liu, Y., Kreiss, S., Alahi, A.: Crowd-robot interaction: Crowd-aware robot navigation with attention-based deep reinforcement learning. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 6015–6022 (2019). https://doi.org/10.1109/ICRA.2019.8794134
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Ghaffar, F.: Controlling traffic with humanoid social robot. arXiv preprint arXiv:2204.04240 (2022)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, vol. 25 (2012)
Google Scholar
Liu, X., Li, X., Su, H., Zhao, Y., Ge, S.S.: The opening workspace control strategy of a novel manipulator-driven emission source microscopy system. ISA Trans. (2022)
Google Scholar
Najibi, M., Samangouei, P., Chellappa, R., Davis, L.S.: Ssh: Single stage headless face detector, pp. 4875–4884 (2017)
Google Scholar
Rothe, R., Timofte, R., Gool, L.V.: Deep expectation of real and apparent age from a single image without facial landmarks. Int. J. Comput. Vision 126(2–4), 144–157 (2018)
Article Google Scholar
Rothe, R., Timofte, R., Van Gool, L.: Dex: Deep expectation of apparent age from a single image. In: Proceedings of the IEEE international conference on computer vision workshops, pp. 10–15 (2015)
Google Scholar
Saaybi, S., Majid, A.Y., Prasad, R.V., Koubaa, A., Verhoeven, C.: Covy: An ai-powered robot for detection of breaches in social distancing. arXiv preprint arXiv:2207.06847 (2022)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Tang, X., Du, D.K., He, Z., Liu, J.: Pyramidbox: A context-assisted single shot face detector. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 797–813 (2018)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Wang, D., Ma, G., Liu, X.: An intelligent recognition framework of access control system with anti-spoofing function. AIMS Math. 7(6), 10495–10512 (2022)
Article Google Scholar
Xu, Y., Su, H., Ma, G., Liu, X.: A novel dual-modal emotion recognition algorithm with fusing hybrid features of audio signal and speech context. Complex & Intelligent Systems, pp. 1–13 (2022). https://doi.org/10.1007/s40747-022-00841-3
Yang, F., Wu, Y., Sakti, S., Nakamura, S.: Make skeleton-based action recognition model smaller, faster and better. In: Proceedings of the ACM multimedia asia, pp. 1–6 (2019)
Google Scholar
Yang, T.Y., Huang, Y.H., Lin, Y.Y., Hsiu, P.C., Chuang, Y.Y.: Ssr-net: A compact soft stagewise regression network for age estimation. In: IJCAI. vol. 5, p. 7 (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Science and Technology of China, Hefei, 230031, China
Lanfang Dong, PuZhao Hu, Xiao Xiao & YingChao Tang
AI Lab, China Merchants Bank, Shenzhen, 518040, China
Meng Mao & Guoming Li

Authors

Lanfang Dong
View author publications
You can also search for this author in PubMed Google Scholar
PuZhao Hu
View author publications
You can also search for this author in PubMed Google Scholar
Xiao Xiao
View author publications
You can also search for this author in PubMed Google Scholar
YingChao Tang
View author publications
You can also search for this author in PubMed Google Scholar
Meng Mao
View author publications
You can also search for this author in PubMed Google Scholar
Guoming Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiao Xiao .

Editor information

Editors and Affiliations

University of Florence, Florence, Italy
Filippo Cavallo
Qatar University, Doha, Qatar
John-John Cabibihan
University of Florence, Florence, Italy
Laura Fiorini
University of Florence, Florence, Italy
Alessandra Sorrentino
Wichita State University, Wichita, KS, USA
Hongsheng He
Qingdao University, Qingdao, China
Xiaorui Liu
National Institute of Advanced Industrial Science and Technology, Tsukuba, Japan
Yoshio Matsumoto
National University of Singapore, Singapore, Singapore
Shuzhi Sam Ge

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dong, L., Hu, P., Xiao, X., Tang, Y., Mao, M., Li, G. (2022). A Multimodal Perception and Cognition Framework and Its Application for Social Robots. In: Cavallo, F., et al. Social Robotics. ICSR 2022. Lecture Notes in Computer Science(), vol 13817. Springer, Cham. https://doi.org/10.1007/978-3-031-24667-8_42

Download citation

DOI: https://doi.org/10.1007/978-3-031-24667-8_42
Published: 01 February 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-24666-1
Online ISBN: 978-3-031-24667-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Multimodal Perception and Cognition Framework and Its Application for Social Robots