Skip to main content
Log in

Multimodal intent understanding and interaction system for elderly-assisted companionship

  • Regular Paper
  • Published:
CCF Transactions on Pervasive Computing and Interaction Aims and scope Submit manuscript

Abstract

With the aging of society, there has been an increasing amount of research on elderly-assisted companion robots. However, many existing methods used in research insufficiently consider the physiological characteristics of the elderly or rely on a single mode of interaction, leading to inaccurate understanding of elderly individuals’ intents. In this paper, we design a multimodal intent understanding and interaction system for elderly-assisted companionship. The system presents the following main innovations: (1) Proposing a semantic-based multimodal fusion algorithm (MSFA) to integrate the semantic layers of gesture and speech, addressing the heterogeneity and asynchrony issues between the two modalities. (2) Assisting elderly individuals in completing daily tasks through the human–computer cooperative interaction control algorithm (HCC). Experimental results demonstrate that the proposed multimodal fusion algorithm achieves effective intent recognition and combines natural human–machine interaction with intent understanding. This not only accurately captures users’ interaction intents and assists in completing interactive tasks but also reduces users’ mental and cognitive load, achieving a more desirable interaction effect. Additionally, the subjective evaluation analysis by users further verifies the effectiveness of the system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  • Aaltonen, I., Arbola, A., Heikkil, P., et al.: Hello Pepper, may I tickle you? children’s and adults’ responses to an entertainment robot at a shopping mall. In: ACM Philadelphia (2017)

  • Berns, K., Mehdi, S. A.: Use of an autonomous mobile robot for elderly care. In: 2010 Advanced Technologies for Enhancing Quality of Life. Ieee, pp 121–126 (2010)

  • Cacace, J., Finzi, A., Lippiello, V.: A robust multimodal fusion framework for command interpretation in human-robot cooperation. In: 2017 26th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN). IEEE

  • Di Nuovo, A., Broz, F., Wang, N., et al.: The multi-modal interface of Robot-Era multi-robot services tailored for the elderly. Intel. Serv. Robot. 11, 109–126 (2018)

    Article  Google Scholar 

  • Do, H.M., Pham, M., Sheng, W., et al.: RiSH: a robot-integrated smart home for elderly care. Robot. Auton. Syst. 101(1), 74–92 (2018)

    Article  Google Scholar 

  • Han, J. G., Campbell, N., Jokinen, K., et al.: Investigating the use of non-verbal cues in human–robot interaction with a Nao robot. In: IEEE, pp 679–683 (2012)

  • Hatori, J., Kikuchi, Y., Kobayashi, S., et al.: Interactively picking real-world objects with unconstrained spoken language instructions. In: ICRA Brisbane, (2018)

  • Islam, M. M., Iqbal, T.: Hamlet: a hierarchical multimodal attention-based human activity recognition algorithm. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp 10285–10292 (2020)

  • Iwata, H., Sugano, S.: Design of human symbiotic robot TWENDY-ONE. In: IEEE Kobe, (2009)

  • Jose, K, J., Lakshmi, K. S.: Joint slot filling and intent prediction for natural language understanding in frames dataset. In: ICIRCA Coimbatore (2018)

  • Kim, J. H., Thang, N. D., Kim, T. S.: 3-D hand motion tracking and gesture recognition using a data glove. In: 2009 IEEE international symposium on industrial electronics, pp 1013–101 (2009)

  • Koceski, S., Koceska, N.: Evaluation of an assistive telepresence robot for elderly healthcare. J. Med. Syst. 40(5), 1–7 (2016)

    Article  Google Scholar 

  • Lafaye, J., Gouaillier, D., Wieber, P. B.: Linear model predictive control of the locomotion of Pepper, a humanoid robot with omnidirectional wheels. In: IEEE (2014)

  • Li, J., Feng, Z. Q., Xie, W., et al.: A method of gesture recognition using CNN-SVM model with error correction strategy. In: 2018 International conference on computer, communication and network technology (CCNT 2018) ISBN, pp 978-1 (2018)

  • Maeshima, S., Osawa, A., Nishio, D., et al.: Efficacy of a hybrid assistive limb in post-stroke hemiplegic patients: a preliminary report. BMC Neurol. 11(1), 1–6 (2011)

    Article  Google Scholar 

  • Mikolov, T., Yih, W., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of the 2013 conference of the North American chapter of the association for computational linguistics: Human language technologies, pp 746–751 (2013)

  • Parlitz, C., Hagele, M., Klein, P., et al.: Care-O-bot 3-rationale for human-robot interaction design. In: Seul:ISR, (2008)

  • Rane, P., Mhatre, V., Kurup, L.: Study of a home robot: JIBO. Int. J. Eng. Res. Technol. (IJERT) 3(10), 490–493 (2014)

    Google Scholar 

  • Rosa, S., Patane, A., Lu, C.X., et al.: Semantic place understanding for human–robot coexistence—toward intelligent workplaces. IEEE Trans. Hum.-Mach. Syst. 49(2), 160–170 (2018)

    Article  Google Scholar 

  • Seppälä, M.: A secure and conflict free control platform for Care-O-Bot 4. 2018(1): 77–84 (2018)

  • Shanthakumar, V.A., Peng, C., Hansberger, J., et al.: Design and evaluation of a hand gesture recognition approach for real-time interactions. Multimed. Tools Appl. 79(25), 17707–17730 (2020)

    Article  Google Scholar 

  • Sindagi, V. A., Zhou, Y., Tuzel, O.: Mvx-net: multimodal voxelnet for 3d object detection. In: 2019 International Conference on Robotics and Automation (ICRA). IEEE, pp 7276–7282 (2019)

  • Variani, E., Lei, X., McDermott, E., et al.: Deep neural networks for small footprint text-dependent speaker verification. In: 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 4052–4056 (2014)

  • Wang, Q., Lan, Z.: The primary research of control system on companion robot for the elderly. In: 2016 International Conference on Advanced Robotics and Mechatronics (ICARM). IEEE, pp 38–41 (2016)

  • Zhang, J., Yin, Z., Chen, P., et al.: Emotion recognition using multi-modal data and machine learning techniques: a tutorial and review. Inform. Fusion 59(1), 103–126 (2020a)

    Google Scholar 

  • Zhang, X., Feng, Z., Tian, J., et al.: Multimodal data fusion algorithm applied to robots. J. Phys.: Conf. Ser. 1453(1), 012040 (2020b)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhiquan Feng.

Ethics declarations

Conflict of interest

We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, there is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, Y., Feng, Z. & Wang, H. Multimodal intent understanding and interaction system for elderly-assisted companionship. CCF Trans. Pervasive Comp. Interact. 6, 52–67 (2024). https://doi.org/10.1007/s42486-023-00137-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42486-023-00137-6

Keywords

Navigation