Part-of-Speech and Prosody-based Approaches for Robot Speech and Gesture Synchronization


Humanoid robots are already among us and they are beginning to assume more social and personal roles, like guiding and assisting people. Thus, they should interact in a human-friendly manner, using not only verbal cues but also synchronized non-verbal and para-verbal cues. However, available robots are not able to communicate in this multimodal way, being just able to perform predefined gesture sequences, handcrafted to accompany specific utterances. In the current paper, we propose a model based on three different approaches to extend humanoid robots communication behaviour with upper body gestures synchronized with the speech for novel utterances, exploiting part-of-speech grammatical information, prosody cues, and a combination of both. User studies confirm that our methods are able to produce natural, appropriate and good timed gesture sequences synchronized with speech, using both beat and emblematic gestures.

The second author has been funded by the Agencia Estatal de Investigación (AEI), Ministerio de Ciencia, Innovación y Universidades and the Fondo Social Europeo (FSE) under grant RYC-2015-17239 (AEI/FSE, UE). The authors would like to thank the anonymous reviewers that helped to improve this paper through their valuable comments.

  • Human-computer interaction
  • Multimodal interaction
  • Humanoid robots
  • Prosody
  • Speech
  • Gesture modelling
  • Arm gesture synthesis
  • Speech and gesture synchronization
  • Text-to-gesture