Part-of-Speech and Prosody-based Approaches for Robot Speech and Gesture Synchronization

  • L. Pérez-MayosEmail author
  • M. Farrús
  • J. Adell


Humanoid robots are already among us and they are beginning to assume more social and personal roles, like guiding and assisting people. Thus, they should interact in a human-friendly manner, using not only verbal cues but also synchronized non-verbal and para-verbal cues. However, available robots are not able to communicate in this multimodal way, being just able to perform predefined gesture sequences, handcrafted to accompany specific utterances. In the current paper, we propose a model based on three different approaches to extend humanoid robots communication behaviour with upper body gestures synchronized with the speech for novel utterances, exploiting part-of-speech grammatical information, prosody cues, and a combination of both. User studies confirm that our methods are able to produce natural, appropriate and good timed gesture sequences synchronized with speech, using both beat and emblematic gestures.


Human-computer interaction Multimodal interaction Humanoid robots Prosody Speech Gesture modelling Arm gesture synthesis Speech and gesture synchronization Text-to-gesture 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.



The second author has been funded by the Agencia Estatal de Investigación (AEI), Ministerio de Ciencia, Innovación y Universidades and the Fondo Social Europeo (FSE) under grant RYC-2015-17239 (AEI/FSE, UE). The authors would like to thank the anonymous reviewers that helped to improve this paper through their valuable comments.


  1. 1.
    Aly, A., Tapus, A.: An integrated model of speech to arm gestures mapping in human-robot interaction. In: Information Control Problems in Manufacturing, vol. 14, pp. 817–822 (2012)CrossRefGoogle Scholar
  2. 2.
    Aly, A., Tapus, A.: Prosody-driven robot arm gestures generation in human-robot interaction. In: Proceedings of the Seventh Annual ACM/IEEE International Conference on Human-Robot Interaction, pp. 257–258. ACM (2012)Google Scholar
  3. 3.
    Aly, A., Tapus, A.: Speech to head gesture mapping in multimodal human-robot interaction. In: Service Orientation in Holonic and Multi-Agent Manufacturing Control, pp. 183–196. Springer (2012)Google Scholar
  4. 4.
    Aly, A., Tapus, A.: Prosody-based adaptive metaphoric head and arm gestures synthesis in human robot interaction. In: 2013 16th International Conference on Advanced Robotics (ICAR), pp. 1–8. IEEE (2013)Google Scholar
  5. 5.
    Breckinridge Church, R., Garber, P., Rogalski, K.: The role of gesture in memory and social communication. Gesture 7(2), 137–158 (2007)CrossRefGoogle Scholar
  6. 6.
    Bremner, P., Leonards, U.: Speech and Gesture Emphasis Effects For Robotic and Human Communicators - a Direct Comparison. In: Proceedings of the Tenth Annual ACM/IEEE International Conference on Human-Robot Interaction, pp. 255–262. ACM (2015)Google Scholar
  7. 7.
    Bremner, P., Pipe, A.G., Fraser, M., Subramanian, S., Melhuish, C.: Beat gesture generation rules for human-robot interaction. In: RO-MAN 2009 - the 18th IEEE International Symposium on Robot and Human Interactive Communication, pp. 1029–1034. IEEE (2009)Google Scholar
  8. 8.
    Bremner, P., Pipe, A.G., Melhuish, C., Fraser, M., Subramanian, S.: The effects of robot-performed co-verbal gesture on listener behaviour. In: 2011 11th IEEE-RAS International Conference on Humanoid Robots, pp. 458–465. IEEE (2011)Google Scholar
  9. 9.
    Chiu, C.C., Marsella, S.: How to train your avatar: a data driven approach to gesture generation, pp. 127–140 (2011)CrossRefGoogle Scholar
  10. 10.
    Chiu, C.C., Marsella, S.: Gesture generation with low-dimensional embeddings, pp. 781–788 (2014)Google Scholar
  11. 11.
    Ding, Y., Pelachaud, C., Artières, T.: Modeling multimodal behaviors from speech prosody. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 8108 LNAI, pp. 217–228 (2013)CrossRefGoogle Scholar
  12. 12.
    Fernández-Baena, A., Montaño, R., Antonijoan, M., Roversi, A., Miralles, D., Alías, F.: Gesture synthesis adapted to speech emphasis. Speech Comm. 57, 331–350 (2014)CrossRefGoogle Scholar
  13. 13.
    Feyereisen, P., De Lannoy, J.D.: Gestures and speech: psychological investigations. Cambridge University Press, Cambridge (1991)Google Scholar
  14. 14.
    Gelin, R., d’Alessandro, C., Le, Q.A., Deroo, O., Doukhan, D., Martin, J.C., Pelachaud, C., Rilliard, A., Rosset, S.: Towards a storytelling humanoid robot. In: 2010 AAAI Fall Symposium Series (2010)Google Scholar
  15. 15.
    Goldin-Meadow, S.: The role of gesture in communication and thinking. Trends Cogn. Sci. 3(11), 419–429 (1999)CrossRefGoogle Scholar
  16. 16.
    Ishi, C.T., Machiyashiki, D., Mikata, R., Ishiguro, H.: A speech-driven hand gesture generation method and evaluation in android robots. IEEE Robotics and Automation Letters 3(4), 3757–3764 (2018)CrossRefGoogle Scholar
  17. 17.
    Kim, H.H., Ha, Y.S., Bien, Z., Park, K.H.: Gesture encoding and reproduction for human-robot interaction in text-to-gesture systems. Industrial Robot: An International Journal 39(6), 551–563 (2012)CrossRefGoogle Scholar
  18. 18.
    Krahmer, E., Swerts, M.: Audiovisual prosody —– introduction to the special issue. Lang. Speech 52, 129–133 (2009)CrossRefGoogle Scholar
  19. 19.
    Le, Q.A., Hanoune, S., Pelachaud, C.: Design and implementation of an expressive gesture model for a humanoid robot. In: 2011 11th IEEE-RAS International Conference on Humanoid Robots, pp. 134–140. IEEE (2011)Google Scholar
  20. 20.
    Levine, S., Theobalt, C., Koltun, V.: Real-time prosody-driven synthesis of body language. In: ACM SIGGRAPH Asia 2009 Papers on SIGGRAPH Asia ‘09, vol. 28, p 1. ACM Press, New York (2009)Google Scholar
  21. 21.
    McNeill, D.: Hand and mind: what gestures reveal about thought. University of Chicago Press, Chicago (1992)Google Scholar
  22. 22.
    Meena, R., Jokinen, K., Wilcock, G.: Integration of gestures and speech in human-robot interaction. In: 2012 IEEE 3rd International Conference on Cognitive Infocommunications (Coginfocom), pp. 673–678. IEEE (2012)Google Scholar
  23. 23.
    Mlakar, I., Kacic, Z., Rojc, M.: TTS-driven synthetic behaviour-generation model for artificial bodies. Int. J. Adv. Robot. Syst. 10, 1–20 (2013)CrossRefGoogle Scholar
  24. 24.
    Munhall, K.G., Jones, J.A., Callan, D.E., Kuratate, T., Vatikiotis-Bateson, E.: Visual prosody and speech intelligibility: head movement improves auditory speech perception. Psychol. Sci. 15(2), 133–137 (2004)CrossRefGoogle Scholar
  25. 25.
    Ng-Thow-Hing, V., Luo, P., Okita, S.: Synchronized gesture and speech production for humanoid robots. In: 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 4617–4624. IEEE (2010)Google Scholar
  26. 26.
    Nooteboom, S.: The prosody of speech: melody and rhythm. The Handbook of Phonetic Sciences 5, 640–673 (1997)Google Scholar
  27. 27.
    Pandzic, I.S., Forchheimer, R.: Mpeg-4 facial animation. The standard, implementation and applications. Wiley, Chichester (2002)CrossRefGoogle Scholar
  28. 28.
    Salem, M., Kopp, S., Joublin, F.: Generating finely synchronized gesture and speech for humanoid robots: a closed-loop approach. In: Proceedings of the 8th ACM/IEEE International Conference on Human-Robot Interaction, pp. 219–220. IEEE Press (2013)Google Scholar
  29. 29.
    Tay, J., Veloso, M.: Modeling and composing gestures for human-robot interaction. In: 2012 IEEE RO-MAN: the 21st IEEE International Symposium on Robot and Human Interactive Communication, pp. 107–112. IEEE (2012)Google Scholar
  30. 30.
    Wagner, P., Malisz, Z., Kopp, S.: Gesture and speech in interaction: an overview. Speech Comm. 57, 209–232 (2014)CrossRefGoogle Scholar
  31. 31.
    Wennerstrom, A.: The music of everyday speech: prosody and discourse analysis. Oxford University Press, Oxford (2001)Google Scholar
  32. 32.
    Zoric, G., Forchheimer, R., Pandzic, I.S.: On creating multimodal virtual humans—real time speech driven facial gesturing. Multimed. Tools Appl. 54(1), 165–179 (2010)CrossRefGoogle Scholar

Copyright information

© Springer Nature B.V. 2019

Authors and Affiliations

  1. 1.Department of Information and Communication TechnologiesUPFBarcelonaSpain
  2. 2.Verbio Technologies S.LBarcelonaSpain

Personalised recommendations