Part-of-Speech and Prosody-based Approaches for Robot Speech and Gesture Synchronization


Humanoid robots are already among us and they are beginning to assume more social and personal roles, like guiding and assisting people. Thus, they should interact in a human-friendly manner, using not only verbal cues but also synchronized non-verbal and para-verbal cues. However, available robots are not able to communicate in this multimodal way, being just able to perform predefined gesture sequences, handcrafted to accompany specific utterances. In the current paper, we propose a model based on three different approaches to extend humanoid robots communication behaviour with upper body gestures synchronized with the speech for novel utterances, exploiting part-of-speech grammatical information, prosody cues, and a combination of both. User studies confirm that our methods are able to produce natural, appropriate and good timed gesture sequences synchronized with speech, using both beat and emblematic gestures.

This is a preview of subscription content, access via your institution.


  1. 1.

    Aly, A., Tapus, A.: An integrated model of speech to arm gestures mapping in human-robot interaction. In: Information Control Problems in Manufacturing, vol. 14, pp. 817–822 (2012)

  2. 2.

    Aly, A., Tapus, A.: Prosody-driven robot arm gestures generation in human-robot interaction. In: Proceedings of the Seventh Annual ACM/IEEE International Conference on Human-Robot Interaction, pp. 257–258. ACM (2012)

  3. 3.

    Aly, A., Tapus, A.: Speech to head gesture mapping in multimodal human-robot interaction. In: Service Orientation in Holonic and Multi-Agent Manufacturing Control, pp. 183–196. Springer (2012)

  4. 4.

    Aly, A., Tapus, A.: Prosody-based adaptive metaphoric head and arm gestures synthesis in human robot interaction. In: 2013 16th International Conference on Advanced Robotics (ICAR), pp. 1–8. IEEE (2013)

  5. 5.

    Breckinridge Church, R., Garber, P., Rogalski, K.: The role of gesture in memory and social communication. Gesture 7(2), 137–158 (2007)

    Article  Google Scholar 

  6. 6.

    Bremner, P., Leonards, U.: Speech and Gesture Emphasis Effects For Robotic and Human Communicators - a Direct Comparison. In: Proceedings of the Tenth Annual ACM/IEEE International Conference on Human-Robot Interaction, pp. 255–262. ACM (2015)

  7. 7.

    Bremner, P., Pipe, A.G., Fraser, M., Subramanian, S., Melhuish, C.: Beat gesture generation rules for human-robot interaction. In: RO-MAN 2009 - the 18th IEEE International Symposium on Robot and Human Interactive Communication, pp. 1029–1034. IEEE (2009)

  8. 8.

    Bremner, P., Pipe, A.G., Melhuish, C., Fraser, M., Subramanian, S.: The effects of robot-performed co-verbal gesture on listener behaviour. In: 2011 11th IEEE-RAS International Conference on Humanoid Robots, pp. 458–465. IEEE (2011)

  9. 9.

    Chiu, C.C., Marsella, S.: How to train your avatar: a data driven approach to gesture generation, pp. 127–140 (2011)

  10. 10.

    Chiu, C.C., Marsella, S.: Gesture generation with low-dimensional embeddings, pp. 781–788 (2014)

  11. 11.

    Ding, Y., Pelachaud, C., Artières, T.: Modeling multimodal behaviors from speech prosody. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 8108 LNAI, pp. 217–228 (2013)

  12. 12.

    Fernández-Baena, A., Montaño, R., Antonijoan, M., Roversi, A., Miralles, D., Alías, F.: Gesture synthesis adapted to speech emphasis. Speech Comm. 57, 331–350 (2014)

    Article  Google Scholar 

  13. 13.

    Feyereisen, P., De Lannoy, J.D.: Gestures and speech: psychological investigations. Cambridge University Press, Cambridge (1991)

    Google Scholar 

  14. 14.

    Gelin, R., d’Alessandro, C., Le, Q.A., Deroo, O., Doukhan, D., Martin, J.C., Pelachaud, C., Rilliard, A., Rosset, S.: Towards a storytelling humanoid robot. In: 2010 AAAI Fall Symposium Series (2010)

  15. 15.

    Goldin-Meadow, S.: The role of gesture in communication and thinking. Trends Cogn. Sci. 3(11), 419–429 (1999)

    Article  Google Scholar 

  16. 16.

    Ishi, C.T., Machiyashiki, D., Mikata, R., Ishiguro, H.: A speech-driven hand gesture generation method and evaluation in android robots. IEEE Robotics and Automation Letters 3(4), 3757–3764 (2018)

    Article  Google Scholar 

  17. 17.

    Kim, H.H., Ha, Y.S., Bien, Z., Park, K.H.: Gesture encoding and reproduction for human-robot interaction in text-to-gesture systems. Industrial Robot: An International Journal 39(6), 551–563 (2012)

    Article  Google Scholar 

  18. 18.

    Krahmer, E., Swerts, M.: Audiovisual prosody —– introduction to the special issue. Lang. Speech 52, 129–133 (2009)

    Article  Google Scholar 

  19. 19.

    Le, Q.A., Hanoune, S., Pelachaud, C.: Design and implementation of an expressive gesture model for a humanoid robot. In: 2011 11th IEEE-RAS International Conference on Humanoid Robots, pp. 134–140. IEEE (2011)

  20. 20.

    Levine, S., Theobalt, C., Koltun, V.: Real-time prosody-driven synthesis of body language. In: ACM SIGGRAPH Asia 2009 Papers on SIGGRAPH Asia ‘09, vol. 28, p 1. ACM Press, New York (2009)

  21. 21.

    McNeill, D.: Hand and mind: what gestures reveal about thought. University of Chicago Press, Chicago (1992)

    Google Scholar 

  22. 22.

    Meena, R., Jokinen, K., Wilcock, G.: Integration of gestures and speech in human-robot interaction. In: 2012 IEEE 3rd International Conference on Cognitive Infocommunications (Coginfocom), pp. 673–678. IEEE (2012)

  23. 23.

    Mlakar, I., Kacic, Z., Rojc, M.: TTS-driven synthetic behaviour-generation model for artificial bodies. Int. J. Adv. Robot. Syst. 10, 1–20 (2013)

    Article  Google Scholar 

  24. 24.

    Munhall, K.G., Jones, J.A., Callan, D.E., Kuratate, T., Vatikiotis-Bateson, E.: Visual prosody and speech intelligibility: head movement improves auditory speech perception. Psychol. Sci. 15(2), 133–137 (2004)

    Article  Google Scholar 

  25. 25.

    Ng-Thow-Hing, V., Luo, P., Okita, S.: Synchronized gesture and speech production for humanoid robots. In: 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 4617–4624. IEEE (2010)

  26. 26.

    Nooteboom, S.: The prosody of speech: melody and rhythm. The Handbook of Phonetic Sciences 5, 640–673 (1997)

    Google Scholar 

  27. 27.

    Pandzic, I.S., Forchheimer, R.: Mpeg-4 facial animation. The standard, implementation and applications. Wiley, Chichester (2002)

    Book  Google Scholar 

  28. 28.

    Salem, M., Kopp, S., Joublin, F.: Generating finely synchronized gesture and speech for humanoid robots: a closed-loop approach. In: Proceedings of the 8th ACM/IEEE International Conference on Human-Robot Interaction, pp. 219–220. IEEE Press (2013)

  29. 29.

    Tay, J., Veloso, M.: Modeling and composing gestures for human-robot interaction. In: 2012 IEEE RO-MAN: the 21st IEEE International Symposium on Robot and Human Interactive Communication, pp. 107–112. IEEE (2012)

  30. 30.

    Wagner, P., Malisz, Z., Kopp, S.: Gesture and speech in interaction: an overview. Speech Comm. 57, 209–232 (2014)

    Article  Google Scholar 

  31. 31.

    Wennerstrom, A.: The music of everyday speech: prosody and discourse analysis. Oxford University Press, Oxford (2001)

    Google Scholar 

  32. 32.

    Zoric, G., Forchheimer, R., Pandzic, I.S.: On creating multimodal virtual humans—real time speech driven facial gesturing. Multimed. Tools Appl. 54(1), 165–179 (2010)

    Article  Google Scholar 

Download references


The second author has been funded by the Agencia Estatal de Investigación (AEI), Ministerio de Ciencia, Innovación y Universidades and the Fondo Social Europeo (FSE) under grant RYC-2015-17239 (AEI/FSE, UE). The authors would like to thank the anonymous reviewers that helped to improve this paper through their valuable comments.

Author information



Corresponding author

Correspondence to L. Pérez-Mayos.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Pérez-Mayos, L., Farrús, M. & Adell, J. Part-of-Speech and Prosody-based Approaches for Robot Speech and Gesture Synchronization. J Intell Robot Syst 99, 277–287 (2020).

Download citation


  • Human-computer interaction
  • Multimodal interaction
  • Humanoid robots
  • Prosody
  • Speech
  • Gesture modelling
  • Arm gesture synthesis
  • Speech and gesture synchronization
  • Text-to-gesture