Skip to main content

Head Motion Generation

  • Reference work entry
  • First Online:
Handbook of Human Motion

Abstract

Head movement is an important part of body language. Head motion plays a role in communicating lexical and syntactic information. It conveys emotional and personality traits. It plays an important role in acknowledging active listening. Given these communicative functions, it is important to synthesize Conversation Agents (CAs) with meaningful human-like head motion sequences, which are timely synchronized with speech. There are several studies that have focused on synthesizing head movements. Most studies can be categorized as rule-based or data-driven frameworks. On the one hand, rule-based methods define rules that map semantic labels or communicative goals to specific head motion sequences, which are appropriate for the underlying message (e.g., nodding for affirmation). However, the range of head motion sequences that are generated by these systems are usually limited, resulting in repetitive behaviors. On the other hand, data-driven methods rely on recorded head motion sequences which are used either to concatenate existing sequences creating new realizations of head movements or to build statistical frameworks that are able to synthesize novel realizations of head motion behaviors. Due to the strong correlation between head movements and speech prosody, these approaches usually rely on speech to drive the head movements. These methods can capture a broader range of movements displayed during human interaction. However, even when the generated head movements may be tightly synchronized with speech, they may not convey the underlying discourse function or intention in the message. The advantages of rule-based and data-driven methods have inspired several studies to create hybrid methods that overcome the aforementioned limitations. These studies have been proposed to generate the movements using parametric or nonparametric approaches, constraining the models not only on speech, but also on the semantic content. This chapter reviews most influential frameworks to generate head motion. It also discusses open challenges that can move this research area forward.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  • André E, Müller J, Rist T (1996) The PPP persona: a multipurpose animated presentation agent. In: Workshop on advanced visual interfaces, Gubbio, pp 245–247

    Google Scholar 

  • Arellano D, Varona J, Perales FJ, Bee N, Janowski K, André EE (2011) Influence of head orientation in perception of personality traits in virtual agents. In: The 10th international conference on autonomous agents and multiagent systems-Volume 3, Taipei, pp 1093–1094

    Google Scholar 

  • Arya A, Jefferies L, Enns J, DiPaola S (2006) Facial actions as visual cues for personality. Comput Anim Virtual Worlds 17(3–4):371–382

    Article  Google Scholar 

  • Bell L, Gustafson J, Heldner M (2003) Prosodic adaptation in human-computer interaction. In: 15th international congress of phonetic sciences (ICPhS 03), Barcelona, pp 2453–2456

    Google Scholar 

  • Beskow J, McGlashan S (1997) Olga – a conversational agent with gestures. In: Proceedings of the IJCAI 1997 workshop on animated interface agents: making them intelligent, Nagoya

    Google Scholar 

  • Breazeal C (2002) Regulation and entrainment in human-robot interaction. Int J Robot Res 21(10–11):883–902

    Article  Google Scholar 

  • Busso C, Narayanan S (2007) Interrelation between speech and facial gestures in emotional utterances: a single subject study. IEEE Trans Audio, Speech Lang Process 15(8):2331–2347

    Article  Google Scholar 

  • Busso C, Deng Z, Neumann U, Narayanan S (2005) Natural head motion synthesis driven by acoustic prosodic features. Comput Anim Virtual Worlds 16(3–4):283–290

    Article  Google Scholar 

  • Busso C, Deng Z, Grimm M, Neumann U, Narayanan S (2007a) Rigid head motion in expressive speech animation: analysis and synthesis. IEEE Trans Audio, Speech Lang Process 15(3):1075–1086

    Article  Google Scholar 

  • Busso C, Deng Z, Neumann U, Narayanan S (2007b) Learning expressive human-like head motion sequences from speech. In: Deng Z, Neumann U (eds) Data-driven 3D facial animations. Springer-Verlag London Ltd, Surrey, pp 113–131

    Chapter  Google Scholar 

  • Busso C, Bulut M, Lee C, Kazemzadeh A, Mower E, Kim S, Chang J, Lee S, Narayanan S (2008) IEMOCAP: Interactive emotional dyadic motion capture database. J Lang Resour Eval 42(4):335–359

    Article  Google Scholar 

  • Cassell J, Pelachaud C, Badler N, Steedman M, Achorn B, Bechet T, Douville B, Prevost S, Stone M (1994) Animated conversation: rule-based generation of facial expression gesture and spoken intonation for multiple conversational agents. In: Computer graphics (Proc. of ACM SIGGRAPH’94), Orlando, pp 413–420

    Google Scholar 

  • Cassell J, Bickmore T, Billinghurst M, Campbell L, Chang K, Vilhjalmsson H, Yan H (1999) Embodiment in conversational interfaces: Rea. In: International conference on human factors in computing systems (CHI-99), Pittsburgh, pp 520–527

    Google Scholar 

  • Chiu C-C, Marsella S (2011) How to train your avatar: a data driven approach to gesture generation. In: Intelligent virtual agents, Reykjavik, pp 127–140

    Google Scholar 

  • Chiu C-C, Morency L-P, Marsella S (2015) Predicting co-verbal gestures: a deep and temporal modeling approach. In: Intelligent virtual agents, Delft, pp 152–166

    Google Scholar 

  • Chuang E, Bregler C (2005) Mood swings: expressive speech animation. ACM Trans Graph 24(2):331–347

    Article  Google Scholar 

  • DeCarlo D, Stone M, Revilla C, Venditti JJ (2004) Specifying and animating facial signals for discourse in embodied conversational agents. Comput Anim Virtual Worlds 15(1):27–38

    Article  Google Scholar 

  • Deng Z, Busso C, Narayanan S, Neumann U (2004) Audio-based head motion synthesis for avatar-based telepresence systems. In: ACM SIGMM 2004 workshop on effective telepresence (ETP 2004). ACM Press, New York, pp 24–30

    Chapter  Google Scholar 

  • Foster ME (2007) Comparing rule-based and data-driven selection of facial displays. In: Workshop on embodied language processing, association for computational linguistics, Prague, pp 1–8

    Google Scholar 

  • Graf HP, Cosatto E, Strom V, Huang FJ (2002) Visual prosody: facial movements accompanying speech. In: Proceedings of IEEE international conference on automatic faces and gesture recognition, Washington, DC, pp 396–401

    Google Scholar 

  • Gratch J, Okhmatovskaia A, Lamothe F, Marsella S, Morales M, van der Werf R, Morency L (2006) Virtual rapport. In: 6th international conference on intelligent virtual agents (IVA 2006), Marina del Rey

    Google Scholar 

  • Hadar U, Steiner TJ, Grant EC, Rose FC (1983) Kinematics of head movements accompanying speech during conversation. Hum Mov Sci 2(1):35–46

    Article  Google Scholar 

  • Heylen D (2005) Challenges ahead head movements and other social acts in conversation. In: Artificial intelligence and simulation of behaviour (AISB 2005), social presence cues for virtual humanoids symposium, page 8, Hertfordshire

    Google Scholar 

  • Huang L, Morency L-P, Gratch J (2010) Parasocial consensus sampling: combining multiple perspectives to learn virtual human behavior. In: Proceedings of the 9th international conference on autonomous agents and multiagent systems: volume 1-volume 1, Toronto, pp 1265–1272

    Google Scholar 

  • Huang L, Morency L-P, Gratch J (2011) Virtual rapport 2.0. In: Intelligent virtual agents, Reykjavik, pp 68–79

    Google Scholar 

  • Ishi CT, Ishiguro H, Hagita N (2014) Analysis of relationship between head motion events and speech in dialogue conversations. Speech Commun 57:233–243

    Article  Google Scholar 

  • Jakkam A, Busso C (2016) A multimodal analysis of synchrony during dyadic interaction using a metric based on sequential pattern mining. In: IEEE international conference on acoustics, speech and signal processing (ICASSP 2016), Shanghai, pp 6085–6089

    Google Scholar 

  • Kipp M (2003) Gesture generation by imitation: from human behavior to computer character animation. PhD thesis, Universität des Saarlandes, Saarbrücken

    Google Scholar 

  • Kopp S, Krenn B, Marsella S, Marshall AN, Pelachaud C, Pirker H, Thórisson KR, Vilhjálmsson H (2006) Towards a common framework for multimodal generation: the behavior markup language. In: International conference on intelligent virtual agents (IVA 2006), Marina Del Rey, pp 205–217

    Google Scholar 

  • Kuratate T, Munhall KG, Rubin PE, Vatikiotis-Bateson E, Yehia H (1999) Audio-visual synthesis of talking faces from speech production correlates. In: Sixth European conference on speech communication and technology, Eurospeech 1999, Budapest, pp 1279–1282

    Google Scholar 

  • Lance B, Marsella SC (2007) Emotionally expressive head and body movement during gaze shifts. In: Intelligent virtual agents, Paris, pp 72–85

    Google Scholar 

  • Le BH, Ma X, Deng Z (2012) Live speech driven head-and-eye motion generators. IEEE Trans Vis Comput Graph 18(11):1902–1914

    Article  Google Scholar 

  • Lee J, Marsella S (2006) Nonverbal behavior generator for embodied conversational agents. Intell Virtual Agents 4133:243–255

    Google Scholar 

  • Lee JJ, Marsella S (2009) Learning a model of speaker head nods using gesture corpora. In: Proceedings of the 8th international conference on autonomous agents and multiagent systems-volume 1, volume 1, Budapest, pp 289–296

    Google Scholar 

  • Lester J, Stone B, Stelling G (1999) Lifelike pedagogical agents for mixed-initiative problem solving in constructivist learning environments. User Model User-Adap Inter 9(1–2):1–44

    Article  Google Scholar 

  • Levine S, Krähenbühl P, Thrun S, Koltun V (2010) Gesture controllers. ACM Trans Graph 29(4):1–124

    Article  Google Scholar 

  • Liu C, Ishi CT, Ishiguro H, Hagita N (2012) Generation of nodding, head tilting and eye gazing for human-robot dialogue interaction. In: Human-Robot interaction (HRI), 2012 7th ACM/IEEE international conference on, Boston, pp 285–292

    Google Scholar 

  • Mariooryad S, Busso C (2012) Generating human-like behaviors using joint, speech-driven models for conversational agents. IEEE Trans Audio, Speech Lang Process 20(8):2329–2340

    Article  Google Scholar 

  • Mariooryad S, Busso C (2013) Exploring cross-modality affective reactions for audiovisual emotion recognition. IEEE Trans Affect Comput 4(2):183–196

    Article  Google Scholar 

  • Marsella S, Xu Y, Lhommet M, Feng A, Scherer S, Shapiro A (2013) Virtual character performance from speech. In ACM SIGGRAPH/Eurographics symposium on computer animation (SCA 2013), Anaheim, pp 25–35

    Google Scholar 

  • Marsi E, van Rooden F (2007) Expressing uncertainty with a talking head. In: Workshop on multimodal output generation (MOG 2007), Aberdeen, pp 105–116

    Google Scholar 

  • McClave EZ (2000) Linguistic functions of head movements in the context of speech. J Pragmat 32(7):855–878

    Article  Google Scholar 

  • Moubayed SA, Beskow J, Granström B, House D (2010) Audio-visual prosody: perception, detection, and synthesis of prominence. In: COST 2102 training school, pp 55–71

    Google Scholar 

  • Munhall KG, Jones JA, Callan DE, Kuratate T, Vatikiotis-Bateson E (2004) Visual prosody and speech intelligibility: head movement improves auditory speech perception. Psychol Sci 15(2):133–137

    Article  Google Scholar 

  • Pelachaud C, Badler N, Steedman M (1996) Generating facial expressions for speech. Cognit Sci 20(1):1–46

    Article  Google Scholar 

  • Poggi I, Pelachaud C, de Rosis F, Carofiglio V, de Carolis B (2005) Greta. a believable embodied conversational agent. In: Stock O, Zancanaro M (eds) Multimodal intelligent information presentation, Text, speech and language technology. Springer Netherlands, Dordrecht, pp 3–25

    Chapter  Google Scholar 

  • Rickel J, Johnson WL (1998) Task-oriented dialogs with animated agents in virtual reality. In: Workshop on embodied conversational characters, Tahoe City, pp 39–46

    Google Scholar 

  • Sadoughi N, Busso C (2015) Retrieving target gestures toward speech driven animation with meaningful behaviors. In: International conference on Multimodal interaction (ICMI 2015), Seattle, pp 115–122

    Google Scholar 

  • Sadoughi N, Busso C (2016) Head motion generation with synthetic speech: a data driven approach. In: Interspeech 2016, San Francisco, pp 52–56

    Google Scholar 

  • Sadoughi N, Liu Y, Busso C (2014) Speech-driven animation constrained by appropriate discourse functions. In: International conference on multimodal interaction (ICMI 2014), Istanbul, pp 148–155

    Google Scholar 

  • Sadoughi N, Liu Y, Busso C (2015) MSP-AVATAR corpus: motion capture recordings to study the role of discourse functions in the design of intelligent virtual agents. In: 1st international workshop on understanding human activities through 3D sensors (UHA3DS 2015), Ljubljana

    Google Scholar 

  • Sargin ME, Yemez Y, Erzin E, Tekalp AM (2008) Analysis of head gesture and prosody patterns for prosody-driven head-gesture animation. IEEE Trans Pattern Anal Mach Intell 30(8):1330–1345

    Article  Google Scholar 

  • Silverman K, Beckman M, Pitrelli J, Ostendorf M, Wightman C, Price P, Pierrehumbert J, Hirschberg J (1992) ToBI: a standard for labelling english prosody. In: 2th international conference on spoken language processing (ICSLP 1992), Banff, pp 867–870

    Google Scholar 

  • Smid K, Pandzic I, Radman V (2004) Autonomous speaker agent. In: IEEE 17th international conference on computer animation and social agents (CASA 2004), Geneva, pp 259–266

    Google Scholar 

  • Stone M, DeCarlo D, Oh I, Rodriguez C, Stere A, Lees A, Bregler C (2004) Speaking with hands: creating animated conversational characters from recordings of human performance. ACM Trans Graph (TOG) 23(3):506–513

    Article  Google Scholar 

  • Taylor GW, Hinton GE (2009) Factored conditional restricted Boltzmann machines for modeling motion style. In: Proceedings of the 26th annual international conference on machine learning, Montreal, pp 1025–1032

    Google Scholar 

  • Taylor GW, Hinton GE, Roweis ST (2006) Modeling human motion using binary latent variables. Adv Neural Inf Process Syst 1345–1352

    Google Scholar 

  • Welbergen H, Ding Y, Sattler K, Pelachaud C, Kopp S (2015) Real-time visual prosody for interactive virtual agents. In: Intelligent virtual agents, Delft, pp 139–151

    Google Scholar 

  • Xiao B, Georgiou P, Baucom B, Narayanan S (2015) Modeling head motion entrainment for prediction of couples’ behavioral characteristics. In: Affective computing and intelligent interaction (ACII), 2015 international conference on, Xi’an, pp 91–97

    Google Scholar 

  • Youssef AB, Shimodaira H, Braude DA (2013) Head motion analysis and synthesis over different tasks. Intell Virtual Agents 8108:285–294

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Najmeh Sadoughi or Carlos Busso .

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Sadoughi, N., Busso, C. (2018). Head Motion Generation. In: Handbook of Human Motion. Springer, Cham. https://doi.org/10.1007/978-3-319-14418-4_4

Download citation

Publish with us

Policies and ethics