Head Motion Generation

Sadoughi, Najmeh; Busso, Carlos

doi:10.1007/978-3-319-14418-4_4

Najmeh Sadoughi⁸ &
Carlos Busso⁸

816 Accesses
3 Citations

Abstract

Head movement is an important part of body language. Head motion plays a role in communicating lexical and syntactic information. It conveys emotional and personality traits. It plays an important role in acknowledging active listening. Given these communicative functions, it is important to synthesize Conversation Agents (CAs) with meaningful human-like head motion sequences, which are timely synchronized with speech. There are several studies that have focused on synthesizing head movements. Most studies can be categorized as rule-based or data-driven frameworks. On the one hand, rule-based methods define rules that map semantic labels or communicative goals to specific head motion sequences, which are appropriate for the underlying message (e.g., nodding for affirmation). However, the range of head motion sequences that are generated by these systems are usually limited, resulting in repetitive behaviors. On the other hand, data-driven methods rely on recorded head motion sequences which are used either to concatenate existing sequences creating new realizations of head movements or to build statistical frameworks that are able to synthesize novel realizations of head motion behaviors. Due to the strong correlation between head movements and speech prosody, these approaches usually rely on speech to drive the head movements. These methods can capture a broader range of movements displayed during human interaction. However, even when the generated head movements may be tightly synchronized with speech, they may not convey the underlying discourse function or intention in the message. The advantages of rule-based and data-driven methods have inspired several studies to create hybrid methods that overcome the aforementioned limitations. These studies have been proposed to generate the movements using parametric or nonparametric approaches, constraining the models not only on speech, but also on the semantic content. This chapter reviews most influential frameworks to generate head motion. It also discusses open challenges that can move this research area forward.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

André E, Müller J, Rist T (1996) The PPP persona: a multipurpose animated presentation agent. In: Workshop on advanced visual interfaces, Gubbio, pp 245–247
Google Scholar
Arellano D, Varona J, Perales FJ, Bee N, Janowski K, André EE (2011) Influence of head orientation in perception of personality traits in virtual agents. In: The 10th international conference on autonomous agents and multiagent systems-Volume 3, Taipei, pp 1093–1094
Google Scholar
Arya A, Jefferies L, Enns J, DiPaola S (2006) Facial actions as visual cues for personality. Comput Anim Virtual Worlds 17(3–4):371–382
Article Google Scholar
Bell L, Gustafson J, Heldner M (2003) Prosodic adaptation in human-computer interaction. In: 15th international congress of phonetic sciences (ICPhS 03), Barcelona, pp 2453–2456
Google Scholar
Beskow J, McGlashan S (1997) Olga – a conversational agent with gestures. In: Proceedings of the IJCAI 1997 workshop on animated interface agents: making them intelligent, Nagoya
Google Scholar
Breazeal C (2002) Regulation and entrainment in human-robot interaction. Int J Robot Res 21(10–11):883–902
Article Google Scholar
Busso C, Narayanan S (2007) Interrelation between speech and facial gestures in emotional utterances: a single subject study. IEEE Trans Audio, Speech Lang Process 15(8):2331–2347
Article Google Scholar
Busso C, Deng Z, Neumann U, Narayanan S (2005) Natural head motion synthesis driven by acoustic prosodic features. Comput Anim Virtual Worlds 16(3–4):283–290
Article Google Scholar
Busso C, Deng Z, Grimm M, Neumann U, Narayanan S (2007a) Rigid head motion in expressive speech animation: analysis and synthesis. IEEE Trans Audio, Speech Lang Process 15(3):1075–1086
Article Google Scholar
Busso C, Deng Z, Neumann U, Narayanan S (2007b) Learning expressive human-like head motion sequences from speech. In: Deng Z, Neumann U (eds) Data-driven 3D facial animations. Springer-Verlag London Ltd, Surrey, pp 113–131
Chapter Google Scholar
Busso C, Bulut M, Lee C, Kazemzadeh A, Mower E, Kim S, Chang J, Lee S, Narayanan S (2008) IEMOCAP: Interactive emotional dyadic motion capture database. J Lang Resour Eval 42(4):335–359
Article Google Scholar
Cassell J, Pelachaud C, Badler N, Steedman M, Achorn B, Bechet T, Douville B, Prevost S, Stone M (1994) Animated conversation: rule-based generation of facial expression gesture and spoken intonation for multiple conversational agents. In: Computer graphics (Proc. of ACM SIGGRAPH’94), Orlando, pp 413–420
Google Scholar
Cassell J, Bickmore T, Billinghurst M, Campbell L, Chang K, Vilhjalmsson H, Yan H (1999) Embodiment in conversational interfaces: Rea. In: International conference on human factors in computing systems (CHI-99), Pittsburgh, pp 520–527
Google Scholar
Chiu C-C, Marsella S (2011) How to train your avatar: a data driven approach to gesture generation. In: Intelligent virtual agents, Reykjavik, pp 127–140
Google Scholar
Chiu C-C, Morency L-P, Marsella S (2015) Predicting co-verbal gestures: a deep and temporal modeling approach. In: Intelligent virtual agents, Delft, pp 152–166
Google Scholar
Chuang E, Bregler C (2005) Mood swings: expressive speech animation. ACM Trans Graph 24(2):331–347
Article Google Scholar
DeCarlo D, Stone M, Revilla C, Venditti JJ (2004) Specifying and animating facial signals for discourse in embodied conversational agents. Comput Anim Virtual Worlds 15(1):27–38
Article Google Scholar
Deng Z, Busso C, Narayanan S, Neumann U (2004) Audio-based head motion synthesis for avatar-based telepresence systems. In: ACM SIGMM 2004 workshop on effective telepresence (ETP 2004). ACM Press, New York, pp 24–30
Chapter Google Scholar
Foster ME (2007) Comparing rule-based and data-driven selection of facial displays. In: Workshop on embodied language processing, association for computational linguistics, Prague, pp 1–8
Google Scholar
Graf HP, Cosatto E, Strom V, Huang FJ (2002) Visual prosody: facial movements accompanying speech. In: Proceedings of IEEE international conference on automatic faces and gesture recognition, Washington, DC, pp 396–401
Google Scholar
Gratch J, Okhmatovskaia A, Lamothe F, Marsella S, Morales M, van der Werf R, Morency L (2006) Virtual rapport. In: 6th international conference on intelligent virtual agents (IVA 2006), Marina del Rey
Google Scholar
Hadar U, Steiner TJ, Grant EC, Rose FC (1983) Kinematics of head movements accompanying speech during conversation. Hum Mov Sci 2(1):35–46
Article Google Scholar
Heylen D (2005) Challenges ahead head movements and other social acts in conversation. In: Artificial intelligence and simulation of behaviour (AISB 2005), social presence cues for virtual humanoids symposium, page 8, Hertfordshire
Google Scholar
Huang L, Morency L-P, Gratch J (2010) Parasocial consensus sampling: combining multiple perspectives to learn virtual human behavior. In: Proceedings of the 9th international conference on autonomous agents and multiagent systems: volume 1-volume 1, Toronto, pp 1265–1272
Google Scholar
Huang L, Morency L-P, Gratch J (2011) Virtual rapport 2.0. In: Intelligent virtual agents, Reykjavik, pp 68–79
Google Scholar
Ishi CT, Ishiguro H, Hagita N (2014) Analysis of relationship between head motion events and speech in dialogue conversations. Speech Commun 57:233–243
Article Google Scholar
Jakkam A, Busso C (2016) A multimodal analysis of synchrony during dyadic interaction using a metric based on sequential pattern mining. In: IEEE international conference on acoustics, speech and signal processing (ICASSP 2016), Shanghai, pp 6085–6089
Google Scholar
Kipp M (2003) Gesture generation by imitation: from human behavior to computer character animation. PhD thesis, Universität des Saarlandes, Saarbrücken
Google Scholar
Kopp S, Krenn B, Marsella S, Marshall AN, Pelachaud C, Pirker H, Thórisson KR, Vilhjálmsson H (2006) Towards a common framework for multimodal generation: the behavior markup language. In: International conference on intelligent virtual agents (IVA 2006), Marina Del Rey, pp 205–217
Google Scholar
Kuratate T, Munhall KG, Rubin PE, Vatikiotis-Bateson E, Yehia H (1999) Audio-visual synthesis of talking faces from speech production correlates. In: Sixth European conference on speech communication and technology, Eurospeech 1999, Budapest, pp 1279–1282
Google Scholar
Lance B, Marsella SC (2007) Emotionally expressive head and body movement during gaze shifts. In: Intelligent virtual agents, Paris, pp 72–85
Google Scholar
Le BH, Ma X, Deng Z (2012) Live speech driven head-and-eye motion generators. IEEE Trans Vis Comput Graph 18(11):1902–1914
Article Google Scholar
Lee J, Marsella S (2006) Nonverbal behavior generator for embodied conversational agents. Intell Virtual Agents 4133:243–255
Google Scholar
Lee JJ, Marsella S (2009) Learning a model of speaker head nods using gesture corpora. In: Proceedings of the 8th international conference on autonomous agents and multiagent systems-volume 1, volume 1, Budapest, pp 289–296
Google Scholar
Lester J, Stone B, Stelling G (1999) Lifelike pedagogical agents for mixed-initiative problem solving in constructivist learning environments. User Model User-Adap Inter 9(1–2):1–44
Article Google Scholar
Levine S, Krähenbühl P, Thrun S, Koltun V (2010) Gesture controllers. ACM Trans Graph 29(4):1–124
Article Google Scholar
Liu C, Ishi CT, Ishiguro H, Hagita N (2012) Generation of nodding, head tilting and eye gazing for human-robot dialogue interaction. In: Human-Robot interaction (HRI), 2012 7th ACM/IEEE international conference on, Boston, pp 285–292
Google Scholar
Mariooryad S, Busso C (2012) Generating human-like behaviors using joint, speech-driven models for conversational agents. IEEE Trans Audio, Speech Lang Process 20(8):2329–2340
Article Google Scholar
Mariooryad S, Busso C (2013) Exploring cross-modality affective reactions for audiovisual emotion recognition. IEEE Trans Affect Comput 4(2):183–196
Article Google Scholar
Marsella S, Xu Y, Lhommet M, Feng A, Scherer S, Shapiro A (2013) Virtual character performance from speech. In ACM SIGGRAPH/Eurographics symposium on computer animation (SCA 2013), Anaheim, pp 25–35
Google Scholar
Marsi E, van Rooden F (2007) Expressing uncertainty with a talking head. In: Workshop on multimodal output generation (MOG 2007), Aberdeen, pp 105–116
Google Scholar
McClave EZ (2000) Linguistic functions of head movements in the context of speech. J Pragmat 32(7):855–878
Article Google Scholar
Moubayed SA, Beskow J, Granström B, House D (2010) Audio-visual prosody: perception, detection, and synthesis of prominence. In: COST 2102 training school, pp 55–71
Google Scholar
Munhall KG, Jones JA, Callan DE, Kuratate T, Vatikiotis-Bateson E (2004) Visual prosody and speech intelligibility: head movement improves auditory speech perception. Psychol Sci 15(2):133–137
Article Google Scholar
Pelachaud C, Badler N, Steedman M (1996) Generating facial expressions for speech. Cognit Sci 20(1):1–46
Article Google Scholar
Poggi I, Pelachaud C, de Rosis F, Carofiglio V, de Carolis B (2005) Greta. a believable embodied conversational agent. In: Stock O, Zancanaro M (eds) Multimodal intelligent information presentation, Text, speech and language technology. Springer Netherlands, Dordrecht, pp 3–25
Chapter Google Scholar
Rickel J, Johnson WL (1998) Task-oriented dialogs with animated agents in virtual reality. In: Workshop on embodied conversational characters, Tahoe City, pp 39–46
Google Scholar
Sadoughi N, Busso C (2015) Retrieving target gestures toward speech driven animation with meaningful behaviors. In: International conference on Multimodal interaction (ICMI 2015), Seattle, pp 115–122
Google Scholar
Sadoughi N, Busso C (2016) Head motion generation with synthetic speech: a data driven approach. In: Interspeech 2016, San Francisco, pp 52–56
Google Scholar
Sadoughi N, Liu Y, Busso C (2014) Speech-driven animation constrained by appropriate discourse functions. In: International conference on multimodal interaction (ICMI 2014), Istanbul, pp 148–155
Google Scholar
Sadoughi N, Liu Y, Busso C (2015) MSP-AVATAR corpus: motion capture recordings to study the role of discourse functions in the design of intelligent virtual agents. In: 1st international workshop on understanding human activities through 3D sensors (UHA3DS 2015), Ljubljana
Google Scholar
Sargin ME, Yemez Y, Erzin E, Tekalp AM (2008) Analysis of head gesture and prosody patterns for prosody-driven head-gesture animation. IEEE Trans Pattern Anal Mach Intell 30(8):1330–1345
Article Google Scholar
Silverman K, Beckman M, Pitrelli J, Ostendorf M, Wightman C, Price P, Pierrehumbert J, Hirschberg J (1992) ToBI: a standard for labelling english prosody. In: 2th international conference on spoken language processing (ICSLP 1992), Banff, pp 867–870
Google Scholar
Smid K, Pandzic I, Radman V (2004) Autonomous speaker agent. In: IEEE 17th international conference on computer animation and social agents (CASA 2004), Geneva, pp 259–266
Google Scholar
Stone M, DeCarlo D, Oh I, Rodriguez C, Stere A, Lees A, Bregler C (2004) Speaking with hands: creating animated conversational characters from recordings of human performance. ACM Trans Graph (TOG) 23(3):506–513
Article Google Scholar
Taylor GW, Hinton GE (2009) Factored conditional restricted Boltzmann machines for modeling motion style. In: Proceedings of the 26th annual international conference on machine learning, Montreal, pp 1025–1032
Google Scholar
Taylor GW, Hinton GE, Roweis ST (2006) Modeling human motion using binary latent variables. Adv Neural Inf Process Syst 1345–1352
Google Scholar
Welbergen H, Ding Y, Sattler K, Pelachaud C, Kopp S (2015) Real-time visual prosody for interactive virtual agents. In: Intelligent virtual agents, Delft, pp 139–151
Google Scholar
Xiao B, Georgiou P, Baucom B, Narayanan S (2015) Modeling head motion entrainment for prediction of couples’ behavioral characteristics. In: Affective computing and intelligent interaction (ACII), 2015 international conference on, Xi’an, pp 91–97
Google Scholar
Youssef AB, Shimodaira H, Braude DA (2013) Head motion analysis and synthesis over different tasks. Intell Virtual Agents 8108:285–294
Article Google Scholar

Download references

Author information

Authors and Affiliations

Multimodal signal Processing Lab, University of Texas at Dallas, Dallas, TX, USA
Najmeh Sadoughi & Carlos Busso

Authors

Najmeh Sadoughi
View author publications
You can also search for this author in PubMed Google Scholar
Carlos Busso
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Najmeh Sadoughi or Carlos Busso .

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Sadoughi, N., Busso, C. (2018). Head Motion Generation. In: Handbook of Human Motion. Springer, Cham. https://doi.org/10.1007/978-3-319-14418-4_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-14418-4_4
Published: 05 April 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14417-7
Online ISBN: 978-3-319-14418-4
eBook Packages: EngineeringReference Module Computer Science and Engineering

Publish with us

Policies and ethics