Abstract
Head movement is an important part of body language. Head motion plays a role in communicating lexical and syntactic information. It conveys emotional and personality traits. It plays an important role in acknowledging active listening. Given these communicative functions, it is important to synthesize Conversation Agents (CAs) with meaningful human-like head motion sequences, which are timely synchronized with speech. There are several studies that have focused on synthesizing head movements. Most studies can be categorized as rule-based or data-driven frameworks. On the one hand, rule-based methods define rules that map semantic labels or communicative goals to specific head motion sequences, which are appropriate for the underlying message (e.g., nodding for affirmation). However, the range of head motion sequences that are generated by these systems are usually limited, resulting in repetitive behaviors. On the other hand, data-driven methods rely on recorded head motion sequences which are used either to concatenate existing sequences creating new realizations of head movements or to build statistical frameworks that are able to synthesize novel realizations of head motion behaviors. Due to the strong correlation between head movements and speech prosody, these approaches usually rely on speech to drive the head movements. These methods can capture a broader range of movements displayed during human interaction. However, even when the generated head movements may be tightly synchronized with speech, they may not convey the underlying discourse function or intention in the message. The advantages of rule-based and data-driven methods have inspired several studies to create hybrid methods that overcome the aforementioned limitations. These studies have been proposed to generate the movements using parametric or nonparametric approaches, constraining the models not only on speech, but also on the semantic content. This chapter reviews most influential frameworks to generate head motion. It also discusses open challenges that can move this research area forward.
Keywords
This is a preview of subscription content, log in via an institution.
References
André E, Müller J, Rist T (1996) The PPP persona: a multipurpose animated presentation agent. In: Workshop on advanced visual interfaces, Gubbio, pp 245–247
Arellano D, Varona J, Perales FJ, Bee N, Janowski K, André EE (2011) Influence of head orientation in perception of personality traits in virtual agents. In: The 10th international conference on autonomous agents and multiagent systems-Volume 3, Taipei, pp 1093–1094
Arya A, Jefferies L, Enns J, DiPaola S (2006) Facial actions as visual cues for personality. Comput Anim Virtual Worlds 17(3–4):371–382
Bell L, Gustafson J, Heldner M (2003) Prosodic adaptation in human-computer interaction. In: 15th international congress of phonetic sciences (ICPhS 03), Barcelona, pp 2453–2456
Beskow J, McGlashan S (1997) Olga – a conversational agent with gestures. In: Proceedings of the IJCAI 1997 workshop on animated interface agents: making them intelligent, Nagoya
Breazeal C (2002) Regulation and entrainment in human-robot interaction. Int J Robot Res 21(10–11):883–902
Busso C, Narayanan S (2007) Interrelation between speech and facial gestures in emotional utterances: a single subject study. IEEE Trans Audio, Speech Lang Process 15(8):2331–2347
Busso C, Deng Z, Neumann U, Narayanan S (2005) Natural head motion synthesis driven by acoustic prosodic features. Comput Anim Virtual Worlds 16(3–4):283–290
Busso C, Deng Z, Grimm M, Neumann U, Narayanan S (2007a) Rigid head motion in expressive speech animation: analysis and synthesis. IEEE Trans Audio, Speech Lang Process 15(3):1075–1086
Busso C, Deng Z, Neumann U, Narayanan S (2007b) Learning expressive human-like head motion sequences from speech. In: Deng Z, Neumann U (eds) Data-driven 3D facial animations. Springer-Verlag London Ltd, Surrey, pp 113–131
Busso C, Bulut M, Lee C, Kazemzadeh A, Mower E, Kim S, Chang J, Lee S, Narayanan S (2008) IEMOCAP: Interactive emotional dyadic motion capture database. J Lang Resour Eval 42(4):335–359
Cassell J, Pelachaud C, Badler N, Steedman M, Achorn B, Bechet T, Douville B, Prevost S, Stone M (1994) Animated conversation: rule-based generation of facial expression gesture and spoken intonation for multiple conversational agents. In: Computer graphics (Proc. of ACM SIGGRAPH’94), Orlando, pp 413–420
Cassell J, Bickmore T, Billinghurst M, Campbell L, Chang K, Vilhjalmsson H, Yan H (1999) Embodiment in conversational interfaces: Rea. In: International conference on human factors in computing systems (CHI-99), Pittsburgh, pp 520–527
Chiu C-C, Marsella S (2011) How to train your avatar: a data driven approach to gesture generation. In: Intelligent virtual agents, Reykjavik, pp 127–140
Chiu C-C, Morency L-P, Marsella S (2015) Predicting co-verbal gestures: a deep and temporal modeling approach. In: Intelligent virtual agents, Delft, pp 152–166
Chuang E, Bregler C (2005) Mood swings: expressive speech animation. ACM Trans Graph 24(2):331–347
DeCarlo D, Stone M, Revilla C, Venditti JJ (2004) Specifying and animating facial signals for discourse in embodied conversational agents. Comput Anim Virtual Worlds 15(1):27–38
Deng Z, Busso C, Narayanan S, Neumann U (2004) Audio-based head motion synthesis for avatar-based telepresence systems. In: ACM SIGMM 2004 workshop on effective telepresence (ETP 2004). ACM Press, New York, pp 24–30
Foster ME (2007) Comparing rule-based and data-driven selection of facial displays. In: Workshop on embodied language processing, association for computational linguistics, Prague, pp 1–8
Graf HP, Cosatto E, Strom V, Huang FJ (2002) Visual prosody: facial movements accompanying speech. In: Proceedings of IEEE international conference on automatic faces and gesture recognition, Washington, DC, pp 396–401
Gratch J, Okhmatovskaia A, Lamothe F, Marsella S, Morales M, van der Werf R, Morency L (2006) Virtual rapport. In: 6th international conference on intelligent virtual agents (IVA 2006), Marina del Rey
Hadar U, Steiner TJ, Grant EC, Rose FC (1983) Kinematics of head movements accompanying speech during conversation. Hum Mov Sci 2(1):35–46
Heylen D (2005) Challenges ahead head movements and other social acts in conversation. In: Artificial intelligence and simulation of behaviour (AISB 2005), social presence cues for virtual humanoids symposium, page 8, Hertfordshire
Huang L, Morency L-P, Gratch J (2010) Parasocial consensus sampling: combining multiple perspectives to learn virtual human behavior. In: Proceedings of the 9th international conference on autonomous agents and multiagent systems: volume 1-volume 1, Toronto, pp 1265–1272
Huang L, Morency L-P, Gratch J (2011) Virtual rapport 2.0. In: Intelligent virtual agents, Reykjavik, pp 68–79
Ishi CT, Ishiguro H, Hagita N (2014) Analysis of relationship between head motion events and speech in dialogue conversations. Speech Commun 57:233–243
Jakkam A, Busso C (2016) A multimodal analysis of synchrony during dyadic interaction using a metric based on sequential pattern mining. In: IEEE international conference on acoustics, speech and signal processing (ICASSP 2016), Shanghai, pp 6085–6089
Kipp M (2003) Gesture generation by imitation: from human behavior to computer character animation. PhD thesis, Universität des Saarlandes, Saarbrücken
Kopp S, Krenn B, Marsella S, Marshall AN, Pelachaud C, Pirker H, Thórisson KR, Vilhjálmsson H (2006) Towards a common framework for multimodal generation: the behavior markup language. In: International conference on intelligent virtual agents (IVA 2006), Marina Del Rey, pp 205–217
Kuratate T, Munhall KG, Rubin PE, Vatikiotis-Bateson E, Yehia H (1999) Audio-visual synthesis of talking faces from speech production correlates. In: Sixth European conference on speech communication and technology, Eurospeech 1999, Budapest, pp 1279–1282
Lance B, Marsella SC (2007) Emotionally expressive head and body movement during gaze shifts. In: Intelligent virtual agents, Paris, pp 72–85
Le BH, Ma X, Deng Z (2012) Live speech driven head-and-eye motion generators. IEEE Trans Vis Comput Graph 18(11):1902–1914
Lee J, Marsella S (2006) Nonverbal behavior generator for embodied conversational agents. Intell Virtual Agents 4133:243–255
Lee JJ, Marsella S (2009) Learning a model of speaker head nods using gesture corpora. In: Proceedings of the 8th international conference on autonomous agents and multiagent systems-volume 1, volume 1, Budapest, pp 289–296
Lester J, Stone B, Stelling G (1999) Lifelike pedagogical agents for mixed-initiative problem solving in constructivist learning environments. User Model User-Adap Inter 9(1–2):1–44
Levine S, Krähenbühl P, Thrun S, Koltun V (2010) Gesture controllers. ACM Trans Graph 29(4):1–124
Liu C, Ishi CT, Ishiguro H, Hagita N (2012) Generation of nodding, head tilting and eye gazing for human-robot dialogue interaction. In: Human-Robot interaction (HRI), 2012 7th ACM/IEEE international conference on, Boston, pp 285–292
Mariooryad S, Busso C (2012) Generating human-like behaviors using joint, speech-driven models for conversational agents. IEEE Trans Audio, Speech Lang Process 20(8):2329–2340
Mariooryad S, Busso C (2013) Exploring cross-modality affective reactions for audiovisual emotion recognition. IEEE Trans Affect Comput 4(2):183–196
Marsella S, Xu Y, Lhommet M, Feng A, Scherer S, Shapiro A (2013) Virtual character performance from speech. In ACM SIGGRAPH/Eurographics symposium on computer animation (SCA 2013), Anaheim, pp 25–35
Marsi E, van Rooden F (2007) Expressing uncertainty with a talking head. In: Workshop on multimodal output generation (MOG 2007), Aberdeen, pp 105–116
McClave EZ (2000) Linguistic functions of head movements in the context of speech. J Pragmat 32(7):855–878
Moubayed SA, Beskow J, Granström B, House D (2010) Audio-visual prosody: perception, detection, and synthesis of prominence. In: COST 2102 training school, pp 55–71
Munhall KG, Jones JA, Callan DE, Kuratate T, Vatikiotis-Bateson E (2004) Visual prosody and speech intelligibility: head movement improves auditory speech perception. Psychol Sci 15(2):133–137
Pelachaud C, Badler N, Steedman M (1996) Generating facial expressions for speech. Cognit Sci 20(1):1–46
Poggi I, Pelachaud C, de Rosis F, Carofiglio V, de Carolis B (2005) Greta. a believable embodied conversational agent. In: Stock O, Zancanaro M (eds) Multimodal intelligent information presentation, Text, speech and language technology. Springer Netherlands, Dordrecht, pp 3–25
Rickel J, Johnson WL (1998) Task-oriented dialogs with animated agents in virtual reality. In: Workshop on embodied conversational characters, Tahoe City, pp 39–46
Sadoughi N, Busso C (2015) Retrieving target gestures toward speech driven animation with meaningful behaviors. In: International conference on Multimodal interaction (ICMI 2015), Seattle, pp 115–122
Sadoughi N, Busso C (2016) Head motion generation with synthetic speech: a data driven approach. In: Interspeech 2016, San Francisco, pp 52–56
Sadoughi N, Liu Y, Busso C (2014) Speech-driven animation constrained by appropriate discourse functions. In: International conference on multimodal interaction (ICMI 2014), Istanbul, pp 148–155
Sadoughi N, Liu Y, Busso C (2015) MSP-AVATAR corpus: motion capture recordings to study the role of discourse functions in the design of intelligent virtual agents. In: 1st international workshop on understanding human activities through 3D sensors (UHA3DS 2015), Ljubljana
Sargin ME, Yemez Y, Erzin E, Tekalp AM (2008) Analysis of head gesture and prosody patterns for prosody-driven head-gesture animation. IEEE Trans Pattern Anal Mach Intell 30(8):1330–1345
Silverman K, Beckman M, Pitrelli J, Ostendorf M, Wightman C, Price P, Pierrehumbert J, Hirschberg J (1992) ToBI: a standard for labelling english prosody. In: 2th international conference on spoken language processing (ICSLP 1992), Banff, pp 867–870
Smid K, Pandzic I, Radman V (2004) Autonomous speaker agent. In: IEEE 17th international conference on computer animation and social agents (CASA 2004), Geneva, pp 259–266
Stone M, DeCarlo D, Oh I, Rodriguez C, Stere A, Lees A, Bregler C (2004) Speaking with hands: creating animated conversational characters from recordings of human performance. ACM Trans Graph (TOG) 23(3):506–513
Taylor GW, Hinton GE (2009) Factored conditional restricted Boltzmann machines for modeling motion style. In: Proceedings of the 26th annual international conference on machine learning, Montreal, pp 1025–1032
Taylor GW, Hinton GE, Roweis ST (2006) Modeling human motion using binary latent variables. Adv Neural Inf Process Syst 1345–1352
Welbergen H, Ding Y, Sattler K, Pelachaud C, Kopp S (2015) Real-time visual prosody for interactive virtual agents. In: Intelligent virtual agents, Delft, pp 139–151
Xiao B, Georgiou P, Baucom B, Narayanan S (2015) Modeling head motion entrainment for prediction of couples’ behavioral characteristics. In: Affective computing and intelligent interaction (ACII), 2015 international conference on, Xi’an, pp 91–97
Youssef AB, Shimodaira H, Braude DA (2013) Head motion analysis and synthesis over different tasks. Intell Virtual Agents 8108:285–294
Acknowledgments
This work was funded by National Science Foundation under grant IIS-1352950.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this entry
Cite this entry
Sadoughi, N., Busso, C. (2016). Head Motion Generation. In: Müller, B., et al. Handbook of Human Motion. Springer, Cham. https://doi.org/10.1007/978-3-319-30808-1_4-1
Download citation
DOI: https://doi.org/10.1007/978-3-319-30808-1_4-1
Received:
Accepted:
Published:
Publisher Name: Springer, Cham
Online ISBN: 978-3-319-30808-1
eBook Packages: Springer Reference EngineeringReference Module Computer Science and Engineering