Abstract
In this work we elaborate on a novel image-based system for creating video-realistic eye animations to arbitrary spoken output. These animations are useful to give a face to multimedia applications such as virtual operators in dialog systems. Our eye animation system consists of two parts: eye control unit and rendering engine, which synthesizes eye animations by combining 3D and image-based models. The designed eye control unit is based on eye movement physiology and the statistical analysis of recorded human subjects. As already analyzed in previous publications, eye movements vary while listening and talking. We focus on the latter and are the first to design a new model which fully automatically couples eye blinks and movements with phonetic and prosodic information extracted from spoken language. We extended the already known simple gaze model by refining mutual gaze to better model human eye movements. Furthermore, we improved the eye movement models by considering head tilts, torsion, and eyelid movements. Mainly due to our integrated blink and gaze model and to the control of eye movements based on spoken language, subjective tests indicate that participants are not able to distinguish between real eye motions and our animations, which has not been achieved before.
Similar content being viewed by others
References
Ostermann, J., Weissenfeld, A.: Talking faces—technologies and applications. In: ICPR’04: Proceedings of the Pattern Recognition, vol. 3, pp. 826–833 (2004)
Pighin, F., Hecker, J., Lischinski, D., Szeliski, R., Salesin, D.H.: Synthesizing realistic facial expressions from photographs. Comput. Graph. 32, 75–84 (1998)
Parke, F.I.: Computer generated animation of faces. In: ACM’72: Proceedings of the ACM Annual Conference, pp. 451–457 (1972)
Bregler, C., Covell, M., Slaney, M.: Video rewrite: driving visual speech with audio. In: Proc. ACM SIGGRAPH’97, in Computer Graphics Proceedings, Annual Conference Series (1997)
Ezzat, T., Geiger, G., Poggio, T.: Trainable videorealistic speech animation. In: Proc. ACM SIGGRAPH, pp. 388–397 (2002)
Cosatto, E., Graf, H.P.: Photo-realistic talking heads from image samples. IEEE Trans. Multimedia 2(3), 152–163 (2000)
Cassell, J., Torres, O.: Turn taking vs. discourse structure: how best to model multimodal conversation. In: Wilks, Y. (ed.) Machine Conversations. Kluwer Academic, The Hague (1998)
Cassell, J., Pelachaud, C., Badler, N., Steedman, M., Achorn, B., Becket, T., Douville, B., Prevost, S., Stone, M.: Animated conversation: rule-based generation of facial expression, gesture and spoken intonation for multiple conversational agents. Comput. Graph. 28, 413–420 (1994)
Colburn, A., Cohen, M., Drucker, S.: The role of eye gaze in avatar mediated conversational interfaces MSR-TR-2000-81. Microsoft Research (2000)
Heylen, D., van Es, I., van Dijk, E.M.A.G., Nijholt, A.: Experimenting with the Gaze of a Conversational Agent, Natural, Intelligent and Effective Interaction in Multimodal Dialogue Systems. Kluwer Academic, Dordrecht (2005)
Poggi, I., Pelachaud, C., de Rosis, F.: Eye communication in a conversational 3D synthetic agent. AI Commun. 13(3), 169–182 (2000)
Fukayama, A., Ohno, T., Mukawa, N., Sawaki, M., Hagita, N.: Messages embedded in gaze of interface agents—impression management with agent’s gaze. In: CHI’02: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 41–48 (2002)
Garau, M., Slater, M., Bee, S., Sasse, M.A.: The impact of eye gaze on communication using humanoid avatars. In: CHI’01: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 309–316 (2001)
Deng, Zh., Lewis, J.P., Neumann, U.: Automated eye motion using texture synthesis. IEEE Comput. Graph. Appl. 25, 24–30 (2005)
Park Lee, S., Badler, J.B., Badler, N.I.: Eyes alive. In: SIGGRAPH’02: Proceedings of the 29th Annual Conference on Computer Graphics and Interactive Techniques, pp. 637–644 (2002)
Freedman, E.G., Sparks, D.L.: Coordination of the eyes and head: movement kinematics. Exp. Brain Res. 131, 22–32 (2000)
Maand, X., Deng, Z.: Natural eye motion synthesis by modeling gaze-head coupling. In: VR’09: Proceedings of the 2009 IEEE Virtual Reality Conference, pp. 143–150 (2009)
Cosatto, E.: Sample-based talking-head synthesis. PhD thesis, Signal Processing Lab, Swiss Federal Institute of Technology, Lausanne, Switzerland, 2002
Argyle, M., Cook, M.: Gaze and Mutual Gaze. Cambridge University Press, Cambridge (1976)
Masuko, S., Hoshino, J.: Generating head-eye movement for virtual actor. Syst. Comput. Jpn. 37(12), 33–44 (2006)
von Cranach, M., Schmid, R., Vogel, M.W.: Über einige Bedingungen des Zusammenhanges von Lidschlag und Blickbewegung. Psychol. Forsch. 33, 68–78 (1969)
Condon, W.S., Ogsten, W.D.: A segmentation of behavior. J. Psych. Res., 221–235 (1967)
Kuipers, J.: Quaternions and Rotation Sequences. Princeton University Press, Princeton (1998)
Young, S., Evermann, G., Gales, M., Hain, Th., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: HTK Book. Cambridge University Engineering Department, Cambridge (2005)
Huang, X., Hon, H.-W.: Spoken Language Processing: A Guide to Theory, Algorithm, and System Development. Prentice Hall PTR, New York (2001)
Zheng, J., Franco, H., Weng, F., Sankar, A., Bratt, H.: Word-level rate of speech modeling using rate-specific phones and pronunciations. Proc. ICASSP 3, 1775–1778 (2000)
Terken, J.: Fundamental frequency and perceived prominence of accented syllables. J. Acoust. Soc. Am. 95, 3662–3665 (1994)
Kennedy, L., Ellis, D.: Pitch-based emphasis detection for the characterization of meeting recordings. In: IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2003), pp. 243–248 (2003)
Arons, B.: Pitch-based emphasis detection for segmenting speech recordings. In: Proc. ICSLP’94, pp. 1931–1934 (1994)
LCTechnologies, Eyegaze systems. http://www.eyegaze.com (2007)
Kolmogorov, A.N.: Confidence limits for an unknown distribution function. Ann. Math. Stat. 12, 461–483 (1941)
Aitchison, J., Brown, J.A.C.: The Lognormal Distribution. Cambridge University Press, Cambridge (1973)
Limpert, E., Stahel, W.A., Abbt, M.: Log-normal distributions across the sciences: keys and clues. BioScience 51(5), 341–352 (2001)
Kendon, A.: Some functions of gaze-direction in social interaction. Acta Psychol. 26, 22–63 (1967)
Barnes, G.R.: Vestibulo-ocular function during coordinated head and eye movements to acquire visual targets. J. Physiol. 287, 127–147 (1979)
Stahl, J.S.: Amplitude of human head movements associated with horizontal saccades. Exp. Brain Res. 126(1), 41–54 (1999)
Donders, F.C.: Beitrag zur Lehre von den Bewegungen des menschlichen Auges. Hollaendis. Beitr. Anat. Physiol. Wiss. 1, 104–145 (1848)
Helmholtz, H.: On the normal movements of the human eye. Arch. Ophthalmol. IX, 153–214 (1863)
Haslwanter, T.: Mathematics of three-dimensional eye rotations. Vis. Res. 35, 1727–1739 (1995)
Schworm, H.D., Ygge, J., Pansell, T., Lennerstrand, G.: Assessment of ocular counterroll during head tilt using binocular video oculography. Invest. Ophthalmol. Vis. Sci. 43(3), 662–667 (2002)
ITU Telecom: Standardization Sector of ITU, Methodology for the Subjective Assessment of the Quality of Television Pictures. Recommendation ITU-R BT.500-11 (2002)
ITU International Telecom: Union, Telecom. sector, Subjective video quality assessment methods for multimedia applications. Recommendation ITU-T P.910 (1999)
Author information
Authors and Affiliations
Corresponding author
Electronic Supplementary Material
VideoObject. (MPG 19,268 KB)
VideoObject. (MPG 5,980 KB)
VideoObject. (MPG 18,020 KB)
Rights and permissions
About this article
Cite this article
Weissenfeld, A., Liu, K. & Ostermann, J. Video-realistic image-based eye animation via statistically driven state machines. Vis Comput 26, 1201–1216 (2010). https://doi.org/10.1007/s00371-009-0401-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-009-0401-x