The Visual Computer

, Volume 26, Issue 9, pp 1201–1216 | Cite as

Video-realistic image-based eye animation via statistically driven state machines

  • Axel WeissenfeldEmail author
  • Kang Liu
  • Jörn Ostermann
Original Article


In this work we elaborate on a novel image-based system for creating video-realistic eye animations to arbitrary spoken output. These animations are useful to give a face to multimedia applications such as virtual operators in dialog systems. Our eye animation system consists of two parts: eye control unit and rendering engine, which synthesizes eye animations by combining 3D and image-based models. The designed eye control unit is based on eye movement physiology and the statistical analysis of recorded human subjects. As already analyzed in previous publications, eye movements vary while listening and talking. We focus on the latter and are the first to design a new model which fully automatically couples eye blinks and movements with phonetic and prosodic information extracted from spoken language. We extended the already known simple gaze model by refining mutual gaze to better model human eye movements. Furthermore, we improved the eye movement models by considering head tilts, torsion, and eyelid movements. Mainly due to our integrated blink and gaze model and to the control of eye movements based on spoken language, subjective tests indicate that participants are not able to distinguish between real eye motions and our animations, which has not been achieved before.

Eye animation Talking-heads Sample-based image synthesis Computer vision 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Supplementary material

VideoObject. (MPG 19,268 KB)

VideoObject. (MPG 5,980 KB)

VideoObject. (MPG 18,020 KB)


  1. 1.
    Ostermann, J., Weissenfeld, A.: Talking faces—technologies and applications. In: ICPR’04: Proceedings of the Pattern Recognition, vol. 3, pp. 826–833 (2004) Google Scholar
  2. 2.
    Pighin, F., Hecker, J., Lischinski, D., Szeliski, R., Salesin, D.H.: Synthesizing realistic facial expressions from photographs. Comput. Graph. 32, 75–84 (1998) Google Scholar
  3. 3.
    Parke, F.I.: Computer generated animation of faces. In: ACM’72: Proceedings of the ACM Annual Conference, pp. 451–457 (1972) Google Scholar
  4. 4.
    Bregler, C., Covell, M., Slaney, M.: Video rewrite: driving visual speech with audio. In: Proc. ACM SIGGRAPH’97, in Computer Graphics Proceedings, Annual Conference Series (1997) Google Scholar
  5. 5.
    Ezzat, T., Geiger, G., Poggio, T.: Trainable videorealistic speech animation. In: Proc. ACM SIGGRAPH, pp. 388–397 (2002) Google Scholar
  6. 6.
    Cosatto, E., Graf, H.P.: Photo-realistic talking heads from image samples. IEEE Trans. Multimedia 2(3), 152–163 (2000) CrossRefGoogle Scholar
  7. 7.
    Cassell, J., Torres, O.: Turn taking vs. discourse structure: how best to model multimodal conversation. In: Wilks, Y. (ed.) Machine Conversations. Kluwer Academic, The Hague (1998) Google Scholar
  8. 8.
    Cassell, J., Pelachaud, C., Badler, N., Steedman, M., Achorn, B., Becket, T., Douville, B., Prevost, S., Stone, M.: Animated conversation: rule-based generation of facial expression, gesture and spoken intonation for multiple conversational agents. Comput. Graph. 28, 413–420 (1994) Google Scholar
  9. 9.
    Colburn, A., Cohen, M., Drucker, S.: The role of eye gaze in avatar mediated conversational interfaces MSR-TR-2000-81. Microsoft Research (2000) Google Scholar
  10. 10.
    Heylen, D., van Es, I., van Dijk, E.M.A.G., Nijholt, A.: Experimenting with the Gaze of a Conversational Agent, Natural, Intelligent and Effective Interaction in Multimodal Dialogue Systems. Kluwer Academic, Dordrecht (2005) Google Scholar
  11. 11.
    Poggi, I., Pelachaud, C., de Rosis, F.: Eye communication in a conversational 3D synthetic agent. AI Commun. 13(3), 169–182 (2000) Google Scholar
  12. 12.
    Fukayama, A., Ohno, T., Mukawa, N., Sawaki, M., Hagita, N.: Messages embedded in gaze of interface agents—impression management with agent’s gaze. In: CHI’02: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 41–48 (2002) Google Scholar
  13. 13.
    Garau, M., Slater, M., Bee, S., Sasse, M.A.: The impact of eye gaze on communication using humanoid avatars. In: CHI’01: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 309–316 (2001) Google Scholar
  14. 14.
    Deng, Zh., Lewis, J.P., Neumann, U.: Automated eye motion using texture synthesis. IEEE Comput. Graph. Appl. 25, 24–30 (2005) CrossRefGoogle Scholar
  15. 15.
    Park Lee, S., Badler, J.B., Badler, N.I.: Eyes alive. In: SIGGRAPH’02: Proceedings of the 29th Annual Conference on Computer Graphics and Interactive Techniques, pp. 637–644 (2002) Google Scholar
  16. 16.
    Freedman, E.G., Sparks, D.L.: Coordination of the eyes and head: movement kinematics. Exp. Brain Res. 131, 22–32 (2000) CrossRefGoogle Scholar
  17. 17.
    Maand, X., Deng, Z.: Natural eye motion synthesis by modeling gaze-head coupling. In: VR’09: Proceedings of the 2009 IEEE Virtual Reality Conference, pp. 143–150 (2009) Google Scholar
  18. 18.
    Cosatto, E.: Sample-based talking-head synthesis. PhD thesis, Signal Processing Lab, Swiss Federal Institute of Technology, Lausanne, Switzerland, 2002 Google Scholar
  19. 19.
    Argyle, M., Cook, M.: Gaze and Mutual Gaze. Cambridge University Press, Cambridge (1976) Google Scholar
  20. 20.
    Masuko, S., Hoshino, J.: Generating head-eye movement for virtual actor. Syst. Comput. Jpn. 37(12), 33–44 (2006) CrossRefGoogle Scholar
  21. 21.
    von Cranach, M., Schmid, R., Vogel, M.W.: Über einige Bedingungen des Zusammenhanges von Lidschlag und Blickbewegung. Psychol. Forsch. 33, 68–78 (1969) CrossRefGoogle Scholar
  22. 22.
    Condon, W.S., Ogsten, W.D.: A segmentation of behavior. J. Psych. Res., 221–235 (1967) Google Scholar
  23. 23.
    Kuipers, J.: Quaternions and Rotation Sequences. Princeton University Press, Princeton (1998) zbMATHGoogle Scholar
  24. 24.
    Young, S., Evermann, G., Gales, M., Hain, Th., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: HTK Book. Cambridge University Engineering Department, Cambridge (2005) Google Scholar
  25. 25.
    Huang, X., Hon, H.-W.: Spoken Language Processing: A Guide to Theory, Algorithm, and System Development. Prentice Hall PTR, New York (2001) Google Scholar
  26. 26.
    Zheng, J., Franco, H., Weng, F., Sankar, A., Bratt, H.: Word-level rate of speech modeling using rate-specific phones and pronunciations. Proc. ICASSP 3, 1775–1778 (2000) Google Scholar
  27. 27.
    Terken, J.: Fundamental frequency and perceived prominence of accented syllables. J. Acoust. Soc. Am. 95, 3662–3665 (1994) CrossRefGoogle Scholar
  28. 28.
    Kennedy, L., Ellis, D.: Pitch-based emphasis detection for the characterization of meeting recordings. In: IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2003), pp. 243–248 (2003) Google Scholar
  29. 29.
    Arons, B.: Pitch-based emphasis detection for segmenting speech recordings. In: Proc. ICSLP’94, pp. 1931–1934 (1994) Google Scholar
  30. 30.
    LCTechnologies, Eyegaze systems. (2007)
  31. 31.
    Kolmogorov, A.N.: Confidence limits for an unknown distribution function. Ann. Math. Stat. 12, 461–483 (1941) CrossRefGoogle Scholar
  32. 32.
    Aitchison, J., Brown, J.A.C.: The Lognormal Distribution. Cambridge University Press, Cambridge (1973) Google Scholar
  33. 33.
    Limpert, E., Stahel, W.A., Abbt, M.: Log-normal distributions across the sciences: keys and clues. BioScience 51(5), 341–352 (2001) CrossRefGoogle Scholar
  34. 34.
    Kendon, A.: Some functions of gaze-direction in social interaction. Acta Psychol. 26, 22–63 (1967) CrossRefGoogle Scholar
  35. 35.
    Barnes, G.R.: Vestibulo-ocular function during coordinated head and eye movements to acquire visual targets. J. Physiol. 287, 127–147 (1979) Google Scholar
  36. 36.
    Stahl, J.S.: Amplitude of human head movements associated with horizontal saccades. Exp. Brain Res. 126(1), 41–54 (1999) CrossRefGoogle Scholar
  37. 37.
    Donders, F.C.: Beitrag zur Lehre von den Bewegungen des menschlichen Auges. Hollaendis. Beitr. Anat. Physiol. Wiss. 1, 104–145 (1848) Google Scholar
  38. 38.
    Helmholtz, H.: On the normal movements of the human eye. Arch. Ophthalmol. IX, 153–214 (1863) Google Scholar
  39. 39.
    Haslwanter, T.: Mathematics of three-dimensional eye rotations. Vis. Res. 35, 1727–1739 (1995) CrossRefGoogle Scholar
  40. 40.
    Schworm, H.D., Ygge, J., Pansell, T., Lennerstrand, G.: Assessment of ocular counterroll during head tilt using binocular video oculography. Invest. Ophthalmol. Vis. Sci. 43(3), 662–667 (2002) Google Scholar
  41. 41.
    ITU Telecom: Standardization Sector of ITU, Methodology for the Subjective Assessment of the Quality of Television Pictures. Recommendation ITU-R BT.500-11 (2002) Google Scholar
  42. 42.
    ITU International Telecom: Union, Telecom. sector, Subjective video quality assessment methods for multimedia applications. Recommendation ITU-T P.910 (1999) Google Scholar

Copyright information

© Springer-Verlag 2009

Authors and Affiliations

  1. 1.Institut für InformationsverarbeitungLeibniz Universität HannoverHannoverGermany

Personalised recommendations