Skip to main content
Log in

Video-realistic image-based eye animation via statistically driven state machines

  • Original Article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

In this work we elaborate on a novel image-based system for creating video-realistic eye animations to arbitrary spoken output. These animations are useful to give a face to multimedia applications such as virtual operators in dialog systems. Our eye animation system consists of two parts: eye control unit and rendering engine, which synthesizes eye animations by combining 3D and image-based models. The designed eye control unit is based on eye movement physiology and the statistical analysis of recorded human subjects. As already analyzed in previous publications, eye movements vary while listening and talking. We focus on the latter and are the first to design a new model which fully automatically couples eye blinks and movements with phonetic and prosodic information extracted from spoken language. We extended the already known simple gaze model by refining mutual gaze to better model human eye movements. Furthermore, we improved the eye movement models by considering head tilts, torsion, and eyelid movements. Mainly due to our integrated blink and gaze model and to the control of eye movements based on spoken language, subjective tests indicate that participants are not able to distinguish between real eye motions and our animations, which has not been achieved before.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Ostermann, J., Weissenfeld, A.: Talking faces—technologies and applications. In: ICPR’04: Proceedings of the Pattern Recognition, vol. 3, pp. 826–833 (2004)

  2. Pighin, F., Hecker, J., Lischinski, D., Szeliski, R., Salesin, D.H.: Synthesizing realistic facial expressions from photographs. Comput. Graph. 32, 75–84 (1998)

    Google Scholar 

  3. Parke, F.I.: Computer generated animation of faces. In: ACM’72: Proceedings of the ACM Annual Conference, pp. 451–457 (1972)

  4. Bregler, C., Covell, M., Slaney, M.: Video rewrite: driving visual speech with audio. In: Proc. ACM SIGGRAPH’97, in Computer Graphics Proceedings, Annual Conference Series (1997)

  5. Ezzat, T., Geiger, G., Poggio, T.: Trainable videorealistic speech animation. In: Proc. ACM SIGGRAPH, pp. 388–397 (2002)

  6. Cosatto, E., Graf, H.P.: Photo-realistic talking heads from image samples. IEEE Trans. Multimedia 2(3), 152–163 (2000)

    Article  Google Scholar 

  7. Cassell, J., Torres, O.: Turn taking vs. discourse structure: how best to model multimodal conversation. In: Wilks, Y. (ed.) Machine Conversations. Kluwer Academic, The Hague (1998)

    Google Scholar 

  8. Cassell, J., Pelachaud, C., Badler, N., Steedman, M., Achorn, B., Becket, T., Douville, B., Prevost, S., Stone, M.: Animated conversation: rule-based generation of facial expression, gesture and spoken intonation for multiple conversational agents. Comput. Graph. 28, 413–420 (1994)

    Google Scholar 

  9. Colburn, A., Cohen, M., Drucker, S.: The role of eye gaze in avatar mediated conversational interfaces MSR-TR-2000-81. Microsoft Research (2000)

  10. Heylen, D., van Es, I., van Dijk, E.M.A.G., Nijholt, A.: Experimenting with the Gaze of a Conversational Agent, Natural, Intelligent and Effective Interaction in Multimodal Dialogue Systems. Kluwer Academic, Dordrecht (2005)

    Google Scholar 

  11. Poggi, I., Pelachaud, C., de Rosis, F.: Eye communication in a conversational 3D synthetic agent. AI Commun. 13(3), 169–182 (2000)

    Google Scholar 

  12. Fukayama, A., Ohno, T., Mukawa, N., Sawaki, M., Hagita, N.: Messages embedded in gaze of interface agents—impression management with agent’s gaze. In: CHI’02: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 41–48 (2002)

  13. Garau, M., Slater, M., Bee, S., Sasse, M.A.: The impact of eye gaze on communication using humanoid avatars. In: CHI’01: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 309–316 (2001)

  14. Deng, Zh., Lewis, J.P., Neumann, U.: Automated eye motion using texture synthesis. IEEE Comput. Graph. Appl. 25, 24–30 (2005)

    Article  Google Scholar 

  15. Park Lee, S., Badler, J.B., Badler, N.I.: Eyes alive. In: SIGGRAPH’02: Proceedings of the 29th Annual Conference on Computer Graphics and Interactive Techniques, pp. 637–644 (2002)

  16. Freedman, E.G., Sparks, D.L.: Coordination of the eyes and head: movement kinematics. Exp. Brain Res. 131, 22–32 (2000)

    Article  Google Scholar 

  17. Maand, X., Deng, Z.: Natural eye motion synthesis by modeling gaze-head coupling. In: VR’09: Proceedings of the 2009 IEEE Virtual Reality Conference, pp. 143–150 (2009)

  18. Cosatto, E.: Sample-based talking-head synthesis. PhD thesis, Signal Processing Lab, Swiss Federal Institute of Technology, Lausanne, Switzerland, 2002

  19. Argyle, M., Cook, M.: Gaze and Mutual Gaze. Cambridge University Press, Cambridge (1976)

    Google Scholar 

  20. Masuko, S., Hoshino, J.: Generating head-eye movement for virtual actor. Syst. Comput. Jpn. 37(12), 33–44 (2006)

    Article  Google Scholar 

  21. von Cranach, M., Schmid, R., Vogel, M.W.: Über einige Bedingungen des Zusammenhanges von Lidschlag und Blickbewegung. Psychol. Forsch. 33, 68–78 (1969)

    Article  Google Scholar 

  22. Condon, W.S., Ogsten, W.D.: A segmentation of behavior. J. Psych. Res., 221–235 (1967)

  23. Kuipers, J.: Quaternions and Rotation Sequences. Princeton University Press, Princeton (1998)

    MATH  Google Scholar 

  24. Young, S., Evermann, G., Gales, M., Hain, Th., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: HTK Book. Cambridge University Engineering Department, Cambridge (2005)

    Google Scholar 

  25. Huang, X., Hon, H.-W.: Spoken Language Processing: A Guide to Theory, Algorithm, and System Development. Prentice Hall PTR, New York (2001)

    Google Scholar 

  26. Zheng, J., Franco, H., Weng, F., Sankar, A., Bratt, H.: Word-level rate of speech modeling using rate-specific phones and pronunciations. Proc. ICASSP 3, 1775–1778 (2000)

    Google Scholar 

  27. Terken, J.: Fundamental frequency and perceived prominence of accented syllables. J. Acoust. Soc. Am. 95, 3662–3665 (1994)

    Article  Google Scholar 

  28. Kennedy, L., Ellis, D.: Pitch-based emphasis detection for the characterization of meeting recordings. In: IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2003), pp. 243–248 (2003)

  29. Arons, B.: Pitch-based emphasis detection for segmenting speech recordings. In: Proc. ICSLP’94, pp. 1931–1934 (1994)

  30. LCTechnologies, Eyegaze systems. http://www.eyegaze.com (2007)

  31. Kolmogorov, A.N.: Confidence limits for an unknown distribution function. Ann. Math. Stat. 12, 461–483 (1941)

    Article  Google Scholar 

  32. Aitchison, J., Brown, J.A.C.: The Lognormal Distribution. Cambridge University Press, Cambridge (1973)

    Google Scholar 

  33. Limpert, E., Stahel, W.A., Abbt, M.: Log-normal distributions across the sciences: keys and clues. BioScience 51(5), 341–352 (2001)

    Article  Google Scholar 

  34. Kendon, A.: Some functions of gaze-direction in social interaction. Acta Psychol. 26, 22–63 (1967)

    Article  Google Scholar 

  35. Barnes, G.R.: Vestibulo-ocular function during coordinated head and eye movements to acquire visual targets. J. Physiol. 287, 127–147 (1979)

    Google Scholar 

  36. Stahl, J.S.: Amplitude of human head movements associated with horizontal saccades. Exp. Brain Res. 126(1), 41–54 (1999)

    Article  Google Scholar 

  37. Donders, F.C.: Beitrag zur Lehre von den Bewegungen des menschlichen Auges. Hollaendis. Beitr. Anat. Physiol. Wiss. 1, 104–145 (1848)

    Google Scholar 

  38. Helmholtz, H.: On the normal movements of the human eye. Arch. Ophthalmol. IX, 153–214 (1863)

    Google Scholar 

  39. Haslwanter, T.: Mathematics of three-dimensional eye rotations. Vis. Res. 35, 1727–1739 (1995)

    Article  Google Scholar 

  40. Schworm, H.D., Ygge, J., Pansell, T., Lennerstrand, G.: Assessment of ocular counterroll during head tilt using binocular video oculography. Invest. Ophthalmol. Vis. Sci. 43(3), 662–667 (2002)

    Google Scholar 

  41. ITU Telecom: Standardization Sector of ITU, Methodology for the Subjective Assessment of the Quality of Television Pictures. Recommendation ITU-R BT.500-11 (2002)

  42. ITU International Telecom: Union, Telecom. sector, Subjective video quality assessment methods for multimedia applications. Recommendation ITU-T P.910 (1999)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Axel Weissenfeld.

Electronic Supplementary Material

VideoObject. (MPG 19,268 KB)

VideoObject. (MPG 5,980 KB)

VideoObject. (MPG 18,020 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Weissenfeld, A., Liu, K. & Ostermann, J. Video-realistic image-based eye animation via statistically driven state machines. Vis Comput 26, 1201–1216 (2010). https://doi.org/10.1007/s00371-009-0401-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-009-0401-x

Navigation