Skip to main content
Log in

Animating speech: an automated approach using speech synthesised by rules

  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

This paper is concerned with the problem of animating computer drawn images of speaking human characters, and particularly with the problem of reducing the cost of adequate lip synchronisation. Since the method is based upon the use of speech synthesis by rules, extended to manipulate facial parameters, and there is also a need to gather generalised data about facial expressions associated with speech, these problems are touched upon as well. Useful parallels can be drawn between the problems of speech synthesis and those of facial expression synthesis. The paper outlines the background to the work, as well as the problems and some approaches to solution, and goes on to describe work in progress in the authors' laboratories that has resulted in one apparently successful approach to low-cost animated speaking faces. Outstanding problems are noted, the chief ones being the difficulty of selecting and controlling appropriate facial expression categories: the lack or naturalness of the synthetic speech; and the need to consider the body movements and speech of all characters in an animated sequence during the animation process.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bergeron P, Lachapelle P (1985) Controlling facial expressions and body movements in the computer animated short “Tony de Peltrie”. Siggraph 85 Tutorial Notes, ACM, New York

    Google Scholar 

  • Boston DW (1973) Synthetic facial communication. Br J Audiology 7(1):95–101

    Google Scholar 

  • Coker CH (1976) A model of articulatory dynamics and control. Proc IEEE 64(4):452–460

    Google Scholar 

  • Condon WS, Ogston WD (1971) Speech and body motion synchrony of the speaker-hearer. In: Horton OL, Jenkins JJ (eds) The Perception of Language Merrill, Columbus, Ohio, pp 150–184

  • De Soete G (1987) A perceptual study of the Flury-Riedwyl faces for graphically displaying multivariate data. Int J Man-Mach Stud 25(5):549–555

    Google Scholar 

  • Dewdney AK (1986) The compleat computer caricaturist and a whimsical tour of face space. Sci Am 255(4):20–28

    Google Scholar 

  • Ekman P, Friesen W (1975) Unmasking the human face. Consulting Psychologist, Palo Alto, California

  • Ekman P, Friesen W (1977) Manual for the facial action coding system. Consulting Psychologist, Palo Alto, California

  • Flanagan JL (1972) Speech analysis, synthesis and perception. Springer, Berlin Heidelberg New York

    Google Scholar 

  • Fromkin V (1964) Lip positions in American English vowels. Language and Speech 7(3):215–225

    Google Scholar 

  • Hazard E (1971) Lipreading for the oral deaf and hard-of-hearing person. Charles C Thomas, Springfield, Illinois

    Google Scholar 

  • Hill DR (1978) A program structure for event-based speech synthesis by rules within a flexible segmental framework. Int J Man-Mach Stud 10(3):285–299

    Google Scholar 

  • Hill DR (1980) Spoken language generation and understanding by machine: a problems and applications oriented overview. In: Simon JC (ed) Spoken Language Generation and Understanding. NATO ASI Series, D. Riedel, Dordrecht, pp 3–38

    Google Scholar 

  • Hill DR, Witten IH, Jassem W (1977) Some results from a preliminary study of British English speech rhythm. 94th. Meeting Acoust Soc Am Miami (December 1977) (available as Dept Comput Sci, Univ of Calgary, Rep No 78/26/5)

  • Holmes JN (1961) Notes on synthesis work. Speech Transmission Laboratory Quarterly Progress Rep, Stockholm (April 1961)

  • Holmes JN (1979) Synthesis of natural-sounding speech using a formant synthesiser. In: Lindblom B, Ohmann S (eds) Frontiers of Speech Communication Research, Academic Press, London, pp 275–285

    Google Scholar 

  • Hubley J, Hubley F, Trudeau G (1983) A Doonesbury special. (Animated cartoon movie) Pacific Arts Video Records, Carmel, California, PAVR-537, 30 mins

  • Jassem W, Hill DR, Witten IH (1984) Isochrony in English speech: its statistical validity and linguistic relevance. In: Gibbon D (ed) Pattern, Process and Function in Discourse Phonology. de Gruyter, pp 203–225

  • Jeffers J, Barley M (1971) Speechreading (lipreading). Charles C Thomas, Springfield, Illinois, pp 72–75

    Google Scholar 

  • Jeffers J, Barley M (1979) Look, now hear this. Charles C Thomas, Springfield, Illinois, p 4

    Google Scholar 

  • Lewis JP, Parke FI (1987) Automated lip-synch and speech synthesis for character animation. In: Caroll JH, Tanner P (eds) Proc Human Factors in Computing Systems (and Graphics Interface (CHI+GI 87) (April 1987), pp 143–147

  • Liberman AM (1957) Some results of research on speech perception. J Acoust Soc Am 29(1):117–123

    Google Scholar 

  • Massaro DW, Cohen MM (1983) Evaluation and integration of visual and auditory information in speech perception. J Exp Psychology 9(5):753–771

    Google Scholar 

  • Massaro DW, Thompson LA, Barron B, Laren E (1986) Development changes in visual and auditory contributions to speech perception. J Exp Child Psychology 41(1):93–113

    Google Scholar 

  • Nishida S (1986) Speech recognition enhancement by lip information. In: Mantei M, Orbeton P (eds) Human Factors in Computer Systems (Proc CHI 86). Boston (April 1986) Assoc for Comput Mach, New York, pp 198–204

    Google Scholar 

  • Ordman KA, Ralli MP (1976) What people say. Alexander Graham Bell Association for the Deaf, Washington, DC

    Google Scholar 

  • Parke FI (1974) A parametric model for human faces. PhD Thesis, Dept Comput Sci, Univ of Utah (December 1974)

  • Parke FI (1982) Parameterized models for facial animation. IEEE Comput Graph Appl 2(9):61–68

    Google Scholar 

  • Pearce A, Wyvill B, Wyvill G, Hill DR (1986) Speech and expression: a computer solution to face animation. Proc Graphics Interface 86 Conf, Vancouver (May 1986) Canadian Information Proc Soc, Toronto, Ontario, pp 136–140

  • Platt SM (1980) A system for computer simulation of the human face. M.Sc. Thesis, The Moore School, Univ of Pennsylvania (August 1980)

  • Platt SM, Badler NI (1981) Animating facial expressions. Comput Graph 15(3):245–252

    Google Scholar 

  • Platt SM (1986) Structure-based animation of the human face. Int Rep Engineering Dept, Swarthmore College

  • Sumby WH, Pollack I (1954) Visual contribution to speech intelligibility is noise. J Acoust Soc Am 26(2):212–215

    Google Scholar 

  • Walther EF (1982) Lipreading. Nelson-Hall, Chicago

    Google Scholar 

  • Waters K (1987) A muscle model for animating three-dimensional facial expression. Comput Graph 21(4):17–24

    Google Scholar 

  • Weil P (1982) About face: Computergraphie synthesis and manipulation of facial imagery. MS Thesis, Massachusetts Institute of Technology, Cambridge, Massachusetts

    Google Scholar 

  • Witten IH (1982) Principles of computer speech. Academic Press, London New York Paris San Diego San Francisco Sao Paulo Sydney Tokyo Toronto

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hill, D.R., Pearce, A. & Wyvill, B. Animating speech: an automated approach using speech synthesised by rules. The Visual Computer 3, 277–289 (1988). https://doi.org/10.1007/BF01914863

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01914863

Key words

Navigation