Abstract
This paper is concerned with the problem of animating computer drawn images of speaking human characters, and particularly with the problem of reducing the cost of adequate lip synchronisation. Since the method is based upon the use of speech synthesis by rules, extended to manipulate facial parameters, and there is also a need to gather generalised data about facial expressions associated with speech, these problems are touched upon as well. Useful parallels can be drawn between the problems of speech synthesis and those of facial expression synthesis. The paper outlines the background to the work, as well as the problems and some approaches to solution, and goes on to describe work in progress in the authors' laboratories that has resulted in one apparently successful approach to low-cost animated speaking faces. Outstanding problems are noted, the chief ones being the difficulty of selecting and controlling appropriate facial expression categories: the lack or naturalness of the synthetic speech; and the need to consider the body movements and speech of all characters in an animated sequence during the animation process.
Similar content being viewed by others
References
Bergeron P, Lachapelle P (1985) Controlling facial expressions and body movements in the computer animated short “Tony de Peltrie”. Siggraph 85 Tutorial Notes, ACM, New York
Boston DW (1973) Synthetic facial communication. Br J Audiology 7(1):95–101
Coker CH (1976) A model of articulatory dynamics and control. Proc IEEE 64(4):452–460
Condon WS, Ogston WD (1971) Speech and body motion synchrony of the speaker-hearer. In: Horton OL, Jenkins JJ (eds) The Perception of Language Merrill, Columbus, Ohio, pp 150–184
De Soete G (1987) A perceptual study of the Flury-Riedwyl faces for graphically displaying multivariate data. Int J Man-Mach Stud 25(5):549–555
Dewdney AK (1986) The compleat computer caricaturist and a whimsical tour of face space. Sci Am 255(4):20–28
Ekman P, Friesen W (1975) Unmasking the human face. Consulting Psychologist, Palo Alto, California
Ekman P, Friesen W (1977) Manual for the facial action coding system. Consulting Psychologist, Palo Alto, California
Flanagan JL (1972) Speech analysis, synthesis and perception. Springer, Berlin Heidelberg New York
Fromkin V (1964) Lip positions in American English vowels. Language and Speech 7(3):215–225
Hazard E (1971) Lipreading for the oral deaf and hard-of-hearing person. Charles C Thomas, Springfield, Illinois
Hill DR (1978) A program structure for event-based speech synthesis by rules within a flexible segmental framework. Int J Man-Mach Stud 10(3):285–299
Hill DR (1980) Spoken language generation and understanding by machine: a problems and applications oriented overview. In: Simon JC (ed) Spoken Language Generation and Understanding. NATO ASI Series, D. Riedel, Dordrecht, pp 3–38
Hill DR, Witten IH, Jassem W (1977) Some results from a preliminary study of British English speech rhythm. 94th. Meeting Acoust Soc Am Miami (December 1977) (available as Dept Comput Sci, Univ of Calgary, Rep No 78/26/5)
Holmes JN (1961) Notes on synthesis work. Speech Transmission Laboratory Quarterly Progress Rep, Stockholm (April 1961)
Holmes JN (1979) Synthesis of natural-sounding speech using a formant synthesiser. In: Lindblom B, Ohmann S (eds) Frontiers of Speech Communication Research, Academic Press, London, pp 275–285
Hubley J, Hubley F, Trudeau G (1983) A Doonesbury special. (Animated cartoon movie) Pacific Arts Video Records, Carmel, California, PAVR-537, 30 mins
Jassem W, Hill DR, Witten IH (1984) Isochrony in English speech: its statistical validity and linguistic relevance. In: Gibbon D (ed) Pattern, Process and Function in Discourse Phonology. de Gruyter, pp 203–225
Jeffers J, Barley M (1971) Speechreading (lipreading). Charles C Thomas, Springfield, Illinois, pp 72–75
Jeffers J, Barley M (1979) Look, now hear this. Charles C Thomas, Springfield, Illinois, p 4
Lewis JP, Parke FI (1987) Automated lip-synch and speech synthesis for character animation. In: Caroll JH, Tanner P (eds) Proc Human Factors in Computing Systems (and Graphics Interface (CHI+GI 87) (April 1987), pp 143–147
Liberman AM (1957) Some results of research on speech perception. J Acoust Soc Am 29(1):117–123
Massaro DW, Cohen MM (1983) Evaluation and integration of visual and auditory information in speech perception. J Exp Psychology 9(5):753–771
Massaro DW, Thompson LA, Barron B, Laren E (1986) Development changes in visual and auditory contributions to speech perception. J Exp Child Psychology 41(1):93–113
Nishida S (1986) Speech recognition enhancement by lip information. In: Mantei M, Orbeton P (eds) Human Factors in Computer Systems (Proc CHI 86). Boston (April 1986) Assoc for Comput Mach, New York, pp 198–204
Ordman KA, Ralli MP (1976) What people say. Alexander Graham Bell Association for the Deaf, Washington, DC
Parke FI (1974) A parametric model for human faces. PhD Thesis, Dept Comput Sci, Univ of Utah (December 1974)
Parke FI (1982) Parameterized models for facial animation. IEEE Comput Graph Appl 2(9):61–68
Pearce A, Wyvill B, Wyvill G, Hill DR (1986) Speech and expression: a computer solution to face animation. Proc Graphics Interface 86 Conf, Vancouver (May 1986) Canadian Information Proc Soc, Toronto, Ontario, pp 136–140
Platt SM (1980) A system for computer simulation of the human face. M.Sc. Thesis, The Moore School, Univ of Pennsylvania (August 1980)
Platt SM, Badler NI (1981) Animating facial expressions. Comput Graph 15(3):245–252
Platt SM (1986) Structure-based animation of the human face. Int Rep Engineering Dept, Swarthmore College
Sumby WH, Pollack I (1954) Visual contribution to speech intelligibility is noise. J Acoust Soc Am 26(2):212–215
Walther EF (1982) Lipreading. Nelson-Hall, Chicago
Waters K (1987) A muscle model for animating three-dimensional facial expression. Comput Graph 21(4):17–24
Weil P (1982) About face: Computergraphie synthesis and manipulation of facial imagery. MS Thesis, Massachusetts Institute of Technology, Cambridge, Massachusetts
Witten IH (1982) Principles of computer speech. Academic Press, London New York Paris San Diego San Francisco Sao Paulo Sydney Tokyo Toronto
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Hill, D.R., Pearce, A. & Wyvill, B. Animating speech: an automated approach using speech synthesised by rules. The Visual Computer 3, 277–289 (1988). https://doi.org/10.1007/BF01914863
Issue Date:
DOI: https://doi.org/10.1007/BF01914863