Skip to main content
Log in

Generating and manipulating emotional synthetic speech on a personal computer

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Against a background of incorporating a talking head into a role-playing simulator, enhancements are proposed for users of the simulator and of text-to-speech systems in general. The first is the ability to generate vocal emotion in synthetic speech using a limited number of prosodic parameters with a concatenative speech synthesizer. The second enhancement allows for vocal emotions to be included during the authoring of text for output by the text-to-speech system. Vocal emotions can be represented visually, and can be manipulated directly by the user. Applications such as training simulators that use synthetic speech can be made more ‘human’ by the addition of emotions. A graphical editor for specifying and directly manipulating the speech improves the authoring environment of these applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. J.Allen, M.S.Hunnicutt, and D.Klatt, From Text to Speech: The MITalk System, Cambridge University Press: Cambridge, 1987.

    Google Scholar 

  2. C.Baber, “Speech output,” in Interactive Speech Technology, C.Baber and J.M.Noyes (Eds.), Taylor and Francis: London, 1993, pp. 21–24.

    Google Scholar 

  3. B.L.Brown, W.J.Strong, and A.C.Rencher, “Fifty-four voices from two: The effects of simultaneous manipulations of rate, mean fundamental frequency, and variance of fundamental frequency on ratings of personality from speech,” Journal of the Acoustical Society of America, Vol. 55, pp. 313–318, 1974.

    Google Scholar 

  4. J.E.Cahn, “Generating expression in synthesized speech,” Technical Report, M.I.T. Media Laboratory, Massachusetts Institute of Technology, Cambridge, MA, 1990.

    Google Scholar 

  5. R.Carlson, B.Granström, and I.Karlsson, “Experiments with voice modelling in speech synthesis,” Speech Communication, Vol. 10, pp. 481–489, 1991.

    Google Scholar 

  6. R.Collier, “Multi-language intonation synthesis,” Journal of Phonetics, Vol. 19, pp. 61–74, 1991.

    Google Scholar 

  7. C.K.Cowley and D.M.Jones, “Assessing the quality of synthetic speech,” in Interactive Speech Technology, C.Baber and J.M.Noyes (Eds.), Taylor and Francis: London, 1993, pp. 149–155.

    Google Scholar 

  8. D.Crystal, The English Tone of Voice, Edward Arnold: London, 1975.

    Google Scholar 

  9. Digital Equipment Corporation, DECtalk DTC03 Text-to-Speech System Owner's Manual, Maynard, MA, 1985.

  10. J.H. Eggen, “On the Quality of Synthetic Speech, Evaluation and Improvements,” Doctoral Thesis, University of Eindhoven, 1992.

  11. R.W.Frick, “The prosodic expression of anger: Differentiating threat and frustration,” Aggressive Behavior, Vol. 12, pp. 121–128, 1986.

    Google Scholar 

  12. C.G.Henton, “Fact and fiction in the use of female and male pitch,” Language and Communication, Vol. 9, pp. 299–311, 1989.

    Google Scholar 

  13. C.Henton, “The abnormality of male speech,” in New Departures in Linguistics, G.Wolf (Ed.), Garland Press: New York, 1992a, pp. 27–58.

    Google Scholar 

  14. C. Henton, “Sex and speech synthesis: Techniques, successes, and challenges,” in Proceedings of the Fourth Australian International Conference on Speech Science and Technology (SST-92), Brisbane, 1992b, pp. 738–743.

  15. C.Henton, “Speech synthesis: Telling it like it is,” Australasian Wheels for the Mind, Vol. 3, pp. 40–45. 1993.

    Google Scholar 

  16. C. Henton, “Beyond visemes: Using disemes in synthetic speech with facial animation,” Journal of the Acoustical Society of America, Vol. 95, p. 3010, 1994.

    Google Scholar 

  17. C.Henton, “Pitch dynamism in female and male speech,” Language and Communication, Vol. 15, pp. 43–61, 1995.

    Google Scholar 

  18. C. Henton and P. Litwinowicz, “Saying it with feeling: Techniques for synthesizing visible, emotional speech,” in Proceedings, 2nd. ESCA/IEEE Workshop on Speech Synthesis, 1994, pp. 73–76.

  19. Inside Macintosh. Sound (1994), Apple Computer, Inc., Cupertino, CA.

  20. A. James and J.C. Spohrer, “Simulation-based learning systems: Prototypes and experiences,” in Proceedings, ACM/SIGCHI Human Factors in Computing Systems, Monterey, CA, May 3–7, 1992, pp. 523–524.

  21. D.H.Klatt, “Review of text-to-speech conversion for English,” Journal of the Acoustical Society of America, Vol. 82, pp. 737–793, 1987.

    Google Scholar 

  22. D.H.Klatt and L.C.Klatt, “Analysis, synthesis, and perception of voice quality variations among female and male talkers,” Journal of the Acoustical Society of America, Vol. 87, pp. 820–855, 1990.

    Google Scholar 

  23. J.Laver, The Phonetic Description of Voice Quality, Cambridge University Press: Cambridge, 1980.

    Google Scholar 

  24. P. Litwinowicz and L. Williams, “Animating images with drawings,” SIGGRAPH'94 Conference Proceedings, 1994, pp. 121–124.

  25. D.W.Massaro, “Speech perception by ear and by eye: A paradigm for psychological enquiry,” Lawrence Erlbaum Associates: Hillsdale, NJ, 1987.

    Google Scholar 

  26. D.W.Massaro, M.M.Cohen, and P.M.T.Smeele, “Cross-linguistic comparisons in the integration of visual and auditory speech,” Memory and Cognition, Vol. 23, pp. 113–131, 1995.

    Google Scholar 

  27. D.W.Massaro and E.L.Ferguson, “Cognitive style and perception: The relationship between category width and speech perception, categorization, and discrimination,” American Journal of Psychology, Vol. 106, pp. 25–49, 1993.

    Google Scholar 

  28. I.R.Murray and J.L.Arnott, “Toward the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion,” Journal of the Acoustical Society of America Vol. 93, pp. 1097–1108, 1993.

    Google Scholar 

  29. A.Ortony and T.J.Turner, “What's basic about basic emotions?,” Psychological Review, Vol. 97, pp. 315–331, 1990.

    Google Scholar 

  30. D.O'Shaughnessy, Speech Communication: Human and Machine, Addison-Wesley: Reading, Mass., 1990.

    Google Scholar 

  31. E.Patterson, P.Litwinowicz, and N.Greene, “Facial animation by spatial mapping,” Computer Animation 1991, Springer Verlag: New York, 1991, pp. 31–44.

    Google Scholar 

  32. K.R.Scherer, “Emotion as a multicomponent process: A model and some cross-cultural data,” Review of Personality and Social Psychology, Vol. 5, pp. 37–63, 1984.

    Google Scholar 

  33. J.C.Spohrer, A.James, C.A.Abbott, G.J.Czora, J.Laffey, and M.L.Miller, “A role-playing simulator for needs analysis consulatations,” in Proceedings of the World Congress on Expert Systems, Pergamon Press: Orlando, FL, 1991.

    Google Scholar 

  34. K.N.Stevens and C.A.Bickley, “Constraints among parameters simplify control of Klatt formant synthesizer,” Journal of Phonetics, Vol. 19, pp. 161–174, 1991.

    Google Scholar 

  35. M.Tatham, “Voice output for human-machine interaction,” in Interactive Speech Technology, C.Baber and J.M.Noyes (Eds.), Taylor and Francis: London, 1993, pp. 25–35.

    Google Scholar 

  36. R.A.M.G.vanBezooijen, Characteristics and Recognizability of Vocal Expressions of Emotion, Foris: Dordrecht, 1984.

    Google Scholar 

  37. T.Vitale, “Issues in speech technology for persons with disabilities,” Journal of the American Voice I/O Society, Vol. 12, pp. 13–34, 1992.

    Google Scholar 

  38. E.J.Yannakoudakis and P.J.Hutton, Speech Synthesis and Recognition Systems, Halsted Press: New York, 1987.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Henton, C., Edelman, B. Generating and manipulating emotional synthetic speech on a personal computer. Multimed Tools Appl 3, 105–125 (1996). https://doi.org/10.1007/BF00429747

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF00429747

Keywords

Navigation