Abstract
The task of assigning appropriate intonation to syntheticspeech is one that requires knowledge of linguistic structure as well ascomputational possibilities. This paper surveys the basic challengesfacing the designer of a text-to-speech system, and reviews some of theperspectives on these problems that have been developed in thelinguistic literature.
Similar content being viewed by others
References
Bloomfield, L. (1993). Language. Chicago: University of Chicago Press.
Bolinger, D. (1958). A theory of pitch accent in English. Word, 14:109–149. Reprinted in Bolinger (1965).
Bolinger, D. (1965). Forms of English: Accent, Morpheme, Order. Cambridge: Harvard University Press.
Chomsky, N. and Halle, M. (1968). The Sound Pattern of English. New York: Harper and Row.
Daelemans, W. and Bosch, A.v.d. (1996). Language-independent data-oriented grapheme-to-phoneme conversion. In J. Van Santen, R. Sproat, J. Olive, and J. Hirschberg (Eds.), Progress in Speech Synthesis. New York: Springer Verlag, pp. 77–90.
Divay, M. and Vitale, A.J. (1997). Algorithms for graphemephoneme translation for English and French: Applications for database searches and speech synthesis. Computational Linguistics, 23(4):495–523.
Goldsmith, J. (1976). Autosegmental Phonology. Ph.D. dissertation, Massachusetts Institute of Technology. Cambridge. (Reprinted by Garland Press, New York, 1979).
Goldsmith, J. (1980 [1974]). English as a tone language. In D. Goyvaerts (Ed.), Phonology in the 1980s. Gent: Story-Scientia. (Circulated as an unpublished paper, 1974, MIT.)
Hayes, B. (1995). Metrical Stress Theory: Principles and Case Studies. Chicago: University of Chicago Press.
Hirschberg, J. (1993). Pitch accent in context: Predicting intonational prominence from text. Artificial Intelligence, 63(1–2):305–340.
Huang, X., Acero, A., Adcock, J., Hon, H.-W., Goldsmith, J., Liu, J., and Plumpe, M. (1995). Whistler: A Trainable Text-to-Speech System. Presented at Fourth International Conference on Spoken Language Processing, Philadelphia, PA.
Ladd, D.R. (1992). An introduction to intonational phonology. In G.J. Docherty and D.R. Ladd (Eds.), Papers in Laboratory Phonology II: Gesture, Segment, Prosody. New York: Cambridge University Press, pp. 321–334.
Liberman, M. (1975). The Intonational System of English. Ph.D. dissertation, Massachusetts Institute of Technology.
Liberman, M. and Church, K. (1992). Text analysis and word pronunciation in text-to-speech synthesis. In S. Furui and M. M. Sondhi (Eds.), Advances in Speech Technology. NewYork: Marcel Dekker, pp. 791–831.
Liberman, M. and Sag, I. (1974). Prosodic form and discourse function. In M.W. LaGaly, R.A. Fox, and A. Bruck (Eds.), Papers from the 10th Regional Meeting. Chicago: Chicago Linguistic Society, pp. 416–427.
McCawley, J. (1994). Some graphotactic constraints. In W.C. Watt (Ed.), Writing Systems and Cognition. Dordrecht: Kluwer, pp. 115–127.
Ostendorf, M. and Veilleux, N. (1994). A hierarchical stochastic model for automatic prediction of prosodic boundary location. Computational Linguistics. 20(1):27–54.
Pierrehumbert, J. (1980). The Phonology and Phonetics of English Intonation. Ph.D. dissertation, Massachusetts Institute of Technology.
Pierrehumbert, J. (1981). Synthesizing intonation. Journal of the Acoustical Society of America, 70:985–995.
Pike, K. (1945). The Intonation of American English. Ann Arbor: University of Michigan.
Wang, M.Q. and Hirschberg, J. (1992). Automatic classification of intonational phrase boundaries. Computer Speech and Language, 6:175–196.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Goldsmith, J. Dealing with Prosody in a Text-to-Speech System. International Journal of Speech Technology 3, 51–63 (1999). https://doi.org/10.1023/A:1009678810697
Issue Date:
DOI: https://doi.org/10.1023/A:1009678810697