Dealing with Prosody in a Text-to-Speech System

Goldsmith, John

doi:10.1023/A:1009678810697

Dealing with Prosody in a Text-to-Speech System

Published: November 1999

Volume 3, pages 51–63, (1999)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

John Goldsmith¹

87 Accesses
4 Citations
Explore all metrics

Abstract

The task of assigning appropriate intonation to syntheticspeech is one that requires knowledge of linguistic structure as well ascomputational possibilities. This paper surveys the basic challengesfacing the designer of a text-to-speech system, and reviews some of theperspectives on these problems that have been developed in thelinguistic literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Phonetics and Machine Learning: Hierarchical Modelling of Prosody in Statistical Speech Synthesis

ProZed: A Speech Prosody Editor for Linguists, Using Analysis-by-Synthesis

Handling Two Difficult Challenges for Text-to-Speech Synthesis Systems: Out-of-Vocabulary Words and Prosody: A Case Study in Romanian

References

Bloomfield, L. (1993). Language. Chicago: University of Chicago Press.
Google Scholar
Bolinger, D. (1958). A theory of pitch accent in English. Word, 14:109–149. Reprinted in Bolinger (1965).
Google Scholar
Bolinger, D. (1965). Forms of English: Accent, Morpheme, Order. Cambridge: Harvard University Press.
Google Scholar
Chomsky, N. and Halle, M. (1968). The Sound Pattern of English. New York: Harper and Row.
Google Scholar
Daelemans, W. and Bosch, A.v.d. (1996). Language-independent data-oriented grapheme-to-phoneme conversion. In J. Van Santen, R. Sproat, J. Olive, and J. Hirschberg (Eds.), Progress in Speech Synthesis. New York: Springer Verlag, pp. 77–90.
Google Scholar
Divay, M. and Vitale, A.J. (1997). Algorithms for graphemephoneme translation for English and French: Applications for database searches and speech synthesis. Computational Linguistics, 23(4):495–523.
Google Scholar
Goldsmith, J. (1976). Autosegmental Phonology. Ph.D. dissertation, Massachusetts Institute of Technology. Cambridge. (Reprinted by Garland Press, New York, 1979).
Google Scholar
Goldsmith, J. (1980 [1974]). English as a tone language. In D. Goyvaerts (Ed.), Phonology in the 1980s. Gent: Story-Scientia. (Circulated as an unpublished paper, 1974, MIT.)
Google Scholar
Hayes, B. (1995). Metrical Stress Theory: Principles and Case Studies. Chicago: University of Chicago Press.
Google Scholar
Hirschberg, J. (1993). Pitch accent in context: Predicting intonational prominence from text. Artificial Intelligence, 63(1–2):305–340.
Google Scholar
Huang, X., Acero, A., Adcock, J., Hon, H.-W., Goldsmith, J., Liu, J., and Plumpe, M. (1995). Whistler: A Trainable Text-to-Speech System. Presented at Fourth International Conference on Spoken Language Processing, Philadelphia, PA.
Ladd, D.R. (1992). An introduction to intonational phonology. In G.J. Docherty and D.R. Ladd (Eds.), Papers in Laboratory Phonology II: Gesture, Segment, Prosody. New York: Cambridge University Press, pp. 321–334.
Google Scholar
Liberman, M. (1975). The Intonational System of English. Ph.D. dissertation, Massachusetts Institute of Technology.
Liberman, M. and Church, K. (1992). Text analysis and word pronunciation in text-to-speech synthesis. In S. Furui and M. M. Sondhi (Eds.), Advances in Speech Technology. NewYork: Marcel Dekker, pp. 791–831.
Google Scholar
Liberman, M. and Sag, I. (1974). Prosodic form and discourse function. In M.W. LaGaly, R.A. Fox, and A. Bruck (Eds.), Papers from the 10th Regional Meeting. Chicago: Chicago Linguistic Society, pp. 416–427.
Google Scholar
McCawley, J. (1994). Some graphotactic constraints. In W.C. Watt (Ed.), Writing Systems and Cognition. Dordrecht: Kluwer, pp. 115–127.
Google Scholar
Ostendorf, M. and Veilleux, N. (1994). A hierarchical stochastic model for automatic prediction of prosodic boundary location. Computational Linguistics. 20(1):27–54.
Google Scholar
Pierrehumbert, J. (1980). The Phonology and Phonetics of English Intonation. Ph.D. dissertation, Massachusetts Institute of Technology.
Pierrehumbert, J. (1981). Synthesizing intonation. Journal of the Acoustical Society of America, 70:985–995.
Google Scholar
Pike, K. (1945). The Intonation of American English. Ann Arbor: University of Michigan.
Google Scholar
Wang, M.Q. and Hirschberg, J. (1992). Automatic classification of intonational phrase boundaries. Computer Speech and Language, 6:175–196.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Linguistics, University of Chicago, Chicago, IL, 60637
John Goldsmith

Authors

John Goldsmith
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Goldsmith, J. Dealing with Prosody in a Text-to-Speech System. International Journal of Speech Technology 3, 51–63 (1999). https://doi.org/10.1023/A:1009678810697

Download citation

Issue Date: November 1999
DOI: https://doi.org/10.1023/A:1009678810697

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dealing with Prosody in a Text-to-Speech System

Abstract

Access this article

Similar content being viewed by others

Phonetics and Machine Learning: Hierarchical Modelling of Prosody in Statistical Speech Synthesis

ProZed: A Speech Prosody Editor for Linguists, Using Analysis-by-Synthesis

Handling Two Difficult Challenges for Text-to-Speech Synthesis Systems: Out-of-Vocabulary Words and Prosody: A Case Study in Romanian

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Dealing with Prosody in a Text-to-Speech System

Abstract

Access this article

Similar content being viewed by others

Phonetics and Machine Learning: Hierarchical Modelling of Prosody in Statistical Speech Synthesis

ProZed: A Speech Prosody Editor for Linguists, Using Analysis-by-Synthesis

Handling Two Difficult Challenges for Text-to-Speech Synthesis Systems: Out-of-Vocabulary Words and Prosody: A Case Study in Romanian

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation