Homograph Disambiguation in Text-to-Speech Synthesis

Yarowsky, David

doi:10.1007/978-1-4612-1894-4_12

David Yarowsky

299 Accesses
13 Citations

Abstract

This chapter presents a statistical decision procedure for lexical ambiguity resolution in text-to-speech synthesis. Based on decision lists, the algorithm incorporates both local syntactic patterns and more distant collocational evidence, combining the strengths of decision trees, N-gram taggers and Bayesian classifiers. The algorithm is applied to seven major types of ambiguity in which context can be used to choose the pronunciation of a word.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

P. Brown, S. Delia Pietra, V. Delia Pietra, and R. Mercer. Word sense disambiguation using statistical methods. In Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, 264–270, 1991.
Google Scholar
L. Brieman, J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Wadsworth & Brooks, Monterey, CA, 1984.
Google Scholar
E. Brill. A Corpus-Based Approach to Language Learning. Ph.D. Thesis, University of Pennsylvania, Philadelphia, 1993.
Google Scholar
R. Bruce and J. Wiebe. Word-sense disambiguation using decomposable models. In Proceedings of the 32nd Annual Meeting of the Association for Compu-tational Linguistics, Las Cruces, NM, 139–146, 1994.
Chapter Google Scholar
K. W. Church. A stochastic parts program and noun phrase parser for unrestricted text. In Proceedings of the Second Conference on Applied Natural Language Processing, 136–143, 1988.
Google Scholar
I. Dagan and A. Itai. Word sense disambiguation using a second language monolingual corpus. Computational Linguistics, 20:563–596, 1994.
Google Scholar
W. Gale, K. Church, and D. Yarowsky. A method for disambiguating word senses in a large corpus. Computers and the Humanities, 26:415–439, 1992.
Article Google Scholar
W. Gale, K. Church, and D. Yarowsky. Discrimination decisions for 100,000-dimensional spaces. In Current Issues in Computational Linguistics: In Honour of Don Walker, A. Zampoli, N. Calzolari, and M. Palmer, eds. Kluwer Academic Publishers, Dordrecht, Holland, 429–450, 1994.
Google Scholar
M. Hearst. Noun homograph disambiguation using local context in large text corpora. In Using Corpora, University of Waterloo, Waterloo, Ontario, 1991.
Google Scholar
F. Jelinek. Markov source modeling of text generation. In Impact of Processing Techniques on Communication, J. Skwirzinski, M. Nijhoff, Dordrecht, 1985.
Google Scholar
C. Leacock, G. Towell, and E. Voorhees. Corpus-based statistical sense resolution. In Proceedings, ARPA Human Language Technology Workshop, Princeton, NJ, 260–265, 1993.
Google Scholar
B. Merialdo. Tagging text with a probabilistic model In Proceedings of the IBM Natural Language ITL, Paris, France, 161–172, 1990.
Google Scholar
F. Mosteller and D. Wallace Inference and Disputed Authorship: The Federalist. Addison-Wesley, Reading, MA, 1964.
MATH Google Scholar
R. L. Rivest. Learning decision lists. Machine Learning 2:229–246, 1987.
MathSciNet Google Scholar
R. Sproat, J. Hirschberg, and D. Yarowsky. A corpus-based synthesizer. In Proceedings, International Conference on Spoken Language Processing, Banff, 1992.
Google Scholar
D. Yarowsky. Word-sense disambiguation using statistical models of Roget’s categories trained on large corpora. In Proceedings, COLING-92, Nantes, France, 454–460, 1992.
Google Scholar
D. Yarowsky. One sense per collocation. In Proceedings, ARPA Human Language Technology Workshop, Princeton, NJ, 266–271, 1993.
Chapter Google Scholar
D. Yarowsky. Decision lists for lexical ambiguity resolution: Application to accent restoration in Spanish and French. In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, Las Cruces, NM, 88–95, 1994.
Google Scholar
D. Yarowsky. A comparison of corpus-based techniques for restoring accents in Spanish and French text. In Proceedings, 2nd Annual Workshop on Very Large Corpora, Kyoto, Japan, 19–32, 1994.
Google Scholar

Download references

Authors

David Yarowsky
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Bell Laboratories Room 2D-452, 600 Mountain Avenue, Murray Hill, NJ, 07974-0636, USA
Jan P. H. van Santen
Bell Laboratories Room 2D-447, 600 Mountain Avenue, Murray Hill, NJ, 07974-0636, USA
Joseph P. Olive
Bell Laboratories Room 2D-451, 600 Mountain Avenue, Murray Hill, NJ, 07974-0636, USA
Richard W. Sproat
AT&T Research Room 2C-409, 600 Mountain Avenue, Murray Hill, NJ, 07974-0636, USA
Julia Hirschberg

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Yarowsky, D. (1997). Homograph Disambiguation in Text-to-Speech Synthesis. In: van Santen, J.P.H., Olive, J.P., Sproat, R.W., Hirschberg, J. (eds) Progress in Speech Synthesis. Springer, New York, NY. https://doi.org/10.1007/978-1-4612-1894-4_12

Download citation

DOI: https://doi.org/10.1007/978-1-4612-1894-4_12
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4612-7328-8
Online ISBN: 978-1-4612-1894-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics