Abstract
This paper describes a 15-year research effort to improve the automatic pronunciation of proper names and details the issues involved in applying those pronunciations to speech synthesis and speech recognition. Our approach consists primarily of a large hand-tuned rule component, supplemented by a comparatively small pronunciation dictionary, both guided by extensive survey and polling data. Compared to other state-of-the-art programs, we use language-class identification to smaller degree. We utilize alternate pronunciations, obtained from the polling data, for both synthesis and recognition purposes. While our approach yields comparatively high accuracies, a comprehensive database of names and their pronunciations verified and authenticated through customer interactions (such as auto-attendants and automated directory assistance) will likely be the best future resource defining the ultimate in accuracy.
Similar content being viewed by others
References
Bechet, F., de Mori, R., and Subsol, G. (2002). Dynamic generation of proper name pronunciations for directory assistance. ICASSP. Orlando, FL, vol. I, pp 745-748.
Black, A.W., Lenzo, K., and Pagel, V. (1998). Issues in building general letter to sound rules. ESCAWorkshop on Speech Synthesis. Jenolan Caves, Australia, pp. 77-80.
Boves, L., Jouvet, D., Sienel, J., de Mori, R., Bechet, F., Fissore, L., and Laface, P. (2000). ASR for automatic directory assistance: The SMADA project. Proceedings: ESCA Int'l Speech Comm. Assoc. Automatic Speech Recog. Challenges for the New Millennium, Paris: LIMSI-CNRS, pp. 249-254.
Choukri, K. (2002). Personal communication.
Church, K.W. (1985). Morphological decomposition and stress assignment for speech synthesis. Assoc. for Comp. Ling., New York, NY, pp. 156-164.
CMU. (1998). The CMU Pronouncing Dictionary. http://www. speech.cs.cmu.edu/cgi-bin/cmudict.
Coker, C.H., Church,K.W., and Liberman,M.Y. (1990). Morphology and rhyming: Two powerful alternatives to letter-to-sound rules for speech synthesis. ESCA Speech Synthesis Workshop, Autrans, France, pp. 83-86.
Dedina, M. and Nusbaum, N.C. (1991). Pronounce: A program for pronunciation by analogy. Comp. Speech & Lang., 5:55-64.
Golding, A.R. and Rosenbloom, P.S. (1993). A comparison of Anapron with seven other name-pronunciation systems. J. of AVIOS, 14:1-21.
Liberman, M.Y. (1985, 2002). Personal communications.
Llitj´os, A.F. (2001). Improving pronunciation accuracy of proper names with language origin classes, CMU thesis. Available as www.cs.cmu.edu/?aria/papers/mthesis-cmu.pdf.
Marchand, Y. and Damper, R.I. (2000). A multi-strategy approach to improving pronunciation by analogy. Comp. Ling., 26:195-219.
Ngan, J., Ganapathiraju, A., and Picone, J. (1998). Improved surname pronunciations using decision trees. ICSLP. Sydney, Australia, paper 653.
Onomastica. (1995). Multi-language pronunciation dictionary of proper names and place names. Technical report, European Community. Ling Res. Engin. Prog., Proj. LRE-61004, Final Report.
Spiegel, M.F. (1985). Pronouncing names automatically. AVIOS. San Francisco, CA, pp. 107-132.
Spiegel, M.F. and Macchi, M.J. (1990). Development of the ORATOR synthesizer for network applications: Name pronunciation accuracy, morphological analysis, customization for business listings, and acronym pronunciation. AVIOS. Bethesda, MD.
Surprenant, A.M., Hura, S.L., Harper, M.P., Jamieson, L.H., Long, G., Thede, S.M., Rout, A., Hsueh, T.-H., Hockema, S.A., Johnson, M.T., Laflen, J.B., Srinivasan, P., and White, C.M. (1998). Familiarity and pronounceability of nouns and names: The Purdue proper name database. 16th ICA. Seattle, WA, pp. 2007-2008.
Vitale,T. (1991).Analgorithm for high accuracy name pronunciation by parametric speech synthesizer. Comp. Ling., 17:257-275.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Spiegel, M.F. Proper Name Pronunciations for Speech Technology Applications. International Journal of Speech Technology 6, 419–427 (2003). https://doi.org/10.1023/A:1025721319650
Issue Date:
DOI: https://doi.org/10.1023/A:1025721319650