Abstract
In this work we evaluate a set of lexicon adaptation methods for improving the recognition of foreign names and acronyms in automatic speech recognition (ASR). The most likely foreign names and acronyms are selected from the LM training corpus based on typographic information and letter-ngram perplexity. Adapted pronunciation rules are generated for the selected foreign name candidates using a statistical grapheme-to-phoneme (G2P) model. A rule-based method is used for pronunciation adaptation of acronym candidates. In addition to unsupervised lexicon adaptation, we also evaluate an adaptation method based on speech data and user corrected ASR transcripts. Pronunciation variants for foreign name candidates are retrieved using forced alignment and second-pass decoding over partial audio segments. Optimal pronunciation variants are collected and used for future pronunciation adaptation of foreign names.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Snowball - http://snowball.tartarus.org/.
References
Adde, L., Svendsen, T.: Pronunciation variation modeling of non-native proper names by discriminative tree search. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal, ICASSP (2011)
Ahmed, B., Cha, S.H., Tappert, C.: Detection of foreign entities in native text using n-gram based cumulative frequency addition. In: Proceedings of the CSIS Research Day, Pace University (2005)
Bisani, M., Ney, H.: Joint-sequence models for grapheme-to-phoneme conversion. Speech Commun 50(5), 434–451 (2008)
Creutz, M., Lagus, K.: Unsupervised morpheme segmentation and morphology induction from text corpora using morfessor 1.0. Technical report, Helsinki University of Technology, technical Report A81, Publications in Computer and Information Science (2005)
Driesen, J., Bell, P., Sinclair, M., Renals, S.: Description of the uedin system for German asr. In: Proceedings of the International Workshop on Spoken Language Translation, IWSLT (2013)
Heuvel, H., Reveil, B., Martens, J.P.: Pronunciation-based asr for names. In: Proceedings of Interspeech (2009)
Hirsimäki, T., Pylkkönen, J., Kurimo, M.: Importance of high-order n-gram models in morph-based speech recognition. IEEE Trans. Audio Speech Lang. Process. 17, 724–732 (2009)
Lehečka, J., Švec, J.: Improving speech recognition by detecting foreign inclusions and generating pronunciations. In: Habernal, I. (ed.) TSD 2013. LNCS, vol. 8082, pp. 295–302. Springer, Heidelberg (2013)
Maison, B., Chen, S., Cohen, P.S.: Pronunciation modeling for names of foreign origin. In: IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU (2003)
Mansikkaniemi, A., Kurimo, M.: Unsupervised topic adaptation for morph-based speech recognition. In: Proceedings of Interspeech (2013)
Mansikkaniemi, A., Kurimo, M.: Adaptation of morph-based speech recognition for foreign names and acronyms. IEEE Trans. Audio Speech Lang. Process. 23(5), 941–950 (2015)
Siivola, V., Hirsimäki, T., Virpioja, S.: On growing and pruning kneser-ney smoothed n-gram models. IEEE Trans. Audio Speech Lang. Process. 15(5), 1617–1624 (2007)
Yang, Q., Martens, J.P., Konings, N., Heuvel, H.: Development of a phoneme-to-phoneme (p2p) converter to improve the grapheme-to-phoneme (g2p) conversion of names. In: Proceedings of the Fifth International Conference on Language Resources and Evaluation, LREC (2006)
Acknowledgements
This work has been funded by the Academy of Finland under the Finnish Centre of Excellence in Computational Inference programme. The experiments were performed using computational resources provided by the Aalto Science-IT project.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Mansikkaniemi, A., Kurimo, M. (2015). Unsupervised and User Feedback Based Lexicon Adaptation for Foreign Names and Acronyms. In: Dediu, AH., Martín-Vide, C., Vicsi, K. (eds) Statistical Language and Speech Processing. SLSP 2015. Lecture Notes in Computer Science(), vol 9449. Springer, Cham. https://doi.org/10.1007/978-3-319-25789-1_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-25789-1_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25788-4
Online ISBN: 978-3-319-25789-1
eBook Packages: Computer ScienceComputer Science (R0)