Advertisement

Unsupervised and User Feedback Based Lexicon Adaptation for Foreign Names and Acronyms

  • André MansikkaniemiEmail author
  • Mikko Kurimo
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9449)

Abstract

In this work we evaluate a set of lexicon adaptation methods for improving the recognition of foreign names and acronyms in automatic speech recognition (ASR). The most likely foreign names and acronyms are selected from the LM training corpus based on typographic information and letter-ngram perplexity. Adapted pronunciation rules are generated for the selected foreign name candidates using a statistical grapheme-to-phoneme (G2P) model. A rule-based method is used for pronunciation adaptation of acronym candidates. In addition to unsupervised lexicon adaptation, we also evaluate an adaptation method based on speech data and user corrected ASR transcripts. Pronunciation variants for foreign name candidates are retrieved using forced alignment and second-pass decoding over partial audio segments. Optimal pronunciation variants are collected and used for future pronunciation adaptation of foreign names.

Keywords

Speech recognition Lexicon adaptation Unsupervised pronunciation adaptation Forced alignment User feedback 

Notes

Acknowledgements

This work has been funded by the Academy of Finland under the Finnish Centre of Excellence in Computational Inference programme. The experiments were performed using computational resources provided by the Aalto Science-IT project.

References

  1. 1.
    Adde, L., Svendsen, T.: Pronunciation variation modeling of non-native proper names by discriminative tree search. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal, ICASSP (2011)Google Scholar
  2. 2.
    Ahmed, B., Cha, S.H., Tappert, C.: Detection of foreign entities in native text using n-gram based cumulative frequency addition. In: Proceedings of the CSIS Research Day, Pace University (2005)Google Scholar
  3. 3.
    Bisani, M., Ney, H.: Joint-sequence models for grapheme-to-phoneme conversion. Speech Commun 50(5), 434–451 (2008)CrossRefGoogle Scholar
  4. 4.
    Creutz, M., Lagus, K.: Unsupervised morpheme segmentation and morphology induction from text corpora using morfessor 1.0. Technical report, Helsinki University of Technology, technical Report A81, Publications in Computer and Information Science (2005)Google Scholar
  5. 5.
    Driesen, J., Bell, P., Sinclair, M., Renals, S.: Description of the uedin system for German asr. In: Proceedings of the International Workshop on Spoken Language Translation, IWSLT (2013)Google Scholar
  6. 6.
    Heuvel, H., Reveil, B., Martens, J.P.: Pronunciation-based asr for names. In: Proceedings of Interspeech (2009)Google Scholar
  7. 7.
    Hirsimäki, T., Pylkkönen, J., Kurimo, M.: Importance of high-order n-gram models in morph-based speech recognition. IEEE Trans. Audio Speech Lang. Process. 17, 724–732 (2009)CrossRefGoogle Scholar
  8. 8.
    Lehečka, J., Švec, J.: Improving speech recognition by detecting foreign inclusions and generating pronunciations. In: Habernal, I. (ed.) TSD 2013. LNCS, vol. 8082, pp. 295–302. Springer, Heidelberg (2013) Google Scholar
  9. 9.
    Maison, B., Chen, S., Cohen, P.S.: Pronunciation modeling for names of foreign origin. In: IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU (2003)Google Scholar
  10. 10.
    Mansikkaniemi, A., Kurimo, M.: Unsupervised topic adaptation for morph-based speech recognition. In: Proceedings of Interspeech (2013)Google Scholar
  11. 11.
    Mansikkaniemi, A., Kurimo, M.: Adaptation of morph-based speech recognition for foreign names and acronyms. IEEE Trans. Audio Speech Lang. Process. 23(5), 941–950 (2015)CrossRefGoogle Scholar
  12. 12.
    Siivola, V., Hirsimäki, T., Virpioja, S.: On growing and pruning kneser-ney smoothed n-gram models. IEEE Trans. Audio Speech Lang. Process. 15(5), 1617–1624 (2007)CrossRefGoogle Scholar
  13. 13.
    Yang, Q., Martens, J.P., Konings, N., Heuvel, H.: Development of a phoneme-to-phoneme (p2p) converter to improve the grapheme-to-phoneme (g2p) conversion of names. In: Proceedings of the Fifth International Conference on Language Resources and Evaluation, LREC (2006)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Department of Signal Processing and Acoustics, School of Electrical EngineeringAalto UniversityAaltoFinland

Personalised recommendations