Lexical Modeling for Proper name Recognition in Autonomata Too

  • Bert Réveil
  • Jean-Pierre Martens
  • Henk van den Heuvel
  • Gerrit Bloothooft
  • Marijn Schraagen
Part of the Theory and Applications of Natural Language Processing book series (NLP)


The research in Autonomata Too aimed at the development of new pronunciation modeling techniques that can bring the speech recognition component of a Dutch/Flemish POI (Points of Interest) information providing business service to the required level of accuracy. The automatic recognition of spoken POI is extremely difficult because of the existence of multiple pronunciations that are frequently used for the same POI and because of the presence of important cross-lingual effects one has to account for. In fact, the ASR (Automatic Speech Recognition) engine must be able to cope with pronunciations of (partly) foreign POI names spoken by native speakers and pronunciations of native POI names uttered by non-native speakers. In order to deal adequately with such pronunciations, one must model them at the level of the acoustic models as well as at the level of the recognition lexicon. This paper describes a novel lexical modeling approach that was developed and tested in the Autonomata Too project. The new method employs a G2P-P2P (grapheme-to-phoneme, phoneme-to-phoneme) tandem to generate suitable lexical pronunciation variants. It was shown to yield a significant improvement over a baseline system already embedding state-of-the-art acoustic and lexical models.


  1. 1.
    Cremelie, N., ten Bosch, L.: Improving the recognition of foreign names and non-native speech by combining multiple grapheme-to-phoneme converters. In: Proceedings ISCA ITRW on Adaptation Methods for Speech Recognition, Sophia Antopolis, France, pp. 151–154 (2001)Google Scholar
  2. 2.
    Eklund, R., Lindstrom, R.: How foreign are ‘foreign’ speech sounds? Implication for speech recognition and speech synthesis. In: Proceedings RTO Meeting on Multi-Lingual Interoperability in Speech Technology, Hull, Canada, pp. 15–19 (1999)Google Scholar
  3. 3.
    Flege, J.: The production and perception of foreign language speech sounds. In: Winitz, H. (ed.) Human Communication and Its Disorders. A Review, pp. 224–401. Norwood, Ablex (1988)Google Scholar
  4. 4.
    Schaden, S.: Regelbasierte Modellierung fremdsprachlich akzentbehafteter Aussprachevarianten. PhD Dissertation University of Bochum (2006)Google Scholar
  5. 5.
    Trancoso, I., Viana, C., Mascarenhas, I., Teixeira, C.: On deriving rules for nativised pronunciation in navigation queries. In: Proceedings Eurospeech, Budapest, Hungary, pp. 195–198 (1999)Google Scholar
  6. 6.
    Maison, B., Chen, S., Cohen, P.: Pronunciation modeling for names of foreign origin. In: Proceedings ASRU, Virgin Islands, USA, pp. 429–434 (2003)Google Scholar
  7. 7.
    Schultz, T., Kirchhof, K.: Multilingual Speech Processing. Elsevier, Academic (2006)Google Scholar
  8. 8.
    Stouten, F., Martens, J.: Recognition of foreign names spoken by native speakers. In: Proceedings Interspeech, Antwerp, Belgium, pp. 2133–2136 (2007)Google Scholar
  9. 9.
    Leggetter, C., Woodland, P.: Maximum likelihood linear regression for speaker adaptation of continuous density hidden markov models. Comput. Speech Lang 9, 171–185 (1995)CrossRefGoogle Scholar
  10. 10.
    Gauvain, J.-L., Lee, C.-H.: Maximum a posteriori estimation for multivariate gaussian mixture observations of markov chains. IEEE Trans. Speech Audio Process. 2, 291–298 (1994)CrossRefGoogle Scholar
  11. 11.
    Mayfield-Tomokiyo, L., Waibel, A.: Adaptation methods for non-native speech. In: Proceedings Workshop on multilinguality in Spoken Language Processing, Aalborg, Denmark (2001)Google Scholar
  12. 12.
    Bouselmi, G., Fohr, D., Illina, I., Haton, J.: Fully automated non-native speech recognition using confusion-based acoustic model integration and graphemic constraints. In: Proceedings ICASSP, Toulouse, France, pp. 345–348 (2006)Google Scholar
  13. 13.
    Stemmer, G., Nöth, E., Niemann, H.: Acoustic modeling of foreign words in a german speech recognition system. In: Proceedings Eurospeech, Aalborg, Denmark, pp. 2745–2748 (2001)Google Scholar
  14. 14.
    Li, Y., Fung, P., Xu, P., Liu, Y.: Asymmetric acoustic modeling of mixed language speech. In: Proceedings ICASSP, Prague, Czech Republic, pp. 5004–5007 (2011)Google Scholar
  15. 15.
    Bartkova, K., Jouvet, D.: Using multilingual units for improving modeling of pronunciation variants. In: Proceedings ICASSP, Toulouse, France, pp. 1037–1040 (2006)Google Scholar
  16. 16.
    Bartkova, K., Jouvet, D.: On using units trained on foreign data for improved multiple accent speech recognition. Speech Commun. 49 (10–11), 836–846 (2007)CrossRefGoogle Scholar
  17. 17.
    Reveil, B., Martens, J., van den Heuvel, H.: Improving proper name recognition by means of automatically learned pronunciation variants. Speech Commun. 54 (3), 321–340 (2012)CrossRefGoogle Scholar
  18. 18.
    Réveil, B., Martens, J.-P., D’Hoore, B.: How speaker tongue and name source language affect the automatic recognition of spoken names. In: Proceedings Interspeech, Brighton, UK, pp. 2995–2998 (2009)Google Scholar
  19. 19.
    Riley, M., Byrne, W., Finke, M., Khudanpur, S., Ljolje, A., McDonough, J., Nock, H., Saraclar, M., Wooters, C., Zavaliagkos, G.: Stochastic pronunciation modelling from hand-labelled phonetic corpora. Speech Commun. 29 (2–4):209–224 (1999)CrossRefGoogle Scholar
  20. 20.
    Schaden, S.: Rule-based lexical modelling of foreign-accented pronunciation variants. In: Proceedings 10th EACL Conference, Budapest, Hungary, pp. 159–162 (2003)Google Scholar
  21. 21.
    Schaden, S.: Generating non-native pronunciation lexicons by phonological rules. In: Proceedings ICPhS, Barcelona, Spain, pp. 2545–2548 (2003)Google Scholar
  22. 22.
    Conover, W.: Practical Nonparametric Statistics, vol. 3. Wiley, New York (1999)Google Scholar
  23. 23.
    Schraagen, M., Bloothooft, G.: Evaluating repetitions, or how to improve your multilingual asr system by doing nothing. In: Proceedings LREC, Valletta, Malta, pp. 612–617 (2010)Google Scholar
  24. 24.
    Schraagen, M., Bloothooft, G.: A qualitative evaluation of phoneme-to-phoneme technology. In: Proceedings Interspeech, Florence, Italy, pp. 2321–2324 (2011)Google Scholar

Copyright information

© The Author(s) 2013

Open Access. This chapter is distributed under the terms of the Creative Commons Attribution Noncommercial License, which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Authors and Affiliations

  • Bert Réveil
    • 1
  • Jean-Pierre Martens
    • 1
  • Henk van den Heuvel
    • 2
  • Gerrit Bloothooft
    • 3
  • Marijn Schraagen
    • 3
  1. 1.Ghent University, ELIS-DSSPGentBelgium
  2. 2.Radboud University, CLSTNijmegenThe Netherlands
  3. 3.Utrecht Institute of Linguistics, OTSUtrechtThe Netherlands

Personalised recommendations