Skip to main content

Grapheme-to-Phoneme Transduction for Cross-Language ASR

  • Conference paper
  • First Online:
Statistical Language and Speech Processing (SLSP 2020)

Abstract

Automatic speech recognition (ASR) can be deployed in a previously unknown language, in less than 24 h, given just three resources: an acoustic model trained on other languages, a set of language-model training data, and a grapheme-to-phoneme (G2P) transducer to connect them. The LanguageNet G2Ps were created with the goal of being small, fast, and easy to port to a previously unseen language. Data come from pronunciation lexicons if available, but if there are no pronunciation lexicons in the target language, then data are generated from minimal resources: from a Wikipedia description of the target language, or from a one-hour interview with a native speaker of the language. Using such methods, the LanguageNet G2Ps now include simple models in nearly 150 languages, with trained finite state transducers in 122 languages, 59 of which are sufficiently well-resourced to permit measurement of their phone error rates. This paper proposes a measure of the distance between the G2Ps in different languages, and demonstrates that agglomerative clustering of the LanguageNet languages bears some resemblance to a phylogeographic language family tree. The LanguageNet G2Ps proposed in this paper have already been applied in three cross-language ASRs, using both hybrid and end-to-end neural architectures, and further experiments are ongoing.

This research was supported by the DARPA LORELEI program. Conclusions and findings are those of the authors, and are not endorsed by DARPA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The complete tree is at github.com/uiuc-sst/g2ps/blob/master/g2ppy/cluster/agglomerative_cluster_output_2020-07-18.txt.

References

  1. Adda, G., et al.: Breaking the unwritten language barrier: the BULB project. In: Proceedings of the SLTU-2016 5th Workshop on Spoken Language Technologies for Under-resourced Languages (2016)

    Google Scholar 

  2. Aker, A., Paramita, M.L., Pinnis, M., Gaizauskas, R.J.: Bilingual dictionaries for all EU languages. In: Proceedings of the Conference on Language Resources and Evaluation (LREC), pp. 2839–2845 (2014)

    Google Scholar 

  3. Allauzen, C., Mohri, M., Roark, B.: Generalized algorithms for constructing statistical language models. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics, pp. 40–47 (2003)

    Google Scholar 

  4. Baayen, R., Piepenbrock, R., Gulikers, L.: CELEX2. Technical report, LDC96L14, Linguistic Data Consortium (1996)

    Google Scholar 

  5. Bahl, L.R., Brown, P.F., de Souza, P.V., Picheny, M.A.: Acoustic Markov models used in the Tangora speech recognition system. In: Proceedings ICASSP, pp. 497–500 (1988)

    Google Scholar 

  6. Bisani, M., Ney, H.: Joint-sequence models for grapheme-to-phoneme conversion. Speech Commun. 50(5), 434–451 (2008)

    Article  Google Scholar 

  7. Blench, R., Nebel, A.: Dinka-English and English-Dinka dictionary (2005)

    Google Scholar 

  8. Bond, F., Paik, K.: A survey of wordnets and their licenses. Small 8(4), 5 (2012)

    Google Scholar 

  9. Bouckaert, R., et al.: Mapping the origins and expansion of the Indo-European language family. Science 337(6097), 957–960 (2012)

    Article  Google Scholar 

  10. Chan, W., Jaitly, N., Le, Q., Vinyals, O.: Listen, attend and spell: a neural network for large vocabulary conversational speech recognition. In: Proceedings ICASSP, pp. 4960–4964 (2016). https://doi.org/10.1109/ICASSP.2016.7472621

  11. Dâna, A.: Sözlük (2006). www.denizyuret.com/2006/11/turkish-resources.html. Accessed 20 July 2020

  12. Davis, K., Biddulph, R., Balashek, S.: Automatic recognition of spoken digits. J. Acoust. Soc. Am. 24(6), 637–642 (1952)

    Article  Google Scholar 

  13. Deng, L.: Integrated-multilingual speech recognition using universal phonological features in a functional speech production model. In: Proceedings ICASSP (1997). https://doi.org/10.1109/ICASSP.1997.596110

  14. Deri, A., Knight, K.: Grapheme-to-phoneme models for (almost) any language. In: Proceedings 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 399–408 (2016). https://doi.org/10.18653/v1/P16-1038

  15. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley, New York (2001)

    MATH  Google Scholar 

  16. Dudley, H., Balashek, S.: Automatic recognition of phonetic patterns in speech. J. Acoust. Soc. Am. 30, 721–732 (1958)

    Article  Google Scholar 

  17. Eberhard, D.M., Simons, G.F., Fennig, C.D. (eds.): Ethnologue: Languages of the World. 23rd edn. SIL International, Dallas (2020). www.ethnologue.com

  18. Elmahdy, M., Hasegawa-Johnson, M., Mustafawi, E.: Development of a TV broadcasts speech recognition system for Qatari Arabic. In: Proceedings of the Conference on Language Resources and Evaluation (LREC), pp. 3057–3061 (2014)

    Google Scholar 

  19. Garrett, J., Lastowka, G., et al.: Turkmen-English dictionary: a SPA project of Peace Corps Turkmenistan (1996)

    Google Scholar 

  20. Gilloux, M.: Automatic learning of word transducers from examples. In: Proceedings EUROSPEECH, pp. 107–112 (1991)

    Google Scholar 

  21. Grézl, F., Karafiaát, M., Veselý, K.: Adaptation of multilingual stacked bottle-neck neural network structure for new language. In: Proceedings ICASSP, pp. 7704–7708 (2014)

    Google Scholar 

  22. Hasegawa-Johnson, M., Goudeseune, C., Levow, G.A.: Fast transcription of speech in low-resource languages (2019). https://arxiv.org/abs/1909.07285

  23. Hock, H.H.: Principles of Historical Linguistics. Mouton de Gruyter, Berlin (1991)

    Book  Google Scholar 

  24. Howard, D.A.: The History of Turkey. Greenwood, Santa Barbara (2016)

    Google Scholar 

  25. Hughes, G.W.: The Recognition of Speech by Machine. Ph.D. Thesis, MIT (1961)

    Google Scholar 

  26. Hwang, M.Y., Huang, X.: Subphonetic modeling for speech recognition. In: Human Language Technology (HLT), pp. 174–179 (1992)

    Google Scholar 

  27. IATE: Interactive terminology for Europe (2020). https://iate.europa.eu. Accessed 26 July 2020

  28. International Phonetic Association: Handbook of the International Phonetic Association, Cambridge (1999)

    Google Scholar 

  29. Kamholz, D., Pool, J., Colowick, S.M.: PanLex: building a resource for panlingual lexical translation. In: Proceedings of the Conference on Language Resources and Evaluation (LREC), pp. 3145–3150 (2014)

    Google Scholar 

  30. Kneser, R., Ney, H.: Improved backing-off for M-gram language modeling. In: Proc. ICASSP, pp. 181–184 (1995)

    Google Scholar 

  31. Köhler, J.: Comparing three methods to create multilingual phone models for vocabulary independent speech recognition tasks. In: Multi-Lingual Interoperability in Speech Technology (1999)

    Google Scholar 

  32. Kroeber, P.D.: The Salish Language Family: Reconstructing Syntax. University of Nebraska Press (1999)

    Google Scholar 

  33. Kučera, H.: Mechanical phonemic transcription and phoneme frequency count in Czech. Int. J. Slavic Linguist. Poetics 6, 36–50 (1963)

    Google Scholar 

  34. Ladefoged, P.: The revised international phonetic alphabet. Language 66(3), 550–552 (1990)

    Article  Google Scholar 

  35. Lee, F.F.: Automatic grapheme-to-phoneme translation of English. J. Acoust. Soc. Am. 41(6), 1594 (1969). https://doi.org/10.1121/1.2143635

    Article  Google Scholar 

  36. Li, J., Hasegawa-Johnson, M.: Autosegmental neural nets: should phones and tones be synchronous or asynchronous? In: Proceedings Interspeech (2020)

    Google Scholar 

  37. Marcantonio, A.: The Uralic language family: facts, myths and statistics. Sapienza Università di Roma (2002)

    Google Scholar 

  38. Millward, J.: Eurasian Crossroads: A History of Xinjiang. Columbia University Press (1982)

    Google Scholar 

  39. Moran, S., McCloy, D. (eds.): PHOIBLE 2.0. Jena: Max Planck Institute for the Science of Human History (2019)

    Google Scholar 

  40. Mortensen, D.R., Dalmia, S., Littell, P.: Epitran: precision G2P for many languages. In: Proceedings of the Conference on Language Resources and Evaluation (LREC), pp. 2710–2714 (2018)

    Google Scholar 

  41. Neubig, G., et al.: DyNet: the dynamic neural network toolkit (2017). https://arxiv.org/pdf/1701.03980.pdf. Accessed 14 Sept 2017

  42. Novak, J.R., Minematsu, N., Hirose, K.: Phonetisaurus: Exploring grapheme-to-phoneme conversion with joint n-gram models in the WFST framework. Natural Lang. Eng. 22(6), 907–938 (2015)

    Article  Google Scholar 

  43. Omar, A.H.: The Malay spelling reform. J. Simplified Spelling Soc. 1989(2), 9–13 (1989)

    Google Scholar 

  44. Peters, B., Dehdari, J., van Genabith, J.: Massively multilingual neural grapheme-to-phoneme conversion. In: EMNLP 2017 Workshop on Building Linguisically Generalizable NLP Systems (2017)

    Google Scholar 

  45. Peterson, G.E.: Automatic speech recognition procedures. Lang. Speech 4(4), 200–219 (1961). https://doi.org/10.1177/002383096100400403

    Article  Google Scholar 

  46. Povey, D., et al.: The Kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society, December 2011. IEEE Catalog No.: CFP11SRW-USB

    Google Scholar 

  47. Rentzepopoulos, P.A., Kokkinakis, G.K.: Efficient multilingual phoneme-to-grapheme conversion based on HMM. Comput. Linguist. 22(3), 351–376 (1996)

    Google Scholar 

  48. Ritchie, M., Comrie, B. (eds.): The Intercontinental Dictionary Series. Max Planck Institute for Evolutionary Anthropology, Leipzig (2015). http://ids.clld.org. Accessed 26 July 2020

  49. Rolston, L., Kirchhoff, K.: Collection of bilingual data for lexicon transfer learning. Technical report, UWEETR-2016-0000, University of Washington Department of Electrical Engineering (2016)

    Google Scholar 

  50. Schultz, T.: GlobalPhone: a multilingual speech and text database developed at Karlsruhe University. In: Seventh International Conference on Spoken Language Processing (2002)

    Google Scholar 

  51. Schultz, T., Waibel, A.: Multilingual and crosslingual speech recognition. In: Proceedings International Conference Spoken Language Processing (ICSLP), pp. 0577:1–4 (1998)

    Google Scholar 

  52. Uzman, M.: Romanisation in Uzbekistan past and present. J. Roy. Asiatic Soc. 20(1), 49–60 (2010)

    Google Scholar 

  53. van Rijnsoever, P.: A multilingual text-to-speech system. In: IPO Annual Progress Report, pp. 34–41. Institute for Perception Research, Eindhoven (1988)

    Google Scholar 

  54. Varga, K.: Kaldi ASR: Extending the ASpIRE model (2017). chrisearch.wordpress.com/2017/03/11/speech-recognition-using-kaldi-extending-and-using-the-aspire-model

  55. Vasu, S.C.: The Ashtádhyáyí of Páini. Translated into English, Sindhu Charan Bose (1897

    Google Scholar 

  56. Vaswani, A., et al.: Attention is all you need. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 30, pp. 5998–6008. Curran Associates, Inc. (2017). http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf

  57. Watanabe, S., et al.: ESPnet: end-to-end speech processing toolkit. In: Proceedings Interspeech, pp. 2207–2211 (2018). https://doi.org/10.21437/Interspeech.2018-1456

  58. Żelasko, P., Moro-Velázquez, L., Hasegawa-Johnson, M., Scharenborg, O., Dehak, N.: That sounds familiar: an analysis of phonetic representations transfer across languages. In: Proceedings Interspeech (2020)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mark Hasegawa-Johnson .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hasegawa-Johnson, M., Rolston, L., Goudeseune, C., Levow, GA., Kirchhoff, K. (2020). Grapheme-to-Phoneme Transduction for Cross-Language ASR. In: Espinosa-Anke, L., Martín-Vide, C., Spasić, I. (eds) Statistical Language and Speech Processing. SLSP 2020. Lecture Notes in Computer Science(), vol 12379. Springer, Cham. https://doi.org/10.1007/978-3-030-59430-5_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-59430-5_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-59429-9

  • Online ISBN: 978-3-030-59430-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics