Pronunciation Extraction from Phoneme Sequences through Cross-Lingual Word-to-Phoneme Alignment

  • Felix Stahlberg
  • Tim Schlippe
  • Stephan Vogel
  • Tanja Schultz
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7978)


With the help of written translations in a source language, we cross-lingually segment phoneme sequences in a target language into word units using our new alignment model Model 3P [17]. From this, we deduce phonetic transcriptions of target language words, introduce the vocabulary in terms of word IDs, and extract a pronunciation dictionary. Our approach is highly relevant to bootstrap dictionaries from audio data for Automatic Speech Recognition and bypass the written form in Speech-to-Speech Translation, particularly in the context of under-resourced languages, and those which are not written at all.

Analyzing 14 translations in 9 languages to build a dictionary for English shows that the quality of the resulting dictionary is better in case of close vocabulary sizes in source and target language, shorter sentences, more word repetitions, and formal equivalent translations.


pronunciation dictionary under-resourced languages speech-to-speech translation word segmentation 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Achtert, E., Goldhofer, S., Kriegel, H.P., Schubert, E., Zimek, A.: Evaluation of Clusterings–Metrics and Visual Support. In: ICDE (2012)Google Scholar
  2. 2.
    Besacier, L., Zhou, B., Gao, Y.: Towards Speech Translation of Non-Written Languages. In: SLT (2006)Google Scholar
  3. 3.
    Borland, J.A.: The English Standard Version-A Review Article. Faculty Publications and Presentations, 162 (2003)Google Scholar
  4. 4.
    Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., Mercer, R.L.: The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics 19(2), 263–311 (1993)Google Scholar
  5. 5.
    Crossway: The Holy Bible: English Standard Version (2001)Google Scholar
  6. 6.
    Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases With Noise. In: KDD (1996)Google Scholar
  7. 7.
    Gollan, C., Bisani, M., Kanthak, S., Schlüter, R., Ney, H.: Cross Domain Automatic Transcription on the TC-STAR EPPS Corpus. In: ICASSP (2005)Google Scholar
  8. 8.
    Gordon, R.G., Grimes, B.F.: Ethnologue: Languages of the World, 15th edn. SIL International (2005)Google Scholar
  9. 9.
    Johnson, M., Goldwater, S.: Improving Non-Parameteric Bayesian Inference: Experiments on Unsupervised Word Segmentation with Adaptor Grammars. In: HLT-NAACL (2009)Google Scholar
  10. 10.
    Kikui, G., Sumita, E., Takezawa, T., Yamamoto, S.: Creating Corpora for Speech-to-Speech Translation. In: Eurospeech (2003)Google Scholar
  11. 11.
    Lockman: La Biblia de las Américas (1986), (accessed on February 28, 2013)
  12. 12.
    Martirosian, O., Davel, M.: Error Analysis of a Public Domain Pronunciation Dictionary. In: PRASA (2007)Google Scholar
  13. 13.
    Nettle, D., Romaine, S.: Vanishing Voices: The Extinction of the World’s Languages. Oxford University Press (2000)Google Scholar
  14. 14.
    Och, F.J., Ney, H.: A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics 29(1), 19–51 (2003)zbMATHCrossRefGoogle Scholar
  15. 15.
    Rodgers, J.L., Nicewander, W.A.: Thirteen Ways to Look at the Correlation Coefficient. The American Statistician 42(1), 59–66 (1988)CrossRefGoogle Scholar
  16. 16.
    Schultz, T., Kirchhoff, K. (eds.): Multilingual Speech Processing. Academic Press, Amsterdam (2006)Google Scholar
  17. 17.
    Stahlberg, F., Schlippe, T., Vogel, S., Schultz, T.: Word Segmentation Through Cross-Lingual Word-to-Phoneme Alignment. In: SLT (2012)Google Scholar
  18. 18.
    Stolcke, A., Konig, Y., Weintraub, M.: Explicit Word Error Minimization in N-best List Rescoring. In: Eurospeech (1997)Google Scholar
  19. 19.
    Stüker, S., Waibel, A.: Towards Human Translations Guided Language Discovery for ASR Systems. In: SLTU (2008)Google Scholar
  20. 20.
    Stüker, S., Besacier, L., Waibel, A.: Human Translations Guided Language Discovery for ASR Systems. In: Interspeech (2009)Google Scholar
  21. 21.
    Thomas, R.L.: Bible Translations: The Link Between Exegesis and Expository Preaching. The Masters Seminary Journal 1, 53–74 (1990)Google Scholar
  22. 22.
    VIM: International Vocabulary of Basic and General Terms in Metrology. International Organization, pp. 09–14 (2004)Google Scholar
  23. 23.
    Vu, N.T., Kraus, F., Schultz, T.: Rapid Building of an ASR System for Under-Resourced Languages Based on Multilingual Unsupervised Training. In: Interspeech (2011)Google Scholar
  24. 24.
    Weide, R.: The Carnegie Mellon Pronouncing Dictionary 0.6 (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Felix Stahlberg
    • 1
  • Tim Schlippe
    • 1
  • Stephan Vogel
    • 2
  • Tanja Schultz
    • 1
  1. 1.Cognitive Systems Lab.Karlsruhe Institute of TechnologyKarlsruheGermany
  2. 2.Qatar Computing Research InstituteQatar FoundationDohaQatar

Personalised recommendations