Grapheme to Phoneme Translation Using Conditional Random Fields with Re-Ranking

  • Stephen Ash
  • David Lin
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9924)


Grapheme to phoneme (G2P) translation is an important part of many applications including text to speech, automatic speech recognition, and phonetic similarity matching. Although G2P models have been studied thoroughly in the literature, we propose a G2P system which is optimized for producing a high-quality top-k list of candidate pronunciations for an input grapheme string. Our pipeline approach uses Conditional Random Fields (CRF) to predict phonemes from graphemes and a discriminative re-ranker, which incorporates information from previous stages in the pipeline with a graphone language model to construct a high-quality ranked list of results. We evaluate our findings against the widely used CMUDict dataset and demonstrate competitive performance with state-of-the-art G2P methods. Additionally, using entries with multiple valid pronunciations, we show that our re-ranking approach out-performs ranking using only a smoothed graphone language model, a technique employed by many recent publications. Lastly, we released our system as an open-source G2P toolkit available at


Grapheme-to-phoneme conversion Conditional random fields G2P 


  1. 1.
    Klatt, D.H.: Review of the ARPA speech understanding project. J. Acoust. Soc. Am. 62(6), 1345–1366 (1977)CrossRefGoogle Scholar
  2. 2.
    Kaplan, R.M., Kay, M.: Regular models of phonological rule systems. Comput. Linguist. 20(3), 331–378 (1994)Google Scholar
  3. 3.
    Black, A.W., Lenzo, K., Pagel, V.: Issues in building general letter to sound rules. In: ESCA Synthesis Workshop, Australia, pp. 77–80 (1998)Google Scholar
  4. 4.
    McCulloch, N., Bedworth, M., Bridle, J.: NETspeak a re-implementation of NETtalk. Comput. Speech Lang. 2(3), 289–302 (1987)CrossRefGoogle Scholar
  5. 5.
    Torkkola, K.: An efficient way to learn english grapheme-to-phoneme rules automatically. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 2, pp. 199–202. IEEE (1993)Google Scholar
  6. 6.
    Bisani, M., Ney, H.: Joint-sequence models for grapheme-to-phoneme conversion. Speech Commun. 50(5), 434–451 (2008)CrossRefGoogle Scholar
  7. 7.
    Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of ICML, pp. 282–289 (2001)Google Scholar
  8. 8.
    Wang, D., King, S.: Letter-to-sound pronunciation prediction using conditional random fields. IEEE Signal Process. Lett. 18(2), 122–125 (2011)CrossRefGoogle Scholar
  9. 9.
    Novak, J.R., Minematsu, N., Hirose, K.: WFST-based grapheme-to-phoneme conversion: open source tools for alignment, model-building and decoding. In: 10th International Workshop on Finite State Methods and Natural Language Processing, p. 45 (2012)Google Scholar
  10. 10.
    Novak, J.R., Minematsu, N., Hirose, K., Hori, C., Kashioka, H., Dixon, P.R.: Improving WFST-based G2P conversion with alignment constraints and RNNLM n-best rescoring. In: Interspeech (2012)Google Scholar
  11. 11.
    Wu, K., Allauzen, C., Hall, K., Riley, M., Roark, B.: Encoding linear models as weighted finite-state transducers. In: Interspeech (2014)Google Scholar
  12. 12.
    Rao, K., Peng, F., Sak, H., Beaufays, F.: Grapheme-to-phoneme conversion using long short-term memory recurrent neural networks. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2015)Google Scholar
  13. 13.
    Weide, R.: The CMU pronunciation dictionary, release 0.7a (2014).
  14. 14.
    McCallum, A.K.: Mallet: a machine learning for language toolkit (2002).
  15. 15.
    Galescu, L., Allen, J.F.: Pronunciation of proper names with a joint n-gram model for bi-directional grapheme-to-phoneme conversion. In: 7th International Conference on Spoken Language Processing, pp. 109–112 (2002)Google Scholar
  16. 16.
    Kheang, S., Katsurada, K., Iribe, Y., Nitta, T.: Solving the phoneme conflict in grapheme-to-phoneme conversion using a two-stage neural network-based approach. IEICE Trans. Inf. Syst. 97(4), 901–910 (2014)CrossRefGoogle Scholar
  17. 17.
    Eger, S.: Do we need bigram alignment models? On the effect of alignment quality on transduction accuracy in G2P. Proc. EMNLP 18, 127–136 (2015)MathSciNetGoogle Scholar
  18. 18.
    Jiampojamarn, S., Kondrak, G.: Online discriminative training for grapheme-to-phoneme conversion. In: Interspeech, pp. 1303–1306 (2009)Google Scholar
  19. 19.
    Lehnen, P., Allauzen, A., Lavergne, T., Yvon, F., Hahn, S., Ney, H.: Structure learning in hidden conditional random fields for grapheme-to-phoneme conversion. In: Interspeech, pp. 2326–2330 (2013)Google Scholar
  20. 20.
    Lehnen, P., Hahn, S., Guta, A., Ney, H.: Incorporating alignments into conditional random fields for grapheme to phoneme conversion. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 4916–4919. IEEE (2011)Google Scholar
  21. 21.
    Jiampojamarn, S., Kondrak, G.: Letter-phoneme alignment: an exploration. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 780–788 (2010)Google Scholar
  22. 22.
    Sejnowski, T.J., Rosenberg, C.R.: Parallel networks that learn to pronounce English text. J. Complex Syst. 1(1), 145–168 (1987)zbMATHGoogle Scholar
  23. 23.
    Wang, X., Sim, K.C.: Integrating conditional random fields and joint multi-gram model with syllabic features for grapheme-to-phone conversion. In: Interspeech, pp. 2321–2325 (2013)Google Scholar
  24. 24.
    Bartlett, S., Kondrak, G., Cherry, C.: On the syllabification of phonemes. In: Proceedings of NAACL-HLT, pp. 308–316 (2009)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.University of MemphisMemphisUSA
  2. 2.Baylor UniversityWacoUSA

Personalised recommendations