Skip to main content

Grapheme to Phoneme Translation Using Conditional Random Fields with Re-Ranking

Part of the Lecture Notes in Computer Science book series (LNAI,volume 9924)

Abstract

Grapheme to phoneme (G2P) translation is an important part of many applications including text to speech, automatic speech recognition, and phonetic similarity matching. Although G2P models have been studied thoroughly in the literature, we propose a G2P system which is optimized for producing a high-quality top-k list of candidate pronunciations for an input grapheme string. Our pipeline approach uses Conditional Random Fields (CRF) to predict phonemes from graphemes and a discriminative re-ranker, which incorporates information from previous stages in the pipeline with a graphone language model to construct a high-quality ranked list of results. We evaluate our findings against the widely used CMUDict dataset and demonstrate competitive performance with state-of-the-art G2P methods. Additionally, using entries with multiple valid pronunciations, we show that our re-ranking approach out-performs ranking using only a smoothed graphone language model, a technique employed by many recent publications. Lastly, we released our system as an open-source G2P toolkit available at http://bit.ly/83yysKL.

Keywords

  • Grapheme-to-phoneme conversion
  • Conditional random fields
  • G2P

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-45510-5_36
  • Chapter length: 12 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   79.99
Price excludes VAT (USA)
  • ISBN: 978-3-319-45510-5
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   99.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.

References

  1. Klatt, D.H.: Review of the ARPA speech understanding project. J. Acoust. Soc. Am. 62(6), 1345–1366 (1977)

    CrossRef  Google Scholar 

  2. Kaplan, R.M., Kay, M.: Regular models of phonological rule systems. Comput. Linguist. 20(3), 331–378 (1994)

    Google Scholar 

  3. Black, A.W., Lenzo, K., Pagel, V.: Issues in building general letter to sound rules. In: ESCA Synthesis Workshop, Australia, pp. 77–80 (1998)

    Google Scholar 

  4. McCulloch, N., Bedworth, M., Bridle, J.: NETspeak a re-implementation of NETtalk. Comput. Speech Lang. 2(3), 289–302 (1987)

    CrossRef  Google Scholar 

  5. Torkkola, K.: An efficient way to learn english grapheme-to-phoneme rules automatically. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 2, pp. 199–202. IEEE (1993)

    Google Scholar 

  6. Bisani, M., Ney, H.: Joint-sequence models for grapheme-to-phoneme conversion. Speech Commun. 50(5), 434–451 (2008)

    CrossRef  Google Scholar 

  7. Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of ICML, pp. 282–289 (2001)

    Google Scholar 

  8. Wang, D., King, S.: Letter-to-sound pronunciation prediction using conditional random fields. IEEE Signal Process. Lett. 18(2), 122–125 (2011)

    CrossRef  Google Scholar 

  9. Novak, J.R., Minematsu, N., Hirose, K.: WFST-based grapheme-to-phoneme conversion: open source tools for alignment, model-building and decoding. In: 10th International Workshop on Finite State Methods and Natural Language Processing, p. 45 (2012)

    Google Scholar 

  10. Novak, J.R., Minematsu, N., Hirose, K., Hori, C., Kashioka, H., Dixon, P.R.: Improving WFST-based G2P conversion with alignment constraints and RNNLM n-best rescoring. In: Interspeech (2012)

    Google Scholar 

  11. Wu, K., Allauzen, C., Hall, K., Riley, M., Roark, B.: Encoding linear models as weighted finite-state transducers. In: Interspeech (2014)

    Google Scholar 

  12. Rao, K., Peng, F., Sak, H., Beaufays, F.: Grapheme-to-phoneme conversion using long short-term memory recurrent neural networks. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2015)

    Google Scholar 

  13. Weide, R.: The CMU pronunciation dictionary, release 0.7a (2014). http://www.speech.cs.cmu.edu/cgi-bin/cmudict

  14. McCallum, A.K.: Mallet: a machine learning for language toolkit (2002). http://mallet.cs.umass.edu

  15. Galescu, L., Allen, J.F.: Pronunciation of proper names with a joint n-gram model for bi-directional grapheme-to-phoneme conversion. In: 7th International Conference on Spoken Language Processing, pp. 109–112 (2002)

    Google Scholar 

  16. Kheang, S., Katsurada, K., Iribe, Y., Nitta, T.: Solving the phoneme conflict in grapheme-to-phoneme conversion using a two-stage neural network-based approach. IEICE Trans. Inf. Syst. 97(4), 901–910 (2014)

    CrossRef  Google Scholar 

  17. Eger, S.: Do we need bigram alignment models? On the effect of alignment quality on transduction accuracy in G2P. Proc. EMNLP 18, 127–136 (2015)

    MathSciNet  Google Scholar 

  18. Jiampojamarn, S., Kondrak, G.: Online discriminative training for grapheme-to-phoneme conversion. In: Interspeech, pp. 1303–1306 (2009)

    Google Scholar 

  19. Lehnen, P., Allauzen, A., Lavergne, T., Yvon, F., Hahn, S., Ney, H.: Structure learning in hidden conditional random fields for grapheme-to-phoneme conversion. In: Interspeech, pp. 2326–2330 (2013)

    Google Scholar 

  20. Lehnen, P., Hahn, S., Guta, A., Ney, H.: Incorporating alignments into conditional random fields for grapheme to phoneme conversion. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 4916–4919. IEEE (2011)

    Google Scholar 

  21. Jiampojamarn, S., Kondrak, G.: Letter-phoneme alignment: an exploration. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 780–788 (2010)

    Google Scholar 

  22. Sejnowski, T.J., Rosenberg, C.R.: Parallel networks that learn to pronounce English text. J. Complex Syst. 1(1), 145–168 (1987)

    MATH  Google Scholar 

  23. Wang, X., Sim, K.C.: Integrating conditional random fields and joint multi-gram model with syllabic features for grapheme-to-phone conversion. In: Interspeech, pp. 2321–2325 (2013)

    Google Scholar 

  24. Bartlett, S., Kondrak, G., Cherry, C.: On the syllabification of phonemes. In: Proceedings of NAACL-HLT, pp. 308–316 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stephen Ash .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Ash, S., Lin, D. (2016). Grapheme to Phoneme Translation Using Conditional Random Fields with Re-Ranking. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech, and Dialogue. TSD 2016. Lecture Notes in Computer Science(), vol 9924. Springer, Cham. https://doi.org/10.1007/978-3-319-45510-5_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-45510-5_36

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-45509-9

  • Online ISBN: 978-3-319-45510-5

  • eBook Packages: Computer ScienceComputer Science (R0)