, Volume 19, Issue 2, pp 127-134
Date: 12 Sep 2012

Generating a pronunciation dictionary for European Portuguese using a joint-sequence model with embedded stress assignment

Rent the article at a discount

Rent now

* Final gross prices may vary according to local VAT.

Get Access


This paper addresses the problem of grapheme to phoneme conversion to create a pronunciation dictionary from a vocabulary of the most frequent words in European Portuguese. A system based on a mixed approach funded on a stochastic model with embedded rules for stressed vowel assignment is described. The implemented model can generate pronunciations from unrestricted words; however, a dictionary with the 40k most frequent words was constructed and corrected interactively. The dictionary includes homographs with multiplepronunciations. The vocabulary was defined using the CETEMPúblico corpus. The model and dictionary are publicly available.

This is a revised and extended version of a previous paper that appeared at STIL 2011, the 8th Brazilian Symposium in Information and Human Language Technology http://www.ufmt.br/stil2011/.