Journal of the Brazilian Computer Society

, Volume 17, Issue 1, pp 53–68 | Cite as

Free tools and resources for Brazilian Portuguese speech recognition

  • Nelson Neto
  • Carlos Patrick
  • Aldebaro Klautau
  • Isabel Trancoso
Open Access
Original Paper

Abstract

An automatic speech recognition system has modules that depend on the language and, while there are many public resources for some languages (e.g., English and Japanese), the resources for Brazilian Portuguese (BP) are still limited. This work describes the development of resources and free tools for BP speech recognition, consisting of text and audio corpora, phonetic dictionary, grapheme-to-phone converter, language and acoustic models. All of them are publicly available and, together with a proposed application programming interface, have been used for the development of several new applications, including a speech module for the OpenOffice suite. Performance tests are presented, comparing the developed BP system with a commercial software. The paper also describes an application that uses synthesis and speech recognition together with a natural language processing module dedicated to statistical machine translation. This application allows the translation of spoken conversations from BP to English and vice versa. The resources make easier the adoption of BP speech technologies by other academic groups and industry.

Keywords

Speech recognition Brazilian Portuguese Grapheme-to-phone conversion Application programming interface Speech-based applications 

References

  1. 1.
    Rabiner L, Juang B (1993) Fundamentals of speech recognition. PTR Prentice Hall, Englewood CliffsGoogle Scholar
  2. 2.
    Huang X, Acero A, Hon H (2001) Spoken language processing. Prentice-Hall, New YorkGoogle Scholar
  3. 3.
    Dutoit T (2001) An introduction to text-to-speech synthesis. Kluwer Academic, DordrechtGoogle Scholar
  4. 4.
    Taylor P (2009) Text-to-speech synthesis. Cambridge University Press, CambridgeCrossRefGoogle Scholar
  5. 5.
    Allen J, Hunnicutt MS, Klatt DH, Armstrong RC, Pisoni DB (1987) From text to speech: the MITalk system. Cambridge University Press, CambridgeGoogle Scholar
  6. 6.
    Odell J, Mukerjee K (2007) Architecture, user interface, and enabling technology in Windows Vista’s speech systems. IEEE Trans Comput 56(9):1156–1168MathSciNetCrossRefGoogle Scholar
  7. 7.
    www.google.com/chrome. Visited in June 2010
  8. 8.
    Schramm M, Freitas L, Zanuz A, Barone D (2000) A Brazilian Portuguese language corpus development. In: International conference on spoken language processing, vol 2, pp 579–582Google Scholar
  9. 9.
    Teruszkin R, Vianna F (2006) Implementation of a large vocabulary continuous speech recognition system for Brazilian Portuguese. J Commun Inf Syst 21:204–218Google Scholar
  10. 10.
    Ynoguti CA, Violaro F (2008) A Brazilian Portuguese speech database. In: XXVI simpósio Brasileiro de telecomuniçacõesGoogle Scholar
  11. 11.
    Neto J, Meinedo H, Viveiros M, Cassaca R, Martins C, Caseiro D (2008) Broadcast news subtitling system in Portuguese. In: IEEE international conference on acoustics, speech, and signal processingGoogle Scholar
  12. 12.
    Neto J, Martins C, Meinedo H, Almeida L (1997) The design of a large vocabulary speech corpus for Portuguese. In: Proceedings of the European conference on speech technologyGoogle Scholar
  13. 13.
    Paul D, Baker J (1992) The design for the Wall Street Journal-based CSR corpus. In: Proceedings of the international conference on spoken language processingGoogle Scholar
  14. 14.
    Ribeiro ITM, Duarte I, Matos G (1998) Corpus de diálogo CORAL. In: III encontro para o processamento computacional da língua Portuguesa escrita e FaladaGoogle Scholar
  15. 15.
    Trancoso I, Martins R, Moniz H, Silva A, Ribeiro M (2008) The LECTRA corpus: Classroom lecture transcriptions in European Portuguese. In: Language resources and evaluation conferenceGoogle Scholar
  16. 16.
    Valtchev V, Odell JJ, Woodland PC, Young SJ (1997) MMIE training of large vocabulary recognition systems. Speech Commun 22(4):303–314CrossRefGoogle Scholar
  17. 17.
    www.laps.ufpa.br/falabrasil. Visited in June 2010
  18. 18.
    Vandewalle P, Kovacevic J, Vetterli M (2009) Reproducible research in signal processing—what, why, and how. IEEE Signal Process Mag 26:37–47CrossRefGoogle Scholar
  19. 19.
    Couto I, Neto N, Tadaiesky V, Klautau A, Maia R (2010) An open source HMM-based text-to-speech system for Brazilian Portuguese. In: 7th international telecommunications symposiumGoogle Scholar
  20. 20.
    www.microsoft.com/speech/. Visited in June 2010
  21. 21.
    Neto N, Sousa E, Macedo V, Adami A, Klautau A (2005) Desenvolvimento de software livre usando reconhecimento e síntese de voz: O estado da arte para o Português Brasileiro. In: 6th forum internacional software livreGoogle Scholar
  22. 22.
  23. 23.
    Santos S, Alcaim A (2002) Um sistema de reconhecimento de voz contínua dependente da tarefa em língua portuguesa. Rev Soc Brasil Telecomun 17(2):135–147Google Scholar
  24. 24.
    Fagundes R, Sanches I (2003) Uma nova abordagem foneticofonologica em sistemas de reconhecimento de fala espontinea. Rev Soc Brasil Telecomun 95:225–239Google Scholar
  25. 25.
    Silva E, Baptista L, Fernandes H, Klautau A (2005) Desenvolvimento de um sistema de reconhecimento automático de voz contínua com grande vocabulário para o Português Brasileiro. In: XXV congresso da sociedade Brasileira de computaçãoGoogle Scholar
  26. 26.
    Abad A, Trancoso I, Neto N, Ribeiro M (2009) Porting an European Portuguese broadcast news recognition system to Brazilian Portuguese. In: Interspeech, Brighton, UKGoogle Scholar
  27. 27.
    Rabiner L (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2):257–86CrossRefGoogle Scholar
  28. 28.
    Juang H, Rabiner R (1991) Hidden Markov models for speech recognition. Technometrics 33:251–272MathSciNetCrossRefGoogle Scholar
  29. 29.
    Deshmukh N, Ganapathiraju A, Picone J (1999) Hierarchical search for large-vocabulary conversational speech recognition. IEEE Signal Process Mag 84–107Google Scholar
  30. 30.
    Jevtić N, Klautau A, Orlitsky A (2001) Estimated rank pruning and Java-based speech recognition. In: Automatic speech recognition and understanding workshopGoogle Scholar
  31. 31.
    Ladefoged P (2001) A course in phonetics, 4th edn. Harcourt Brace, New YorkGoogle Scholar
  32. 32.
    www.phon.ucl.ac.uk/home/sampa/. Visited in June 2010
  33. 33.
    Antoniol G, Fiutem R, Flor R, Lazzari G (1993) Radiological reporting based on voice recognition. In: Human–computer interaction. Lecture notes in computer science, vol 753. Springer, Berlin, pp 242–253Google Scholar
  34. 34.
    Lee CH, Gauvain JL (1993) Speaker adaptation based on MAP estimation of HMM parameters. In: IEEE ICASSP, pp 558–561Google Scholar
  35. 35.
    Ralf SG, Kompe R (2000) A combined MAP + MLLR approach for speaker adaptation. Proc Sony Res Forum 9:9–14Google Scholar
  36. 36.
    Bellman R (1957) Dynamic programming. Princeton University Press, PrincetonGoogle Scholar
  37. 37.
    Young S, Ollason D, Valtchev V, Woodland P (2006) The HTK book, version 3.4. Cambridge University Engineering Department, CambridgeGoogle Scholar
  38. 38.
    Stolcke A (2002) SRILM—an extensible language modeling toolkit. In: International conference on spoken language processingGoogle Scholar
  39. 39.
    Lee A (2009) The Julius book, 0.0.2 ed., rev 4.1.2Google Scholar
  40. 40.
    Caseiro D, Trancoso I, Oliveira L, Viana C (2002) Grapheme-to-phone using finite-state transducers. In: In IEEE workshop on speech synthesisGoogle Scholar
  41. 41.
    Teixeira A, Oliveira C, Moutinho L (2006) On the use of machine learning and syllable information in European Portuguese grapheme-phone conversion. In: Computational processing of the Portuguese language. Lecture notes in computer science, vol 3960. Springer, Berlin, pp 212–215CrossRefGoogle Scholar
  42. 42.
    Silva D, de Lima A, Maia R, Braga D, de Moraes JF, de Moraes JA, Resende F Jr (2006) A rule-based grapheme-phone converter and stress determination for Brazilian Portuguese natural language processing. In: VI international telecommunications symposiumGoogle Scholar
  43. 43.
    Faria A (2003) Applied phonetics: Portuguese text-to-speech. Tech Rep. University of CaliforniaGoogle Scholar
  44. 44.
    acdc.linguateca.pt/cetenfolha/. Visited in June 2010
  45. 45.
    Silva P, Neto N, Klautau A (2009) Novos recursos e utilização de adaptação de locutor no desenvolvimento de um sistema de reconhecimento de voz para o Português Brasileiro. In: XXVII simpósio Brasileiro de telecomuniçacõesGoogle Scholar
  46. 46.
    Cirigliano RJ, Monteiro C, de F Barbosa FL, Resende FL Jr, Couto L, Moraes J (2005) Um conjunto de 1000 frases foneticamente balanceadas para o Português Brasileiro obtido utilizando a abordagem de algoritmos genéticos. In: XXII simpósio Brasileiro de telecomuniçacõesGoogle Scholar
  47. 47.
    Neto N, Silva P, Klautau A, Adami A (2008) Spoltech and OGI-22 baseline systems for speech recognition in Brazilian Portuguese. In: International conference on computational processing of Portuguese language—PROPORGoogle Scholar
  48. 48.
    Weimar F, Barone D, Adami A (2010) A baseline system for continuous speech recognition of Brazilian Portuguese using the West Point Brazilian Portuguese speech corpus. In: International conference on computational processing of Portuguese languageGoogle Scholar
  49. 49.
    Davis S, Merlmestein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans ASSP 357–366Google Scholar
  50. 50.
    Woodland P, Young S (1993) The HTK tied-state continuous speech recognizer. In: Proceedings of the Eurospeech’93, BerlimGoogle Scholar
  51. 51.
    Welch LR (2003) Hidden Markov models and the Baum–Welch algorithm. IEEE Inf Theory Soc Newslett 53:10–12Google Scholar
  52. 52.
    Silva E, Pantoja M, Celidnio J, Klautau A (2004) Modelos de linguagem n-grama para reconhecimento de voz com grande vocabulários. In: III workshop em tecnologia da informação e da linguagem humanaGoogle Scholar
  53. 53.
    Chen SF, Goodman J (1999) An empirical study of smoothing techniques for language modeling. Comput Speech Lang 13:359–394CrossRefGoogle Scholar
  54. 54.
    Kneser R, Ney H (1995) Improved backing-off for M-gram language modeling. In: IEEE international conference on acoustics, speech and signal processing, pp 181–184Google Scholar
  55. 55.
  56. 56.
    code.google.com/p/lapsapi/. Visited in June 2010
  57. 57.
    Hosn C, Baptista LAN, Imbiriba T, Klautau A (2006) New resources for Brazilian Portuguese: results for grapheme-to-phoneme and phone classification. In: VI international telecommunications symposiumGoogle Scholar
  58. 58.
    Lander T, Cole R, Oshika B, Noel M (1995) The OGI 22 language telephone speech corpus. In: Proceedings of the Eurospeech, MadridGoogle Scholar
  59. 59.
    Colen WD, Batista P (2010) Veja mampe, sem as mpos! SpeechOO, uma extenspo de ditado para o BrOffice.org. In: 11th fórum internacional software livreGoogle Scholar
  60. 60.
    code.google.com/p/speechoo/. Visited in June 2010
  61. 61.
    Koehn P, Och FJ, Marcu D (2003) Statistical phrase-based translation. In: Proceedings of the human language technology, pp 127–133Google Scholar
  62. 62.
    http://www.voxforge.org/. Visited in June 2010
  63. 63.
    http://freetts.sourceforge.net. Visited in June 2010
  64. 64.
  65. 65.
    Aziz W, Pardo T, Paraboni I (2009) Fine-tuning in Portuguese–English statistical machine translation. In: 7th Brazilian symposium in information and human language technologyGoogle Scholar
  66. 66.
    Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of association for computational linguistics, pp 177–180Google Scholar
  67. 67.
    Och FJ, Ney H (2000) Improved statistical alignment models. In: Proceedings of association for computational linguistics, pp 440–447Google Scholar
  68. 68.
    Caseli H, Nunes I (2009) Statistical machine translation: little changes big impacts. In: 7th Brazilian symposium in information and human language technology, pp 1–9Google Scholar
  69. 69.
    Zhang Y, Vogel S, Waibel A (2004) Interpreting BLEU/NIST scores: How much improvement do we need to have a better system. In: 4th international conference on language resources and evaluationGoogle Scholar
  70. 70.
    Koehn P, Hoang H (2007) Factored translation models. In: Empirical methods on natural language processing, pp 868–876Google Scholar
  71. 71.
    Yao X, Bhutada P, Georgila K, Sagae K, Artstein R, Traum D (2010) Practical evaluation of speech recognizers for virtual human dialogue systems. In: 7th international language resources and evaluationGoogle Scholar
  72. 72.

Copyright information

© The Brazilian Computer Society 2010

Authors and Affiliations

  • Nelson Neto
    • 1
  • Carlos Patrick
    • 1
  • Aldebaro Klautau
    • 1
  • Isabel Trancoso
    • 2
  1. 1.Federal University of ParáBelémBrazil
  2. 2.IST/INESC-IDLisbonPortugal

Personalised recommendations