Skip to main content

A Baseline System for Continuous Speech Recognition of Brazilian Portuguese Using the West Point Brazilian Portuguese Speech Corpus

  • Conference paper
Computational Processing of the Portuguese Language (PROPOR 2010)

Abstract

Despite the availability of several speech corpora that can be used to build automatic speech recognition systems, there are only a few corpora for the Brazilian Portuguese (BP) language. This lack of corpora does not allow an extensive and deep research on continuous speech recognition systems for BP. In this work, we present a baseline system for continuous speech recognition for BP and its results using the West Point Brazilian Portuguese Corpus. In addition to the results, the resources developed to build the system are made available for continuing the research on such systems for BP.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Sampaio Neto, N., Patrick, C., Adami, A.G., Klautau, A.: Spoltech and ogi-22 baseline systems for speech recognition in brazilian portuguese. In: Teixeira, A., de Lima, V.L.S., de Oliveira, L.C., Quaresma, P. (eds.) PROPOR 2008. LNCS (LNAI), vol. 5190, pp. 256–259. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  2. Teruszkin, R., Junior, F.: Implementation of a Large Vocabulary Continuous Speech Recognition System for Brazilian Portuguese. Journal of Communication and Information Systems 21(3), 204–218 (2006)

    Google Scholar 

  3. Neto, N.S., Sousa, E., Macedo, V., Adami, A.G., Klautau, A.: Desenvolvimento de software livre usando reconhecimento e sĂ­ntese de voz: O estado da arte para o portuguĂªs brasileiro. In: 6 Workshop Software Livre, Anais da Trilha Nacional do Workshop Software Livre, Porto Alegre, vol. 1 (2005)

    Google Scholar 

  4. Young, S., Odell, J., Ollason, D., Valtchev, V., Woodland, P.: The HTK Book. Entropic Cambridge Research Laboratory (1997)

    Google Scholar 

  5. Linguateca: Corpus de extractos de textos electrĂ³nicos nilc/folha (2008), http://www.linguateca.pt/cetenfolha/

  6. Morgan, J., Ackerlind, S., Packer, S.: West Point Brazilian Portuguese Speech (2008), http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2008S04

  7. Bisani, M., Ney, H.: Joint-sequence models for grapheme-to-phoneme conversion. Speech Communication (2008)

    Google Scholar 

  8. Sequitur G2P: Sequitur G2P - A trainable Grapheme-to-Phoneme converter (2008), http://www-i6.informatik.rwth-aachen.de/web/Software/g2p.html

  9. Santos, F., Barone, D., Adami, A.: ValidaĂ§Ă£o de Corpus para Reconhecimento de Fala ContĂ­nua em PortuguĂªs Brasileiro. In: Proc. V Workshop em Tecnologia da InformaĂ§Ă£o e da Linguagem Humana, TIL 2008 (2008)

    Google Scholar 

  10. dos Santos, F.W.: ValidaĂ§Ă£o de corpus para reconhecimento de fala contĂ­nua em portuguĂªs brasileiro. Master’s thesis, Universidade Federal do Rio Grande do Sul (2009)

    Google Scholar 

  11. Stolcke, A.: SRILM-an Extensible Language Modeling Toolkit. In: Seventh International Conference on Spoken Language Processing, vol. 2, pp. 901–904. ISCA, Denver (2002)

    Google Scholar 

  12. Young, S.: ATK-An Application Toolkit for HTK (2007)

    Google Scholar 

  13. VoxForge: Read Prompts and Submit Recordings (2008), http://www.voxforge.org/pt_br/read

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

dos Santos, F.W., Barone, D.A.C., Adami, A.G. (2010). A Baseline System for Continuous Speech Recognition of Brazilian Portuguese Using the West Point Brazilian Portuguese Speech Corpus. In: Pardo, T.A.S., Branco, A., Klautau, A., Vieira, R., de Lima, V.L.S. (eds) Computational Processing of the Portuguese Language. PROPOR 2010. Lecture Notes in Computer Science(), vol 6001. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12320-7_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12320-7_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12319-1

  • Online ISBN: 978-3-642-12320-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics