Skip to main content

Building Language Models for Continuous Speech Recognition Systems

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2389))

Abstract

This paper describes the work developed in the creation of language models for a continuous speech recognition system for the Portuguese language. First we discuss the process we use to create and update a text corpus based on newspaper editions collected from the Web from which we were able to generate N-gram language models. We also present the procedure we use to improve those models for a Broadcast News (BN) recognition task by interpolating them with a BN transcriptions based language model. Finally the paper details a method used to generate morpheme-based language models.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Paulo Rocha and Diana Santos, “CETEMPúblico: Um corpus de grandes dimensões de linguagem jornalística portuguesa”, in Proceedings PROPOR’2000, Brasil, 2000 (Portuguese text) [http://cgi.portugues.mct.pt/cetempublico/].

  2. P. Clarkson and R. Rosenfeld, “Statistical Language Modeling Using the CMU-Cambridge Toolkit”, in Proceedings of EUROSPEECH 97, Rhodes, Greece, 1997.

    Google Scholar 

  3. Ronald Rosenfeld, “Adaptive Statistical Language Modeling: A Maximum Entropy Approach”, PhD Thesis, School of Computer Science, Carnegie Mellon University, Pittsburgh, 1994.

    Google Scholar 

  4. Ciro Martins, “Modelos de Linguagem no Reconhecimento de Fala Contínua”, Tese de Mestrado, Institute Superior Técnico, Universidade Técnica de Lisboa, Lisboa, 1998 (Portuguese text).

    Google Scholar 

  5. Ciro Martins, João P. Neto, Luís B. Almeida, “Using Partial Morphological Analysis in Language Modeling Estimation for Large Vocabulary Portuguese Speech Recognition”, in Proceedings of Eurospeech 1999, Budapest, Hungary, 1999.

    Google Scholar 

  6. H. Meinedo and J. Neto, “Combination of acoustic models in continuous speech recognition hybrid systems”, in Proceedings ICSLP 2000, Beijing, China, 2000.

    Google Scholar 

  7. H. Bourlard and N. Morgan, Connectionist Speech Recognition-A Hybrid Approach, Kluwer Academic Press, 1994

    Google Scholar 

  8. H. Meinedo, N. Souto and J. Neto, “Broadcast News speech recognition for the Portuguese language”, in Proceedings ASRU, Italy, 2001.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Souto, N., Meinedo, H., Neto, J.P. (2002). Building Language Models for Continuous Speech Recognition Systems. In: Ranchhod, E., Mamede, N.J. (eds) Advances in Natural Language Processing. PorTAL 2002. Lecture Notes in Computer Science(), vol 2389. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45433-0_16

Download citation

  • DOI: https://doi.org/10.1007/3-540-45433-0_16

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43829-8

  • Online ISBN: 978-3-540-45433-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics