Abstract
This paper describes the work developed in the creation of language models for a continuous speech recognition system for the Portuguese language. First we discuss the process we use to create and update a text corpus based on newspaper editions collected from the Web from which we were able to generate N-gram language models. We also present the procedure we use to improve those models for a Broadcast News (BN) recognition task by interpolating them with a BN transcriptions based language model. Finally the paper details a method used to generate morpheme-based language models.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Paulo Rocha and Diana Santos, “CETEMPúblico: Um corpus de grandes dimensões de linguagem jornalística portuguesa”, in Proceedings PROPOR’2000, Brasil, 2000 (Portuguese text) [http://cgi.portugues.mct.pt/cetempublico/].
P. Clarkson and R. Rosenfeld, “Statistical Language Modeling Using the CMU-Cambridge Toolkit”, in Proceedings of EUROSPEECH 97, Rhodes, Greece, 1997.
Ronald Rosenfeld, “Adaptive Statistical Language Modeling: A Maximum Entropy Approach”, PhD Thesis, School of Computer Science, Carnegie Mellon University, Pittsburgh, 1994.
Ciro Martins, “Modelos de Linguagem no Reconhecimento de Fala Contínua”, Tese de Mestrado, Institute Superior Técnico, Universidade Técnica de Lisboa, Lisboa, 1998 (Portuguese text).
Ciro Martins, João P. Neto, Luís B. Almeida, “Using Partial Morphological Analysis in Language Modeling Estimation for Large Vocabulary Portuguese Speech Recognition”, in Proceedings of Eurospeech 1999, Budapest, Hungary, 1999.
H. Meinedo and J. Neto, “Combination of acoustic models in continuous speech recognition hybrid systems”, in Proceedings ICSLP 2000, Beijing, China, 2000.
H. Bourlard and N. Morgan, Connectionist Speech Recognition-A Hybrid Approach, Kluwer Academic Press, 1994
H. Meinedo, N. Souto and J. Neto, “Broadcast News speech recognition for the Portuguese language”, in Proceedings ASRU, Italy, 2001.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Souto, N., Meinedo, H., Neto, J.P. (2002). Building Language Models for Continuous Speech Recognition Systems. In: Ranchhod, E., Mamede, N.J. (eds) Advances in Natural Language Processing. PorTAL 2002. Lecture Notes in Computer Science(), vol 2389. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45433-0_16
Download citation
DOI: https://doi.org/10.1007/3-540-45433-0_16
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43829-8
Online ISBN: 978-3-540-45433-5
eBook Packages: Springer Book Archive