Abstract
It is well known that a direct integration of acoustic and language models (LM) into a Continuous Speech Recognition (CSR) system leads to low performances. This problem has been analyzed in this work as a practical numerical problem. There are two ways to get optimum system performances: scaling acoustic or language model probabilities. Both approaches have been analyzed from a numerical point of view. They have also been experimentally tested on a CSR system over two Spanish databases. These experiments show similar reductions in word recognition rates but very different computational cost behaviors. They also show that the values of scaling factors required to get optimum CSR systems performances are closely related to other heuristic parameters in the system like the beam search width.
This work has been partially supported by the Spanish CICYT under grant TIC2002-04103-C03-02 and by the Basque Country University (00224.310-13566/2001)
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Jelinek, F.: Five speculations (and a divertimento) on the themes of h. bourlard, h. hermansky and n. morgan. Speech Communication 18, 242–246 (1996)
Rubio, J.A., Diaz-Verdejo, J.E., García, P., Segura, J.C.: On the influence of of frame-asynchronous grammar scoring in a csr system. In: Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. II, pp. 895–899 (1997)
Ogawa, A., Takeda, K., Itakura, F.: Balancing acoustic and linguistic probabilities. In: Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. II, pp. 181–185 (1998)
Varona, A., Torres, I.: High and low smoothed lms in a csr system. In: Brauer, W. (ed.) Progress in Pattern Recognition Speech and Image Analysis. Computer. LNCS, vol. 1, pp. 236–243. Springer, Heidelberg (1973)
Díaz, J., Rubio, A., Peinado, A., Segarra, E., Prieto, N.: F.Casacuberta: Albayzin: a task-oriented spanish speech corpus. In: First Int. Conf. on language resources and evaluation, vol. II, pp. 497–501 (1998)
Rodríguez, L., Torres, I., Varona, A.: Evaluation of sublexical and lexical models of acoustic disfluencies for spontaneous speech recognition in spanish. In: Proc. of European Conference on Speech Technology, vol. 3, pp. 1665–1668 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Varona, A., Torres, M.I. (2004). Scaling Acoustic and Language Model Probabilities in a CSR System. In: Sanfeliu, A., Martínez Trinidad, J.F., Carrasco Ochoa, J.A. (eds) Progress in Pattern Recognition, Image Analysis and Applications. CIARP 2004. Lecture Notes in Computer Science, vol 3287. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30463-0_49
Download citation
DOI: https://doi.org/10.1007/978-3-540-30463-0_49
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23527-9
Online ISBN: 978-3-540-30463-0
eBook Packages: Springer Book Archive