Scaling Acoustic and Language Model Probabilities in a CSR System
It is well known that a direct integration of acoustic and language models (LM) into a Continuous Speech Recognition (CSR) system leads to low performances. This problem has been analyzed in this work as a practical numerical problem. There are two ways to get optimum system performances: scaling acoustic or language model probabilities. Both approaches have been analyzed from a numerical point of view. They have also been experimentally tested on a CSR system over two Spanish databases. These experiments show similar reductions in word recognition rates but very different computational cost behaviors. They also show that the values of scaling factors required to get optimum CSR systems performances are closely related to other heuristic parameters in the system like the beam search width.
KeywordsLanguage Model Spontaneous Speech Probable Word Word Error Rate Partial Path
- 2.Rubio, J.A., Diaz-Verdejo, J.E., García, P., Segura, J.C.: On the influence of of frame-asynchronous grammar scoring in a csr system. In: Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. II, pp. 895–899 (1997)Google Scholar
- 3.Ogawa, A., Takeda, K., Itakura, F.: Balancing acoustic and linguistic probabilities. In: Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. II, pp. 181–185 (1998)Google Scholar
- 4.Varona, A., Torres, I.: High and low smoothed lms in a csr system. In: Brauer, W. (ed.) Progress in Pattern Recognition Speech and Image Analysis. Computer. LNCS, vol. 1, pp. 236–243. Springer, Heidelberg (1973)Google Scholar
- 5.Díaz, J., Rubio, A., Peinado, A., Segarra, E., Prieto, N.: F.Casacuberta: Albayzin: a task-oriented spanish speech corpus. In: First Int. Conf. on language resources and evaluation, vol. II, pp. 497–501 (1998)Google Scholar
- 6.Rodríguez, L., Torres, I., Varona, A.: Evaluation of sublexical and lexical models of acoustic disfluencies for spontaneous speech recognition in spanish. In: Proc. of European Conference on Speech Technology, vol. 3, pp. 1665–1668 (2001)Google Scholar