Scaling Acoustic and Language Model Probabilities in a CSR System

  • Amparo Varona
  • M. Inés Torres
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3287)

Abstract

It is well known that a direct integration of acoustic and language models (LM) into a Continuous Speech Recognition (CSR) system leads to low performances. This problem has been analyzed in this work as a practical numerical problem. There are two ways to get optimum system performances: scaling acoustic or language model probabilities. Both approaches have been analyzed from a numerical point of view. They have also been experimentally tested on a CSR system over two Spanish databases. These experiments show similar reductions in word recognition rates but very different computational cost behaviors. They also show that the values of scaling factors required to get optimum CSR systems performances are closely related to other heuristic parameters in the system like the beam search width.

Keywords

Language Model Spontaneous Speech Probable Word Word Error Rate Partial Path 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Jelinek, F.: Five speculations (and a divertimento) on the themes of h. bourlard, h. hermansky and n. morgan. Speech Communication 18, 242–246 (1996)CrossRefGoogle Scholar
  2. 2.
    Rubio, J.A., Diaz-Verdejo, J.E., García, P., Segura, J.C.: On the influence of of frame-asynchronous grammar scoring in a csr system. In: Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. II, pp. 895–899 (1997)Google Scholar
  3. 3.
    Ogawa, A., Takeda, K., Itakura, F.: Balancing acoustic and linguistic probabilities. In: Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. II, pp. 181–185 (1998)Google Scholar
  4. 4.
    Varona, A., Torres, I.: High and low smoothed lms in a csr system. In: Brauer, W. (ed.) Progress in Pattern Recognition Speech and Image Analysis. Computer. LNCS, vol. 1, pp. 236–243. Springer, Heidelberg (1973)Google Scholar
  5. 5.
    Díaz, J., Rubio, A., Peinado, A., Segarra, E., Prieto, N.: F.Casacuberta: Albayzin: a task-oriented spanish speech corpus. In: First Int. Conf. on language resources and evaluation, vol. II, pp. 497–501 (1998)Google Scholar
  6. 6.
    Rodríguez, L., Torres, I., Varona, A.: Evaluation of sublexical and lexical models of acoustic disfluencies for spontaneous speech recognition in spanish. In: Proc. of European Conference on Speech Technology, vol. 3, pp. 1665–1668 (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Amparo Varona
    • 1
  • M. Inés Torres
    • 1
  1. 1.Departamento de Electricidad y ElectrónicaFac. de Ciencia y Tecnología, UPV/EHUBilbaoSpain

Personalised recommendations