Advertisement

International Journal of Speech Technology

, Volume 8, Issue 4, pp 341–361 | Cite as

Scaling Smoothed Language Models

  • A. VaronaEmail author
  • I. Torres
Article
  • 46 Downloads

Abstract

In Continuous Speech Recognition (CSR) systems a Language Model (LM) is required to represent the syntactic constraints of the language. Then a smoothing technique needs to be applied to avoid null LM probabilities. Each smoothing technique leads to a different LM probability distribution. Test set perplexity is usually used to evaluate smoothing techniques but the relationship with acoustic models is not taken into account. In fact, it is well-known that to obtain optimum CSR performances a scaling exponential parameter must be applied over LMs in the Bayes’ rule. This scaling factor implies a new redistribution of smoothed LM probabilities. The shape of the final probability distribution is due to both the smoothing technique used when designing the language model and the scaling factor required to get the optimum system performance when integrating the LM into the CSR system. The main object of this work is to study the relationship between the two factors, which result in dependent effects. Experimental evaluation is carried out over two Spanish speech application tasks. Classical smoothing techniques representing very different degrees of smoothing are compared. A new proposal, Delimited discounting, is also considered. The results of the experiments showed a strong dependence between the amount of smoothing given by the smoothing technique and the way that the LM probabilities need to be scaled to get the best system performance, which is perplexity independent in many cases. This relationship is not independent of the task and available training data.

Keywords

continuous speech recognition language models smoothing techniques scaling factors 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bimbot, F., El-Béze, M., Igounet, S., Jardino, M., Smaili, K., and Zitouni, I. (2001). An alternative schme for perplexity estimation and its assessment for the evaluation of language models. Computer, Speech and Language, 15(1):1–13.CrossRefGoogle Scholar
  2. Bonafonte, A., Aibar, P., Castell, N., Lleida, E., Mariño, J., Sanchis, E., and Torres, I. (2000). Desarrollo de un sistema de diálogo oral en dominios restringidos. In Primeras jornadas en Tecnología del Habla.Google Scholar
  3. Bordel, G., Torres, I., and Vidal, E. (1994). Back-off smoothing in a syntactic approach to Language Modelling. In Proc. International Conference on Speech and Language Processing, pp. 851–854.Google Scholar
  4. Chandhuri, R. and Booth, T. (1986). Approximating Grammar Probabilities: Solution of a Conjecture. Journal ACM, 33(4):702–705.CrossRefGoogle Scholar
  5. Chen, F. S., and Goodman, J. (1999). An empirical study of smoothing techniques for language modeling. Computer, Speech and Language, 13:359–394.CrossRefGoogle Scholar
  6. Chen, F. S., and Rosenfeld, R. (2000). A survey of smoothing for me models. IEEE Transactions on Speech and Audio Processing, 8(1):37–50.CrossRefGoogle Scholar
  7. Clarkson, P., and Robinson, R. (1999). Improved language modelling through better language model evaluation measures. In P roc. of European Conference on Speech Technology, Vol. 5, pp. 1927–1930.Google Scholar
  8. Clarkson, P., and Rosenfeld, R. (1997). Statistical language modeling using the CMU-Cambridge toolkit. In Proc. of European Conference on Speech Technology, pp. 2707–2710.Google Scholar
  9. Díaz, J., Rubio, A., Peinado, A., Segarra, E., Prieto, N., and Casacuberta, F. (1998). Albayzin: A task-oriented spanish speech corpus. In First Int. Conf. on Language Resources and Evaluation, vol. II, pp. 497–501.Google Scholar
  10. Dupont, P., and Miclet, L. (1998). Inférence grammaticale régulière. fondements théoriques et principaux algorithmes. Rapport de reserche N 3449, Institut National de recherche en informatique et en automatique, Rennes.Google Scholar
  11. García, P., and Vidal, E. (1990). Inference of k-testable languages in the strict sense and application to syntactic pattern recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(9):920–925.CrossRefGoogle Scholar
  12. Goodman, J. (2001). A bit of progress in language modeling. Computer, Speech and Language, 15:403–434.CrossRefGoogle Scholar
  13. Hopcroft, J., and Ullman, J. (1979). Introduction to Automata Theory, Languages and Computation. Addison-Wesley.Google Scholar
  14. Jelinek, F. (1985). Markov source modelling of text generation. In J.K., Skwirzynski, and M., Nijhoff, (Eds.), The Impact of Processing Techniques on Communication, Dordrecht, The Netherlands, pp. 569–598.Google Scholar
  15. Jelinek, F. (1996). Five speculations (and a divertimento) on the themes of h. bourlard, h. hermansky and n. morgan. Speech Communication 18:242–246.CrossRefGoogle Scholar
  16. Jelinek, F. and Mercer, R.L. (1980). Interpolated estimation of markov source parameters from sparse data. In Workshop on Pattern Recognition in practise. The Netherlands: North-Holland, pp. 381–397.Google Scholar
  17. Jurafsky, D. and Martin, J.H. (2000). Speech and Language processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition. New Jersey:Prentice Hall.Google Scholar
  18. Katz, S. (1987). Estimation of probabilities from sparse data for the language model component of a speech recogniser. IEEE Transactions on Acoustics, Speech and Signal Processing, ASSP- 35(3):400–401.CrossRefGoogle Scholar
  19. Klakow, D., and Peters, J. (2002). Testing the correlation of word error rate and perplexity. Speech Communication, 38:19–28.CrossRefGoogle Scholar
  20. Kneser, R., and Ney, H. (1995). Improved backing-off for m-gram language moleling. In Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing. Detroit, M.I., vol. I., pp. 1783–1786.Google Scholar
  21. L. Mangu, E. B., Stolcke, A. (2000). Finding consensus in speech recognition: Word error minimization and other applications bof confusion networks. Computer, Speech and Language, 14:373–400.CrossRefGoogle Scholar
  22. Ney, H., Martin, S., and Wessel, F. (1997). Statistical Language Modeling using leaving-one-out. In S., Young, G., Bloothooft, (Eds.), Corpus-based Methods in Language and Speech Processing. Kluwer Academic Publishers, pp. 174–207.Google Scholar
  23. Ogawa, A., Takeda, K., and Itakura, F. (1998). Balancing acoustic and linguistic probabilities. In Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. II, pp. 181–185.Google Scholar
  24. Rodríguez, L., Torres, I., and Varona, A. (2001a)An ISCA Tutorial and research workshop.Google Scholar
  25. Rodríguez, L., Torres, I., and Varona, A. (2001b). Evaluation of sublexical and lexical models of acoustic disfluencies for spontaneous speech recognition in spanish. In Proc. of European Conference on Speech Technology, vol. 3, pp. 1665–1668.Google Scholar
  26. Rodríguez, L., Varona, A., de Ipiña, K.L., and Torres, I. (2000). Tornasol: An integrated continuous speech recognition system. In Pattern recognition and Applicactions. Frontiers in Artificial Intelligence and Applications series. The Netherlands:IOS Press, pp. 271–278.Google Scholar
  27. Rosenfeld, R. (2000). Two decades of statistical language modeling: Where do we go from here. Proceedings of the IEEE, 88(8).Google Scholar
  28. Rubio, J.A., Diaz-Verdejo, J.E., García, P., and Segura, J.C. (1997). On the influence of of frame-asynchronous grammar scoring in a csr system. In Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. II, pp. 895–899.Google Scholar
  29. Torres, I. and Varona, A. (2000). An efficient representation of k-TSS language models. Computación y Sistemas (Revista Iberoamericana de Computación), 3(4):237–244.Google Scholar
  30. Torres, I., and Varona, A. (2001). k-tss language models in a speech recognition systems. Computer, Speech and Language, 15(2):127–149.CrossRefGoogle Scholar
  31. Varona, A., and Torres, I. (1999). Using Smoothed k-TLSS Language Models in Continous Speech Recognition. In Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. II, pp. 729–732.Google Scholar
  32. Varona, A., and Torres, I. (2000). Delimited smoothing technique over pruned and not pruned syntactic language models: Perplexity and wer. In Automatic Speech Recognition: Challenges for the new Millenium, ISCA ITRW ASR2000 Workshop, Paris, pp. 69–76.Google Scholar
  33. Varona, A., and Torres, I. (2001). Back-off smoothing evaluation over syntactic language models. In Proc. of European Conference on Speech Technology. Vol. 3, pp. 2135–2138.Google Scholar
  34. Varona, A., and Torres, I. (2003). Integrating high and low smoothed lms in a csr system. In Progress in Pattern Recognition, Speech and Image Analysis. Lecture Notes in Computer Science, vol. 1, pp. 236–243.Google Scholar

Copyright information

© Springer Science + Business Media, LLC 2006

Authors and Affiliations

  1. 1.Dpto. Electricidad y Electrónica, Fac. Ciencia y TecnologiaBasque Country UniversityVizcayaSpain

Personalised recommendations