Skip to main content
Log in

Phrase classes in two-level language models for ASR

  • Theoretical Advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

In this work, we propose and compare two different approaches to a two-level language model. Both of them are based on phrase classes but they consider different ways of dealing with phrases into the classes. We provide a complete formulation consistent with the two approaches. The language models proposed were integrated into an Automatic Speech Recognition (ASR) system and evaluated in terms of Word Error Rate. Several series of experiments were carried out over a spontaneous human–machine dialogue corpus in Spanish, where users asked for information about long-distance trains by telephone. It can be extracted from the obtained results that the integration of phrases into classes when using the language models proposed leads to an improvement of the performance of an ASR system. Moreover, the obtained results seem to indicate that the history length with which the best performance is achieved is related to the features of the model itself. Thus, not all the models show the best results with the same value of history length.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. Ametzagaiña R&D group, member of the Basque Technologic Network, http://www.ametza.com.

References

  1. Gupta V, Lenning M, Mermelstein P (1992) A language model for very large-vocabulary speech recognition. Comp Speech Lang 6:331–344

    Article  Google Scholar 

  2. Brown PF, Pietra VJD, Souza PVd, Lai JC, Mercer RL (1992) Class-based n-gram models of natural language. Comput Linguist 18:467–480

    Google Scholar 

  3. Niesler TR, Woodland PC (1996) A variable-length category-based n-gram language model. In: IEEE ICASSP-96, vol I. IEEE, Atlanta, pp 164–167

  4. Niesler T, Whittaker E, Woodland P (1998) Comparison of part-of-speech and automatically derived category-based language models for speech recognition. In: ICASSP’98, Seattle, pp 177–180

  5. Zitouni I (2007) Backoff hierarchical class n-gram language models: effectiveness to model unseen events in speech recognition. Comput Speech Lang 21:99–104

    Article  Google Scholar 

  6. Goodman JT (2001) A bit of progress in language modeling. Comput Speech Lang 15:403–434

    Article  Google Scholar 

  7. Deligne S, Bimbot F (1995) Language modeling by variable length sequences: theoretical formulation and evaluation of multigrams. In: Proceedings of ICASSP ’95, Detroit, pp 169–172

  8. Ries K, Bu FD, Wang Y, Waibel A (1995) Improved language modeling by unsupervised acquisition of structure. In: Proceedings of ICASSP ’95, Detroit, pp 193–196

  9. Kuo HKJ, Reichl W (1999) Phrase-based language models for speech recognition. In: Proceedings of EUROSPEECH 99, vol 4, 1595–1598 Budapest

  10. Chen Y, Chan KP (2003) Extended multi-word trigger pair language model using data mining technique. In: Proceedings of IEEE international conference on systems, man and cybernetics, Washington, DC, pp 262–267

  11. Binnenpoorte D, Cucchiarini C, Boves L, Strik H (2005) Multiword expressions in spoken language: an exploratory study on pronunciation variation. Comput Speech and Lang 19:433–449

    Google Scholar 

  12. Marcu D, Wong W (2002) A phrase-based, joint probability model for statistical machine translation (EMNLP), Philadelphia, 6–7 July

  13. Koehn P, Och F, Marcu D (2003) Statistical phrase-based translation. In: Proceedings of the human language technology and North American Association for computational linguistics conference (HLT/NAACL), 27 May to 1 June, Edmonton, Canada

  14. Zhou B, Chen S, Gao Y (2005) Constrained phrase-based translation using weighted finite state transducer. In: ICASSP, vol 1, pp 1017–1020

  15. Suhm B, Waibel A (1994) Towards better language models for spontaneous speech. In: Proceedings of ICSLP ’94, vol 3, Yokohama, Japan, pp 831–834

  16. Ries K, Buo FD, Waibel A (1996) Class phrase models for language modelling. In: Proceedings of ICSLP ’96, vol 1, Philadelphia, pp 398–401

  17. McCandless M, Glass J (1994) Empirical acquisition of language models for speech recognition. In: Proceedings of ICSLP ’94, Yokohama, Japan

  18. Deligne S, Sagisaka Y (2000) Statistical language modeling with a class-based n-multigram model. Comput Speech Lang 14:261–279

    Article  Google Scholar 

  19. Zitouni I (2002) A hierarchical language model based on variable-length class sequences: the mcni approach. IEEE Trans Speech Audio Proc 10:193–198

    Article  Google Scholar 

  20. Yamamoto H, Isogai S, Sagisaka Y (2003) Multi-class composite n-gram language model. Speech Commun 41:369–379

    Article  Google Scholar 

  21. Zitouni I, Smaili K, Haton JP (2003) Statistical language modeling based on variable-length sequences. Comput Speech Lang 17:27–41

    Article  Google Scholar 

  22. Sanchis E, Segarra E, Garca F, Hurtado L (2004) Language Understanding using n-multigram Models. Lect Notes Comp Sci 0302–9743(3230):207–219

    Google Scholar 

  23. Hsu BJP, Glass J (2006) Style & topic language model adaptation using hmm-lda. In: Proceedings of the 2006 Conference on Empirical methods in natural language processing. Association for Computational Linguistics, Sydney, pp 373–381

  24. Li YX, Tan CL, Ding X (2005) A hybrid post-processing system for offline handwritten chinese script recognition. Pattern Anal Appl 8:272–286

    Article  MathSciNet  Google Scholar 

  25. Benedí JM, Sánchez JA (2005) Estimation of stochastic context-free grammars and their use as language models. Comput Speech Lang 19:249–274

    Article  Google Scholar 

  26. García P, Vidal E (1990) Inference of k-testable languages in the strict sense and application to syntactic pattern recognition. IEEE Trans Pattern Anal Mach Intell 12:920–925

    Article  Google Scholar 

  27. Torres I, Varona A (2001) k-TSS language models in speech recognition systems. Comput Speech Lang 15:127–149

    Article  Google Scholar 

  28. Zue V, Seneff S, Glass J, Polifroni J, Pao C, Hazen T, Hetherington L (2000) Jupiter: A telephone-based conversational interface for weather information. IEEE Trans Speech Audio Proc. 8(1):85–96

    Google Scholar 

  29. Lamel L, Rosset S, Gauvin J, Bennacef S, Prouts G (1998) The limsi arise system. In: IEEE 4th workshop on interactive voice technology for telecommunications applications, pp 209–214

  30. Seneff S, Polifroni J (2000) Dialogue management in the mercury flight reservation system. In: ANLP-NAACL 2000 satellite workshop, pp 1–6

  31. Benedí JM, Lleida E, Varona A, Castro MJ, Galiano I, Justo R, López I, Miguel A (2006) Design and acquisition of a telephone spontaneous speech dialogue corpus in Spanish: DIHANA. In: Proceedings of LREC’06, Genoa, Italy, pp 1636–1639

  32. Justo R, Torres MI (2007) Phrases in category-based language models for spanish and basque asr. In: Proceedings of the 10th European conference on speech communication and technology. Interspeech, Antwerp, Belgium, pp 2377–2380

  33. Zens R, Ney H (2004) Improvements in phrase-based statistical machine translation. In: Proceedings of the human language technology conference (HLT-NAACL), pp 257–264

  34. Caseiro DA, Trancoso I (2006) A specialized on-the-fly algorithm for lexicon and language model composition. IEEE Trans Audio Speech Lang Process 14:1281–1291

    Article  Google Scholar 

  35. Piao SS, Rayson P, Archer D, McEnery T (2005) Comparing and combining a semantic tagger and a statistical tool for mwe extraction. Comp Speech Lang 19:378–397

    Google Scholar 

  36. Och FJ (1999) An efficient method for determining bilingual word classes. In: Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics. Association for Computational Linguistics, Morristown, pp 71–76

  37. DIHANA project (2005) Dialogue system for information access using spontaneous speech in different environments. CICYT TIC2002-04103-C03-03 http://www.dihana.upv.es

  38. Grau S, Segarra E, Sanchís E, García F, Hurtado LF (2006) Incorporating semantic knowledge to the language model in a speech understanding system. In: IV Jornadas en Tecnologia del Habla, Zaragoza, Spain, pp 145–148

  39. Hurtado LF, Griol D, Segarra E, Sanchís E (2006) A stochastic approach for dialog management based on neural networks. In: Proceedings of the 9th international conference on spoken language processing interspeech, Pittsburgh, pp 49–52

  40. Justo R, Torres MI, Benedí JM (2006) Category-based language model in a spanish spoken dialogue system. Procesamiento del Lenguaje Natural 37:19–24

    Google Scholar 

Download references

Acknowledgments

We would like to thank anonymous reviewers for their constructive comments and suggestions. We are also very grateful to Professor J. M. Benedí for his helpful comments in the first stage of this work. Finally, we would like to thank the Ametzagaiña group and, in particular, Josu Landa, for providing us with the linguistic classes and segmentation of the DIHANA corpus. This word has been partially supported by the University of the Basque Country under grant GIU07/57 and by CICYT under grant TIN2005-08660-C04-03.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Raquel Justo.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Justo, R., Torres, M.I. Phrase classes in two-level language models for ASR. Pattern Anal Applic 12, 427–437 (2009). https://doi.org/10.1007/s10044-009-0165-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-009-0165-y

Keywords

Navigation