Phrase classes in two-level language models for ASR

Justo, Raquel; Torres, M. Inés

doi:10.1007/s10044-009-0165-y

Phrase classes in two-level language models for ASR

Theoretical Advances
Published: 19 August 2009

Volume 12, pages 427–437, (2009)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

Raquel Justo¹ &
M. Inés Torres¹

151 Accesses
6 Citations
Explore all metrics

Abstract

In this work, we propose and compare two different approaches to a two-level language model. Both of them are based on phrase classes but they consider different ways of dealing with phrases into the classes. We provide a complete formulation consistent with the two approaches. The language models proposed were integrated into an Automatic Speech Recognition (ASR) system and evaluated in terms of Word Error Rate. Several series of experiments were carried out over a spontaneous human–machine dialogue corpus in Spanish, where users asked for information about long-distance trains by telephone. It can be extracted from the obtained results that the integration of phrases into classes when using the language models proposed leads to an improvement of the performance of an ASR system. Moreover, the obtained results seem to indicate that the history length with which the best performance is achieved is related to the features of the model itself. Thus, not all the models show the best results with the same value of history length.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic speech recognition: a survey

Article 10 November 2020

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

Early dementia detection with speech analysis and machine learning techniques

Article Open access 11 April 2024

Notes

Ametzagaiña R&D group, member of the Basque Technologic Network, http://www.ametza.com.

References

Gupta V, Lenning M, Mermelstein P (1992) A language model for very large-vocabulary speech recognition. Comp Speech Lang 6:331–344
Article Google Scholar
Brown PF, Pietra VJD, Souza PVd, Lai JC, Mercer RL (1992) Class-based n-gram models of natural language. Comput Linguist 18:467–480
Google Scholar
Niesler TR, Woodland PC (1996) A variable-length category-based n-gram language model. In: IEEE ICASSP-96, vol I. IEEE, Atlanta, pp 164–167
Niesler T, Whittaker E, Woodland P (1998) Comparison of part-of-speech and automatically derived category-based language models for speech recognition. In: ICASSP’98, Seattle, pp 177–180
Zitouni I (2007) Backoff hierarchical class n-gram language models: effectiveness to model unseen events in speech recognition. Comput Speech Lang 21:99–104
Article Google Scholar
Goodman JT (2001) A bit of progress in language modeling. Comput Speech Lang 15:403–434
Article Google Scholar
Deligne S, Bimbot F (1995) Language modeling by variable length sequences: theoretical formulation and evaluation of multigrams. In: Proceedings of ICASSP ’95, Detroit, pp 169–172
Ries K, Bu FD, Wang Y, Waibel A (1995) Improved language modeling by unsupervised acquisition of structure. In: Proceedings of ICASSP ’95, Detroit, pp 193–196
Kuo HKJ, Reichl W (1999) Phrase-based language models for speech recognition. In: Proceedings of EUROSPEECH 99, vol 4, 1595–1598 Budapest
Chen Y, Chan KP (2003) Extended multi-word trigger pair language model using data mining technique. In: Proceedings of IEEE international conference on systems, man and cybernetics, Washington, DC, pp 262–267
Binnenpoorte D, Cucchiarini C, Boves L, Strik H (2005) Multiword expressions in spoken language: an exploratory study on pronunciation variation. Comput Speech and Lang 19:433–449
Google Scholar
Marcu D, Wong W (2002) A phrase-based, joint probability model for statistical machine translation (EMNLP), Philadelphia, 6–7 July
Koehn P, Och F, Marcu D (2003) Statistical phrase-based translation. In: Proceedings of the human language technology and North American Association for computational linguistics conference (HLT/NAACL), 27 May to 1 June, Edmonton, Canada
Zhou B, Chen S, Gao Y (2005) Constrained phrase-based translation using weighted finite state transducer. In: ICASSP, vol 1, pp 1017–1020
Suhm B, Waibel A (1994) Towards better language models for spontaneous speech. In: Proceedings of ICSLP ’94, vol 3, Yokohama, Japan, pp 831–834
Ries K, Buo FD, Waibel A (1996) Class phrase models for language modelling. In: Proceedings of ICSLP ’96, vol 1, Philadelphia, pp 398–401
McCandless M, Glass J (1994) Empirical acquisition of language models for speech recognition. In: Proceedings of ICSLP ’94, Yokohama, Japan
Deligne S, Sagisaka Y (2000) Statistical language modeling with a class-based n-multigram model. Comput Speech Lang 14:261–279
Article Google Scholar
Zitouni I (2002) A hierarchical language model based on variable-length class sequences: the mcni approach. IEEE Trans Speech Audio Proc 10:193–198
Article Google Scholar
Yamamoto H, Isogai S, Sagisaka Y (2003) Multi-class composite n-gram language model. Speech Commun 41:369–379
Article Google Scholar
Zitouni I, Smaili K, Haton JP (2003) Statistical language modeling based on variable-length sequences. Comput Speech Lang 17:27–41
Article Google Scholar
Sanchis E, Segarra E, Garca F, Hurtado L (2004) Language Understanding using n-multigram Models. Lect Notes Comp Sci 0302–9743(3230):207–219
Google Scholar
Hsu BJP, Glass J (2006) Style & topic language model adaptation using hmm-lda. In: Proceedings of the 2006 Conference on Empirical methods in natural language processing. Association for Computational Linguistics, Sydney, pp 373–381
Li YX, Tan CL, Ding X (2005) A hybrid post-processing system for offline handwritten chinese script recognition. Pattern Anal Appl 8:272–286
Article MathSciNet Google Scholar
Benedí JM, Sánchez JA (2005) Estimation of stochastic context-free grammars and their use as language models. Comput Speech Lang 19:249–274
Article Google Scholar
García P, Vidal E (1990) Inference of k-testable languages in the strict sense and application to syntactic pattern recognition. IEEE Trans Pattern Anal Mach Intell 12:920–925
Article Google Scholar
Torres I, Varona A (2001) k-TSS language models in speech recognition systems. Comput Speech Lang 15:127–149
Article Google Scholar
Zue V, Seneff S, Glass J, Polifroni J, Pao C, Hazen T, Hetherington L (2000) Jupiter: A telephone-based conversational interface for weather information. IEEE Trans Speech Audio Proc. 8(1):85–96
Google Scholar
Lamel L, Rosset S, Gauvin J, Bennacef S, Prouts G (1998) The limsi arise system. In: IEEE 4th workshop on interactive voice technology for telecommunications applications, pp 209–214
Seneff S, Polifroni J (2000) Dialogue management in the mercury flight reservation system. In: ANLP-NAACL 2000 satellite workshop, pp 1–6
Benedí JM, Lleida E, Varona A, Castro MJ, Galiano I, Justo R, López I, Miguel A (2006) Design and acquisition of a telephone spontaneous speech dialogue corpus in Spanish: DIHANA. In: Proceedings of LREC’06, Genoa, Italy, pp 1636–1639
Justo R, Torres MI (2007) Phrases in category-based language models for spanish and basque asr. In: Proceedings of the 10th European conference on speech communication and technology. Interspeech, Antwerp, Belgium, pp 2377–2380
Zens R, Ney H (2004) Improvements in phrase-based statistical machine translation. In: Proceedings of the human language technology conference (HLT-NAACL), pp 257–264
Caseiro DA, Trancoso I (2006) A specialized on-the-fly algorithm for lexicon and language model composition. IEEE Trans Audio Speech Lang Process 14:1281–1291
Article Google Scholar
Piao SS, Rayson P, Archer D, McEnery T (2005) Comparing and combining a semantic tagger and a statistical tool for mwe extraction. Comp Speech Lang 19:378–397
Google Scholar
Och FJ (1999) An efficient method for determining bilingual word classes. In: Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics. Association for Computational Linguistics, Morristown, pp 71–76
DIHANA project (2005) Dialogue system for information access using spontaneous speech in different environments. CICYT TIC2002-04103-C03-03 http://www.dihana.upv.es
Grau S, Segarra E, Sanchís E, García F, Hurtado LF (2006) Incorporating semantic knowledge to the language model in a speech understanding system. In: IV Jornadas en Tecnologia del Habla, Zaragoza, Spain, pp 145–148
Hurtado LF, Griol D, Segarra E, Sanchís E (2006) A stochastic approach for dialog management based on neural networks. In: Proceedings of the 9th international conference on spoken language processing interspeech, Pittsburgh, pp 49–52
Justo R, Torres MI, Benedí JM (2006) Category-based language model in a spanish spoken dialogue system. Procesamiento del Lenguaje Natural 37:19–24
Google Scholar

Download references

Acknowledgments

We would like to thank anonymous reviewers for their constructive comments and suggestions. We are also very grateful to Professor J. M. Benedí for his helpful comments in the first stage of this work. Finally, we would like to thank the Ametzagaiña group and, in particular, Josu Landa, for providing us with the linguistic classes and segmentation of the DIHANA corpus. This word has been partially supported by the University of the Basque Country under grant GIU07/57 and by CICYT under grant TIN2005-08660-C04-03.

Author information

Authors and Affiliations

Department of Electricity and Electronics, University of the Basque Country, 48940, Leioa, Spain
Raquel Justo & M. Inés Torres

Authors

Raquel Justo
View author publications
You can also search for this author in PubMed Google Scholar
M. Inés Torres
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Raquel Justo.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Justo, R., Torres, M.I. Phrase classes in two-level language models for ASR. Pattern Anal Applic 12, 427–437 (2009). https://doi.org/10.1007/s10044-009-0165-y

Download citation

Received: 06 February 2008
Accepted: 19 June 2008
Published: 19 August 2009
Issue Date: December 2009
DOI: https://doi.org/10.1007/s10044-009-0165-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Phrase classes in two-level language models for ASR

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

Early dementia detection with speech analysis and machine learning techniques

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Phrase classes in two-level language models for ASR

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

Early dementia detection with speech analysis and machine learning techniques

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation