Skip to main content

Advertisement

SpringerLink
Log in
Menu
Find a journal Publish with us Track your research
Search
Cart
Book cover

Iberoamerican Congress on Pattern Recognition

CIARP 2005: Progress in Pattern Recognition, Image Analysis and Applications pp 556–565Cite as

  1. Home
  2. Progress in Pattern Recognition, Image Analysis and Applications
  3. Conference paper
Statistical and Linguistic Clustering for Language Modeling in ASR

Statistical and Linguistic Clustering for Language Modeling in ASR

  • R. Justo18 &
  • I. Torres18 
  • Conference paper
  • 1050 Accesses

Part of the Lecture Notes in Computer Science book series (LNIP,volume 3773)

Abstract

In this work several sets of categories obtained by a statistical clustering algorithm, as well as a linguistic set, were used to design category-based language models. The language models proposed were evaluated, as usual, in terms of perplexity of the text corpus. Then they were integrated into an ASR system and also evaluated in terms of system performance. It can be seen that category-based language models can perform better, also in terms of WER, when categories are obtained through statistical models instead of using linguistic techniques. They also show that better system performance are obtained when the language model interpolates category based and word based models.

Keywords

  • Language Model
  • Statistical Cluster
  • Training Corpus
  • Text Corpus
  • Speech Recognition System

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This work has been partially supported by the CICYT proyect TIC2002-04103-C03-02 and by the Universidad del País Vasco under grant 9/UPV 00224.310-13566/2001.

Chapter PDF

Download to read the full chapter text

References

  1. Niesler, T.: Category-based statistical language models. PhD thesis, Department of Engineering, University of Cambridge, U.K. (1997)

    Google Scholar 

  2. Brown, P.F., de Souza, P.V., Mercer, R.L., Pietra, V.J.D., Lai, J.C.: Class-based n-gram models of natural language. Comput. Linguist. 18, 467–479 (1992)

    Google Scholar 

  3. Linares, D., Benedí, J., Sánchez, J.: A hybrid language model based on a combination of n-grams and stochastic context-free grammars. ACM Trans. on Asian Language Information Processing 3, 113–127 (2004)

    CrossRef  Google Scholar 

  4. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley-Interscience, Hoboken (2000)

    Google Scholar 

  5. Martin, S., Liermann, J., Ney, H.: Algorithms for bigram and trigram word clustering. Speech Communication 24, 19–37 (1998)

    CrossRef  Google Scholar 

  6. Barrachina, S.: Técnicas de agrupamiento bilingue aplicada a la inferencia de traductores. PhD thesis, Universidad Jaume I, Departamento de Ingeniería y Ciencia de los Computadores (2003)

    Google Scholar 

  7. Niesler, T.R., Woodland, P.C.: A variable-length category-based n-gram language model. In: IEEE ICASSP 1996, Atlanta, GA, vol. I, pp. 164–167. IEEE, Los Alamitos (1996)

    Google Scholar 

  8. Nevado, F., Sánchez, J., Benedí, J.: Lexical decoding based on the combination of category-based stochastic models and word-category distribution models. In: IX Spanish Symposium on Pattern Recognition and Image Analysis, Castellón, Spain, vol. 1, pp. 183–188. Publicacions de la Universitat Jaume I (2001)

    Google Scholar 

  9. Proyecto BASURDE: Spontaneus-Speech Dialogue System in Limited Domains. Comisión Interministerial de Ciencia y Tecnología TIC98-423-C06 (1998-2001) http://gps-tsc.upc.es/veu/basurde/Home.htm

  10. Torres, I., Varona, A.: k-TSS language models in speech recognition systems. Computer Speech and Language 15, 127–149 (2001)

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

  1. Departamento de Electricidad y Electrónica, Facultad de Ciencia y Tecnología, Universidad del País Vasco,  

    R. Justo & I. Torres

Authors
  1. R. Justo
    View author publications

    You can also search for this author in PubMed Google Scholar

  2. I. Torres
    View author publications

    You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

  1. Dept. System Engineering and Automation, Universitat Politècnica de Catalunya (UPC) Barcelona, Spain

    Alberto Sanfeliu

  2. Pattern Recognition Group, ICIMAF, Havana, Cuba

    Manuel Lazo Cortés

Rights and permissions

Reprints and Permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Justo, R., Torres, I. (2005). Statistical and Linguistic Clustering for Language Modeling in ASR. In: Sanfeliu, A., Cortés, M.L. (eds) Progress in Pattern Recognition, Image Analysis and Applications. CIARP 2005. Lecture Notes in Computer Science, vol 3773. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11578079_58

Download citation

  • .RIS
  • .ENW
  • .BIB
  • DOI: https://doi.org/10.1007/11578079_58

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29850-2

  • Online ISBN: 978-3-540-32242-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Publish with us

Policies and ethics

  • The International Association for Pattern Recognition

    Published in cooperation with

    http://www.iapr.org/

search

Navigation

  • Find a journal
  • Publish with us
  • Track your research

Discover content

  • Journals A-Z
  • Books A-Z

Publish with us

  • Publish your research
  • Open access publishing

Products and services

  • Our products
  • Librarians
  • Societies
  • Partners and advertisers

Our imprints

  • Springer
  • Nature Portfolio
  • BMC
  • Palgrave Macmillan
  • Apress
  • Your US state privacy rights
  • Accessibility statement
  • Terms and conditions
  • Privacy policy
  • Help and support
  • Cancel contracts here

167.114.118.210

Not affiliated

Springer Nature

© 2023 Springer Nature