Skip to main content

Forming Word Classes by Statistical Clustering for Statistical Language Modelling

  • Chapter
Contributions to Quantitative Linguistics

Abstract

In statistical language modelling there is always a problem of sparse data. A way to reduce this problem is to form groups of words in order to get equivalence classes. In this paper we present a clustering algorithm that builds abstract word equivalence classes. The algorithm finds a local optimum according to a maximum-likelihood criterion. Experiments were made on an English 1.1-million word corpus and a German 100,000-word corpus. Compared to a word bigram model, the use of clustered equivalence classes in a bigram class model leads to a significant improvement, as measured by the perplexity. Depending on the size of the training material, the automatically clustered word classes are even better than manually determined categories.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Bahl, L.R.; Jelinek, F.; Mercer, R.L. (1983): A maximum likelihood approach to continuous speech recognition. In: IEEE Trans. on Pattern Analysis and Machine Intelligence 5 (March), 179–190.

    Google Scholar 

  • Derouault, A.M.; Merialdo, B. (1986): Natural language modeling for phoneme-to-text transcription. In: IEEE Trans. on Pattern Analysis and Machine Intelligence 8 (Nov.), 742–749.

    Google Scholar 

  • Duda, R.O.; Hart, P.E. (1973): Pattern Classification and Scene Analysis. New York: Wiley

    Google Scholar 

  • Kuhn, R.; de Mori, R. (1990): A cache-based natural language model for speech recognition. In: IEEE Trans. on Pattern Analysis and Machine Intelligence 12 (June), 570–583.

    Google Scholar 

  • Ney, H.; Essen, U. (1991): On smoothing techniques for bigram-based natural language modelling. In: Proc. ICASSP 2 (May), 825–828.

    Google Scholar 

  • Steinbiss, V.; Noll, A.; Paeseler, A.; Ney, H. et al. (1990): A 10000-word continuous-speech recognition system. In: Proc. ICASSP, Vol. 1 (April), 57–60.

    Google Scholar 

Download references

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1993 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Kneser, R., Ney, H. (1993). Forming Word Classes by Statistical Clustering for Statistical Language Modelling. In: Köhler, R., Rieger, B.B. (eds) Contributions to Quantitative Linguistics. Springer, Dordrecht. https://doi.org/10.1007/978-94-011-1769-2_15

Download citation

  • DOI: https://doi.org/10.1007/978-94-011-1769-2_15

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-94-010-4777-7

  • Online ISBN: 978-94-011-1769-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics