Skip to main content

Estimation Procedures for Language Context: Poor Estimates are Worse than None

  • Conference paper
Compstat

Abstract

It is difficult to estimate the probability of a word’s context because of sparse data problems. If appropriate care is taken, we find that it is possible to make useful estimates of contextual probabilities that improve performance in a spelling correction application. In contrast, less careful estimates are found to be useless. Specifically, we will show that the Good-Turing method makes the use of contextual information practical for a spelling corrector, while attempts to use the maximum likelihood estimator (MLE) or expected likelihood estimator (ELE) fail. Spelling correction was selected as an application domain because it is analogous to many important recognition applications based on a noisy channel model (such as speech recognition), though somewhat simpler and therefore possibly more amenable to detailed statistical analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Box, G. E. P., and G. C. Tiao, 1973 Bayesian Inference in Statistical Analysis, Addison-Wesley, Reading, Massachusetts.

    Google Scholar 

  • Brown, P., J. Cocke, S. Della Pietra, V. Della Pietra, F. Jelinek, R. Mercer, and P. Pietra, 1988, “A Statistical Approach to French/English Translation,” Proceedings RIA088, Conference on User-oriented Content-based Text and Image Handling, Cambridge, Mass, March 21–24.

    Google Scholar 

  • Chomsky, N., “Three Models for the Description of Language,” IRE Transactions on Information Theory, vol. IT-2, Proceedings of the Symposium on Information Theory, 1956.

    Google Scholar 

  • Church, K. W. and W. A. Gale, 1990, “Enhanced Good-Turing and Cat-Cal: Two New Methods for Estimating Probabilities of English Bigrams,” submiued to Computer, Speech, and Language, Academic Press.

    Google Scholar 

  • Gale, W. A. and K. W. Church, 1990, “What’s Wrong with Adding One?” submitted to IEEE Transactions on Acoustics, Speech, and Signal Processing.

    Google Scholar 

  • Good, I. J., 1953, “The population frequencies of species and the estimation of population parameters,” Biometrika, v. 40, pp. 237–264.

    Google Scholar 

  • Kernighan. M. D., K. W. Church, W. A. Gale, 1989, “A Spelling Corrector Based on Error Frequencies” Proceedings of the Thirteenth International Conference on Computational Linguistics

    Google Scholar 

  • Nadas, A., (1984), “Estimation of probabilities in the language model of the IBM speech recognition system,” IEEE Transactions on Acoustics, Speech, and Signal Processing, v. ASSP-32 pp. 859–861.

    Google Scholar 

  • Steinhaus, H., (1957), “The problem of estimation,” Annals of Mathematical Statistics, v. 28, pp. 633–648.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1990 Physica-Verlag Heidelberg

About this paper

Cite this paper

Gale, W.A., Church, K.W. (1990). Estimation Procedures for Language Context: Poor Estimates are Worse than None. In: Momirović, K., Mildner, V. (eds) Compstat. Physica-Verlag HD. https://doi.org/10.1007/978-3-642-50096-1_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-50096-1_11

  • Publisher Name: Physica-Verlag HD

  • Print ISBN: 978-3-7908-0475-1

  • Online ISBN: 978-3-642-50096-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics