Estimation Procedures for Language Context: Poor Estimates are Worse than None

Gale, W. A.; Church, K. W.

doi:10.1007/978-3-642-50096-1_11

W. A. Gale² &
K. W. Church²

658 Accesses
3 Citations

Abstract

It is difficult to estimate the probability of a word’s context because of sparse data problems. If appropriate care is taken, we find that it is possible to make useful estimates of contextual probabilities that improve performance in a spelling correction application. In contrast, less careful estimates are found to be useless. Specifically, we will show that the Good-Turing method makes the use of contextual information practical for a spelling corrector, while attempts to use the maximum likelihood estimator (MLE) or expected likelihood estimator (ELE) fail. Spelling correction was selected as an application domain because it is analogous to many important recognition applications based on a noisy channel model (such as speech recognition), though somewhat simpler and therefore possibly more amenable to detailed statistical analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Box, G. E. P., and G. C. Tiao, 1973 Bayesian Inference in Statistical Analysis, Addison-Wesley, Reading, Massachusetts.
Google Scholar
Brown, P., J. Cocke, S. Della Pietra, V. Della Pietra, F. Jelinek, R. Mercer, and P. Pietra, 1988, “A Statistical Approach to French/English Translation,” Proceedings RIA088, Conference on User-oriented Content-based Text and Image Handling, Cambridge, Mass, March 21–24.
Google Scholar
Chomsky, N., “Three Models for the Description of Language,” IRE Transactions on Information Theory, vol. IT-2, Proceedings of the Symposium on Information Theory, 1956.
Google Scholar
Church, K. W. and W. A. Gale, 1990, “Enhanced Good-Turing and Cat-Cal: Two New Methods for Estimating Probabilities of English Bigrams,” submiued to Computer, Speech, and Language, Academic Press.
Google Scholar
Gale, W. A. and K. W. Church, 1990, “What’s Wrong with Adding One?” submitted to IEEE Transactions on Acoustics, Speech, and Signal Processing.
Google Scholar
Good, I. J., 1953, “The population frequencies of species and the estimation of population parameters,” Biometrika, v. 40, pp. 237–264.
Google Scholar
Kernighan. M. D., K. W. Church, W. A. Gale, 1989, “A Spelling Corrector Based on Error Frequencies” Proceedings of the Thirteenth International Conference on Computational Linguistics
Google Scholar
Nadas, A., (1984), “Estimation of probabilities in the language model of the IBM speech recognition system,” IEEE Transactions on Acoustics, Speech, and Signal Processing, v. ASSP-32 pp. 859–861.
Google Scholar
Steinhaus, H., (1957), “The problem of estimation,” Annals of Mathematical Statistics, v. 28, pp. 633–648.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Murray Hill, USA
W. A. Gale & K. W. Church

Authors

W. A. Gale
View author publications
You can also search for this author in PubMed Google Scholar
K. W. Church
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Zagreb, Engelsova bb, 41000, Zagreb, Yugoslavia
Konstantin Momirović & Vesna Mildner MA &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gale, W.A., Church, K.W. (1990). Estimation Procedures for Language Context: Poor Estimates are Worse than None. In: Momirović, K., Mildner, V. (eds) Compstat. Physica-Verlag HD. https://doi.org/10.1007/978-3-642-50096-1_11

Download citation

DOI: https://doi.org/10.1007/978-3-642-50096-1_11
Publisher Name: Physica-Verlag HD
Print ISBN: 978-3-7908-0475-1
Online ISBN: 978-3-642-50096-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics