Skip to main content
Log in

Probability scoring for spelling correction

Statistics and Computing Aims and scope Submit manuscript

Abstract

This paper describes a new program, CORRECT, which takes words rejected by the Unix® SPELL program, proposes a list of candidate corrections, and sorts them by probability score. The probability scores are the novel contribution of this work. They are based on a noisy channel model. It is assumed that the typist knows what words he or she wants to type but some noise is added on the way to the keyboard (in the form of typos and spelling errors). Using a classic Bayesian argument of the kind that is popular in recognition applications, especially speech recognition (Jelinek, 1985), one can often recover the intended correction,c, from a typo,t, by finding the correctionc that maximizesPr(c) Pr(t/c). The first factor,Pr(c), is a prior model of word probabilities; the second factor,Pr(t/c), is a model of the noisy channel that accounts for spelling transformations on letter sequences (insertions, deletions, substitutions and reversals). Both sets of probabilities were estimated using data collected from the Associated Press (AP) newswire over 1988 and 1989 as a training set. The AP generates about 1 million words and 500 typos per week.

In evaluating the program, we found that human judges were extremely reluctant to cast a vote given only the information available to the program, and that they were much more comfortable when they could see a concordance line or two. The second half of this paper discusses some very simple methods of modeling the context usingn-gram statistics. Althoughn-gram methods are much too simple (compared with much more sophisticated methods used in artificial intelligence and natural language processing), we have found that even these very simple methods illustrate some very interesting estimation problems that will almost certainly come up when we consider more sophisticated models of contexts. The problem is how to estimate the probability of a context that we have not seen. We compare several estimation techniques and find that some are useless. Fortunately, we have found that the Good-Turing method provides an estimate of contextual probabilities that produces a significant improvement in program performance. Context is helpful in this application, but only if it is estimated very carefully.

At this point, we have a number of different knowledge sources—the prior, the channel and the context—and there will certainly be more in the future. In general, performance will be improved as more and more knowledge sources are added to the system, as long as each additional knowledge source provides some new (independent) information. As we shall see, it is important to think more carefully about combination rules, especially when there are a large number of different knowledge sources.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

  • Box, G. E. P. and G. C. Tiao (1973)Bayesian Inference in Statistical Analysis, Addison-Wesley, Reading, Mass.

    Google Scholar 

  • Chapman, R. (1977)Roget's International Thesaurus (4th edn), Harper & Row, New York.

    Google Scholar 

  • Church, K. W. (1989) A stochastic parts program and noun phrase parser for unrestricted text, inProceedings, IEEE International Conference on Acoustics, Speech and Signal Processing, 1989, Glasgow.

  • Church, K. W. and W. A. Gale (1991) A comparison of the enhanced Good-Turing and deleted estimation methods for estimating probabilities of English bigrams.Computer Speech and Language,5, 19–54.

    Google Scholar 

  • Elbeze, M. and A-M. Deroualt (1990) A morphological model for large vocabulary speech recognition.Proceedings, IEEE International Conference on Acoustics, Speech and Signal Processing, 1990, Albuquerque.

  • Gale, W. A. and K. W. Church (1989) What's wrong with adding one? AT&T Bell Laboratories Statistical Research Reports, No. 90. November 1989.

  • Good, I. J. (1953) The population frequencies of species and the estimation of population parameters.Biometrika,40, 237–264.

    Google Scholar 

  • Hanks, P., T. Long and L. Urdang (eds) (1979)Collins Dictionary of the English Language, Collins, London.

    Google Scholar 

  • Hindle, D. (1983) User manual for Fidditch, a deterministic parser. Naval Research Laboratory Technical Memorandum 7590-142.

  • Jelinek, F. (1985) Self-organized language modeling for speech recognition. IBM report.

  • Kucera, H. (1988) Automated word substitution using numerical rankings of structural disparity between misspelled words & candidate substitution words. US Patent no. 4 783 758.

  • Mays, M., F. Damerau and R. Mercer (1990) Context based spelling correction. IBM internal memo, RC 15803 (#730266).

  • McIlroy, M. (1982) Development of a spelling list.IEEE Transactions on Communications,30(1).

  • Salton, G. (1989)Automatic Text Processing, Addison-Wesley, Reading, Mass.

    Google Scholar 

  • Seneff, S. (1989) Probabilistic parsing for spoken language applications. Paper presented at the International Workshop on Parsing Technologies, Carnegie Mellon University.

  • Sinclair, J., P. Hanks, G. Fox et al (eds) (1987)Collins Cobuild English Language Dictionary, Collins London.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Church, K.W., Gale, W.A. Probability scoring for spelling correction. Stat Comput 1, 93–103 (1991). https://doi.org/10.1007/BF01889984

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01889984

Keywords

Navigation