This paper describes a new program, CORRECT, which takes words rejected by the Unix® SPELL program, proposes a list of candidate corrections, and sorts them by probability score. The probability scores are the novel contribution of this work. They are based on a noisy channel model. It is assumed that the typist knows what words he or she wants to type but some noise is added on the way to the keyboard (in the form of typos and spelling errors). Using a classic Bayesian argument of the kind that is popular in recognition applications, especially speech recognition (Jelinek, 1985), one can often recover the intended correction,c, from a typo,t, by finding the correctionc that maximizesPr(c) Pr(t/c). The first factor,Pr(c), is a prior model of word probabilities; the second factor,Pr(t/c), is a model of the noisy channel that accounts for spelling transformations on letter sequences (insertions, deletions, substitutions and reversals). Both sets of probabilities were estimated using data collected from the Associated Press (AP) newswire over 1988 and 1989 as a training set. The AP generates about 1 million words and 500 typos per week.
In evaluating the program, we found that human judges were extremely reluctant to cast a vote given only the information available to the program, and that they were much more comfortable when they could see a concordance line or two. The second half of this paper discusses some very simple methods of modeling the context usingn-gram statistics. Althoughn-gram methods are much too simple (compared with much more sophisticated methods used in artificial intelligence and natural language processing), we have found that even these very simple methods illustrate some very interesting estimation problems that will almost certainly come up when we consider more sophisticated models of contexts. The problem is how to estimate the probability of a context that we have not seen. We compare several estimation techniques and find that some are useless. Fortunately, we have found that the Good-Turing method provides an estimate of contextual probabilities that produces a significant improvement in program performance. Context is helpful in this application, but only if it is estimated very carefully.
At this point, we have a number of different knowledge sources—the prior, the channel and the context—and there will certainly be more in the future. In general, performance will be improved as more and more knowledge sources are added to the system, as long as each additional knowledge source provides some new (independent) information. As we shall see, it is important to think more carefully about combination rules, especially when there are a large number of different knowledge sources.
KeywordsAutomated learning spelling correction n-gram language model Good-Turing estimates
Unable to display preview. Download preview PDF.
- Box, G. E. P. and G. C. Tiao (1973)Bayesian Inference in Statistical Analysis, Addison-Wesley, Reading, Mass.Google Scholar
- Chapman, R. (1977)Roget's International Thesaurus (4th edn), Harper & Row, New York.Google Scholar
- Church, K. W. (1989) A stochastic parts program and noun phrase parser for unrestricted text, inProceedings, IEEE International Conference on Acoustics, Speech and Signal Processing, 1989, Glasgow.Google Scholar
- Church, K. W. and W. A. Gale (1991) A comparison of the enhanced Good-Turing and deleted estimation methods for estimating probabilities of English bigrams.Computer Speech and Language,5, 19–54.Google Scholar
- Elbeze, M. and A-M. Deroualt (1990) A morphological model for large vocabulary speech recognition.Proceedings, IEEE International Conference on Acoustics, Speech and Signal Processing, 1990, Albuquerque.Google Scholar
- Gale, W. A. and K. W. Church (1989) What's wrong with adding one? AT&T Bell Laboratories Statistical Research Reports, No. 90. November 1989.Google Scholar
- Good, I. J. (1953) The population frequencies of species and the estimation of population parameters.Biometrika,40, 237–264.Google Scholar
- Hanks, P., T. Long and L. Urdang (eds) (1979)Collins Dictionary of the English Language, Collins, London.Google Scholar
- Hindle, D. (1983) User manual for Fidditch, a deterministic parser. Naval Research Laboratory Technical Memorandum 7590-142.Google Scholar
- Jelinek, F. (1985) Self-organized language modeling for speech recognition. IBM report.Google Scholar
- Kucera, H. (1988) Automated word substitution using numerical rankings of structural disparity between misspelled words & candidate substitution words. US Patent no. 4 783 758.Google Scholar
- Mays, M., F. Damerau and R. Mercer (1990) Context based spelling correction. IBM internal memo, RC 15803 (#730266).Google Scholar
- McIlroy, M. (1982) Development of a spelling list.IEEE Transactions on Communications,30(1).Google Scholar
- Salton, G. (1989)Automatic Text Processing, Addison-Wesley, Reading, Mass.Google Scholar
- Seneff, S. (1989) Probabilistic parsing for spoken language applications. Paper presented at the International Workshop on Parsing Technologies, Carnegie Mellon University.Google Scholar
- Sinclair, J., P. Hanks, G. Fox et al (eds) (1987)Collins Cobuild English Language Dictionary, Collins London.Google Scholar