Real-Word Spelling Correction with Trigrams: A Reconsideration of the Mays, Damerau, and Mercer Model
- 1.3k Downloads
The trigram-based noisy-channel model of real-word spelling-error correction that was presented by Mays, Damerau, and Mercer in 1991 has never been adequately evaluated or compared with other methods. We analyze the advantages and limitations of the method, and present a new evaluation that enables a meaningful comparison with the WordNet-based method of Hirst and Budanitsky. The trigram method is found to be superior, even on content words. We then show that optimizing over sentences gives better results than variants of the algorithm that optimize over fixed-length windows.
KeywordsContent Word Correction Recall Spelling Correction Original Sentence Spelling Variation
Unable to display preview. Download preview PDF.
- Bahl, L.R., et al.: Recognition of a continuously read natural corpus. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 1978), Tulsa, vol. 3, pp. 422–424 (1978)Google Scholar
- Brill, E., Moore, R.C.: An improved error model for noisy channel spelling correction. In: Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, Hong Kong, pp. 286–293 (2000)Google Scholar
- Clarkson, P., Rosenfeld, R.: Statistical language modeling using the CMU–Cambridge Toolkit. In: Proceedings of the 5th European Conference on Speech Communication and Technology (Eurospeech), Rhodes, pp. 2707–2710 (1997)Google Scholar
- Toutanova, K., Moore, R.C.: Pronunciation modeling for improved spelling correction. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, pp. 144–151 (2002)Google Scholar
- Verberne, S.: Context-sensitive spell [sic] checking based on trigram probabilities. Master’s thesis, University of Nijmegen (2002)Google Scholar
- Wilcox-O’Hearn, L.A.: Applying trigram models to real-word spelling correction. MSc thesis, Department of Computer Science, University of Toronto (forthcoming, 2008)Google Scholar