Abstract
The trigram-based noisy-channel model of real-word spelling-error correction that was presented by Mays, Damerau, and Mercer in 1991 has never been adequately evaluated or compared with other methods. We analyze the advantages and limitations of the method, and present a new evaluation that enables a meaningful comparison with the WordNet-based method of Hirst and Budanitsky. The trigram method is found to be superior, even on content words. We then show that optimizing over sentences gives better results than variants of the algorithm that optimize over fixed-length windows.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bahl, L.R., et al.: Recognition of a continuously read natural corpus. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 1978), Tulsa, vol. 3, pp. 422–424 (1978)
Bahl, L.R., Jelinek, F., Mercer, R.L.: A maximum likelihood approach to continuous speech recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 5(2), 179–190 (1983)
Brill, E., Moore, R.C.: An improved error model for noisy channel spelling correction. In: Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, Hong Kong, pp. 286–293 (2000)
Church, K.W., Gale, W.A.: Probability scoring for spelling correction. Statistics and Computing 1, 93–103 (1991)
Clarkson, P., Rosenfeld, R.: Statistical language modeling using the CMU–Cambridge Toolkit. In: Proceedings of the 5th European Conference on Speech Communication and Technology (Eurospeech), Rhodes, pp. 2707–2710 (1997)
Golding, A.R., Roth, D.: A Winnow-based approach to context-sensitive spelling correction. Machine Learning 34(1–3), 107–130 (1999)
Hirst, G., Budanitsky, A.: Correcting real-word spelling errors by restoring lexical cohesion. Natural Language Engineering 11(1), 87–111 (2005)
Kukich, K.: Techniques for automatically correcting words in text. Computing Surveys 24(4), 377–439 (1992)
Mays, E., Damerau, F.J., Mercer, R.L.: Context based spelling correction. Information Processing and Management 23(5), 517–522 (1991)
Toutanova, K., Moore, R.C.: Pronunciation modeling for improved spelling correction. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, pp. 144–151 (2002)
Verberne, S.: Context-sensitive spell [sic] checking based on trigram probabilities. Master’s thesis, University of Nijmegen (2002)
Wilcox-O’Hearn, L.A.: Applying trigram models to real-word spelling correction. MSc thesis, Department of Computer Science, University of Toronto (forthcoming, 2008)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wilcox-O’Hearn, A., Hirst, G., Budanitsky, A. (2008). Real-Word Spelling Correction with Trigrams: A Reconsideration of the Mays, Damerau, and Mercer Model. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2008. Lecture Notes in Computer Science, vol 4919. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78135-6_52
Download citation
DOI: https://doi.org/10.1007/978-3-540-78135-6_52
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78134-9
Online ISBN: 978-3-540-78135-6
eBook Packages: Computer ScienceComputer Science (R0)