Real-Word Spelling Correction with Trigrams: A Reconsideration of the Mays, Damerau, and Mercer Model

  • Amber Wilcox-O’Hearn
  • Graeme Hirst
  • Alexander Budanitsky
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4919)

Abstract

The trigram-based noisy-channel model of real-word spelling-error correction that was presented by Mays, Damerau, and Mercer in 1991 has never been adequately evaluated or compared with other methods. We analyze the advantages and limitations of the method, and present a new evaluation that enables a meaningful comparison with the WordNet-based method of Hirst and Budanitsky. The trigram method is found to be superior, even on content words. We then show that optimizing over sentences gives better results than variants of the algorithm that optimize over fixed-length windows.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bahl, L.R., et al.: Recognition of a continuously read natural corpus. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 1978), Tulsa, vol. 3, pp. 422–424 (1978)Google Scholar
  2. Bahl, L.R., Jelinek, F., Mercer, R.L.: A maximum likelihood approach to continuous speech recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 5(2), 179–190 (1983)CrossRefGoogle Scholar
  3. Brill, E., Moore, R.C.: An improved error model for noisy channel spelling correction. In: Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, Hong Kong, pp. 286–293 (2000)Google Scholar
  4. Church, K.W., Gale, W.A.: Probability scoring for spelling correction. Statistics and Computing 1, 93–103 (1991)CrossRefGoogle Scholar
  5. Clarkson, P., Rosenfeld, R.: Statistical language modeling using the CMU–Cambridge Toolkit. In: Proceedings of the 5th European Conference on Speech Communication and Technology (Eurospeech), Rhodes, pp. 2707–2710 (1997)Google Scholar
  6. Golding, A.R., Roth, D.: A Winnow-based approach to context-sensitive spelling correction. Machine Learning 34(1–3), 107–130 (1999)MATHCrossRefGoogle Scholar
  7. Hirst, G., Budanitsky, A.: Correcting real-word spelling errors by restoring lexical cohesion. Natural Language Engineering 11(1), 87–111 (2005)CrossRefGoogle Scholar
  8. Kukich, K.: Techniques for automatically correcting words in text. Computing Surveys 24(4), 377–439 (1992)CrossRefGoogle Scholar
  9. Mays, E., Damerau, F.J., Mercer, R.L.: Context based spelling correction. Information Processing and Management 23(5), 517–522 (1991)CrossRefGoogle Scholar
  10. Toutanova, K., Moore, R.C.: Pronunciation modeling for improved spelling correction. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, pp. 144–151 (2002)Google Scholar
  11. Verberne, S.: Context-sensitive spell [sic] checking based on trigram probabilities. Master’s thesis, University of Nijmegen (2002)Google Scholar
  12. Wilcox-O’Hearn, L.A.: Applying trigram models to real-word spelling correction. MSc thesis, Department of Computer Science, University of Toronto (forthcoming, 2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Amber Wilcox-O’Hearn
    • 1
  • Graeme Hirst
    • 1
  • Alexander Budanitsky
    • 1
  1. 1.Department of Computer ScienceUniversity of TorontoTorontoCanada

Personalised recommendations