Skip to main content
Log in

Real-word error correction with trigrams: correcting multiple errors in a sentence

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

Spelling correction is a fundamental task in text mining. In this study, we assess the real-word error correction model proposed by Mays, Damerau and Mercer and describe several drawbacks of the model. We propose a new variation which focuses on detecting and correcting multiple real-word errors in a sentence, by manipulating a probabilistic context-free grammar to discriminate between items in the search space. We test our approach on the Wall Street Journal corpus and show that it outperforms Hirst and Budanitsky’s WordNet-based method and Wilcox-O’Hearn, Hirst, and Budanitsky’s fixed windows size method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Bahl, L. R., Jelinek, F., & Mercer, R. L. (1983). A maximum likelihood approach to continuous speech recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 5(2), 179–190.

    Article  Google Scholar 

  • Clarkson, P., & Rosenfeld, R. (1997). Statistical language modeling using the CMU–Cambridge Toolkit. In Proceedings of the 5th European conference on speech communication and technology (Eurospeech) (pp. 2707–2710). Rhodes.

  • Flexner, S. B. (Ed.). (1983). Random house unabridged dictionary (2nd ed.). New York: Random House.

    Google Scholar 

  • Fossati, D., & Di Eugenio, B. (2007). February. A mixed trigrams approach for context sensitive spell checking. In International conference on intelligent text processing and computational linguistics (pp. 623–633). Berlin: Springer.

  • Fossati, D., & Di Eugenio, B. (2008). May. I saw TREE trees in the park: How to Correct Real-Word Spelling Mistakes. In LREC.

  • Golding, A. R. (1995). A Bayesian hybrid method for context-sensitive spelling correction. In Proceedings 3rd workshop on very large corpora (pp. 39–53). Boston, MA.

  • Golding, A. R., & Roth, D. (1996). Applying winnow to context-sensitive spelling correction. In: L. Saitta (Ed.), Machine learning: Proceedings 13th international conference (pp. 182–190). Bari.

  • Golding, A. R., & Roth, D. (1999). A winnow-based approach to context-sensitive spelling correction. Machine Learning, 34(1–3), 107–130.

    Article  Google Scholar 

  • Golding, A. R., & Schabes, Y. (1996). Combining trigram-based and feature-based methods for context-sensitive spelling correction. In Proceedings 34th annual meeting of the association for computational linguistics (pp. 71–78). Santa Cruz, CA.

  • Hirst, G., & Budanitsky, A. (2005). Correcting real-word spelling errors by restoring lexical cohesion. Natural Language Engineering, 11(1), 87–111.

    Article  Google Scholar 

  • Klein, D., & Manning, C. D. (2003, July). Accurate unlexicalized parsing. In Proceedings of the 41st annual meeting on association for computational linguistics-volume 1 (pp. 423–430). Association for Computational Linguistics.

  • Kuenning, G., Willisson, P., Buehring, W., & Stevens, K. (2004). International ispell. Version, 3(00), 1–33.

    Google Scholar 

  • Kukich, K. (1992). Techniques for automatically correcting words in text. ACM Computing Surveys (CSUR), 24(4), 377–439.

    Article  Google Scholar 

  • Mays, E., Damerau, F. J., & Mercer, R. L. (1991). Context based spelling correction. Information Processing and Management, 27(5), 517–522.

    Article  Google Scholar 

  • Pedler, J. (2007). Computer correction of real-word spelling errors in dyslexic text. Unpublished Ph.D. thesis. Birkbeck: University of London.

  • Petrov, S., Barrett, L., Thibaux, R., & Klein, D. (2006, July). Learning accurate, compact, and interpretable tree annotation. In Proceedings of the 21st international conference on computational linguistics and the 44th annual meeting of the association for computational linguistics (pp. 433–440). Association for Computational Linguistics.

  • Verberne, S. (2002). Context-sensitive spell checking based on word trigram probabilities. Unpublished master’s thesis, University of Nijmegen.

  • Voutilainen, A., & Heikkilä, J. (1993). An English Constraint Grammar (ENGCG) a surface-syntactic parser of English.

  • Wilcox-O’Hearn, A., Hirst, G., & Budanitsky, A. (2008). Real-word spelling correction with trigrams: A reconsideration of the Mays, Damerau, and Mercer model. In International conference on intelligent text processing and computational linguistics (pp. 605–616). Berlin: Springer.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Seyed MohammadSadegh Dashti.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dashti, S.M. Real-word error correction with trigrams: correcting multiple errors in a sentence. Lang Resources & Evaluation 52, 485–502 (2018). https://doi.org/10.1007/s10579-017-9397-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-017-9397-4

Keywords

Navigation