Abstract
Spelling correction is a fundamental task in text mining. In this study, we assess the real-word error correction model proposed by Mays, Damerau and Mercer and describe several drawbacks of the model. We propose a new variation which focuses on detecting and correcting multiple real-word errors in a sentence, by manipulating a probabilistic context-free grammar to discriminate between items in the search space. We test our approach on the Wall Street Journal corpus and show that it outperforms Hirst and Budanitsky’s WordNet-based method and Wilcox-O’Hearn, Hirst, and Budanitsky’s fixed windows size method.
Similar content being viewed by others
References
Bahl, L. R., Jelinek, F., & Mercer, R. L. (1983). A maximum likelihood approach to continuous speech recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 5(2), 179–190.
Clarkson, P., & Rosenfeld, R. (1997). Statistical language modeling using the CMU–Cambridge Toolkit. In Proceedings of the 5th European conference on speech communication and technology (Eurospeech) (pp. 2707–2710). Rhodes.
Flexner, S. B. (Ed.). (1983). Random house unabridged dictionary (2nd ed.). New York: Random House.
Fossati, D., & Di Eugenio, B. (2007). February. A mixed trigrams approach for context sensitive spell checking. In International conference on intelligent text processing and computational linguistics (pp. 623–633). Berlin: Springer.
Fossati, D., & Di Eugenio, B. (2008). May. I saw TREE trees in the park: How to Correct Real-Word Spelling Mistakes. In LREC.
Golding, A. R. (1995). A Bayesian hybrid method for context-sensitive spelling correction. In Proceedings 3rd workshop on very large corpora (pp. 39–53). Boston, MA.
Golding, A. R., & Roth, D. (1996). Applying winnow to context-sensitive spelling correction. In: L. Saitta (Ed.), Machine learning: Proceedings 13th international conference (pp. 182–190). Bari.
Golding, A. R., & Roth, D. (1999). A winnow-based approach to context-sensitive spelling correction. Machine Learning, 34(1–3), 107–130.
Golding, A. R., & Schabes, Y. (1996). Combining trigram-based and feature-based methods for context-sensitive spelling correction. In Proceedings 34th annual meeting of the association for computational linguistics (pp. 71–78). Santa Cruz, CA.
Hirst, G., & Budanitsky, A. (2005). Correcting real-word spelling errors by restoring lexical cohesion. Natural Language Engineering, 11(1), 87–111.
Klein, D., & Manning, C. D. (2003, July). Accurate unlexicalized parsing. In Proceedings of the 41st annual meeting on association for computational linguistics-volume 1 (pp. 423–430). Association for Computational Linguistics.
Kuenning, G., Willisson, P., Buehring, W., & Stevens, K. (2004). International ispell. Version, 3(00), 1–33.
Kukich, K. (1992). Techniques for automatically correcting words in text. ACM Computing Surveys (CSUR), 24(4), 377–439.
Mays, E., Damerau, F. J., & Mercer, R. L. (1991). Context based spelling correction. Information Processing and Management, 27(5), 517–522.
Pedler, J. (2007). Computer correction of real-word spelling errors in dyslexic text. Unpublished Ph.D. thesis. Birkbeck: University of London.
Petrov, S., Barrett, L., Thibaux, R., & Klein, D. (2006, July). Learning accurate, compact, and interpretable tree annotation. In Proceedings of the 21st international conference on computational linguistics and the 44th annual meeting of the association for computational linguistics (pp. 433–440). Association for Computational Linguistics.
Verberne, S. (2002). Context-sensitive spell checking based on word trigram probabilities. Unpublished master’s thesis, University of Nijmegen.
Voutilainen, A., & Heikkilä, J. (1993). An English Constraint Grammar (ENGCG) a surface-syntactic parser of English.
Wilcox-O’Hearn, A., Hirst, G., & Budanitsky, A. (2008). Real-word spelling correction with trigrams: A reconsideration of the Mays, Damerau, and Mercer model. In International conference on intelligent text processing and computational linguistics (pp. 605–616). Berlin: Springer.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Dashti, S.M. Real-word error correction with trigrams: correcting multiple errors in a sentence. Lang Resources & Evaluation 52, 485–502 (2018). https://doi.org/10.1007/s10579-017-9397-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10579-017-9397-4