Abstract
The spelling correction problem, the task of automatically correcting misspellings in a text, is critical in natural language processing (NLP). Although it can be considered a standalone task, in most cases, it is an integral component of various NLP tasks as a preprocessing step since a dataset with typos can lead to erroneous results. Many previous automatic spelling correctors use a dictionary, independently search the word in a predefined list of words, and recommend the most similar one without considering the context. Even though these models’ output may be a correctly spelled word, it could be semantically incorrect. Therefore, some correctors consider the context when correcting typos based on language models. However, only employing the language model is insufficient, and the corrected word should be similar to the misspelled word. In our approach, we select a candidate for the typo based on masked language model output, character-level similarities, and edit distance. Exploiting the combination of the masked language model, character-level similarities, and edit distance assists us in recommending similar context-related candidates. We have used recall (correction rate) as our evaluation metric, and the results demonstrate a considerable improvement compared with previous studies.
F. Tohidian and A. Kashiri—These authors contributed equally to this work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Jayanthi, S.M., Pruthi, D., Neubig, G.: NeuSpell: a neural spelling correction toolkit. arXiv preprint arXiv:2010.11085 (2020)
Hládek, D., Staš, J., Pleva, M.: Survey of automatic spelling correction. Electronics 9(10), 1670 (2020)
Fahda, A., Purwarianti, A.: A statistical and rule-based spelling and grammar checker for Indonesian text. In: 2017 International Conference on Data and Software Engineering (ICoDSE), pp. 1–6. IEEE (2017)
Yunus, A., Masum, M.: A context free spell correction method using supervised machine learning algorithms. Int. J. Comput. Appl. 975, 8887 (2020)
Huang, G., Chen, J., Sun, Z.: A correction method of word spelling mistake for English text. In: Journal of Physics: Conference Series, vol. 1693, no. 1, p. 012118. IOP Publishing (2020)
Carlson, A., Fette, I.: Memory-based context-sensitive spelling correction at web scale. In: Sixth International Conference on Machine Learning and Applications (ICMLA 2007), pp. 166–171. IEEE (2007)
Bassil, Y., Alwani, M.: Context-sensitive spelling correction using google web 1t 5-gram information. arXiv preprint arXiv:1204.5852 (2012)
Hu, Y., Jing, X., Ko, Y., Rayz, J.T.: Misspelling correction with pre-trained contextual language model. In: 2020 IEEE 19th International Conference on Cognitive Informatics & Cognitive Computing (ICCI* CC), pp. 144–149. IEEE (2020)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Zhang, S., Huang, H., Liu, J., Li, H.: Spelling error correction with soft-masked BERT. In: ACL (2020)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. In: Soviet Physics Doklady, vol. 10, no. 8, pp. 707–710 (1966)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Tohidian, F., Kashiri, A., Lotfi, F. (2023). BEDSpell: Spelling Error Correction Using BERT-Based Masked Language Model and Edit Distance. In: Troya, J., et al. Service-Oriented Computing – ICSOC 2022 Workshops. ICSOC 2022. Lecture Notes in Computer Science, vol 13821. Springer, Cham. https://doi.org/10.1007/978-3-031-26507-5_1
Download citation
DOI: https://doi.org/10.1007/978-3-031-26507-5_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26506-8
Online ISBN: 978-3-031-26507-5
eBook Packages: Computer ScienceComputer Science (R0)