Skip to main content

BEDSpell: Spelling Error Correction Using BERT-Based Masked Language Model and Edit Distance

  • Conference paper
  • First Online:
Service-Oriented Computing – ICSOC 2022 Workshops (ICSOC 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13821))

Included in the following conference series:

  • 753 Accesses

Abstract

The spelling correction problem, the task of automatically correcting misspellings in a text, is critical in natural language processing (NLP). Although it can be considered a standalone task, in most cases, it is an integral component of various NLP tasks as a preprocessing step since a dataset with typos can lead to erroneous results. Many previous automatic spelling correctors use a dictionary, independently search the word in a predefined list of words, and recommend the most similar one without considering the context. Even though these models’ output may be a correctly spelled word, it could be semantically incorrect. Therefore, some correctors consider the context when correcting typos based on language models. However, only employing the language model is insufficient, and the corrected word should be similar to the misspelled word. In our approach, we select a candidate for the typo based on masked language model output, character-level similarities, and edit distance. Exploiting the combination of the masked language model, character-level similarities, and edit distance assists us in recommending similar context-related candidates. We have used recall (correction rate) as our evaluation metric, and the results demonstrate a considerable improvement compared with previous studies.

F. Tohidian and A. Kashiri—These authors contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://catalog.ldc.upenn.edu/LDC2006T13.

  2. 2.

    https://github.com/IntuitionEngineeringTeam/chars2vec.

References

  1. Jayanthi, S.M., Pruthi, D., Neubig, G.: NeuSpell: a neural spelling correction toolkit. arXiv preprint arXiv:2010.11085 (2020)

  2. Hládek, D., Staš, J., Pleva, M.: Survey of automatic spelling correction. Electronics 9(10), 1670 (2020)

    Article  Google Scholar 

  3. Fahda, A., Purwarianti, A.: A statistical and rule-based spelling and grammar checker for Indonesian text. In: 2017 International Conference on Data and Software Engineering (ICoDSE), pp. 1–6. IEEE (2017)

    Google Scholar 

  4. Yunus, A., Masum, M.: A context free spell correction method using supervised machine learning algorithms. Int. J. Comput. Appl. 975, 8887 (2020)

    Google Scholar 

  5. Huang, G., Chen, J., Sun, Z.: A correction method of word spelling mistake for English text. In: Journal of Physics: Conference Series, vol. 1693, no. 1, p. 012118. IOP Publishing (2020)

    Google Scholar 

  6. Carlson, A., Fette, I.: Memory-based context-sensitive spelling correction at web scale. In: Sixth International Conference on Machine Learning and Applications (ICMLA 2007), pp. 166–171. IEEE (2007)

    Google Scholar 

  7. Bassil, Y., Alwani, M.: Context-sensitive spelling correction using google web 1t 5-gram information. arXiv preprint arXiv:1204.5852 (2012)

  8. Hu, Y., Jing, X., Ko, Y., Rayz, J.T.: Misspelling correction with pre-trained contextual language model. In: 2020 IEEE 19th International Conference on Cognitive Informatics & Cognitive Computing (ICCI* CC), pp. 144–149. IEEE (2020)

    Google Scholar 

  9. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  10. Zhang, S., Huang, H., Liu, J., Li, H.: Spelling error correction with soft-masked BERT. In: ACL (2020)

    Google Scholar 

  11. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  12. Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. In: Soviet Physics Doklady, vol. 10, no. 8, pp. 707–710 (1966)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fatemeh Tohidian .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tohidian, F., Kashiri, A., Lotfi, F. (2023). BEDSpell: Spelling Error Correction Using BERT-Based Masked Language Model and Edit Distance. In: Troya, J., et al. Service-Oriented Computing – ICSOC 2022 Workshops. ICSOC 2022. Lecture Notes in Computer Science, vol 13821. Springer, Cham. https://doi.org/10.1007/978-3-031-26507-5_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-26507-5_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-26506-8

  • Online ISBN: 978-3-031-26507-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics