Vietnamese Spelling Error Detection and Correction Using BERT and N-gram Language Model

Tien, Dong Nguyen; Minh, Tuoi Tran Thi; Vu, Loi Le; Minh, Tuan Dang

doi:10.1007/978-981-19-3394-3_49

Dong Nguyen Tien¹⁴,
Tuoi Tran Thi Minh¹⁴,
Loi Le Vu^14,16 &
…
Tuan Dang Minh^14,15

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 471))

440 Accesses
1 Citations

Abstract

Vietnamese spelling error detection and correction is a crucial task in Natural language processing, it plays an important role in many different real-world applications. Although there is a lot of research on it, dealing with diverse types of errors in Vietnamese is still a challenge. In this paper, we propose a model to help detect and correct some specific Vietnamese spelling errors by combining a pre-trained neural network-based Vietnamese language model and N-gram language model. We also provide a clear definition of handleable error types, error generation rules in the training set and evaluate our proposed model on a Vietnamese benchmark dataset at the word level. The experimental results show that our model achieves higher than from 1% to 14% f1-score than other neural network-based pre-trained language models in detection and make comparisons with bi, tri and 4-g language models to choose the best model for correction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Hardcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Deep Learning Approach for Vietnamese Consonant Misspell Correction

VSEC: Transformer-Based Model for Vietnamese Spelling Correction

Using Large N-gram for Vietnamese Spell Checking

References

Nguyen, H.T., Dang, T., Le, C.A., Nguyen, H.T., Dang, T.-T., Le, A.-C.: Adapting Vietnamese word segmentation for microblogs-style data (2014). https://www.researchgate.net/publication/283502832
Nguyen, V.H., Nguyen, H.T., Snasel, V.: Named entity recognition in Vietnamese tweets. In: Thai, M.T., Nguyen, N.P., Shen, H. (eds.) CSoNet 2015. LNCS, vol. 9197, pp. 205–215. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-21786-4_18
Chapter Google Scholar
Nguyen,Q.D., Le, D.A., Zelinka, I.: OCR error correction for unconstrained Vietnamese handwriten text. PervasiveHealth Pervasive Comput. Technol. Healthc. 132–138 (2019). https://doi.org/10.1145/3368926.3369686
Nguyen, P.H., Ngo, T.D., Phan, D.A., Dinh, T.P.T., Huynh, T.Q.: Vietnamese spelling detection and correction: using Bi-gram, minimum edit distance, SoundEx algorithms with some additional heuristics. In: RIVF 2008 - 2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies, pp. 96–102 (2008). https://doi.org/10.1109/RIVF.2008.4586339
Nguyen, H.T., Dang, T.B., Nguyen, L.M.: Deep learning approach for vietnamese consonant misspell correction. In: Nguyen, L.-M., Phan, X.-H., Hasida, K., Tojo, S. (eds.) PACLING 2019. CCIS, vol. 1215, pp. 497–504. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-6168-9_40
Chapter Google Scholar
Nguyen, V.H., Nguyen, H.T., Snasel, V.: Normalization of Vietnamese tweets on Twitter. In: Abraham, A., Jiang, X.H., Snášel, V., Pan, J.-S. (eds.) Intelligent Data Analysis and Applications. AISC, vol. 370, pp. 179–189. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-21206-7_16
Chapter Google Scholar
Tran, H., Dinh, C.V., Phan, L., Nguyen, S.T.: Hierarchical transformer encoders for Vietnamese spelling correction, May 2021. http://arxiv.org/abs/2105.13578
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of Conference, vol. 1, no. Mlm, pp. 4171–4186 (2019)
Google Scholar
Lin, J.: N-Gram Language Models N-Gram Language Models (2009)
Google Scholar
Heafield, K.: KenLM: faster and smaller language model queries. In: Proceedings of Sixth Workshop on Statistical Machine Translation, pp. 187–197 (2011)
Google Scholar
Nguyen, D.Q., Nguyen, A.T.: PhoBert: pre-trained language models for Vietnamese, pp. 1037–1042 (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.92
Ruder, S., Søgaard, A., Vulic, I.: Unsupervised cross-lingual representation learning. In: ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Tutoring Abstracts, pp. 31–38 (2019). https://doi.org/10.18653/v1/p19-4007

Download references

Acknowledgment

We gratefully acknowledge the support from the CMC Institute of Science Technology for funding the research project.

Author information

Authors and Affiliations

CMC Institute of Science Technology, No. 11 Duy Tan Street, Hanoi, Vietnam
Dong Nguyen Tien, Tuoi Tran Thi Minh, Loi Le Vu & Tuan Dang Minh
Posts and Telecommunications Institute of Technology, Km10, Nguyen Trai Street, Hanoi, Vietnam
Tuan Dang Minh
Hanoi University of Science and Technology, No. 1 Dai Co Viet Street, Hanoi, Vietnam
Loi Le Vu

Authors

Dong Nguyen Tien
View author publications
You can also search for this author in PubMed Google Scholar
Tuoi Tran Thi Minh
View author publications
You can also search for this author in PubMed Google Scholar
Loi Le Vu
View author publications
You can also search for this author in PubMed Google Scholar
Tuan Dang Minh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tuan Dang Minh .

Editor information

Editors and Affiliations

Swinburne University Vietnam, Hanoi, Vietnam
Ngoc Le Anh
School of Computer Science and Engineering, Kyungpook National University, Daegu, Korea (Republic of)
Seok-Joo Koh
Hanoi University of Industry, Hanoi, Vietnam
Thi Dieu Linh Nguyen
Integrated Management Coastal Research Institute, Universitat Politecnica de Valencia, Gandia, Valencia, Spain
Jaime Lloret
Vietnam National University, Hanoi, Vietnam
Thanh Tung Nguyen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tien, D.N., Minh, T.T.T., Vu, L.L., Minh, T.D. (2022). Vietnamese Spelling Error Detection and Correction Using BERT and N-gram Language Model. In: Anh, N.L., Koh, SJ., Nguyen, T.D.L., Lloret, J., Nguyen, T.T. (eds) Intelligent Systems and Networks. Lecture Notes in Networks and Systems, vol 471. Springer, Singapore. https://doi.org/10.1007/978-981-19-3394-3_49

Download citation

DOI: https://doi.org/10.1007/978-981-19-3394-3_49
Published: 05 July 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-3393-6
Online ISBN: 978-981-19-3394-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Vietnamese Spelling Error Detection and Correction Using BERT and N-gram Language Model

Abstract

Access this chapter

Similar content being viewed by others

Deep Learning Approach for Vietnamese Consonant Misspell Correction

VSEC: Transformer-Based Model for Vietnamese Spelling Correction

Using Large N-gram for Vietnamese Spell Checking

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Vietnamese Spelling Error Detection and Correction Using BERT and N-gram Language Model

Abstract

Access this chapter

Similar content being viewed by others

Deep Learning Approach for Vietnamese Consonant Misspell Correction

VSEC: Transformer-Based Model for Vietnamese Spelling Correction

Using Large N-gram for Vietnamese Spell Checking

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation