Abstract
Deep learning has shown remarkable effectiveness in various language tasks. This paper presents Huawei Translation Services Center’s (HW-TSC’s) work called HWCGEC which get the best performance among the seven submitted results in the NLPCC2023 shared task 1, namely Chinese grammatical error correction (CGEC). CGEC aims to automatically correct grammatical errors that violate language rules and converts the noisy input texts to clean output texts. This paper, through experiments, discovered that after model fine-tuning the BART a sequence to sequence (seq2seq) model performs better than the ChatGLM a large language model (LLM) in situations where training data is large while the LoRA mode has a smaller number of parameters for fine-tuning. Additionally, the BART model achieves good results in the CGEC task through data augmentation and curriculum learning methods. Although the performance of LLM is poor in experiments, they possess excellent logical abilities. With the training set becoming more diverse and the methods for training set data augmentation becoming more refined, the supervised fine-tuning (SFT) mode trained LLMs are expected to achieve significant improvements in CGEC tasks in the future.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Borong, H., Xudong, L.: Modern Chinese (updated five editions) (2011)
Bryant, C., Yuan, Z., Qorib, M.R., Cao, H., Ng, H.T., Briscoe, T.: Grammatical error correction: a survey of the state of the art. arXiv preprint arXiv:2211.05166 (2022)
Choshen, L., Abend, O.: Inherent biases in reference based evaluation for grammatical error correction and text simplification. arXiv preprint arXiv:1804.11254 (2018)
Hu, E.J., et al.: LoRA: low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021)
Kubis, M., Vetulani, Z., Wypych, M., Zietkiewicz, T.: Open challenge for correcting errors of speech recognition systems. In: Vetulani, Z., Paroubek, P., Kubis, M. (eds.) Human Language Technology. Challenges for Computer Science and Linguistics - 9th Language and Technology Conference, LTC 2019, Poznan, Poland, May 17–19, 2019, Revised Selected Papers. Lecture Notes in Computer Science, vol. 13212, pp. 322–337. Springer (2019). https://doi.org/10.1007/978-3-031-05328-3_21
Li, J., et al.: Sequence-to-action: grammatical error correction with action guided sequence generation (2022)
Ma, S., et al.: Linguistic rules-based corpus generation for native Chinese grammatical error correction. arXiv preprint arXiv:2210.10442 (2022)
Mangrulkar, S., Gugger, S., Debut, L., Belkada, Y., Paul, S.: PEFT: state-of-the-art parameter-efficient fine-tuning methods (2022). https://github.com/huggingface/peft
Omelianchuk, K., Atrasevych, V., Chernodub, A., Skurzhanskyi, O.: GECToR - grammatical error correction: tag, not rewrite. In: Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications. pp. 163–170. Association for Computational Linguistics, Seattle, WA, USA Online (2020). https://doi.org/10.18653/v1/2020.bea-1.16, https://aclanthology.org/2020.bea-1.16
Rao, G., Gong, Q., Zhang, B., Xun, E.: Overview of NLPTEA-2018 share task Chinese grammatical error diagnosis. In: Proceedings of the 5th Workshop on Natural Language Processing Techniques for Educational Applications, pp. 42–51 (2018)
Rao, G., Yang, E., Zhang, B.: Overview of NLPTEA-2020 shared task for Chinese grammatical error diagnosis. In: Proceedings of the 6th Workshop on Natural Language Processing Techniques for Educational Applications, pp. 25–35 (2020)
Sakaguchi, K., Napoles, C., Post, M., Tetreault, J.: Reassessing the goals of grammatical error correction: fluency instead of grammaticality. Trans. Assoc. Comput. Linguist. 4, 169–182 (2016)
Tang, Z., Ji, Y., Zhao, Y., Li, J.: Chinese grammatical error correction enhanced by data augmentation from word and character levels. In: Proceedings of the 20th Chinese National Conference on Computational Linguistics, Hohhot, China, pp. 13–15 (2021)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Wang, B., Duan, X., Wu, D., Che, W., Chen, Z., Hu, G.: CCTC: a cross-sentence Chinese text correction dataset for native speakers. In: Proceedings of the 29th International Conference on Computational Linguistics, pp. 3331–3341 (2022)
Weller-Di Marco, M., Fraser, A.: Findings of the WMT 2022 shared tasks in unsupervised MT and very low resource supervised MT. In: Proceedings of the Seventh Conference on Machine Translation (WMT), pp. 801–805 (2022)
Xu, L., Wu, J., Peng, J., Fu, J., Cai, M.: FCGEC: fine-grained corpus for Chinese grammatical error correction. arXiv preprint arXiv:2210.12364 (2022)
Zhang, B.: The characteristics and functions of the HSK dynamic composition corpus. Int. Chin. Lang. Educ. 4(11) (2009)
Zhang, Y., et al.: MuCGEC: a multi-reference multi-source evaluation dataset for Chinese grammatical error correction (2022)
Zhang, Y., et al.: NaSGEC: a multi-domain Chinese grammatical error correction dataset from native speaker texts. arXiv preprint arXiv:2305.16023 (2023)
Zhao, Y., Jiang, N., Sun, W., Wan, X.: Overview of the NLPCC 2018 shared task: grammatical error correction. In: Zhang, M., Ng, V., Zhao, D., Li, S., Zan, H. (eds.) NLPCC 2018. LNCS (LNAI), vol. 11109, pp. 439–445. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99501-4_41
Zhao, Z., Wang, H.: MaskGEC: improving neural grammatical error correction via dynamic masking. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 1226–1233 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Su, C. et al. (2023). HWCGEC:HW-TSC’s 2023 Submission for the NLPCC2023’s Chinese Grammatical Error Correction Task. In: Liu, F., Duan, N., Xu, Q., Hong, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2023. Lecture Notes in Computer Science(), vol 14304. Springer, Cham. https://doi.org/10.1007/978-3-031-44699-3_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-44699-3_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44698-6
Online ISBN: 978-3-031-44699-3
eBook Packages: Computer ScienceComputer Science (R0)