Skip to main content

HWCGEC:HW-TSC’s 2023 Submission for the NLPCC2023’s Chinese Grammatical Error Correction Task

  • Conference paper
  • First Online:
Natural Language Processing and Chinese Computing (NLPCC 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14304))

  • 600 Accesses

Abstract

Deep learning has shown remarkable effectiveness in various language tasks. This paper presents Huawei Translation Services Center’s (HW-TSC’s) work called HWCGEC which get the best performance among the seven submitted results in the NLPCC2023 shared task 1, namely Chinese grammatical error correction (CGEC). CGEC aims to automatically correct grammatical errors that violate language rules and converts the noisy input texts to clean output texts. This paper, through experiments, discovered that after model fine-tuning the BART a sequence to sequence (seq2seq) model performs better than the ChatGLM a large language model (LLM) in situations where training data is large while the LoRA mode has a smaller number of parameters for fine-tuning. Additionally, the BART model achieves good results in the CGEC task through data augmentation and curriculum learning methods. Although the performance of LLM is poor in experiments, they possess excellent logical abilities. With the training set becoming more diverse and the methods for training set data augmentation becoming more refined, the supervised fine-tuning (SFT) mode trained LLMs are expected to achieve significant improvements in CGEC tasks in the future.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/HillZhang1999/NaSGEC.

  2. 2.

    https://tiku.baidu.com.

  3. 3.

    https://github.com/masr2000/NaCGEC.

References

  1. Borong, H., Xudong, L.: Modern Chinese (updated five editions) (2011)

    Google Scholar 

  2. Bryant, C., Yuan, Z., Qorib, M.R., Cao, H., Ng, H.T., Briscoe, T.: Grammatical error correction: a survey of the state of the art. arXiv preprint arXiv:2211.05166 (2022)

  3. Choshen, L., Abend, O.: Inherent biases in reference based evaluation for grammatical error correction and text simplification. arXiv preprint arXiv:1804.11254 (2018)

  4. Hu, E.J., et al.: LoRA: low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021)

  5. Kubis, M., Vetulani, Z., Wypych, M., Zietkiewicz, T.: Open challenge for correcting errors of speech recognition systems. In: Vetulani, Z., Paroubek, P., Kubis, M. (eds.) Human Language Technology. Challenges for Computer Science and Linguistics - 9th Language and Technology Conference, LTC 2019, Poznan, Poland, May 17–19, 2019, Revised Selected Papers. Lecture Notes in Computer Science, vol. 13212, pp. 322–337. Springer (2019). https://doi.org/10.1007/978-3-031-05328-3_21

  6. Li, J., et al.: Sequence-to-action: grammatical error correction with action guided sequence generation (2022)

    Google Scholar 

  7. Ma, S., et al.: Linguistic rules-based corpus generation for native Chinese grammatical error correction. arXiv preprint arXiv:2210.10442 (2022)

  8. Mangrulkar, S., Gugger, S., Debut, L., Belkada, Y., Paul, S.: PEFT: state-of-the-art parameter-efficient fine-tuning methods (2022). https://github.com/huggingface/peft

  9. Omelianchuk, K., Atrasevych, V., Chernodub, A., Skurzhanskyi, O.: GECToR - grammatical error correction: tag, not rewrite. In: Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications. pp. 163–170. Association for Computational Linguistics, Seattle, WA, USA Online (2020). https://doi.org/10.18653/v1/2020.bea-1.16, https://aclanthology.org/2020.bea-1.16

  10. Rao, G., Gong, Q., Zhang, B., Xun, E.: Overview of NLPTEA-2018 share task Chinese grammatical error diagnosis. In: Proceedings of the 5th Workshop on Natural Language Processing Techniques for Educational Applications, pp. 42–51 (2018)

    Google Scholar 

  11. Rao, G., Yang, E., Zhang, B.: Overview of NLPTEA-2020 shared task for Chinese grammatical error diagnosis. In: Proceedings of the 6th Workshop on Natural Language Processing Techniques for Educational Applications, pp. 25–35 (2020)

    Google Scholar 

  12. Sakaguchi, K., Napoles, C., Post, M., Tetreault, J.: Reassessing the goals of grammatical error correction: fluency instead of grammaticality. Trans. Assoc. Comput. Linguist. 4, 169–182 (2016)

    Article  Google Scholar 

  13. Tang, Z., Ji, Y., Zhao, Y., Li, J.: Chinese grammatical error correction enhanced by data augmentation from word and character levels. In: Proceedings of the 20th Chinese National Conference on Computational Linguistics, Hohhot, China, pp. 13–15 (2021)

    Google Scholar 

  14. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  15. Wang, B., Duan, X., Wu, D., Che, W., Chen, Z., Hu, G.: CCTC: a cross-sentence Chinese text correction dataset for native speakers. In: Proceedings of the 29th International Conference on Computational Linguistics, pp. 3331–3341 (2022)

    Google Scholar 

  16. Weller-Di Marco, M., Fraser, A.: Findings of the WMT 2022 shared tasks in unsupervised MT and very low resource supervised MT. In: Proceedings of the Seventh Conference on Machine Translation (WMT), pp. 801–805 (2022)

    Google Scholar 

  17. Xu, L., Wu, J., Peng, J., Fu, J., Cai, M.: FCGEC: fine-grained corpus for Chinese grammatical error correction. arXiv preprint arXiv:2210.12364 (2022)

  18. Zhang, B.: The characteristics and functions of the HSK dynamic composition corpus. Int. Chin. Lang. Educ. 4(11) (2009)

    Google Scholar 

  19. Zhang, Y., et al.: MuCGEC: a multi-reference multi-source evaluation dataset for Chinese grammatical error correction (2022)

    Google Scholar 

  20. Zhang, Y., et al.: NaSGEC: a multi-domain Chinese grammatical error correction dataset from native speaker texts. arXiv preprint arXiv:2305.16023 (2023)

  21. Zhao, Y., Jiang, N., Sun, W., Wan, X.: Overview of the NLPCC 2018 shared task: grammatical error correction. In: Zhang, M., Ng, V., Zhao, D., Li, S., Zan, H. (eds.) NLPCC 2018. LNCS (LNAI), vol. 11109, pp. 439–445. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99501-4_41

    Chapter  Google Scholar 

  22. Zhao, Z., Wang, H.: MaskGEC: improving neural grammatical error correction via dynamic masking. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 1226–1233 (2020)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chang Su .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Su, C. et al. (2023). HWCGEC:HW-TSC’s 2023 Submission for the NLPCC2023’s Chinese Grammatical Error Correction Task. In: Liu, F., Duan, N., Xu, Q., Hong, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2023. Lecture Notes in Computer Science(), vol 14304. Springer, Cham. https://doi.org/10.1007/978-3-031-44699-3_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-44699-3_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-44698-6

  • Online ISBN: 978-3-031-44699-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics