Advertisement

Resources and Evaluations of Automated Chinese Error Diagnosis for Language Learners

  • Lung-Hao Lee
  • Yuen-Hsien TsengEmail author
  • Li-Ping Chang
Chapter
Part of the Chinese Language Learning Sciences book series (CLLS)

Abstract

Chinese as a foreign language (CFL) learners may, in their language production, generate inappropriate linguistic usages, including character-level confusions (or commonly known as spelling errors) and word-/sentence-/discourse-level grammatical errors. Chinese spelling errors frequently arise from confusions among multiple-character words that are phonologically and visually similar but semantically distinct. Chinese grammatical errors contain coarse-grained surface differences in terms of missing, redundant, incorrect selection, and word ordering error of linguistic components. Simultaneously, fine-grained error types further focus on representing linguistic morphology and syntax such as verb, noun, preposition, conjunction, adverb, and so on. Annotated learner corpora are important language resources to understand these error patterns and to help the development of error diagnosis systems. In this chapter, we describe two representative Chinese learner corpora: the HSK Dynamic Composition Corpus constructed by Beijing Language and Culture University and the TOCFL Learner Corpus built by National Taiwan Normal University. In addition, we introduce several evaluations based on both learner corpora designed for computer-assisted Chinese learning. One is a series of SIGHAN bakeoffs for Chinese spelling checkers. The other series are the NLPTEA workshop shared tasks for Chinese grammatical error identification. The purpose of this chapter is to summarize the resources and evaluations for better understanding the current research developments and challenges of automated Chinese error diagnosis for CFL learners.

Notes

Acknowledgements

This study was partially supported by the Ministry of Science and Technology, under the grant MOST 103-2221-E-003-013-MY3, MOST 105-2221-E-003-020-MY2, MOST 106-2221-E-003-030-MY2, and the “Aim for the Top University Project” and “Center of Language Technology for Chinese” of National Taiwan Normal University, sponsored by the Ministry of Education, Taiwan.

References

  1. Chang, L.-P. (2016). Error classification and annotation of the TOCFL learner corpus. In Proceedings of the 3rd International Conference of Interlanguage Corpora (pp. 131–159). Beijing, China.Google Scholar
  2. Chang, T.-H., Sung, Y.-T., Hong, J.-F., & Chang, J.-I. (2014). KNGED: A tool for grammatical error diagnosis of Chinese sentences. In Proceedings of the 1st Workshop on Natural Language Processing Techniques for Educational Applications (pp. 48–55). Nara, Japan.Google Scholar
  3. Chang, T.-H., Chen, H.-C., & Yang, C.-H. (2015). Introduction to a proofreading tool for Chinese spelling check task of SIGHAN-8. In Proceedings of the 8th SIGHAN Workshop on Chinese Language Processing (pp. 50–55). Beijing, China.Google Scholar
  4. Chen, P.-L., Wu, S.-H., Chen, L.-P., & Yang, P.-C. (2016a). CYUT-III system at Chinese grammatical error diagnosis task. In Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications (pp. 63–72). Osaka, Japan.Google Scholar
  5. Chen, S.-H., Tsai, Y.-L., & Lin, C.-J. (2016b). Generating and scoring correction candidates in Chinese grammatical error diagnosis. In Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications (pp. 131–139). Osaka, Japan.Google Scholar
  6. Chiu, H.-W., Wu, J.-C., & Chang, J. S. (2014). Chinese spelling checking based on noisy channel model. In Proceedings of the 3rd CIPS-SIGHAN Joint Conference on Chinese Language Processing (pp. 202–209). Wuhan, China.Google Scholar
  7. Chou, W.-C., Lin, C.-K., Liao, Y.-F., & Wang, Y.-R. (2016). Word order sensitive embedding features/conditional random field-based Chinese grammatical error detection. In Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications (pp. 73–81). Osaka, Japan.Google Scholar
  8. Chu, W.-C., & Lin, C.-J. (2014). NTOU Chinese spelling check system in CLP bake-off 2014. In Proceedings of the 3rd CIPS-SIGHAN Joint Conference on Chinese Language Processing (pp. 210–215). Wuhan, China.Google Scholar
  9. Chu, W.-C., & Lin, C.-J. (2015). NTOU Chinese spelling check system in SIGHAN-8 bake-off. In Proceedings of the 8th SIGHAN Workshop on Chinese Language Processing (pp. 137–143). Beijing, China.Google Scholar
  10. Cui, X., & Zhang, B.-L. (2011). The principles for building the “International Corpus of Learner Corpus”. Applied Linguistics, 2011(2), 100–108.Google Scholar
  11. Díaz-Negrillo, A., & Fernández-Domínguez, J. (2006). Error tagging systems for learner corpora. Revista española de lingüística aplicada (RESLA), 19, 83–102.Google Scholar
  12. Fachverband Chinesisch e.V. (2010). Statement of the Fachverband Chinesisch e.V. on the new HSK Chinese Proficiency Test. http://www.fachverband-chinesisch.de/sites/default/files/FaCh2010_ErklaerungHSK_en.pdf. Accessed December 26, 2017.
  13. Granger, S. (2015). Contrastive interlanguage analysis: A reappraisal. International Journal of Learner Corpus Research, 1(1), 7–24.CrossRefGoogle Scholar
  14. Gu, L., Wang, Y., & Liang, X. (2014). Introduction to NJUPT Chinese spelling check system in CLP-2014 bakeoff. In Proceedings of the 3rd CIPS-SIGHAN Joint Conference on Chinese Language Processing (pp. 167–172). Wuhan, China.Google Scholar
  15. Huang, S., & Wang, H. (2016). Bi-LSTM neural networks for Chinese grammatical error diagnosis. In Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications (pp. 148–154). Osaka, Japan.Google Scholar
  16. Huang, Q., Huang, P., Zhang, X., Xie, W., Hong, K., Chen, B., & Huang, L. (2014). Chinese spelling check system based on tri-gram model. In Proceedings of the 3rd CIPS-SIGHAN Joint Conference on Chinese Language Processing (pp. 173–178). Wuhan, China.Google Scholar
  17. Lee, L.-H., Lee, K.-C., Chang, L.-P., Tseng, Y.-H., Yu, L.-C., & Chen, H.-H. (2014). A tagging editor for learner corpus annotation and error analysis. In Proceedings of the 22nd International Conference on Computers in Education (pp. 806–808). Nara, Japan.Google Scholar
  18. Lee, L.-H., Chang, L.-P., Liao, B.-S., Cheng, W.-L., & Tseng, Y.-H. (2015a). A retrieval system for interlanguage analysis. In Proceedings of the 23rd International Conference on Computers in Education (pp. 599–601). Hangzhou, China.Google Scholar
  19. Lee, L.-H., Yu, L.-C., & Chang, L.-P. (2015b). Overview of the NLP-TEA 2015 shared task for Chinese grammatical error diagnosis. In Proceedings of the 2nd Workshop on Natural Language Processing Techniques for Educational Applications (pp. 1–6). Beijing, China.Google Scholar
  20. Lee, L.-H., Chang, L.-P., & Tseng, Y.-H. (2016a). Developing learner corpus annotation for Chinese grammatical errors. In Proceedings of the 20th International Conference on Asian Language Processing (pp. 254–257). Tainan, Taiwan.Google Scholar
  21. Lee, L.-H., Rao, G., Yu, L.-C., Xun, E., Zhang, B., & Chang, L.-P. (2016b). Overview of the NLP-TEA 2016 shared task for Chinese grammatical error diagnosis. In Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications (pp. 40–48). Osaka, Japan.Google Scholar
  22. Lee, L.-H., Tseng, Y.-H., & Chang, L.-P. (2018). Building a TOCFL learner corpus for Chinese grammatical error diagnosis. In Proceedings of the 11th International Conference on Language Resources and Evaluation (pp. 2298–2304), Miyazaki, Japan.Google Scholar
  23. Lin, C.-J., & Chan, S.-H. (2014). Description of NTOU Chinese grammar checker in CFL 2014. In Proceedings of the 1st Workshop on Natural Language Processing Techniques for Educational Applications (pp. 75–78). Nara, Japan.Google Scholar
  24. Lin, C.-J., & Chen, S.-H. (2015). NTOU Chinese grammar checker for CGED shared task. In Proceedings of the 2nd Workshop on Natural Language Processing Techniques for Educational Applications (pp. 15–19). Beijing, China.Google Scholar
  25. Little, D. (2006). The Common European Framework of Reference for languages: Content, purpose, origin, reception and impact. Language Teaching, 39(3), 167–190.CrossRefGoogle Scholar
  26. Liu, M., Jian, P., & Huang, H. (2014). Introduction to BIT Chinese spelling correction system at CLP 2014 bake-off. In Proceedings of the 3rd CIPS-SIGHAN Joint Conference on Chinese Language Processing (pp. 179–185). Wuhan, China.Google Scholar
  27. Liu, Y., Han, Y., Zhuo, L., & Zan, H. (2016). Automatic grammatical error detection for Chinese based on conditional random field. In Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications (pp. 57–62). Osaka, Japan.Google Scholar
  28. Malmasi, S., & Dras, M. (2015). Large-scale native language identification with cross-corpus evaluation. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics—Human Language Technologies (pp. 1403–1409). Denver, CO, USA.Google Scholar
  29. Sakaguchi, K., Arase, Y., & Komachi, M. (2013). Discriminative approach to fill-in-the-blank quiz generation for language learners. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (pp. 238–242). Sofia, Bulgaria.Google Scholar
  30. Sawai, Y., Komachi, M., & Matsumoto, Y. (2013). A learner corpus-based approach for verb suggestion for ESL. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (pp. 708–713). Sofia, Bulgaria.Google Scholar
  31. Swanson, B., & Charniak, E. (2013). Extracting the native language signal for second language acquisition. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics—Human Language Technologies (pp. 85–94). Atlanta, GA, USA.Google Scholar
  32. Tseng, Y.-H., Lee, L.-H., Chang, L.-P., & Chen, H.-H. (2015). Introduction to SIGHAN 2015 bake-off for Chinese spelling check. In Proceedings of the 8th SIGHAN Workshop on Chinese Language Processing (pp. 32–37). Beijing, China.Google Scholar
  33. Wang, Y.-R., & Liao, Y.-F. (2014). NCTU and NTUT’s entry to CLP-2014 Chinese spelling check evaluation. In Proceedings of the 3rd CIPS-SIGHAN Joint Conference on Chinese Language Processing (pp. 216–219). Wuhan, China.Google Scholar
  34. Wang, Y.-R., & Liao, Y.-F. (2015). Word vector/conditional random field-based Chinese spelling error detection for SIGHAN-2015 evaluation. In Proceedings of the 8th SIGHAN Workshop on Chinese Language Processing (pp. 46–49). Beijing, China.Google Scholar
  35. Wang, C., & Seneff, S. (2007). Automatic assessment of student translations for foreign language tutoring. In Proceedings of the 2007 Conference of the North American Chapter of the Association for Computational Linguistics—Human Language Technologies (pp. 468–475). Rochester, NY, USA.Google Scholar
  36. Wu, S.-H., Liu, C.-L., & Lee, L.-H. (2013). Chinese spelling check evaluation at SIGHAN bake-off 2013. In Proceedings of the 7th SIGHAN Workshop on Chinese Language Processing (pp. 35–42). Nagoya, Japan.Google Scholar
  37. Wu, S.-H., Chen, P.-L., Chen, L.-P., Yang, P.-C., & Yang, R.-D. (2015a). Chinese grammatical error diagnosis by conditional random fields. In Proceedings of the 2nd Workshop on Natural Language Processing Techniques for Educational Applications (pp. 7–14). Beijing, China.Google Scholar
  38. Wu, X., Huang, P., Wang, J., Guo, Q., Xu, Y., & Chen, C. (2015b). Chinese grammatical error diagnosis system based on hybrid model. In Proceedings of the 2nd Workshop on Natural Language Processing Techniques for Educational Applications (pp. 117–125). Beijing, China.Google Scholar
  39. Xiang, Y., Wang, X., Han, W., & Hong, Q. (2015). Chinese grammatical error diagnosis using ensemble learning. In Proceedings of the 2nd Workshop on Natural Language Processing Techniques for Educational Applications (pp. 99–104). Beijing, China.Google Scholar
  40. Xie, W., Huang, P., Zhang, X., Hong, K., Huang, Q., Chen, B., & Huang, L. (2015). Chinese spelling check system based on n-gram model. In Proceedings of the 8th SIGHAN Workshop on Chinese Language Processing (pp. 128–136). Beijing, China.Google Scholar
  41. Xin, Y., Zhao, H., Wang, Y., & Jia, Z. (2014). An improved graph model for Chinese spell checking. In Proceedings of the 3rd CIPS-SIGHAN Joint Conference on Chinese Language Processing (pp. 157–166). Wuhan, China.Google Scholar
  42. Xiong, J., Zhang, Q., Hou, J., Wang, Q., Wang, Y., & Cheng, X. (2014). Extended HMM and ranking models for Chinese spelling correction. In Proceedings of the 3rd CIPS-SIGHAN Joint Conference on Chinese Language Processing (pp. 133–138). Wuhan, China.Google Scholar
  43. Yang, J., Peng, B., Wang, J., Zhang, J., & Zhang, X. (2016). Chinese grammatical error diagnosis using single word embedding. In Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications (pp. 155–161). Osaka, Japan.Google Scholar
  44. Yannakoudakis, H., Briscoe, T., & Medlock, B. (2011). A new dataset and method for automatically grading ESOL texts. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (pp. 180–189). Portland, OR, USA.Google Scholar
  45. Yeh, J.-F., Lu, Y.-Y., Lee, C.-H., Yu, Y.-H., & Chen, Y.-T. (2014a). Chinese word spelling correction based on rule induction. In Proceedings of the 3rd CIPS-SIGHAN Joint Conference on Chinese Language Processing (pp. 139–145). Wuhan, China.Google Scholar
  46. Yeh, J.-F., Lu, Y.-Y., Lee, C.-H., Yu, Y.-H., & Chen, Y.-T. (2014b). Detecting grammatical error in Chinese sentence for foreign. In Proceedings of the 1st Workshop on Natural Language Processing Techniques for Educational Applications (pp. 62–68). Nara, Japan.Google Scholar
  47. Yeh, J.-F., Yeh, C.-K., Yu, K.-H., Li, Y.-T., & Tsai, W.-L. (2015). Conditional random field-based grammatical error detection for Chinese as second language. In Proceedings of the 2nd Workshop on Natural Language Processing Techniques for Educational Applications (pp. 105–110). Beijing, China.Google Scholar
  48. Yeh, J.-F., Hsu, T.-W., & Yeh, C.-K. (2016). Grammatical error detection based on machine learning for Mandarin as second language learning. In Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications (pp. 140–147). Osaka, Japan.Google Scholar
  49. Yu, J., & Li, Z. (2014). Chinese spelling error detection and correction based on language model, pronunciation, and shape. In Proceedings of the 3rd CIPS-SIGHAN Joint Conference on Chinese Language Processing (pp. 220–223). Wuhan, China.Google Scholar
  50. Yu, L.-C., Lee, L.-H., & Chang, L.-P. (2014a). Overview of grammatical error diagnosis for learning Chinese as a foreign language. In Proceedings of the 1st Workshop on Natural Language Processing Techniques for Educational Applications (pp. 42–47). Nara, Japan.Google Scholar
  51. Yu, L.-C., Lee, L.-H., Tseng, Y.-H., & Chen, H.-H. (2014b). Overview of SIGHAN 2014 bake-off for Chinese spelling check. In Proceedings of the 3rd CIPS-SIGHAN Joint Conference on Chinese Language Processing (pp. 126–132). Wuhan, China.Google Scholar
  52. Zampieri, M., & Tan, L. (2014). Grammatical error detection with limited training data: The case of Chinese. In Proceedings of the 1st Workshop on Natural Language Processing Techniques for Educational Applications (pp. 69–74). Nara, Japan.Google Scholar
  53. Zhang, B.-L., & Cui, X. (2013). Design concepts of the construction and research of the inter-language corpus of Chinese from global learners. Language Teaching and Linguistic Study, 2013(5), 27–34.Google Scholar
  54. Zhang, S., Xiong, J., Hou, J., Zhang, Q., & Cheng, X. (2015). HANSpeller++: A unified framework for Chinese spelling correction. In Proceedings of the 8th SIGHAN Workshop on Chinese Language Processing (pp. 38–45). Beijing, China.Google Scholar
  55. Zhao, Y., Komachi, M., & Ishikawa, H. (2014). Extracting a Chinese learner corpus from the Web: Grammatical error correction for learning Chinese as a foreign language with statistical machine translation. In Proceedings of the 1st Workshop on Natural Language Processing Techniques for Educational Applications (pp. 56–61). Nara, Japan.Google Scholar
  56. Zhao, Y., Komachi, M., & Ishikawa, H. (2015). Improving Chinese grammatical error correction with corpus augmentation and hierarchical phrase-based statistical machine translation. In Proceedings of the 2nd Workshop on Natural Language Processing Techniques for Educational Applications (pp. 111–116). Beijing, China.Google Scholar
  57. Zheng, B., Che, W., Guo, J., & Liu, T. (2016). Chinese grammatical error diagnosis with long short-term memory networks. In Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications (pp. 49–56). Osaka, Japan.Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.National Central UniversityTaoyuanTaiwan
  2. 2.National Taiwan Normal UniversityTaipeiTaiwan
  3. 3.National Taiwan UniversityTaipeiTaiwan

Personalised recommendations