A Supervised Learning Model for the Automatic Assessment of Language Levels Based on Learner Errors

  • Nicolas BallierEmail author
  • Thomas Gaillat
  • Andrew Simpkin
  • Bernardo Stearns
  • Manon Bouyé
  • Manel Zarrouk
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11722)


This paper focuses on the use of technology in language learning. Language training requires the need to group learners homogeneously and to provide them with instant feedback on their productions such as errors [8, 15, 17] or proficiency levels. A possible approach is to assess writings from students and assign them with a level. This paper analyses the possibility of automatically predicting Common European Framework of Reference (CEFR) language levels on the basis of manually annotated errors in a written learner corpus [9, 11]. The research question is to evaluate the predictive power of errors in terms of levels and to identify which error types appear to be criterial features in determining interlanguage stages. Results show that specific errors such as punctuation, spelling and verb tense are significant at specific CEFR levels.


CEFR level prediction Error tagset Regression Unsupervised clustering Proficiency levels 


  1. 1.
    Arnold, T., Ballier, N., Gaillat, T., Lissón, P.: Predicting CEFRL levels in learner English on the basis of metrics and full texts. arXiv:1806.11099 [cs] (2018)
  2. 2.
    Attali, Y., Burstein, J.: Automated essay scoring With e-rater V.2. J. Technol. Learn. Assess. 4(3), 3–30 (2006)Google Scholar
  3. 3.
    Barker, F., Salamoura, A., Saville, N.: Learner corpora and language testing. In: Granger, S., Gilquin, G., Meunier, F. (eds.) The Cambridge Handbook of Learner Corpus Research, pp. 511–534. Cambridge Handbooks in Language and Linguistics, Cambridge University Press (2015)CrossRefGoogle Scholar
  4. 4.
    Baur, C., et al.: Overview of the 2018 Spoken CALL Shared Task. In: Interspeech 2018, pp. 2354–2358. ISCA (2018)Google Scholar
  5. 5.
    Council of Europe, Council for Cultural Co-operation. Education Committee. Modern Languages Division: Common European Framework of Reference for Languages: learning, teaching, assessment. Cambridge University Press, Cambridge (2001)Google Scholar
  6. 6.
    Crossley, S.A., Kyle, K., Allen, L.K., Guo, L., McNamara, D.S.: Linguistic Microfeatures to Predict L2 Writing Proficiency: A Case Study in Automated Writing Evaluation (2014)Google Scholar
  7. 7.
    Crossley, S.A., Salsbury, T., McNamara, D.S., Jarvis, S.: Predicting lexical proficiency in language learner texts using computational indices. Lang. Test. 28(4), 561–580 (2011)CrossRefGoogle Scholar
  8. 8.
    Dale, R., Anisimoff, I., Narroway, G.: HOO 2012: a report on the preposition and determiner error correction shared task. In: Proceedings of the Seventh Workshop on Building Educational Applications Using NLP, NAACL HLT 2012, pp. 54–62. Association for Computational Linguistics, Stroudsburg (2012). event-place: Montreal, CanadaGoogle Scholar
  9. 9.
    Díaz-Negrillo, A., Fernandez-Domingez, J.: Error tagging systems for learner corpora. Spanish J. Appl. Linguist. (RESLA) 19, 83–102 (2006)Google Scholar
  10. 10.
    Geertzen, J., Alexopoulou, T., Korhonen, A.: Automatic linguistic annotation of large scale L2 databases: the EF-Cambridge Open Language Database (EFCamDat). In: Miller, R.T., et al. (eds.) Proceeedings of the 31st Second Language Research Forum. Cascadilla Press, Carnegie Mellon (2013)Google Scholar
  11. 11.
    Granger, S., Gilquin, G., Meunier, F. (eds.): The Cambridge Handbook of Learner Corpus Research. Cambridge University Press, Cambridge (2015)Google Scholar
  12. 12.
    Hawkins, J.A., Buttery, P.: Criterial features in learner corpora: theory and illustrations. English Profile J. 1(01), 1–23 (2010)CrossRefGoogle Scholar
  13. 13.
    Higgins, D., Xi, X., Zechner, K., Williamson, D.: A three-stage approach to the automated scoring of spontaneous spoken responses. Comput. Speech Lang. 25(2), 282–306 (2011)CrossRefGoogle Scholar
  14. 14.
    Huang, Y., Murakami, A., Alexopoulou, T., Korhonen, A.L.: Dependency parsing of learner English (2018)Google Scholar
  15. 15.
    Leacock, C.: Automated Grammatical Error Detection for Language Learners. Morgan & Claypool Publishers, California (2010)CrossRefGoogle Scholar
  16. 16.
    Nedungadi, P., Raj, H.: Unsupervised word sense disambiguation for automatic essay scoring. In: Kumar Kundu, M., Mohapatra, D.P., Konar, A., Chakraborty, A. (eds.) Advanced Computing, Networking and Informatics- Volume 1. SIST, vol. 27, pp. 437–443. Springer, Cham (2014). Scholar
  17. 17.
    Ng, H.T., Wu, S.M., Briscoe, T., Hadiwinoto, C., Susanto, R.H., Bryant, C.: The CoNLL-2014 shared task on grammatical error correction. In: Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task, pp. 1–14. Association for Computational Linguistics (2014), event-place: Baltimore, MarylandGoogle Scholar
  18. 18.
    Page, E.B.: The use of the computer in analyzing student essays. Int. Rev. Educ. 14(2), 210–225 (1968)CrossRefGoogle Scholar
  19. 19.
    Scrucca, L., Fop, M., Murphy, T.B., Raftery, A.E.: mclust 5: clustering, classification and density estimation using Gaussian finite mixture models. R J. 8(1), 205–233 (2016)CrossRefGoogle Scholar
  20. 20.
    Selinker, L.: Interlanguage. Int. Rev. Appl. Linguist. Lang. Teach. 10(3), 209 (1972)Google Scholar
  21. 21.
    Shermis, M.D., Burstein, J., Higgins, D., Zechner, K.: Automated essay scoring: writing assessment and instruction. Int. Encycl. Educ. 4(1), 20–26 (2010)CrossRefGoogle Scholar
  22. 22.
    Shute, V.J.: Focus on formative feedback. Rev. Educ. Res. 78(1), 153–189 (2008)CrossRefGoogle Scholar
  23. 23.
    Vajjala, S.: Automated assessment of non-native learner essays: investigating the role of linguistic features. Int. J. Artif. Intell. Educ. (2017). arXiv: 1612.00729
  24. 24.
    Vajjala, S., Loo, K.: Automatic CEFR level prediction for estonian learner text. NEALT Proc. Ser. 22, 113–128 (2014)Google Scholar
  25. 25.
    Volodina, E., Pilán, I., Alfter, D.: CALL communities and culture - short papers from EUROCALL 2016. In: Papadima-Sophocleous, S., Bradley, L., Thouësny, S. (eds.) Classification of Swedish Learner Essays by CEFR Levels, pp. 456–461. (2016)Google Scholar
  26. 26.
    Weigle, S.C.: English language learners and automated scoring of essays: critical considerations. Assessing Writ. 18(1), 85–99 (2013)CrossRefGoogle Scholar
  27. 27.
    Yan, H., Jeroen, G., Rachel, B., Anna, K., Theodora, A.: The EF Cambridge Open Language Database (EFCAMDAT) information for users (2017)Google Scholar
  28. 28.
    Yannakoudakis, H., Briscoe, T., Medlock, B.: A new dataset and method for automatically grading ESOL texts. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, HLT 2011, vol. 1, pp. 180–189. Association for Computational Linguistics, Stroudsburg (2011)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Université de Paris-Diderot, CLILLAC-ARPParisFrance
  2. 2.University of Rennes LIDILERennesFrance
  3. 3.Insight Centre for Data analytics, NUI GalwayGalwayIreland

Personalised recommendations