Skip to main content

A Supervised Learning Model for the Automatic Assessment of Language Levels Based on Learner Errors

  • Conference paper
  • First Online:
Transforming Learning with Meaningful Technologies (EC-TEL 2019)

Abstract

This paper focuses on the use of technology in language learning. Language training requires the need to group learners homogeneously and to provide them with instant feedback on their productions such as errors [8, 15, 17] or proficiency levels. A possible approach is to assess writings from students and assign them with a level. This paper analyses the possibility of automatically predicting Common European Framework of Reference (CEFR) language levels on the basis of manually annotated errors in a written learner corpus [9, 11]. The research question is to evaluate the predictive power of errors in terms of levels and to identify which error types appear to be criterial features in determining interlanguage stages. Results show that specific errors such as punctuation, spelling and verb tense are significant at specific CEFR levels.

This paper benefited from the support of the Partenariat Hubert Currien Ulysse 2019 funding for the project “Investigating criterial features of learner English and AI-driven automatic language level assessment” (ref 43121RJ).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The EFCAMDATA is hosted by the University of Cambridge and data is accessible for academic and non-commercial purposes. Our scripts will be available on our github. Data was selected and manipulated independently of the participation of the Cambridge and Education First research teams.

  2. 2.

    For instance, see the Intelligent-Essay-Assessor™developed at Pearson Knowledge Technologies; the IntelliMetric™-Essay-Scoring-System developed by Vantage Learning.

  3. 3.

    http://languagetool.org.

  4. 4.

    See https://englishlive.ef.com.

References

  1. Arnold, T., Ballier, N., Gaillat, T., Lissón, P.: Predicting CEFRL levels in learner English on the basis of metrics and full texts. arXiv:1806.11099 [cs] (2018)

  2. Attali, Y., Burstein, J.: Automated essay scoring With e-rater V.2. J. Technol. Learn. Assess. 4(3), 3–30 (2006)

    Google Scholar 

  3. Barker, F., Salamoura, A., Saville, N.: Learner corpora and language testing. In: Granger, S., Gilquin, G., Meunier, F. (eds.) The Cambridge Handbook of Learner Corpus Research, pp. 511–534. Cambridge Handbooks in Language and Linguistics, Cambridge University Press (2015)

    Chapter  Google Scholar 

  4. Baur, C., et al.: Overview of the 2018 Spoken CALL Shared Task. In: Interspeech 2018, pp. 2354–2358. ISCA (2018)

    Google Scholar 

  5. Council of Europe, Council for Cultural Co-operation. Education Committee. Modern Languages Division: Common European Framework of Reference for Languages: learning, teaching, assessment. Cambridge University Press, Cambridge (2001)

    Google Scholar 

  6. Crossley, S.A., Kyle, K., Allen, L.K., Guo, L., McNamara, D.S.: Linguistic Microfeatures to Predict L2 Writing Proficiency: A Case Study in Automated Writing Evaluation (2014)

    Google Scholar 

  7. Crossley, S.A., Salsbury, T., McNamara, D.S., Jarvis, S.: Predicting lexical proficiency in language learner texts using computational indices. Lang. Test. 28(4), 561–580 (2011)

    Article  Google Scholar 

  8. Dale, R., Anisimoff, I., Narroway, G.: HOO 2012: a report on the preposition and determiner error correction shared task. In: Proceedings of the Seventh Workshop on Building Educational Applications Using NLP, NAACL HLT 2012, pp. 54–62. Association for Computational Linguistics, Stroudsburg (2012). event-place: Montreal, Canada

    Google Scholar 

  9. Díaz-Negrillo, A., Fernandez-Domingez, J.: Error tagging systems for learner corpora. Spanish J. Appl. Linguist. (RESLA) 19, 83–102 (2006)

    Google Scholar 

  10. Geertzen, J., Alexopoulou, T., Korhonen, A.: Automatic linguistic annotation of large scale L2 databases: the EF-Cambridge Open Language Database (EFCamDat). In: Miller, R.T., et al. (eds.) Proceeedings of the 31st Second Language Research Forum. Cascadilla Press, Carnegie Mellon (2013)

    Google Scholar 

  11. Granger, S., Gilquin, G., Meunier, F. (eds.): The Cambridge Handbook of Learner Corpus Research. Cambridge University Press, Cambridge (2015)

    Google Scholar 

  12. Hawkins, J.A., Buttery, P.: Criterial features in learner corpora: theory and illustrations. English Profile J. 1(01), 1–23 (2010)

    Article  Google Scholar 

  13. Higgins, D., Xi, X., Zechner, K., Williamson, D.: A three-stage approach to the automated scoring of spontaneous spoken responses. Comput. Speech Lang. 25(2), 282–306 (2011)

    Article  Google Scholar 

  14. Huang, Y., Murakami, A., Alexopoulou, T., Korhonen, A.L.: Dependency parsing of learner English (2018)

    Google Scholar 

  15. Leacock, C.: Automated Grammatical Error Detection for Language Learners. Morgan & Claypool Publishers, California (2010)

    Book  Google Scholar 

  16. Nedungadi, P., Raj, H.: Unsupervised word sense disambiguation for automatic essay scoring. In: Kumar Kundu, M., Mohapatra, D.P., Konar, A., Chakraborty, A. (eds.) Advanced Computing, Networking and Informatics- Volume 1. SIST, vol. 27, pp. 437–443. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07353-8_51

    Chapter  Google Scholar 

  17. Ng, H.T., Wu, S.M., Briscoe, T., Hadiwinoto, C., Susanto, R.H., Bryant, C.: The CoNLL-2014 shared task on grammatical error correction. In: Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task, pp. 1–14. Association for Computational Linguistics (2014), event-place: Baltimore, Maryland

    Google Scholar 

  18. Page, E.B.: The use of the computer in analyzing student essays. Int. Rev. Educ. 14(2), 210–225 (1968)

    Article  Google Scholar 

  19. Scrucca, L., Fop, M., Murphy, T.B., Raftery, A.E.: mclust 5: clustering, classification and density estimation using Gaussian finite mixture models. R J. 8(1), 205–233 (2016)

    Article  Google Scholar 

  20. Selinker, L.: Interlanguage. Int. Rev. Appl. Linguist. Lang. Teach. 10(3), 209 (1972)

    Google Scholar 

  21. Shermis, M.D., Burstein, J., Higgins, D., Zechner, K.: Automated essay scoring: writing assessment and instruction. Int. Encycl. Educ. 4(1), 20–26 (2010)

    Article  Google Scholar 

  22. Shute, V.J.: Focus on formative feedback. Rev. Educ. Res. 78(1), 153–189 (2008)

    Article  Google Scholar 

  23. Vajjala, S.: Automated assessment of non-native learner essays: investigating the role of linguistic features. Int. J. Artif. Intell. Educ. (2017). arXiv: 1612.00729

  24. Vajjala, S., Loo, K.: Automatic CEFR level prediction for estonian learner text. NEALT Proc. Ser. 22, 113–128 (2014)

    Google Scholar 

  25. Volodina, E., Pilán, I., Alfter, D.: CALL communities and culture - short papers from EUROCALL 2016. In: Papadima-Sophocleous, S., Bradley, L., Thouësny, S. (eds.) Classification of Swedish Learner Essays by CEFR Levels, pp. 456–461. Research-publishing.net (2016)

    Google Scholar 

  26. Weigle, S.C.: English language learners and automated scoring of essays: critical considerations. Assessing Writ. 18(1), 85–99 (2013)

    Article  Google Scholar 

  27. Yan, H., Jeroen, G., Rachel, B., Anna, K., Theodora, A.: The EF Cambridge Open Language Database (EFCAMDAT) information for users (2017)

    Google Scholar 

  28. Yannakoudakis, H., Briscoe, T., Medlock, B.: A new dataset and method for automatically grading ESOL texts. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, HLT 2011, vol. 1, pp. 180–189. Association for Computational Linguistics, Stroudsburg (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nicolas Ballier .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ballier, N., Gaillat, T., Simpkin, A., Stearns, B., Bouyé, M., Zarrouk, M. (2019). A Supervised Learning Model for the Automatic Assessment of Language Levels Based on Learner Errors. In: Scheffel, M., Broisin, J., Pammer-Schindler, V., Ioannou, A., Schneider, J. (eds) Transforming Learning with Meaningful Technologies. EC-TEL 2019. Lecture Notes in Computer Science(), vol 11722. Springer, Cham. https://doi.org/10.1007/978-3-030-29736-7_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-29736-7_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-29735-0

  • Online ISBN: 978-3-030-29736-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics