A Supervised Learning Model for the Automatic Assessment of Language Levels Based on Learner Errors

Ballier, Nicolas; Gaillat, Thomas; Simpkin, Andrew; Stearns, Bernardo; Bouyé, Manon; Zarrouk, Manel

doi:10.1007/978-3-030-29736-7_23

Nicolas Ballier ORCID: orcid.org/0000-0003-2179-1043¹³,
Thomas Gaillat¹⁴,
Andrew Simpkin¹⁵,
Bernardo Stearns¹⁵,
Manon Bouyé¹⁴ &
…
Manel Zarrouk¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11722))

Included in the following conference series:

European Conference on Technology Enhanced Learning

4506 Accesses
5 Citations
6 Altmetric

Abstract

This paper focuses on the use of technology in language learning. Language training requires the need to group learners homogeneously and to provide them with instant feedback on their productions such as errors [8, 15, 17] or proficiency levels. A possible approach is to assess writings from students and assign them with a level. This paper analyses the possibility of automatically predicting Common European Framework of Reference (CEFR) language levels on the basis of manually annotated errors in a written learner corpus [9, 11]. The research question is to evaluate the predictive power of errors in terms of levels and to identify which error types appear to be criterial features in determining interlanguage stages. Results show that specific errors such as punctuation, spelling and verb tense are significant at specific CEFR levels.

This paper benefited from the support of the Partenariat Hubert Currien Ulysse 2019 funding for the project “Investigating criterial features of learner English and AI-driven automatic language level assessment” (ref 43121RJ).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The EFCAMDATA is hosted by the University of Cambridge and data is accessible for academic and non-commercial purposes. Our scripts will be available on our github. Data was selected and manipulated independently of the participation of the Cambridge and Education First research teams.
2.
For instance, see the Intelligent-Essay-Assessor™developed at Pearson Knowledge Technologies; the IntelliMetric™-Essay-Scoring-System developed by Vantage Learning.
3.
http://languagetool.org.
4.
See https://englishlive.ef.com.

References

Arnold, T., Ballier, N., Gaillat, T., Lissón, P.: Predicting CEFRL levels in learner English on the basis of metrics and full texts. arXiv:1806.11099 [cs] (2018)
Attali, Y., Burstein, J.: Automated essay scoring With e-rater V.2. J. Technol. Learn. Assess. 4(3), 3–30 (2006)
Google Scholar
Barker, F., Salamoura, A., Saville, N.: Learner corpora and language testing. In: Granger, S., Gilquin, G., Meunier, F. (eds.) The Cambridge Handbook of Learner Corpus Research, pp. 511–534. Cambridge Handbooks in Language and Linguistics, Cambridge University Press (2015)
Chapter Google Scholar
Baur, C., et al.: Overview of the 2018 Spoken CALL Shared Task. In: Interspeech 2018, pp. 2354–2358. ISCA (2018)
Google Scholar
Council of Europe, Council for Cultural Co-operation. Education Committee. Modern Languages Division: Common European Framework of Reference for Languages: learning, teaching, assessment. Cambridge University Press, Cambridge (2001)
Google Scholar
Crossley, S.A., Kyle, K., Allen, L.K., Guo, L., McNamara, D.S.: Linguistic Microfeatures to Predict L2 Writing Proficiency: A Case Study in Automated Writing Evaluation (2014)
Google Scholar
Crossley, S.A., Salsbury, T., McNamara, D.S., Jarvis, S.: Predicting lexical proficiency in language learner texts using computational indices. Lang. Test. 28(4), 561–580 (2011)
Article Google Scholar
Dale, R., Anisimoff, I., Narroway, G.: HOO 2012: a report on the preposition and determiner error correction shared task. In: Proceedings of the Seventh Workshop on Building Educational Applications Using NLP, NAACL HLT 2012, pp. 54–62. Association for Computational Linguistics, Stroudsburg (2012). event-place: Montreal, Canada
Google Scholar
Díaz-Negrillo, A., Fernandez-Domingez, J.: Error tagging systems for learner corpora. Spanish J. Appl. Linguist. (RESLA) 19, 83–102 (2006)
Google Scholar
Geertzen, J., Alexopoulou, T., Korhonen, A.: Automatic linguistic annotation of large scale L2 databases: the EF-Cambridge Open Language Database (EFCamDat). In: Miller, R.T., et al. (eds.) Proceeedings of the 31st Second Language Research Forum. Cascadilla Press, Carnegie Mellon (2013)
Google Scholar
Granger, S., Gilquin, G., Meunier, F. (eds.): The Cambridge Handbook of Learner Corpus Research. Cambridge University Press, Cambridge (2015)
Google Scholar
Hawkins, J.A., Buttery, P.: Criterial features in learner corpora: theory and illustrations. English Profile J. 1(01), 1–23 (2010)
Article Google Scholar
Higgins, D., Xi, X., Zechner, K., Williamson, D.: A three-stage approach to the automated scoring of spontaneous spoken responses. Comput. Speech Lang. 25(2), 282–306 (2011)
Article Google Scholar
Huang, Y., Murakami, A., Alexopoulou, T., Korhonen, A.L.: Dependency parsing of learner English (2018)
Google Scholar
Leacock, C.: Automated Grammatical Error Detection for Language Learners. Morgan & Claypool Publishers, California (2010)
Book Google Scholar
Nedungadi, P., Raj, H.: Unsupervised word sense disambiguation for automatic essay scoring. In: Kumar Kundu, M., Mohapatra, D.P., Konar, A., Chakraborty, A. (eds.) Advanced Computing, Networking and Informatics- Volume 1. SIST, vol. 27, pp. 437–443. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07353-8_51
Chapter Google Scholar
Ng, H.T., Wu, S.M., Briscoe, T., Hadiwinoto, C., Susanto, R.H., Bryant, C.: The CoNLL-2014 shared task on grammatical error correction. In: Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task, pp. 1–14. Association for Computational Linguistics (2014), event-place: Baltimore, Maryland
Google Scholar
Page, E.B.: The use of the computer in analyzing student essays. Int. Rev. Educ. 14(2), 210–225 (1968)
Article Google Scholar
Scrucca, L., Fop, M., Murphy, T.B., Raftery, A.E.: mclust 5: clustering, classification and density estimation using Gaussian finite mixture models. R J. 8(1), 205–233 (2016)
Article Google Scholar
Selinker, L.: Interlanguage. Int. Rev. Appl. Linguist. Lang. Teach. 10(3), 209 (1972)
Google Scholar
Shermis, M.D., Burstein, J., Higgins, D., Zechner, K.: Automated essay scoring: writing assessment and instruction. Int. Encycl. Educ. 4(1), 20–26 (2010)
Article Google Scholar
Shute, V.J.: Focus on formative feedback. Rev. Educ. Res. 78(1), 153–189 (2008)
Article Google Scholar
Vajjala, S.: Automated assessment of non-native learner essays: investigating the role of linguistic features. Int. J. Artif. Intell. Educ. (2017). arXiv: 1612.00729
Vajjala, S., Loo, K.: Automatic CEFR level prediction for estonian learner text. NEALT Proc. Ser. 22, 113–128 (2014)
Google Scholar
Volodina, E., Pilán, I., Alfter, D.: CALL communities and culture - short papers from EUROCALL 2016. In: Papadima-Sophocleous, S., Bradley, L., Thouësny, S. (eds.) Classification of Swedish Learner Essays by CEFR Levels, pp. 456–461. Research-publishing.net (2016)
Google Scholar
Weigle, S.C.: English language learners and automated scoring of essays: critical considerations. Assessing Writ. 18(1), 85–99 (2013)
Article Google Scholar
Yan, H., Jeroen, G., Rachel, B., Anna, K., Theodora, A.: The EF Cambridge Open Language Database (EFCAMDAT) information for users (2017)
Google Scholar
Yannakoudakis, H., Briscoe, T., Medlock, B.: A new dataset and method for automatically grading ESOL texts. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, HLT 2011, vol. 1, pp. 180–189. Association for Computational Linguistics, Stroudsburg (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Université de Paris-Diderot, CLILLAC-ARP, 75013, Paris, France
Nicolas Ballier
University of Rennes LIDILE, Rennes, France
Thomas Gaillat & Manon Bouyé
Insight Centre for Data analytics, NUI Galway, Galway, Ireland
Andrew Simpkin, Bernardo Stearns & Manel Zarrouk

Authors

Nicolas Ballier
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Gaillat
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Simpkin
View author publications
You can also search for this author in PubMed Google Scholar
Bernardo Stearns
View author publications
You can also search for this author in PubMed Google Scholar
Manon Bouyé
View author publications
You can also search for this author in PubMed Google Scholar
Manel Zarrouk
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nicolas Ballier .

Editor information

Editors and Affiliations

Open University Netherlands, Heerlen, The Netherlands
Maren Scheffel
Paul Sabatier University, Toulouse, France
Julien Broisin
Know-Center GmbH, Graz, Austria
Viktoria Pammer-Schindler
Cyprus University of Technology, Limassol, Cyprus
Andri Ioannou
DIPF, Frankfurt/Main, Germany
Jan Schneider

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ballier, N., Gaillat, T., Simpkin, A., Stearns, B., Bouyé, M., Zarrouk, M. (2019). A Supervised Learning Model for the Automatic Assessment of Language Levels Based on Learner Errors. In: Scheffel, M., Broisin, J., Pammer-Schindler, V., Ioannou, A., Schneider, J. (eds) Transforming Learning with Meaningful Technologies. EC-TEL 2019. Lecture Notes in Computer Science(), vol 11722. Springer, Cham. https://doi.org/10.1007/978-3-030-29736-7_23

Download citation

DOI: https://doi.org/10.1007/978-3-030-29736-7_23
Published: 09 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29735-0
Online ISBN: 978-3-030-29736-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics