Abstract
This paper addresses the needs of language learners and teachers by combining keyword-based search and language level information on an algorithm that can rank documents by pertinence to the required topic (keywords) and adequacy to the user’s language level. We conducted several experiments using the EF-CAMDAT corpus (annotated for topic and level) and we observed that the best ranking results were an average of BM25 and linguistic information. We also saw that the grammar of level C1 is the best indicator for level. Finally, we proposed a customization for prioritizing beginner or intermediate levels at the top of the rank.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
For the grammar and vocabulary of level C1, we used the inversed score, because we expect that an A-level document would present them in a lesser degree.
- 2.
In this work we did not address advanced-level learners, because they are supposed to use traditional search engine systems.
References
Purcell, K., Rainie, L., Heaps, A., Buchanan, J., Friedrich, L., Jacklin, A., Chen, C., Zickuhr, K.: How teens do research in the digital world. Pew Internet & American Life Project (2012)
Chinkina, M., Kannan, M., Meurers, D.: Online information retrieval for language learning. In: 2016 ACL, p. 7 (2016)
Vajjala, S., Meurers, D.: On the applicability of readability models to web texts. In: 2013 ACL, p. 59 (2013)
Passonneau, R., Hemat, L., Plante, J., Sheehan, K.M.: Electronic sources as input to gre® reading comprehension item development: sourcefinder prototype evaluation. ETS Res. Rep. Ser. 2002(1) (2002). 66 pages
Collins-Thompson, K., Callan, J.: Information retrieval for language tutoring: an overview of the reap project. In: 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 544–545. ACM (2004)
Miltsakaki, E., Troutt, A.: Read-x: Automatic evaluation of reading difficulty of web text. In: E-Learn: World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education, Association for the Advancement of Computing in Education (AACE), pp. 7280–7286 (2007)
Verhelst, N., Van Avermaet, P., Takala, S., Figueras, N., North, B.: Common European Framework of Reference for Languages: Learning, Teaching, Assessment. Cambridge University Press, Cambridge (2009)
Geertzen, J., Alexopoulou, T., Korhonen, A.: Automatic linguistic annotation of large scale l2 databases: the ef-cambridge open language database (EFCAMDAT). In: Cascadilla Proceedings Project on 31st Second Language Research Forum. Somerville, MA (2013)
Rayson, P., Garside, R.: Comparing corpora using frequency profiling. In: Workshop on Comparing Corpora, pp. 1–6. ACL (2000)
Robertson, S.E., Walker, S.: Okapi/keenbow at TREC-8. In: TREC, pp. 151–162 (1999)
Pérez-Iglesias, J., Pérez-Agüera, J.R., Fresno, V., Feinstein, Y.Z.: Integrating the probabilistic models bm25/bm25f into lucene. preprint arXiv:0911.5046 (2009)
Dale, E., Chall, J.S.: A formula for predicting readability: instructions. Educ. Res. Bull. 27, 37–54 (1948)
Kincaid, J.P., Fishburne Jr., R.P., Rogers, R.L., Chissom, B.S.: Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel. Technical report, Naval Technical Training Command Millington TN Research Branch (1975)
Gunning, R.: The Technique of Clear Writing. Mcgraw-Hill, NY (1968)
Heilman, M., Collins-Thompson, K., Callan, J., Eskenazi, M.: Combining lexical and grammatical features to improve readability measures for first and second language texts. In: HLT 2007: The Conference of the NAACL, pp. 460–467 (2007)
Acknowledgements
We would like to thank the Walloon Region (Projects BEWARE n. 1510637 and 1610378) for support, and Altissia International for research collaboration.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Wilkens, R., Zilio, L., Fairon, C. (2018). Document Ranking Applied to Second Language Learning. In: Pasi, G., Piwowarski, B., Azzopardi, L., Hanbury, A. (eds) Advances in Information Retrieval. ECIR 2018. Lecture Notes in Computer Science(), vol 10772. Springer, Cham. https://doi.org/10.1007/978-3-319-76941-7_53
Download citation
DOI: https://doi.org/10.1007/978-3-319-76941-7_53
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-76940-0
Online ISBN: 978-3-319-76941-7
eBook Packages: Computer ScienceComputer Science (R0)