Investigating the Robustness of Reading Difficulty Models for Russian Educational Texts

Isaeva, Ulyana; Sorokin, Alexey

doi:10.1007/978-3-030-71214-3_6

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1357))

Included in the following conference series:

International Conference on Analysis of Images, Social Networks and Texts

578 Accesses
3 Citations

Abstract

Recent papers on Russian readability suggest several formulas aimed at evaluating text reading difficulty for learners of different ages. However, little is known about individual formulas for school subjects and their performance compared to that of existing universal readability formulas. Our goal is to study the impact of the subject both in terms of model quality and on the importance of individual features. We trained 4 linear regression models: an individual formula for each of 3 school subjects (Biology, Literature, and Social Studies) and a universal formula for all the 3 subjects. The dataset was created of schoolbook texts, randomly sampled into pseudo-texts of size 500 sentences. It was split into train and test sets in the ratio of 75 to 25. As for the features, previous papers on Russian readability do not provide proper feature selection. So we suggested a set of 32 features that are possibly relevant to text difficulty in Russian. For every model, features were selected from this set based on their importance. The results obtained show that all the one-subject formulas outperform the universal model and previously developed readability formulas. Experiments with other sample sizes (200 and 900 sentences per sample) prove these results. This is because feature importances vary significantly among the subjects. Suggested readability models might be beneficial for school education for evaluating text relevance for learners and adjusting those texts to target difficulty levels.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We replicate the calculation of RMSE for [12] model and our dataset and obtain much larger error. Since the exact test set and model predictions are not available, we have no explanation of such difference.
2.
We do not evaluate the adjusted FRE-formula (1), [10] as it predicts the abstract readability score, not the school grade.

References

Collins-Thompson, K., Callan, J.P.: A language modeling approach to predicting reading difficulty. In: Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004, pp. 193–200 (2004)
Google Scholar
Dell’Orletta, F., Wieling, M., Venturi, G., Cimino, A., Montemagni, S.: Assessing the readability of sentences: which corpora and features? In: Proceedings of the Ninth Workshop on Innovative Use of NLP for Building Educational Applications, pp. 163–173. Association for Computational Linguistics (2014). https://doi.org/10.3115/v1/W14-1820. http://aclweb.org/anthology/W14-1820
Flesch, R., Gould, A.J.: The Art of Readable Writing, vol. 8. Harper, New York (1949)
Google Scholar
Ivanov, V., Solnyshkina, M., Solovyev, V.: Efficiency of text readability features in Russian academic texts. In: Komp’juternaja Lingvistika I Intellektual’nye Tehnologii, pp. 267–283 (2018)
Google Scholar
Juilland, A., Chang-Rodríguez, E.: Frequency dictionary of Spanish words. Technical report (1964)
Google Scholar
Kincaid, J.P., Fishburne Jr., R.P., Rogers, R.L., Chissom, B.S.: Derivation of new readability formulas (automated readability index, fog count and Flesch reading ease formula) for Navy enlisted personnel. Technical report, Naval Technical Training Command Millington TN Research Branch (1975)
Google Scholar
Kuhn, M., Johnson, K.: Applied Predictive Modeling. Springer, New York (2013). https://doi.org/10.1007/978-1-4614-6849-3
Book MATH Google Scholar
Matskovsky, M.: Problems of readability of printed material. In: Semantic Perception of Speech Messages in Conditions of Mass Communication, Nauka, pp. 126–142 (1976)
Google Scholar
Mikk, Y.: On factors of comprehensibility of educational texts. Ph.D. thesis, Tartu University (1970)
Google Scholar
Oborneva, I.: Automatic assessment of the complexity of educational texts on the basis of statistical parameters. Ph.D. thesis, Moscow State Pedagogical University (2006)
Google Scholar
Solnyshkina, M.I., Kiselnikov, A.S.: Text complexity: study phases in Russian linguistics. Vestnik Tomskogo gosudarstvennogo universiteta. Filologiya 6(38), 86–99 (2015). https://doi.org/10.17223/19986645/38/7
Solovyev, V., Ivanov, V., Solnyshkina, M.: Assessment of reading difficulty levels in Russian academic texts: approaches and metrics. J. Intell. Fuzzy Syst. 34(5), 3049–3058 (2018)
Article Google Scholar
Solovyev, V., Solnyshkina, M., Ivanov, V., Batyrshin, I.: Prediction of reading difficulty in Russian academic texts. J. Intell. Fuzzy Syst. 36(5), 4553–4563 (2019)
Article Google Scholar
Straka, M., Straková, J.: Tokenizing, POS tagging, lemmatizing and parsing UD 2.0 with UDPipe. In: Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pp. 88–99 (2017)
Google Scholar
Zakaluk, B., Samuels, S.: Issues related to text comprehensibility: the future of readability. Revue québécoise de linguistique 25(1), 41–59 (1996)
Article Google Scholar

Download references

Acknowledgements

We thank V. Solovyev, M. Solnyshkina, V. Ivanov et al. for publishing their database of schoolbook texts that we used in our study. This contributed greatly to the findings obtained in our work. We are also very grateful to anonymous AIST reviewers whose thorough comments helped to improve the paper.

Author information

Authors and Affiliations

Moscow State University, Moscow, Russia
Ulyana Isaeva & Alexey Sorokin
Moscow Institute of Physics and Technology, Dolgoprudny, Russia
Alexey Sorokin

Authors

Ulyana Isaeva
View author publications
You can also search for this author in PubMed Google Scholar
Alexey Sorokin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alexey Sorokin .

Editor information

Editors and Affiliations

RWTH Aachen University, Aachen, Germany
Wil M. P. van der Aalst
University of Ljubljana, Ljubljana, Slovenia
Vladimir Batagelj
National Research University Higher School of Economics, Perm, Russia
Alexey Buzmakov
National Research University Higher School of Economics, Moscow, Russia
Dmitry I. Ignatov
University of Melbourne, Melbourne, VIC, Australia
Anna Kalenkova
Krasovskii Institute of Mathematics and Mechanics of RAS, Ekaterinburg, Russia
Michael Khachay
National Research University Higher School of Economics, Saint-Petersburg, Russia
Olessia Koltsova
University of Oslo, Oslo, Norway
Andrey Kutuzov
National Research University Higher School of Economics, Moscow, Russia
Sergei O. Kuznetsov
National Research University Higher School of Economics, Moscow, Russia
Irina A. Lomazova
Lomonosov Moscow State University, Moscow, Russia
Natalia Loukachevitch
National Research University Higher School of Economics, Moscow, Russia
Ilya Makarov
LORIA, Vandœuvre-lès-Nancy, France
Amedeo Napoli
Skolkovo Institute of Science and Technology, Moscow, Russia
Alexander Panchenko
University of Florida, Gainesville, FL, USA
Panos M. Pardalos
Università Ca’ Foscari Venezia, Venezia, Italy
Marcello Pelillo
National Research University Higher School of Economics, Nizhny Novgorod, Russia
Andrey V. Savchenko
Kazan Federal University, Kazan, Russia
Elena Tutubalina

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Isaeva, U., Sorokin, A. (2021). Investigating the Robustness of Reading Difficulty Models for Russian Educational Texts. In: van der Aalst, W.M.P., et al. Recent Trends in Analysis of Images, Social Networks and Texts. AIST 2020. Communications in Computer and Information Science, vol 1357. Springer, Cham. https://doi.org/10.1007/978-3-030-71214-3_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-71214-3_6
Published: 25 March 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-71213-6
Online ISBN: 978-3-030-71214-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics