Abstract
Recent papers on Russian readability suggest several formulas aimed at evaluating text reading difficulty for learners of different ages. However, little is known about individual formulas for school subjects and their performance compared to that of existing universal readability formulas. Our goal is to study the impact of the subject both in terms of model quality and on the importance of individual features. We trained 4 linear regression models: an individual formula for each of 3 school subjects (Biology, Literature, and Social Studies) and a universal formula for all the 3 subjects. The dataset was created of schoolbook texts, randomly sampled into pseudo-texts of size 500 sentences. It was split into train and test sets in the ratio of 75 to 25. As for the features, previous papers on Russian readability do not provide proper feature selection. So we suggested a set of 32 features that are possibly relevant to text difficulty in Russian. For every model, features were selected from this set based on their importance. The results obtained show that all the one-subject formulas outperform the universal model and previously developed readability formulas. Experiments with other sample sizes (200 and 900 sentences per sample) prove these results. This is because feature importances vary significantly among the subjects. Suggested readability models might be beneficial for school education for evaluating text relevance for learners and adjusting those texts to target difficulty levels.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
We replicate the calculation of RMSE for [12] model and our dataset and obtain much larger error. Since the exact test set and model predictions are not available, we have no explanation of such difference.
- 2.
References
Collins-Thompson, K., Callan, J.P.: A language modeling approach to predicting reading difficulty. In: Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004, pp. 193–200 (2004)
Dell’Orletta, F., Wieling, M., Venturi, G., Cimino, A., Montemagni, S.: Assessing the readability of sentences: which corpora and features? In: Proceedings of the Ninth Workshop on Innovative Use of NLP for Building Educational Applications, pp. 163–173. Association for Computational Linguistics (2014). https://doi.org/10.3115/v1/W14-1820. http://aclweb.org/anthology/W14-1820
Flesch, R., Gould, A.J.: The Art of Readable Writing, vol. 8. Harper, New York (1949)
Ivanov, V., Solnyshkina, M., Solovyev, V.: Efficiency of text readability features in Russian academic texts. In: Komp’juternaja Lingvistika I Intellektual’nye Tehnologii, pp. 267–283 (2018)
Juilland, A., Chang-RodrÃguez, E.: Frequency dictionary of Spanish words. Technical report (1964)
Kincaid, J.P., Fishburne Jr., R.P., Rogers, R.L., Chissom, B.S.: Derivation of new readability formulas (automated readability index, fog count and Flesch reading ease formula) for Navy enlisted personnel. Technical report, Naval Technical Training Command Millington TN Research Branch (1975)
Kuhn, M., Johnson, K.: Applied Predictive Modeling. Springer, New York (2013). https://doi.org/10.1007/978-1-4614-6849-3
Matskovsky, M.: Problems of readability of printed material. In: Semantic Perception of Speech Messages in Conditions of Mass Communication, Nauka, pp. 126–142 (1976)
Mikk, Y.: On factors of comprehensibility of educational texts. Ph.D. thesis, Tartu University (1970)
Oborneva, I.: Automatic assessment of the complexity of educational texts on the basis of statistical parameters. Ph.D. thesis, Moscow State Pedagogical University (2006)
Solnyshkina, M.I., Kiselnikov, A.S.: Text complexity: study phases in Russian linguistics. Vestnik Tomskogo gosudarstvennogo universiteta. Filologiya 6(38), 86–99 (2015). https://doi.org/10.17223/19986645/38/7
Solovyev, V., Ivanov, V., Solnyshkina, M.: Assessment of reading difficulty levels in Russian academic texts: approaches and metrics. J. Intell. Fuzzy Syst. 34(5), 3049–3058 (2018)
Solovyev, V., Solnyshkina, M., Ivanov, V., Batyrshin, I.: Prediction of reading difficulty in Russian academic texts. J. Intell. Fuzzy Syst. 36(5), 4553–4563 (2019)
Straka, M., Straková, J.: Tokenizing, POS tagging, lemmatizing and parsing UD 2.0 with UDPipe. In: Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pp. 88–99 (2017)
Zakaluk, B., Samuels, S.: Issues related to text comprehensibility: the future of readability. Revue québécoise de linguistique 25(1), 41–59 (1996)
Acknowledgements
We thank V. Solovyev, M. Solnyshkina, V. Ivanov et al. for publishing their database of schoolbook texts that we used in our study. This contributed greatly to the findings obtained in our work. We are also very grateful to anonymous AIST reviewers whose thorough comments helped to improve the paper.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Isaeva, U., Sorokin, A. (2021). Investigating the Robustness of Reading Difficulty Models for Russian Educational Texts. In: van der Aalst, W.M.P., et al. Recent Trends in Analysis of Images, Social Networks and Texts. AIST 2020. Communications in Computer and Information Science, vol 1357. Springer, Cham. https://doi.org/10.1007/978-3-030-71214-3_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-71214-3_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-71213-6
Online ISBN: 978-3-030-71214-3
eBook Packages: Computer ScienceComputer Science (R0)