Skip to main content

Investigating the Robustness of Reading Difficulty Models for Russian Educational Texts

  • Conference paper
  • First Online:
Recent Trends in Analysis of Images, Social Networks and Texts (AIST 2020)

Abstract

Recent papers on Russian readability suggest several formulas aimed at evaluating text reading difficulty for learners of different ages. However, little is known about individual formulas for school subjects and their performance compared to that of existing universal readability formulas. Our goal is to study the impact of the subject both in terms of model quality and on the importance of individual features. We trained 4 linear regression models: an individual formula for each of 3 school subjects (Biology, Literature, and Social Studies) and a universal formula for all the 3 subjects. The dataset was created of schoolbook texts, randomly sampled into pseudo-texts of size 500 sentences. It was split into train and test sets in the ratio of 75 to 25. As for the features, previous papers on Russian readability do not provide proper feature selection. So we suggested a set of 32 features that are possibly relevant to text difficulty in Russian. For every model, features were selected from this set based on their importance. The results obtained show that all the one-subject formulas outperform the universal model and previously developed readability formulas. Experiments with other sample sizes (200 and 900 sentences per sample) prove these results. This is because feature importances vary significantly among the subjects. Suggested readability models might be beneficial for school education for evaluating text relevance for learners and adjusting those texts to target difficulty levels.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We replicate the calculation of RMSE for [12] model and our dataset and obtain much larger error. Since the exact test set and model predictions are not available, we have no explanation of such difference.

  2. 2.

    We do not evaluate the adjusted FRE-formula (1), [10] as it predicts the abstract readability score, not the school grade.

References

  1. Collins-Thompson, K., Callan, J.P.: A language modeling approach to predicting reading difficulty. In: Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004, pp. 193–200 (2004)

    Google Scholar 

  2. Dell’Orletta, F., Wieling, M., Venturi, G., Cimino, A., Montemagni, S.: Assessing the readability of sentences: which corpora and features? In: Proceedings of the Ninth Workshop on Innovative Use of NLP for Building Educational Applications, pp. 163–173. Association for Computational Linguistics (2014). https://doi.org/10.3115/v1/W14-1820. http://aclweb.org/anthology/W14-1820

  3. Flesch, R., Gould, A.J.: The Art of Readable Writing, vol. 8. Harper, New York (1949)

    Google Scholar 

  4. Ivanov, V., Solnyshkina, M., Solovyev, V.: Efficiency of text readability features in Russian academic texts. In: Komp’juternaja Lingvistika I Intellektual’nye Tehnologii, pp. 267–283 (2018)

    Google Scholar 

  5. Juilland, A., Chang-Rodríguez, E.: Frequency dictionary of Spanish words. Technical report (1964)

    Google Scholar 

  6. Kincaid, J.P., Fishburne Jr., R.P., Rogers, R.L., Chissom, B.S.: Derivation of new readability formulas (automated readability index, fog count and Flesch reading ease formula) for Navy enlisted personnel. Technical report, Naval Technical Training Command Millington TN Research Branch (1975)

    Google Scholar 

  7. Kuhn, M., Johnson, K.: Applied Predictive Modeling. Springer, New York (2013). https://doi.org/10.1007/978-1-4614-6849-3

    Book  MATH  Google Scholar 

  8. Matskovsky, M.: Problems of readability of printed material. In: Semantic Perception of Speech Messages in Conditions of Mass Communication, Nauka, pp. 126–142 (1976)

    Google Scholar 

  9. Mikk, Y.: On factors of comprehensibility of educational texts. Ph.D. thesis, Tartu University (1970)

    Google Scholar 

  10. Oborneva, I.: Automatic assessment of the complexity of educational texts on the basis of statistical parameters. Ph.D. thesis, Moscow State Pedagogical University (2006)

    Google Scholar 

  11. Solnyshkina, M.I., Kiselnikov, A.S.: Text complexity: study phases in Russian linguistics. Vestnik Tomskogo gosudarstvennogo universiteta. Filologiya 6(38), 86–99 (2015). https://doi.org/10.17223/19986645/38/7

  12. Solovyev, V., Ivanov, V., Solnyshkina, M.: Assessment of reading difficulty levels in Russian academic texts: approaches and metrics. J. Intell. Fuzzy Syst. 34(5), 3049–3058 (2018)

    Article  Google Scholar 

  13. Solovyev, V., Solnyshkina, M., Ivanov, V., Batyrshin, I.: Prediction of reading difficulty in Russian academic texts. J. Intell. Fuzzy Syst. 36(5), 4553–4563 (2019)

    Article  Google Scholar 

  14. Straka, M., Straková, J.: Tokenizing, POS tagging, lemmatizing and parsing UD 2.0 with UDPipe. In: Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pp. 88–99 (2017)

    Google Scholar 

  15. Zakaluk, B., Samuels, S.: Issues related to text comprehensibility: the future of readability. Revue québécoise de linguistique 25(1), 41–59 (1996)

    Article  Google Scholar 

Download references

Acknowledgements

We thank V. Solovyev, M. Solnyshkina, V. Ivanov et al. for publishing their database of schoolbook texts that we used in our study. This contributed greatly to the findings obtained in our work. We are also very grateful to anonymous AIST reviewers whose thorough comments helped to improve the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alexey Sorokin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Isaeva, U., Sorokin, A. (2021). Investigating the Robustness of Reading Difficulty Models for Russian Educational Texts. In: van der Aalst, W.M.P., et al. Recent Trends in Analysis of Images, Social Networks and Texts. AIST 2020. Communications in Computer and Information Science, vol 1357. Springer, Cham. https://doi.org/10.1007/978-3-030-71214-3_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-71214-3_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-71213-6

  • Online ISBN: 978-3-030-71214-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics