Skip to main content

Thesaurus-Based Methods for Assessment of Text Complexity in Russian

  • Conference paper
  • First Online:
Advances in Computational Intelligence (MICAI 2020)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12469))

Included in the following conference series:

  • 820 Accesses

Abstract

The study explores the problem of assessing complexity of Russian educational texts. In this paper, we focus on measuring conceptual complexity which is rarely selected as a research question and propose to use a thesaurus (or a linguistic ontology) to this end. We also compiled an original corpus of school textbooks on Social Studies, History used in high school, and textbooks for elementary school specifically for this set of text complexity experiments. On the first stage of the research, RuThes-Lite thesaurus, a linguistic knowledge base with the total size of 100,000 concepts, was used to elicit concepts in the texts of schoolbooks and represent them as graphs. To the best of our knowledge, we a new method for text complexity assessment using RuThes-Lite graphs and identify graphs-based semantic characteristics of texts that impact complexity. The most significant findings of the research include identification of statistically significant correlations of the selected features, such as node degree, with complexity of educational texts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/.

  2. 2.

    The last sentence is not truncated, hence the size of a sample in experiments is at least S tokens and at most (S+k) tokens, where k tokens are used to keep the last sentence in the sample.

References

  1. Ahsaee, M.G., Naghibzadeh, M., Naeini, S.E.Y.: Semantic similarity assessment of words using weighted wordnet. Int. J. Mach. Learn. Cybern. 5(3), 479–490 (2014)

    Article  Google Scholar 

  2. Biryukov, B., Tyukhtin, B.: O ponyatii slozhnosti [about the concept of complexity]. V kn.: Logika i metodologiya nauki. Materialy IV Vsesoyuznogo simpoziuma, pp. 219–231 (1967)

    Google Scholar 

  3. Budanitsky, A., Hirst, G.: Evaluating wordnet-based measures of lexical semantic relatedness. Comput. Linguist. 32(1), 13–47 (2006)

    Article  Google Scholar 

  4. Collins, A., Quillian, M.: Retrieval time from semantic memory. J. Verbal Learn. Verbal Behav. 8, 240–248 (1969)

    Article  Google Scholar 

  5. Crossley, S., McNamara, D.: Text coherence and judgments of essay quality: models of quality and coherence. In: Proceedings of the Annual Meeting of the Cognitive Science Society, vol. 33 (2011)

    Google Scholar 

  6. Denton, C.A., et al.: Text-processing differences in adolescent adequate and poor comprehenders reading accessible and challenging narrative and informational text. Read. Res. Quart. 50(4), 393–416 (2015)

    Article  Google Scholar 

  7. Fellbaum, C., (ed.) WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)

    Google Scholar 

  8. Gulan, T., Valerjev, P.: Semantic and related types of priming as a context in word recognition. Rev. Psychol. 17(1), 53–58 (2010)

    Google Scholar 

  9. Hong-Minh, T., Smith, D.: Word similarity in wordnet. In: Bock, H.G., Kostina, E., Phu, H.X., Rannacher, R. (eds.) Modeling, Simulation and Optimization of Complex Processes, pp. 293–302. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-79409-7_19

    Chapter  Google Scholar 

  10. Hulpus, I., Štajner, S., Stuckenschmidt, H.: A spreading activation framework for tracking conceptual complexity of texts. In: Proceedings of the 57th Conference of the Association for Computational Linguistics, pp. 3878–3887 (2019)

    Google Scholar 

  11. Ivanov, V., Solnyshkina, M., Solovyev, V.: Efficiency of text readability features in Russian academic texts. Komp’juternaja Lingvistika i Intellektual’nye Tehnologii 17, 277–287 (2018)

    Google Scholar 

  12. Lagutina, N.S., Lagutina, K.V., Adrianov, A.S., Paramonov, I.V.: Russian language thesauri: automated construction and application for natural language processing tasks. Modelirovanie i Analiz Informatsionnykh Sistem 25(4), 435–458 (2018)

    Google Scholar 

  13. Laposhina, A.: Relevant features selection for the automatic text complexity measurement for Russian as a foreign language. In: Computational Linguistics and Intellectual Technologies: Papers from the Annual International Conference “Dialogue”, pp. 1–7 (2017)

    Google Scholar 

  14. Loukachevitch, N., Alekseev, A.: Summarizing news clusters on the basis of thematic chains. In: Ninth International Conference on Language Resources and Evaluation (LREC-2014), pp. 1600–1607 (2014)

    Google Scholar 

  15. Loukachevitch, N., Dobrov, B.: RuThes linguistic ontology vs. Russian wordnets. In: Proceedings of the Seventh Global Wordnet Conference, pp. 154–162 (2014)

    Google Scholar 

  16. Loukachevitch, N., Dobrov, B., Chetviorkin, I.: Ruthes-lite, a publicly available version of thesaurus of Russian language ruthes. In: Computational Linguistics and Intellectual Technologies: Papers from the Annual International Conference “Dialogue”, vol. 2014 (2014)

    Google Scholar 

  17. McNamara, D., Graesser, A., McCarthy, P., Cai, Z.: Automated evaluation of text and discourse with Coh-Metrix. Cambridge University Press, New York (2014)

    Google Scholar 

  18. McNamara, D.S., Graesser, A., Louwerse, M.M.: Sources of text difficulty: across genres and grades. In: Advances in How We Assess Reading Ability, Measuring Up, pp. 89–116 (2012)

    Google Scholar 

  19. Meng, L., Huang, R., Gu, J.: A review of semantic similarity measures in wordnet. Int. J. Hybrid Inf. Technol. 6(1), 1–12 (2013)

    Google Scholar 

  20. Reynolds, R.: Insights from Russian second language readability classification: complexity-dependent training requirements, and feature evaluation of multiple categories. In: Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications, pp. 289–300 (2016)

    Google Scholar 

  21. Solnyshkina, M., Kiselnikov, A.: Slozhnost’ teksta: etapy izucheniya v otechestvennom prikladnom yazykoznanii [text complexity: study phases in Russian linguistics]. Vestnik Tomskogo gosudarstvennogo universiteta. Filologiya [Tomsk State Univ. J. Philol.] 6, 38 (2015)

    Google Scholar 

  22. Solovyev, V., Ivanov, V., Solnyshkina, M.: Assessment of reading difficulty levels in Russian academic texts: approaches and metrics. J. Intell. Fuzzy Syst. 34(5), 3049–3058 (2018)

    Article  Google Scholar 

  23. Štajner, S., Hulpus, I.: Automatic assessment of conceptual text complexity using knowledge graphs. In: Proceedings of the 27th International Conference on Computational Linguistics. Association for Computational Linguistics, pp. 318–330 (2018)

    Google Scholar 

  24. Steyvers, M., Tenenbaum, J.B.: The large-scale structure of semantic networks: statistical analyses and a model for semantic growth. Arxiv preprint cond-mat/0110012 (2001)

    Google Scholar 

  25. Thornbury, S.: Beyond the sentence: introducing discourse analysis. ELT J. 60(4), 392–394 (2006)

    Google Scholar 

  26. Tomina, Y.A.: Ob’ektivnaya otsenka yazykovoy trudnosti tekstov (opisanie, povestvovanie, rassuzhdenie, dokazatel’stvo) [an objective assessment of language difficulties of texts (description, narration, reasoning, proof)]. Abstract of Pedagogy Cand. Diss, Moscow (1985)

    Google Scholar 

  27. Ustalov, D.: Concept discovery from synonymy graphs. Vychislitel’nye tekhnologii [Comput. Technol.] 22, 99–112 (2017)

    Google Scholar 

Download references

Acknowledgements

The study was supported by the Russian Science Foundation, grant 18-18-00436.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Valery Solovyev .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Solovyev, V., Ivanov, V., Solnyshkina, M. (2020). Thesaurus-Based Methods for Assessment of Text Complexity in Russian. In: Martínez-Villaseñor, L., Herrera-Alcántara, O., Ponce, H., Castro-Espinoza, F.A. (eds) Advances in Computational Intelligence. MICAI 2020. Lecture Notes in Computer Science(), vol 12469. Springer, Cham. https://doi.org/10.1007/978-3-030-60887-3_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-60887-3_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-60886-6

  • Online ISBN: 978-3-030-60887-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics