Abstract
The study explores the problem of assessing complexity of Russian educational texts. In this paper, we focus on measuring conceptual complexity which is rarely selected as a research question and propose to use a thesaurus (or a linguistic ontology) to this end. We also compiled an original corpus of school textbooks on Social Studies, History used in high school, and textbooks for elementary school specifically for this set of text complexity experiments. On the first stage of the research, RuThes-Lite thesaurus, a linguistic knowledge base with the total size of 100,000 concepts, was used to elicit concepts in the texts of schoolbooks and represent them as graphs. To the best of our knowledge, we a new method for text complexity assessment using RuThes-Lite graphs and identify graphs-based semantic characteristics of texts that impact complexity. The most significant findings of the research include identification of statistically significant correlations of the selected features, such as node degree, with complexity of educational texts.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
The last sentence is not truncated, hence the size of a sample in experiments is at least S tokens and at most (S+k) tokens, where k tokens are used to keep the last sentence in the sample.
References
Ahsaee, M.G., Naghibzadeh, M., Naeini, S.E.Y.: Semantic similarity assessment of words using weighted wordnet. Int. J. Mach. Learn. Cybern. 5(3), 479–490 (2014)
Biryukov, B., Tyukhtin, B.: O ponyatii slozhnosti [about the concept of complexity]. V kn.: Logika i metodologiya nauki. Materialy IV Vsesoyuznogo simpoziuma, pp. 219–231 (1967)
Budanitsky, A., Hirst, G.: Evaluating wordnet-based measures of lexical semantic relatedness. Comput. Linguist. 32(1), 13–47 (2006)
Collins, A., Quillian, M.: Retrieval time from semantic memory. J. Verbal Learn. Verbal Behav. 8, 240–248 (1969)
Crossley, S., McNamara, D.: Text coherence and judgments of essay quality: models of quality and coherence. In: Proceedings of the Annual Meeting of the Cognitive Science Society, vol. 33 (2011)
Denton, C.A., et al.: Text-processing differences in adolescent adequate and poor comprehenders reading accessible and challenging narrative and informational text. Read. Res. Quart. 50(4), 393–416 (2015)
Fellbaum, C., (ed.) WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
Gulan, T., Valerjev, P.: Semantic and related types of priming as a context in word recognition. Rev. Psychol. 17(1), 53–58 (2010)
Hong-Minh, T., Smith, D.: Word similarity in wordnet. In: Bock, H.G., Kostina, E., Phu, H.X., Rannacher, R. (eds.) Modeling, Simulation and Optimization of Complex Processes, pp. 293–302. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-79409-7_19
Hulpus, I., Štajner, S., Stuckenschmidt, H.: A spreading activation framework for tracking conceptual complexity of texts. In: Proceedings of the 57th Conference of the Association for Computational Linguistics, pp. 3878–3887 (2019)
Ivanov, V., Solnyshkina, M., Solovyev, V.: Efficiency of text readability features in Russian academic texts. Komp’juternaja Lingvistika i Intellektual’nye Tehnologii 17, 277–287 (2018)
Lagutina, N.S., Lagutina, K.V., Adrianov, A.S., Paramonov, I.V.: Russian language thesauri: automated construction and application for natural language processing tasks. Modelirovanie i Analiz Informatsionnykh Sistem 25(4), 435–458 (2018)
Laposhina, A.: Relevant features selection for the automatic text complexity measurement for Russian as a foreign language. In: Computational Linguistics and Intellectual Technologies: Papers from the Annual International Conference “Dialogue”, pp. 1–7 (2017)
Loukachevitch, N., Alekseev, A.: Summarizing news clusters on the basis of thematic chains. In: Ninth International Conference on Language Resources and Evaluation (LREC-2014), pp. 1600–1607 (2014)
Loukachevitch, N., Dobrov, B.: RuThes linguistic ontology vs. Russian wordnets. In: Proceedings of the Seventh Global Wordnet Conference, pp. 154–162 (2014)
Loukachevitch, N., Dobrov, B., Chetviorkin, I.: Ruthes-lite, a publicly available version of thesaurus of Russian language ruthes. In: Computational Linguistics and Intellectual Technologies: Papers from the Annual International Conference “Dialogue”, vol. 2014 (2014)
McNamara, D., Graesser, A., McCarthy, P., Cai, Z.: Automated evaluation of text and discourse with Coh-Metrix. Cambridge University Press, New York (2014)
McNamara, D.S., Graesser, A., Louwerse, M.M.: Sources of text difficulty: across genres and grades. In: Advances in How We Assess Reading Ability, Measuring Up, pp. 89–116 (2012)
Meng, L., Huang, R., Gu, J.: A review of semantic similarity measures in wordnet. Int. J. Hybrid Inf. Technol. 6(1), 1–12 (2013)
Reynolds, R.: Insights from Russian second language readability classification: complexity-dependent training requirements, and feature evaluation of multiple categories. In: Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications, pp. 289–300 (2016)
Solnyshkina, M., Kiselnikov, A.: Slozhnost’ teksta: etapy izucheniya v otechestvennom prikladnom yazykoznanii [text complexity: study phases in Russian linguistics]. Vestnik Tomskogo gosudarstvennogo universiteta. Filologiya [Tomsk State Univ. J. Philol.] 6, 38 (2015)
Solovyev, V., Ivanov, V., Solnyshkina, M.: Assessment of reading difficulty levels in Russian academic texts: approaches and metrics. J. Intell. Fuzzy Syst. 34(5), 3049–3058 (2018)
Štajner, S., Hulpus, I.: Automatic assessment of conceptual text complexity using knowledge graphs. In: Proceedings of the 27th International Conference on Computational Linguistics. Association for Computational Linguistics, pp. 318–330 (2018)
Steyvers, M., Tenenbaum, J.B.: The large-scale structure of semantic networks: statistical analyses and a model for semantic growth. Arxiv preprint cond-mat/0110012 (2001)
Thornbury, S.: Beyond the sentence: introducing discourse analysis. ELT J. 60(4), 392–394 (2006)
Tomina, Y.A.: Ob’ektivnaya otsenka yazykovoy trudnosti tekstov (opisanie, povestvovanie, rassuzhdenie, dokazatel’stvo) [an objective assessment of language difficulties of texts (description, narration, reasoning, proof)]. Abstract of Pedagogy Cand. Diss, Moscow (1985)
Ustalov, D.: Concept discovery from synonymy graphs. Vychislitel’nye tekhnologii [Comput. Technol.] 22, 99–112 (2017)
Acknowledgements
The study was supported by the Russian Science Foundation, grant 18-18-00436.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Solovyev, V., Ivanov, V., Solnyshkina, M. (2020). Thesaurus-Based Methods for Assessment of Text Complexity in Russian. In: Martínez-Villaseñor, L., Herrera-Alcántara, O., Ponce, H., Castro-Espinoza, F.A. (eds) Advances in Computational Intelligence. MICAI 2020. Lecture Notes in Computer Science(), vol 12469. Springer, Cham. https://doi.org/10.1007/978-3-030-60887-3_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-60887-3_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60886-6
Online ISBN: 978-3-030-60887-3
eBook Packages: Computer ScienceComputer Science (R0)