Thesaurus-Based Methods for Assessment of Text Complexity in Russian

Solovyev, Valery; Ivanov, Vladimir; Solnyshkina, Marina

doi:10.1007/978-3-030-60887-3_14

Valery Solovyev¹²,
Vladimir Ivanov¹³ &
Marina Solnyshkina¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12469))

Included in the following conference series:

Mexican International Conference on Artificial Intelligence

820 Accesses

Abstract

The study explores the problem of assessing complexity of Russian educational texts. In this paper, we focus on measuring conceptual complexity which is rarely selected as a research question and propose to use a thesaurus (or a linguistic ontology) to this end. We also compiled an original corpus of school textbooks on Social Studies, History used in high school, and textbooks for elementary school specifically for this set of text complexity experiments. On the first stage of the research, RuThes-Lite thesaurus, a linguistic knowledge base with the total size of 100,000 concepts, was used to elicit concepts in the texts of schoolbooks and represent them as graphs. To the best of our knowledge, we a new method for text complexity assessment using RuThes-Lite graphs and identify graphs-based semantic characteristics of texts that impact complexity. The most significant findings of the research include identification of statistically significant correlations of the selected features, such as node degree, with complexity of educational texts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/.
2.
The last sentence is not truncated, hence the size of a sample in experiments is at least S tokens and at most (S+k) tokens, where k tokens are used to keep the last sentence in the sample.

References

Ahsaee, M.G., Naghibzadeh, M., Naeini, S.E.Y.: Semantic similarity assessment of words using weighted wordnet. Int. J. Mach. Learn. Cybern. 5(3), 479–490 (2014)
Article Google Scholar
Biryukov, B., Tyukhtin, B.: O ponyatii slozhnosti [about the concept of complexity]. V kn.: Logika i metodologiya nauki. Materialy IV Vsesoyuznogo simpoziuma, pp. 219–231 (1967)
Google Scholar
Budanitsky, A., Hirst, G.: Evaluating wordnet-based measures of lexical semantic relatedness. Comput. Linguist. 32(1), 13–47 (2006)
Article Google Scholar
Collins, A., Quillian, M.: Retrieval time from semantic memory. J. Verbal Learn. Verbal Behav. 8, 240–248 (1969)
Article Google Scholar
Crossley, S., McNamara, D.: Text coherence and judgments of essay quality: models of quality and coherence. In: Proceedings of the Annual Meeting of the Cognitive Science Society, vol. 33 (2011)
Google Scholar
Denton, C.A., et al.: Text-processing differences in adolescent adequate and poor comprehenders reading accessible and challenging narrative and informational text. Read. Res. Quart. 50(4), 393–416 (2015)
Article Google Scholar
Fellbaum, C., (ed.) WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
Google Scholar
Gulan, T., Valerjev, P.: Semantic and related types of priming as a context in word recognition. Rev. Psychol. 17(1), 53–58 (2010)
Google Scholar
Hong-Minh, T., Smith, D.: Word similarity in wordnet. In: Bock, H.G., Kostina, E., Phu, H.X., Rannacher, R. (eds.) Modeling, Simulation and Optimization of Complex Processes, pp. 293–302. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-79409-7_19
Chapter Google Scholar
Hulpus, I., Štajner, S., Stuckenschmidt, H.: A spreading activation framework for tracking conceptual complexity of texts. In: Proceedings of the 57th Conference of the Association for Computational Linguistics, pp. 3878–3887 (2019)
Google Scholar
Ivanov, V., Solnyshkina, M., Solovyev, V.: Efficiency of text readability features in Russian academic texts. Komp’juternaja Lingvistika i Intellektual’nye Tehnologii 17, 277–287 (2018)
Google Scholar
Lagutina, N.S., Lagutina, K.V., Adrianov, A.S., Paramonov, I.V.: Russian language thesauri: automated construction and application for natural language processing tasks. Modelirovanie i Analiz Informatsionnykh Sistem 25(4), 435–458 (2018)
Google Scholar
Laposhina, A.: Relevant features selection for the automatic text complexity measurement for Russian as a foreign language. In: Computational Linguistics and Intellectual Technologies: Papers from the Annual International Conference “Dialogue”, pp. 1–7 (2017)
Google Scholar
Loukachevitch, N., Alekseev, A.: Summarizing news clusters on the basis of thematic chains. In: Ninth International Conference on Language Resources and Evaluation (LREC-2014), pp. 1600–1607 (2014)
Google Scholar
Loukachevitch, N., Dobrov, B.: RuThes linguistic ontology vs. Russian wordnets. In: Proceedings of the Seventh Global Wordnet Conference, pp. 154–162 (2014)
Google Scholar
Loukachevitch, N., Dobrov, B., Chetviorkin, I.: Ruthes-lite, a publicly available version of thesaurus of Russian language ruthes. In: Computational Linguistics and Intellectual Technologies: Papers from the Annual International Conference “Dialogue”, vol. 2014 (2014)
Google Scholar
McNamara, D., Graesser, A., McCarthy, P., Cai, Z.: Automated evaluation of text and discourse with Coh-Metrix. Cambridge University Press, New York (2014)
Google Scholar
McNamara, D.S., Graesser, A., Louwerse, M.M.: Sources of text difficulty: across genres and grades. In: Advances in How We Assess Reading Ability, Measuring Up, pp. 89–116 (2012)
Google Scholar
Meng, L., Huang, R., Gu, J.: A review of semantic similarity measures in wordnet. Int. J. Hybrid Inf. Technol. 6(1), 1–12 (2013)
Google Scholar
Reynolds, R.: Insights from Russian second language readability classification: complexity-dependent training requirements, and feature evaluation of multiple categories. In: Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications, pp. 289–300 (2016)
Google Scholar
Solnyshkina, M., Kiselnikov, A.: Slozhnost’ teksta: etapy izucheniya v otechestvennom prikladnom yazykoznanii [text complexity: study phases in Russian linguistics]. Vestnik Tomskogo gosudarstvennogo universiteta. Filologiya [Tomsk State Univ. J. Philol.] 6, 38 (2015)
Google Scholar
Solovyev, V., Ivanov, V., Solnyshkina, M.: Assessment of reading difficulty levels in Russian academic texts: approaches and metrics. J. Intell. Fuzzy Syst. 34(5), 3049–3058 (2018)
Article Google Scholar
Štajner, S., Hulpus, I.: Automatic assessment of conceptual text complexity using knowledge graphs. In: Proceedings of the 27th International Conference on Computational Linguistics. Association for Computational Linguistics, pp. 318–330 (2018)
Google Scholar
Steyvers, M., Tenenbaum, J.B.: The large-scale structure of semantic networks: statistical analyses and a model for semantic growth. Arxiv preprint cond-mat/0110012 (2001)
Google Scholar
Thornbury, S.: Beyond the sentence: introducing discourse analysis. ELT J. 60(4), 392–394 (2006)
Google Scholar
Tomina, Y.A.: Ob’ektivnaya otsenka yazykovoy trudnosti tekstov (opisanie, povestvovanie, rassuzhdenie, dokazatel’stvo) [an objective assessment of language difficulties of texts (description, narration, reasoning, proof)]. Abstract of Pedagogy Cand. Diss, Moscow (1985)
Google Scholar
Ustalov, D.: Concept discovery from synonymy graphs. Vychislitel’nye tekhnologii [Comput. Technol.] 22, 99–112 (2017)
Google Scholar

Download references

Acknowledgements

The study was supported by the Russian Science Foundation, grant 18-18-00436.

Author information

Authors and Affiliations

Kazan Federal University, 2, Tatarstan Street, Room 467, Kazan, The Republic of Tatarstan, 420021, Russian Federation
Valery Solovyev & Marina Solnyshkina
Innopolis University, st. Universitetskaya, 1, Innopolis, Republic of Tatarstan, 420500, Russian Federation
Vladimir Ivanov

Authors

Valery Solovyev
View author publications
You can also search for this author in PubMed Google Scholar
Vladimir Ivanov
View author publications
You can also search for this author in PubMed Google Scholar
Marina Solnyshkina
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Valery Solovyev .

Editor information

Editors and Affiliations

Facultad de Ingeniería, Universidad Panamericana, Mexico City, Mexico
Lourdes Martínez-Villaseñor
Universidad Autónoma Metropolitana, Mexico City, Mexico
Oscar Herrera-Alcántara
Facultad de Ingeniería, Universidad Panamericana, Mexico City, Mexico
Hiram Ponce
Universidad Autónoma del Estado de Hidalgo, Hidalgo, Mexico
Félix A. Castro-Espinoza

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Solovyev, V., Ivanov, V., Solnyshkina, M. (2020). Thesaurus-Based Methods for Assessment of Text Complexity in Russian. In: Martínez-Villaseñor, L., Herrera-Alcántara, O., Ponce, H., Castro-Espinoza, F.A. (eds) Advances in Computational Intelligence. MICAI 2020. Lecture Notes in Computer Science(), vol 12469. Springer, Cham. https://doi.org/10.1007/978-3-030-60887-3_14

Download citation

DOI: https://doi.org/10.1007/978-3-030-60887-3_14
Published: 07 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60886-6
Online ISBN: 978-3-030-60887-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics