Intellectualization of Knowledge Acquisition of Academic Texts as an Answer to Challenges of Modern Information Society

  • Aleksandra Vatian
  • Sergey Dudorov
  • Natalia Dobrenko
  • Andrey Mairovich
  • Mikhail Osipov
  • Artem LobantsevEmail author
  • Anatoly Shalyto
  • Natalia Gusarova
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 947)


Extracting knowledge from an increasing information flow is one of the main challenges of modern information society. The paper considers the possibilities and means for intellectualization of this process concerning such an important information source as the academic texts. In this case the user is faced with the task of finding fragments relevant to the subject of interest, within the vast textual documents often written in a foreign language. We experimentally investigated the comparative effectiveness of TS algorithms for extended coherent academic texts. The procedure of instrumental effectiveness evaluation was substantiated. The influence of the most significant characteristics of the text, including original language, structural organization (levels of heading), subjects of research (technique, information technologies and medicine) was considered. We have shown that for the intellectualization of knowledge acquisition from academic texts it is necessary to present to the reader the results of the TS fulfilled by different algorithms, in a complex. A system of complex visualization of TS results is proposed, and an appropriate software solution is developed. The visualization system for extended coherent texts explicitly demonstrates the semantic structure of the text, which allows the user to detect and analyze not the whole text, but only fragments corresponding to his current information needs and thus getting a complete idea of the subject of interest.


Topic segmentation Knowledge acquisition Text structure Information retrieval 



This work was financially supported by the Government of Russian Federation, Grant 08-08.


  1. 1.
    LNCS homepage. Accessed 21 Nov 2016
  2. 2.
    Atkins, S., Clear, J., Ostler, N.: Corpus design criteria. Literary Linguist. Comput. 7(1), 1–16 (1992)CrossRefGoogle Scholar
  3. 3.
    Avdeeva, N., Artemova, G., Boyarsky, K., Gusarova, N., Dobrenko, N., Kanevsky, E.: Subtopic segmentation of scientific texts: parameter optimisation. In: Klinov, P., Mouromtsev, D. (eds.) KESW 2015. CCIS, vol. 518, pp. 3–15. Springer, Cham (2015). Scholar
  4. 4.
    Aysina, R.: Survey of visualization tools for topic models of text corpora. Mach. Learn. Data Anal. 1(11), 1584–1618 (2015)Google Scholar
  5. 5.
    Biber, D.: Representativeness in corpus design. Literary Linguist. Comput. 8(4), 243–257 (1993)CrossRefGoogle Scholar
  6. 6.
    Boyarsky, K., Gusarova, N.F., Avdeeva, N., et al.: Specifics of applying topic segmentation algorithms to scientific texts In: Proceedings of XVII International Conference on DAMDID/RCDL (2015)Google Scholar
  7. 7.
    Burrough-Boenisch, J.: Culture and conventions: writing and reading Dutch scientific English. Netherlands Graduate School of Linguistics (2002)Google Scholar
  8. 8.
    Cardoso, P.C., Taboada, M., Pardo, T.A.: Subtopic annotation in a corpus of news texts: steps towards automatic subtopic segmentation. In: Proceedings of the 9th Brazilian Symposium in Information and Human Language Technology (2013)Google Scholar
  9. 9.
    Choi, F.Y., Wiemer-Hastings, P., Moore, J.: Latent semantic analysis for text segmentation. In: Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing (2001)Google Scholar
  10. 10.
    Halliday, M.A.K., Hasan, R.: Cohesion in English. Routledge, London (2014)CrossRefGoogle Scholar
  11. 11.
    Hearst, M.A.: Multi-paragraph segmentation of expository text. In: Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics, pp. 9–16. Association for Computational Linguistics (1994)Google Scholar
  12. 12.
    Lloret, E.: Topic detection and segmentation in automatic text summarization (2009)Google Scholar
  13. 13.
    Martin, J.H., Jurafsky, D.: Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Pearson/Prentice Hall, Upper Saddle River (2009)Google Scholar
  14. 14.
    Moens, M.F., Angheluta, R., De Busser, R., Jeuniaux, P.: Summarizing texts at various levels of detail. In: Coupling Approaches, Coupling Media and Coupling Languages for Information Retrieval, pp. 597–609. Le centre de hautes etudes internationales d’informatique documentaire (2004)Google Scholar
  15. 15.
    Myers, G.: Lexical cohesion and specialized knowledge in science and popular science texts. Discourse Processes 14(1), 1–26 (1991)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Pak, I., Teh, P.L.: Text segmentation techniques: a critical review. In: Zelinka, I., Vasant, P., Duy, V.H., Dao, T.T. (eds.) Innovative Computing, Optimization and Its Applications. SCI, vol. 741, pp. 167–181. Springer, Cham (2018). Scholar
  17. 17.
    Randaccio, M.: Language change in scientific discourse. JCOM 3(2), 1–15 (2004)Google Scholar
  18. 18.
    Riedl, M., Biemann, C.: Text segmentation with topic models. J. Lang. Technol. Comput. Linguist. 27(1), 47–69 (2012)Google Scholar
  19. 19.
    Ries, K.: Segmenting Conversations by Topic, Initiative, and Style. In: Coden, Anni R., Brown, Eric W., Srinivasan, S. (eds.) IRTSA 2001. LNCS, vol. 2273, pp. 51–66. Springer, Heidelberg (2002). Scholar
  20. 20.
    Song, F., Darling, W.M., Duric, A., Kroon, F.W.: An iterative approach to text segmentation. In: Clough, P., et al. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 629–640. Springer, Heidelberg (2011). Scholar
  21. 21.
    Van Dijk, T.A., Kintsch, W.: Strategies of discourse comprehension. Academic Press, New York (1983)Google Scholar
  22. 22.
    Vorontsov, K., Potapenko, A.: Additive regularization of topic models. Mach. Learn. 101(1–3), 303–323 (2015)MathSciNetCrossRefGoogle Scholar
  23. 23.
    Yaari, Y.: Segmentation of expository texts by hierarchical agglomerative clustering. arXiv preprint cmp-lg/9709015 (1997)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.ITMO UniversitySaint PetersburgRussia

Personalised recommendations