Abstract
The aim of this work was to develop pipeline processing of scientific texts, including articles and abstracts, for their further categorization, identify patterns and build recommendations to users of scientific systems. The authors proposed a number of methods of pre-processing of texts, the method of cluster and classification analysis of texts, developed a software system of recommendations to users of scientific publications. To solve the problem of data preprocessing it is proposed to use parametrical approach to retrieve new – semantic – feature from textual publications – the type of scientific result. Scientific result type extraction is built just based on user’s need for content having specific property. To solve the problem of users’ profile clustering it is proposed to use ensemble method with distance metric change. For classification, ensemble method based on entropy is used. Evaluation of proposed methods and algorithms employment efficiency was carried out as applied to operation of search module of “Technologies in Education” International Congress of Conferences information system. Author acknowledges support from the MEPhI Academic Excellence Project (Contract No. 02.a03.21.0005).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Guseva AI, Kireev VS, Bochkarev PV, Kuznetsov IA, Philippov SA (2017) Scientific and educational recommender systems. In: AIP conference proceedings of information technologies in education of the XXI century (ITE-XXI), vol 1797, pp 020002-1–020002-11
de Gemmis M, Lops P, Musto C, Narducci F, Semeraro G (2015) Semantics-aware content-based recommender systems. In: Ricci F, Rokach L, Shapira B (eds) Recommender systems handbook. Springer, Boston, pp 119–159
Landia N, Anand SS (2009) Personalised tag recommendation. In: Proceedings of the 2009 ACM conference on recommender systems, pp 83–36
Samsonovich AV, Kuznetsova K (2018) Semantic-map-based analysis of insight problem solving. Biologically Inspired Cogn Architectures 25:37–42
Amatriain X, Pujol JM (2015) Data Mining Methods for Recommender Systems. In: Ricci F, Rokach L, Shapira B (eds) Recommender systems handbook, pp 227–262
Berikov VB (2013) Collective of algorithms with weights for clustering heterogeneous data. Vestn Tom Gos Univ 2(23):22–31
Onan A, Bulut H, Korukoglu S (2016) An improved ant algorithm with LDA-based representation for text document clustering. J Inf Sci 43(2):275–292
Mathuna KT, Shanthi IE, Nandhini K (2015) Applying clustering techniques for efficient text mining in twitter data. Int J Data Min Tech Appl 4(2):25–28
Hady A, Farouk M (2011) Semi-supervised learning with committees: exploiting unlabeled data using ensemble learning algorithms. In: Open access Repositorium der Universität Ulm. Dissertation
Rokach L (2009) Ensemble-based classifiers. Springer Science + Business Media B.V
Faraway J (2016) Does data splitting improve prediction? Stat Comput 26(1):49–60
Guseva AI, Kireev VS, Bochkarev PV, Smirnov DS, Filippov SA (2016) The formation of user model in scientific recommender systems. Int Rev Manag Mark 6(6):214–220
Kuznetsov IA, Guseva AI (2019) A method for obtaining a type of scientific result from the text of an article abstract to improve the quality of recommender systems. In: Proceedings of the 2019 IEEE conference of russian young researchers in electrical and electronic engineering, ElConRus, vol 8656806, pp 1888–1891
Bochkaryov PV, Guseva AI: The use of clustering algorithms ensemble with variable distance metrics in solving problems of web mining. In: Proceedings - 2017 5th international conference on future internet of things and cloud workshops, W-FiCloud 2017, pp 41–46 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Guseva, A.I., Kuznetsov, I.A., Bochkaryov, P.V., Filippov, S.A., Kireev, V.S. (2020). Development of Text Data Processing Pipeline for Scientific Systems. In: Samsonovich, A. (eds) Biologically Inspired Cognitive Architectures 2019. BICA 2019. Advances in Intelligent Systems and Computing, vol 948. Springer, Cham. https://doi.org/10.1007/978-3-030-25719-4_17
Download citation
DOI: https://doi.org/10.1007/978-3-030-25719-4_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-25718-7
Online ISBN: 978-3-030-25719-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)