Skip to main content

Development of Text Data Processing Pipeline for Scientific Systems

  • Conference paper
  • First Online:
Biologically Inspired Cognitive Architectures 2019 (BICA 2019)

Abstract

The aim of this work was to develop pipeline processing of scientific texts, including articles and abstracts, for their further categorization, identify patterns and build recommendations to users of scientific systems. The authors proposed a number of methods of pre-processing of texts, the method of cluster and classification analysis of texts, developed a software system of recommendations to users of scientific publications. To solve the problem of data preprocessing it is proposed to use parametrical approach to retrieve new – semantic – feature from textual publications – the type of scientific result. Scientific result type extraction is built just based on user’s need for content having specific property. To solve the problem of users’ profile clustering it is proposed to use ensemble method with distance metric change. For classification, ensemble method based on entropy is used. Evaluation of proposed methods and algorithms employment efficiency was carried out as applied to operation of search module of “Technologies in Education” International Congress of Conferences information system. Author acknowledges support from the MEPhI Academic Excellence Project (Contract No. 02.a03.21.0005).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Guseva AI, Kireev VS, Bochkarev PV, Kuznetsov IA, Philippov SA (2017) Scientific and educational recommender systems. In: AIP conference proceedings of information technologies in education of the XXI century (ITE-XXI), vol 1797, pp 020002-1–020002-11

    Google Scholar 

  2. de Gemmis M, Lops P, Musto C, Narducci F, Semeraro G (2015) Semantics-aware content-based recommender systems. In: Ricci F, Rokach L, Shapira B (eds) Recommender systems handbook. Springer, Boston, pp 119–159

    Chapter  Google Scholar 

  3. Landia N, Anand SS (2009) Personalised tag recommendation. In: Proceedings of the 2009 ACM conference on recommender systems, pp 83–36

    Google Scholar 

  4. Samsonovich AV, Kuznetsova K (2018) Semantic-map-based analysis of insight problem solving. Biologically Inspired Cogn Architectures 25:37–42

    Article  Google Scholar 

  5. Amatriain X, Pujol JM (2015) Data Mining Methods for Recommender Systems. In: Ricci F, Rokach L, Shapira B (eds) Recommender systems handbook, pp 227–262

    Chapter  Google Scholar 

  6. Berikov VB (2013) Collective of algorithms with weights for clustering heterogeneous data. Vestn Tom Gos Univ 2(23):22–31

    Google Scholar 

  7. Onan A, Bulut H, Korukoglu S (2016) An improved ant algorithm with LDA-based representation for text document clustering. J Inf Sci 43(2):275–292

    Article  Google Scholar 

  8. Mathuna KT, Shanthi IE, Nandhini K (2015) Applying clustering techniques for efficient text mining in twitter data. Int J Data Min Tech Appl 4(2):25–28

    Google Scholar 

  9. Hady A, Farouk M (2011) Semi-supervised learning with committees: exploiting unlabeled data using ensemble learning algorithms. In: Open access Repositorium der Universität Ulm. Dissertation

    Google Scholar 

  10. Rokach L (2009) Ensemble-based classifiers. Springer Science + Business Media B.V

    Google Scholar 

  11. Faraway J (2016) Does data splitting improve prediction? Stat Comput 26(1):49–60

    Article  MathSciNet  Google Scholar 

  12. Guseva AI, Kireev VS, Bochkarev PV, Smirnov DS, Filippov SA (2016) The formation of user model in scientific recommender systems. Int Rev Manag Mark 6(6):214–220

    Google Scholar 

  13. Kuznetsov IA, Guseva AI (2019) A method for obtaining a type of scientific result from the text of an article abstract to improve the quality of recommender systems. In: Proceedings of the 2019 IEEE conference of russian young researchers in electrical and electronic engineering, ElConRus, vol 8656806, pp 1888–1891

    Google Scholar 

  14. Bochkaryov PV, Guseva AI: The use of clustering algorithms ensemble with variable distance metrics in solving problems of web mining. In: Proceedings - 2017 5th international conference on future internet of things and cloud workshops, W-FiCloud 2017, pp 41–46 (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anna I. Guseva .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Guseva, A.I., Kuznetsov, I.A., Bochkaryov, P.V., Filippov, S.A., Kireev, V.S. (2020). Development of Text Data Processing Pipeline for Scientific Systems. In: Samsonovich, A. (eds) Biologically Inspired Cognitive Architectures 2019. BICA 2019. Advances in Intelligent Systems and Computing, vol 948. Springer, Cham. https://doi.org/10.1007/978-3-030-25719-4_17

Download citation

Publish with us

Policies and ethics