Skip to main content

Experimental Investigation of Significant Keywords Search in Ukrainian Content

  • Conference paper
  • First Online:
Advances in Intelligent Systems and Computing V (CSIT 2020)

Abstract

The article deals with a comparative experimental study of methods of searching for significant keywords of Ukrainian-language content. The approach to the automatic definition of keywords is based on Porter’s stemming of words of the Ukrainian language for the Levenshtein distance, taking into account the possibility of using a thematic dictionary and the removal of blocked words. Experimental based on 100 scientific publications of technical direction compared to the author’s variants obtained numerous statistical characteristics of the accuracy of search results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Khomytska, I., Teslyuk, V.: Authorship and style attribution by statistical methods of style differentiation on the phonological level. In: Advances in Intelligent Systems and Computing III. AISC 871, pp. 105–118. Springer (2019)

    Google Scholar 

  2. Khomytska, I., Teslyuk, V., Holovatyy, A., Morushko, O.: Development of methods, models, and means for the author attribution of a text. Eastern-Eur. J. Enterpr. Technol. 3(2–93), 41–46 (2018)

    Article  Google Scholar 

  3. Cherednichenko, O., Babkova, N., Kanishcheva, O.: Complex term identification for ukrainian medical texts. In: CEUR Workshop Proceedings, pp. 146–154 (2018)

    Google Scholar 

  4. Sharonova, N., Doroshenko, A., Cherednichenko, O.: Issues of fact-based information analysis. In: CEUR Workshop Proceedings, vol. 2136, pp. 11–19 (2018)

    Google Scholar 

  5. Bobicev, V., Kanishcheva, O., Cherednichenko, O.: Sentiment analysis in the Ukrainian and Russian news. In: First Ukraine Conference on Electrical and Computer Engineering, pp. 1050–1055 (2017)

    Google Scholar 

  6. Vysotska, V., Burov, Y., Lytvyn, V., Demchuk, A.: Defining author’s style for plagiarism detection in academic environment. In: Proceedings of the 2018 IEEE 2nd International Conference on Data Stream Mining and Processing, DSMP, pp. 128–133 (2018)

    Google Scholar 

  7. Lytvyn, V., Vysotska, V., Burov, Y., Bobyk, I., Ohirko, O.: The linguometric approach for co-authoring author’s style definition. In: Intelligent Data Acquisition and Advanced Computing Systems, IDAACS-SWS, pp. 29–34 (2018)

    Google Scholar 

  8. Lytvyn, V., Vysotska,V., Peleshchak, I., Basyuk, T., Kovalchuk, V., Kubinska, S., Chyrun, L., Rusyn, B., Pohreliuk, L., Salo, T.: Identifying textual content based on thematic analysis of similar texts in big data. In: International Scientific and Technical Conference on Computer Science and Information Nechnologies (CSIT), pp. 84–91 (2019)

    Google Scholar 

  9. Babichev, S.: An evaluation of the information technology of gene expression profiles processing stability for different levels of noise components. Data, 3(4), 48 (2018)

    Google Scholar 

  10. Babichev, S., Durnyak, B., Pikh, I., Senkivskyy, V.: An evaluation of the objective clustering inductive technology effectiveness implemented using density-based and agglomerative hierarchical clustering algorithms. In: Advances in Intelligent Systems and Computing, vol. 1020, pp. 532–553 (2020)

    Google Scholar 

  11. Senyk, M.: The Porter Stemming Algorithm for Ukrainian, http://www.senyk.poltava.ua, last accessed 2020/03/21

  12. Vysotska, V., Lytvyn, V., Kovalchuk, V., Kubinska, S., Dilai, M., Rusyn, B., Pohreliuk, L., Chyrun, L., Chyrun, S., Brodyak, O.: Method of similar textual content selection based on thematic information retrieval. In: International Scientific and Technical Conference on Computer Science and Information Nechnologies (CSIT), pp. 1–6 (2019)

    Google Scholar 

  13. Vysotska, V., Fernandes, V.B., Lytvyn, V., Emmerich, M., Hrendus, M.: Method for determining linguometric coefficient dynamics of Ukrainian text content authorship. In: Advances in Intelligent Systems and Computing, vol. 871, pp. 132–151 (2019)

    Google Scholar 

  14. Lytvyn, V., Vysotska, V., Pukach, P., Nytrebych, Z., Demkiv, I., Senyk, A., Malanchuk, O., Sachenko, S., Kovalchuk, R., Huzyk, N.: Analysis of the developed quantitative method for automatic attribution of scientific and technical text content written in Ukrainian. Eastern-Eur. J. Enterp. Technol. 6(2–96), 19–31 (2018)

    Article  Google Scholar 

  15. Vysotska, V., Lytvyn, V., Hrendus, M., Kubinska, S., Brodyak, O.: Method of textual information authorship analysis based on stylometry. In: 13th International Scientific and Technical Conference on Computer Sciences and Information Technologies, pp. 9–16 (2018)

    Google Scholar 

  16. Vysotska, V., Kanishcheva, O., Hlavcheva, Y.: Authorship identification of the scientific text in Ukrainian with using the lingvometry methods. In: Computer Sciences and Information Technologies, CSIT, pp. 34–38 (2018)

    Google Scholar 

  17. Kulchytskyi, I.: Statistical analysis of the short stories by roman ivanychuk. In: CEUR Workshop Proceedings, vol. 2362, pp. 312–321 (2019)

    Google Scholar 

  18. Shandruk, U.: Quantitative characteristics of key words in texts of scientific genre (on the Material of the Ukrainian scientific journal). In: CEUR Workshop Proceedings, vol. 2362, pp. 163–172 (2019)

    Google Scholar 

  19. Hardcoded stemmer for Ukrainian. https://github.com/vgrichina/ukrainian-stemmer. Accessed 21 Mar 2020

  20. Lovins, J.B.: Development of a stemming algorithm. Mech. Transl. Comput. Linguist. 11, 22–31 (1968)

    Google Scholar 

  21. Jongejan, B., Dalianis, H.: Automatic training of lemmatization rules that handle morphological changes in pre-, in- and suffixes alike. http://www.aclweb.org/anthology/P/P09/P09-1017.pdf. Accessed 21 Mar 2020

  22. Moseichuk, V.: Porter stemming algorithm for Ukrainian languages. http://www.marazm.org.ua/document/stemer_ua/. Accessed 21 Mar 2020

  23. Perestoronin, P.: The Porter Stemming Algorithm for Russian. http://blog.eigene.in/post/49598738049/snowball. Accessed 21 Mar 2020

  24. Porter stemmer. https://github.com/allaud/porter-stemmer. Accessed 21 Mar 2020

  25. Porter, M.F.: An algorithm for suffix stripping. http://telemat.det.unifi.it/book/2001/wchange/download/stem_porter.html. Accessed 21 Mar 2020

  26. Russian stemming algorithm. http://snowball.tartarus.org. Accessed 21 Mar 2020

  27. The Porter Stemming Algorithm. http://tartarus.org/~martin/PorterStemmer/. Accessed 21 Mar 2020

  28. Porter Stemming Algorithm. http://snowball.tartarus.org/algorithms/porter/stemmer.html. Accessed 21 Mar 2020

  29. English stemming algorithm. http://snowball.tartarus.org/algorithms/english/stemmer.html. Accessed 21 Mar 2020

  30. Willett, P.: The Porter stemming algorithm: then and now. http://eprints.whiterose.ac.uk/1434/. Accessed 21 Mar 2020

  31. Khribi, M.K., Jemni, M., Nasraoui, O.: Automatic recommendations for e-learning personalization based on web usage mining techniques and information retrieval. In: International Conference on Advanced Learning Technologies, pp. 241–245 (2008)

    Google Scholar 

  32. Mobasher, B.: Data mining for web personalization. In: The Adaptive Web, pp. 90–135. Springer (2007)

    Google Scholar 

  33. Ferretti, S., Mirri, S., Prandi, C., Salomoni, P.: Automatic web content personalization through reinforcement learning. J. Syst. Softw. 121, 157–169 (2016)

    Article  Google Scholar 

  34. Lavie, T., Sela, M., Oppenheim, I., Inbar, O., Meyer, J.: User attitudes towards news content personalization. Int. J. Hum.-Comput. Stud. 68(8), 483–495 (2010)

    Article  Google Scholar 

  35. Fredrikson, M., Livshits, B. Repriv: Re-imagining content personalization and in-browser privacy. In: Symposium on Security and Privacy, pp. 131–146 (2011)

    Google Scholar 

  36. Chang, C.C., Chen, P.L., Chiu, F.R., Chen, Y.K.: Application of neural networks and Kano’s method to content recommendation in web personalization. Expert Syst. Appl. 36(3), 5310–5316 (2009)

    Article  Google Scholar 

  37. Oliinyk, V.-A., Vysotska, V., Burov, Y., Mykich, K., Basto-Fernandes, V.: Propaganda detection in text data based on NLP and machine learning. In: CEUR Workshop Proceedings, vol. 2631, pp. 132–144 (2020)

    Google Scholar 

  38. Lynnyk, R., Vysotska,. V., Matseliukh, Y., Burov, Y., Demkiv, L., Zaverbnyj, A., Sachenko, A., Shylinska, I., Yevseyeva, I., Bihun, O.: DDOS attacks analysis based on machine learning in challenges of global changes. In: CEUR Workshop Proceedings, vol. 2631, pp. 159–171 (2020)

    Google Scholar 

  39. Anisimova, O., Vasylenko, V., Fedushko, S.: Social networks as a tool for a higher education institution image creation. In: CEUR Workshop Proceedings, vol. 2392, pp. 54–65 (2019)

    Google Scholar 

  40. Antonyuk, N., Medykovskyy, M., Chyrun, L., Dverii, M., Oborska, O., Krylyshyn, M., Vysotsky, A., Tsiura, N., Naum, O.: Online tourism system development for searching and planning trips with user’s requirements. In: Advances in Intelligent Systems and Computing IV, Springer Nature Switzerland AG 2020, vol. 1080, pp. 831–863 (2020)

    Google Scholar 

  41. Rzheuskyi, A., Kutyuk, O., Voloshyn, O., Kowalska-Styczen, A., Voloshyn, V., Chyrun, L., Chyrun, S., Peleshko, D., Rak, T.: The intellectual system development of distant competencies analyzing for IT recruitment. In: Advances in Intelligent Systems and Computing IV, vol. 1080, pp. 696–720. Springer, Cham (2020)

    Google Scholar 

  42. Antonyuk, N., Chyrun, L., Andrunyk, V., Vasevych, A., Chyrun, S., Gozhyj, A., Kalinina, I., Borzov, Y.: Medical news aggregation and ranking of taking into account the user needs. In: CEUR Workshop Proceedings, vol. 2362, pp. 369–382 (2019)

    Google Scholar 

  43. Chyrun, L., Chyrun, L., Kis, Y., Rybak, L.: Automated information system for connection to the access point with encryption WPA2 enterprise. In: Lecture Notes in Computational Intelligence and Decision Making, vol. 1020, pp. 389–404 (2020)

    Google Scholar 

  44. Kis, Y., Chyrun, L., Tsymbaliak, T., Chyrun, L.: Development of system for managers relationship management with customers. In: Lecture Notes in Computational Intelligence and Decision Making, vol. 1020, pp. 405–421 (2020)

    Google Scholar 

  45. Chyrun, L., Kowalska-Styczen, A., Burov, Y., Berko, A., Vasevych, A., Pelekh, I., Ryshkovets, Y.: Heterogeneous data with agreed content aggregation system development. In: CEUR Workshop Proceedings, vol. 2386, pp. 35–54 (2019)

    Google Scholar 

  46. Chyrun, L., Burov, Y., Rusyn, B., Pohreliuk, L., Oleshek, O., Gozhyj, A., Bobyk, I.: Web resource changes monitoring system development. In: CEUR Workshop Proceedings, vol. 2386, pp. 255–273 (2019)

    Google Scholar 

  47. Gozhyj, A., Chyrun, L., Kowalska-Styczen, A., Lozynska, O.: Uniform method of operative content management in web systems. In: CEUR Workshop Proceedings, vol. 2136, pp. 62–77 (2018)

    Google Scholar 

  48. Chyrun, L., Gozhyj, A., Yevseyeva, I., Dosyn, D., Tyhonov, V., Zakharchuk, M.: Web content monitoring system development. In: CEUR Workshop Proceedings, vol. 2362, pp. 126–142 (2019)

    Google Scholar 

  49. Bisikalo, O., Kontsevoi, A.: System for definition of indicator characteristics of social networks participants profiles. In: CEUR Workshop Proceedings, vol. 2604, pp. 77–88 (2020)

    Google Scholar 

  50. Kulchytskyy, I.: Quantitative parameters of some novellas by roman ivanychuk. In: CEUR Workshop Proceedings, vol. 2604, pp. 89–105 (2020)

    Google Scholar 

  51. Levchenko, O., Tyshchenko, O., Dilai, M.: Associative verbal network of the conceptual domain БIДA (MISERY) in Ukrainian. In: CEUR Workshop Proceedings, vol. 2604, pp. 106–120. (2020)

    Google Scholar 

  52. Vasyliuk, V., Shyika, Y., Shestakevych, T.: Information system of psycholinguistic text analysis. In: CEUR Workshop Proceedings, vol. 2604, pp. 178–188 (2020)

    Google Scholar 

  53. Khomytska, I., Teslyuk, V.: The multifactor method applied for authorship attribution on the phonological level. In: CEUR Workshop Proceedings, vol. 2604, pp. 189–198 (2020)

    Google Scholar 

  54. Albota, S.: Resolving conflict situations in reddit community driven discussion platform. In: CEUR Workshop Proceedings, vol. 2604, pp. 215–226 (2020)

    Google Scholar 

  55. Stasiuk, L.: Computer sampling and quantitative analysis in exploring secondary functions of questions in speech genres of intimate communication. In: CEUR Workshop Proceedings, vol. 2604, pp. 227–238 (2020)

    Google Scholar 

  56. Artemenko, O., Pasichnyk, V., Kunanets, N., Shunevych, K.: Using sentiment text analysis of user reviews in social media for e-tourism mobile recommender systems. In: CEUR Workshop Proceedings, vol. 2604, 259–271 (2020)

    Google Scholar 

  57. Bekesh, R., Chyrun, L., Kravets, P., Demchuk, A., Matseliukh, Y., Batiuk, T., Peleshchak, I., Bigun, R., Maiba, I.: Structural modeling of technical text analysis and synthesis processes. In: CEUR Workshop Proceedings, vol. 2604, pp. 562–589 (2020)

    Google Scholar 

  58. Chyrun, L.: Model of adaptive language synthesis based on cosine conversion furies with the use of continuous fractions. In: CEUR Workshop Proceedings, vol. 2604, pp. 600–611 (2020)

    Google Scholar 

  59. Husak, V., Lozynska, O., Karpov, I., Peleshchak, I., Chyrun, S., Vysotskyi, A.: Information system for recommendation list formation of clothes style image selection according to user’s needs based on NLP and Chatbots. In: CEUR Workshop Proceedings, vol. 2604, pp. 788–818 (2020)

    Google Scholar 

  60. Makara, S., Chyrun, L., Burov, Y., Rybchak, Z., Peleshchak, I., Peleshchak, R., Holoshchuk, R., Kubinska, S., Dmytriv, A.: An intelligent system for generating end-user symptom recommendations based on machine learning technology. In: CEUR Workshop Proceedings, vol. 2604, pp. 844–883 (2020)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Oleg Bisikalo , Victoria Vysotska or Svitlana Vyshemyrska .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bisikalo, O., Vysotska, V., Lytvyn, V., Brodyak, O., Vyshemyrska, S., Rozov, Y. (2021). Experimental Investigation of Significant Keywords Search in Ukrainian Content. In: Shakhovska, N., Medykovskyy, M.O. (eds) Advances in Intelligent Systems and Computing V. CSIT 2020. Advances in Intelligent Systems and Computing, vol 1293. Springer, Cham. https://doi.org/10.1007/978-3-030-63270-0_1

Download citation

Publish with us

Policies and ethics