Modern Approaches to the Language Data Analysis. Using Language Analysis Methods for Management and Planning Tasks

  • Andrei N. VinogradovEmail author
  • Natalia Vlasova
  • Evgeny P. Kurshev
  • Alexey Podobryaev
Conference paper
Part of the Lecture Notes in Networks and Systems book series (LNNS, volume 95)


The article discusses promising directions for the use of modern automatic methods for analyzing natural language data for solving a wide range of practical problems. The technology of creating electronic corpora (collections) of texts is considered as a tool for the transition from model linguistics to tagged data linguistics. The principles of the creation of marked corpora of texts, the possibilities and limitations of their use are considered. Creation of a marked corpus of texts in which language data that is downloaded from the Internet is processed sequentially before issuing the results to users is described. The conveyor consists of the following steps: uploading data from the Internet; definition of the language in which the text is written; unloading metadata; splitting the texts into paragraphs and sentences; deduplication; tokenization; automatic language markup; uploading cleared and marked data to the network. The prospects for the development of language data analysis systems are presented. Requirements for the creation of corpora for solving problems of public administration and strategic planning are developed. Properties that should have such bodies are considered. Those include: corpus format, corpus volume, the degree of the linguistic analysis depth, corpus-manager structure. A description of the marked corpora of texts developed at the Artificial Intelligence Research Center (AIReC) of Ailamazyan Program Systems Institute of the Russian Academy of Sciences, with a reference to the tasks of extracting information about persons, events and situations from the texts of news reports is presented. A retrospective review of the development of systems for automatic processing of natural language texts in the areas of machine translation and human-machine interaction is given.


Artificial intelligence Machine translation Natural language processing Strategic management Digital economy 



The publication was prepared with the support of the state program AAAA-A19-119020690042-2 «Research and development of data mining methods».


  1. 1.
    Isaksson, A.J., Harjunkoski, I., Sand, G.: The impact of digitalization on the future of control and operations. Comput. Chem. Eng. 114, 122–129 (2018). Scholar
  2. 2.
    Comparin, L.: Quality in machine translation and human post-editing: error annotation and specifications. Diss. (2017)Google Scholar
  3. 3.
    Belonogov, G.G.: Systems of phraseological machine translation of polythematic texts from Russian into English and from English into Russian (RETRANS and ERTRANS Systems). Int. Forum Inf. Documentation 20(2), 29–35 (1995)Google Scholar
  4. 4.
    Sowah, E.: Natural language processing in cooperative query answering databases (NLPICQA) (2018)Google Scholar
  5. 5.
    Schneider, D., Zampieri, M., van Genabith, J.: Translation memories and the translator: a report on a user survey. Babel. 64(5–6), 734–762 (2018)CrossRefGoogle Scholar
  6. 6.
    Johnson, M., Schuster, M., Le, Q.V., Krikun, M., Yonghui, W., Chen, Z., Thorat, N., Viegas, F., Wattenberg, M., Corrado, G., Hughes, M., Dean, J.: Google’s multilingual neural machine translation system: enabling zero-shot translation. Trans. Assoc. Comput. Linguist. 5, 339–351 (2017)CrossRefGoogle Scholar
  7. 7.
    Macketanz, V., Avramidis, E., Burchardt, A., Helc, J., Srivastava, A.: Machine translation: phrase-based, rule-based and neural approaches with linguistic evaluation. Cybern. Inf. Technol. 17(2), 28–43 (2017)Google Scholar
  8. 8.
    Costa-jussa, M.R., Fonollosa, J.A.R.: Latest trends in hybrid machine translation and its applications. Comput. Speech Lang. 32(1), 3–10 (2015)CrossRefGoogle Scholar
  9. 9.
    Ferrara, E., Varol, O., Davis, C., Menczer, F., Flammini, A.: The rise of social bots. Commun. ACM 59(7), 96–104 (2016)CrossRefGoogle Scholar
  10. 10.
    Oberer, B., Erkollar, A., Stein, A.: Social bots – act like a human. In: Think Like a Bot. Stumpf, M. (eds.) Digitalisierung und Kommunikation. Europäische Kulturen in der Wirtschaftskommunikation, vol. 31, pp. 311–327. Springer VS, Wiesbaden (2019)Google Scholar
  11. 11.
    Shi, P., Zhang, Z., Choo, Raymond, K.K.: Detecting malicious social bots based on clickstream sequences. IEEE Access. 1, 1 (2019)Google Scholar
  12. 12.
    Davis, C., Varol, O., Ferrara, E., Flammini, A., Menczer, F.: BotOrNot: a system to evaluate social bots. arXiv preprint:1602.00975 (2016)Google Scholar
  13. 13.
    Antropova, V.V.: Speech aggression in the texts of social networks: the communicative aspect. Vestnik VGU. Serija: Filologija. Zhurnalistika. [VSU Herald. Series: Philology. Journalism]. (3), 123–127 (2015). (in Russian)Google Scholar
  14. 14.
  15. 15.
    Lyashevskaya, O.N., Toldova, S.A.: Modern problems and trends in computational linguistics. Voprosy jazykoznanija [Questions of linguistics] 1, 120–145 (2014). (in Russian)Google Scholar
  16. 16.
    Kozlova, N.V.: Linguistic corpora. Definition of basic concepts and typology. Vestnik NGU Serija: Lingvistika i mezhkul’turnaja kommunikacija [NSU Herald, Series: Linguistics and Intercultural Communication], 11 (1), 79–88 (2013). (in Russian)Google Scholar
  17. 17.
    Granovsky, D.V., Bocharov, V.V., Bichineva, S.V.: Open corpus: principles of work and prospects. Kompjuternaja lingvistika i razvitie semanticheskogo poiska v Internete: Trudy nauchnogo seminara XIII Vserossijskoj ob’edinennoj konferencii ”Internet i sovremennoe obshhestvo” [Computational linguistics and the development of semantic search on the Internet: Proceedings of the scientific seminar of the XIII All-Russian Joint Conference “The Internet and Modern Society”]. St. Petersburg, October 19–22, 2010, Ed. by V.Sh. Rubashkin, 94 p. (2010). (in Russian)Google Scholar
  18. 18.
    Belikov, V., Kopylov, N., Piperski, A., Selegey, V., Sharoff, S.: Big and diverse is beautiful: a large corpus of Russian to study linguistic variation. In: Proceedings of the 8th Web as Corpus Workshop (WAC-8) @Corpus Linguistics 2013, 24–29 (2013).
  19. 19.
    Benko, V., Zakharov, V.P.: Very large Russian corpora: new opportunities and new challenges. DIALOG-2016 (2016).
  20. 20.
    Benko, V.: Yet another family of (comparable) Web corpora. In: Conference Text, Speech and Dialogue. 17th International Conference, at Brno, Czech Republic (2014).
  21. 21.
    Jakubíček, M., Kilgarriff, A., Kovář, V., Rychlý, P., Suchomel, V.: The TenTen corpus family. In: Proceedings of the 7th International Corpus Linguistics Conference. Lancaster, pp. 125–127 (2013)Google Scholar
  22. 22.
    Shavrina, T.O., Shapovalova, O.A.: To the methodology of corpus construction for machine learning: TAIGA syntax tree corpus and parser. Trudy mezhdunarodnoj konferencii “Korpusnaja lingvistika-2017”. In: Proceedings of the International Conference “Corpus Linguistics-2017”, Saint-Petersburg, Ch.13, pp. 78–84 (2017)Google Scholar
  23. 23.
    Lyashevskaya, O., Droganova, K., Zeman, D., Alexeeva, M., Gavrilova, T., Mustafina, N., Shakurova, E.: Universal Dependencies for Russian: a New Syntactic Dependencies Tagset. Basic reaearch program. Working papers (2016).
  24. 24.
    Osipova, E.S., Tarnaeva, L.P.: Using of corpus linguistic resources in the preparation of translators in the field of professional communication. Filologicheskie nauki. Voprosy teorii i praktiki [Philology. Theory and practice], 63(9), 205–209 (2015). (in Russian)Google Scholar
  25. 25.
    Matveychuk, S.P.: Prospects for the use of text (linguistic) corpora in hunting research. Gumanitarnye aspekty ohoty i ohotnich’ego hozjajstva, trudy konferencii [Humanitarian aspects of hunting and hunting, conference proceedings], pp. 29–35 (2015). (in Russian)Google Scholar
  26. 26.
    Kovalchuk, A.N.: The relevance of the creation of specialized linguistic corpora for solving practical problems of legal linguistics. Intellektual’nyj potencial XXI veka. Stupeni poznanija [XXI century Intellectual potential. Steps of knowledge] 21, 142–146 (2014). (in Russian)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Andrei N. Vinogradov
    • 1
    Email author
  • Natalia Vlasova
    • 2
  • Evgeny P. Kurshev
    • 2
  • Alexey Podobryaev
    • 2
  1. 1.Peoples’ Friendship University of Russia (RUDN University)MoscowRussia
  2. 2.Ailamazyan Program Systems Institute of RAS (PSI RAS)Pereslavl-Zalessky, Yaroslavl RegionRussia

Personalised recommendations