Abstract
This paper presents the results of a research study that explores the methods of analysis of unstructured text information. A new method that is applied to determine the similarity of scientific and technological documents based on the thematic significance feature is proposed. The results that were obtained using several modifications of the existing methods are compared experimentally.
Similar content being viewed by others
References
Ageev, M., Kuralenok, I., and Nekrest’yanov, I., Official ROMIP metrics. http://romip.ru/romip2006/ appendix_a_metrics.pdf. Cited January 14, 2013.
Ageev, M.S. and Dobrov, B.V., The method for effective calculation of the matrix of nearest neighbors for fulltext documents, Vestn. S.-Peterb. Univ., Ser. 10, 2011, no. 3, pp. 72–84.
Veize, A.A., About nuclear texts obtained by compression, in Problemy tekstual’noi lingvistiki (Problems of Textual Linguistics), Bukhbinder, V.A., Ed., Kiev, 1983.
Zav’yalova, O.S., Kiselev, A.A., Osipov, G.S., Smirnov, I.V., Tikhomirov, I.A., and Sochenkov, I.V., System for Intelligent Analysis and Information Retrieval Exactus on ROMIP-2010. http://romipru/ romip2010/04_exactuspdf. Cited January 14, 2013.
Lotman, Yu.M., Struktura khudozhestvennogo teksta (Art Text Structure), Moscow: Iskusstvo, 1970.
Mbaikodzhi, E., Dral’, A.A., and Sochenkov, I.V., The method of automatic classification of short text messages, Inform. Tekhnol. Vychisl. Sist., 2012, no. 3, pp. 93–102.
Nekrest’yanov, I., Nekrest’yanova, M., and Nozik, A.K, To a question about the effectiveness of the “common pot” method. http://rcdlru/doc/2005/sek9_1_ paperpdf. Cited January 14, 2013.
Osipov, G., Smirnov, I., Tikhomirov, I., and Zavjalova, O., Application of linguistic knowledge to search precision improvement, Proc. 4th Int. IEEE Conf. on Intelligent Systems, 2008, vol. 2, pp. 172–175.
Osipov, G.S., Smirnov, I.V., and Tikhomirov, I.A., The relational-situational method of search and text analytics and its applications, Iskusstv. Intell. Prinyat. Reshen., 2008, no. 2, pp. 3–10.
Tikhomirov, I.A. and Sochenkov, I.V., Method for dynamic content filtering of network traffic based on the analysis of natural language texts, Vestn. Novosib. Gos. Univ., Ser. Inform. Tekhnol., 2008, vol. 6, no. 2, pp. 94–100.
Tikhomirov, I.A., Exactus Expert: Search and analytical system for support of scientific-technical activities, Trudy XIII Natsional’noi konferentsii po iskusstvennomu intellektu s mezhdunarodnym uchastiem KII-2012 (Proc. 13th Nat. Conf. on Artificial Intelligence with International Participation KII-2012), Belgorod, 2012, vol. 4, pp. 100–108.
Osipov, G., Methods for extracting semantic types of natural language statements from texts, 10th IEEE Int. Symp. on Intelligent Control, Monterey, CA, 1995.
Frakes, W.B. and Baeza-Yates, R., Information Retrieval: Data Structures and Algorithms, Englewood Cliffs, New Jersey: Prentice-Hall, Inc., 1992.
Jarvelin, K. and Kekalainen, J., Cumulated GainBased Evaluation of IR Techniques. http://wwwsis. utafi/infim/julkaisut/fire/KJJK-nDCGpdf. Cited January 23, 2013.
Salton, G., et al., The Smart Retrieval System Experiments in Automatic Document Retrieval, Englewood Cliffs, New Jersey: Prentice Hall, Inc., 1971.
Zobel, J., How reliable are the results of large-scale information retrieval experiment? Proc. SIGIR’98, 1998. pp. 307–314.
Sokirko, A.V., Semantic dictionaries in automatic text processing (based on DIALING materials), Cand. Sci. (Techn.) Dissertation, Moscow, 2001. http://wwwaotru/ docs/sokirko/sokirko-candid-1html, Cited January 23, 2013.
Automatic Text Processing. http://wwwaotru, Cited January 23, 2013.
Zolotova, G.A., Onipenko, N.K., and Sidorova, M.Yu., Kommunikativnaya grammatika russkogo yazyka (Communicative Grammar of the Russian Language), Moscow, 2004.
Author information
Authors and Affiliations
Corresponding author
Additional information
Original Russian Text © R.E. Suvorov, I.V. Sochenkov, 2013, published in Iskusstvennyi Intellekt i Prinyatie Reshenii, 2013, No. 1, pp. 33–40.
About this article
Cite this article
Suvorov, R.E., Sochenkov, I.V. Establishing the similarity of scientific and technical documents based on thematic significance. Sci. Tech.Inf. Proc. 42, 321–327 (2015). https://doi.org/10.3103/S0147688215050081
Published:
Issue Date:
DOI: https://doi.org/10.3103/S0147688215050081