Skip to main content
Log in

Establishing the similarity of scientific and technical documents based on thematic significance

  • Published:
Scientific and Technical Information Processing Aims and scope

Abstract

This paper presents the results of a research study that explores the methods of analysis of unstructured text information. A new method that is applied to determine the similarity of scientific and technological documents based on the thematic significance feature is proposed. The results that were obtained using several modifications of the existing methods are compared experimentally.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Ageev, M., Kuralenok, I., and Nekrest’yanov, I., Official ROMIP metrics. http://romip.ru/romip2006/ appendix_a_metrics.pdf. Cited January 14, 2013.

  2. Ageev, M.S. and Dobrov, B.V., The method for effective calculation of the matrix of nearest neighbors for fulltext documents, Vestn. S.-Peterb. Univ., Ser. 10, 2011, no. 3, pp. 72–84.

    Google Scholar 

  3. Veize, A.A., About nuclear texts obtained by compression, in Problemy tekstual’noi lingvistiki (Problems of Textual Linguistics), Bukhbinder, V.A., Ed., Kiev, 1983.

  4. Zav’yalova, O.S., Kiselev, A.A., Osipov, G.S., Smirnov, I.V., Tikhomirov, I.A., and Sochenkov, I.V., System for Intelligent Analysis and Information Retrieval Exactus on ROMIP-2010. http://romipru/ romip2010/04_exactuspdf. Cited January 14, 2013.

  5. Lotman, Yu.M., Struktura khudozhestvennogo teksta (Art Text Structure), Moscow: Iskusstvo, 1970.

    Google Scholar 

  6. Mbaikodzhi, E., Dral’, A.A., and Sochenkov, I.V., The method of automatic classification of short text messages, Inform. Tekhnol. Vychisl. Sist., 2012, no. 3, pp. 93–102.

    Google Scholar 

  7. Nekrest’yanov, I., Nekrest’yanova, M., and Nozik, A.K, To a question about the effectiveness of the “common pot” method. http://rcdlru/doc/2005/sek9_1_ paperpdf. Cited January 14, 2013.

  8. Osipov, G., Smirnov, I., Tikhomirov, I., and Zavjalova, O., Application of linguistic knowledge to search precision improvement, Proc. 4th Int. IEEE Conf. on Intelligent Systems, 2008, vol. 2, pp. 172–175.

    Google Scholar 

  9. Osipov, G.S., Smirnov, I.V., and Tikhomirov, I.A., The relational-situational method of search and text analytics and its applications, Iskusstv. Intell. Prinyat. Reshen., 2008, no. 2, pp. 3–10.

    Google Scholar 

  10. Tikhomirov, I.A. and Sochenkov, I.V., Method for dynamic content filtering of network traffic based on the analysis of natural language texts, Vestn. Novosib. Gos. Univ., Ser. Inform. Tekhnol., 2008, vol. 6, no. 2, pp. 94–100.

    Google Scholar 

  11. Tikhomirov, I.A., Exactus Expert: Search and analytical system for support of scientific-technical activities, Trudy XIII Natsional’noi konferentsii po iskusstvennomu intellektu s mezhdunarodnym uchastiem KII-2012 (Proc. 13th Nat. Conf. on Artificial Intelligence with International Participation KII-2012), Belgorod, 2012, vol. 4, pp. 100–108.

    Google Scholar 

  12. Osipov, G., Methods for extracting semantic types of natural language statements from texts, 10th IEEE Int. Symp. on Intelligent Control, Monterey, CA, 1995.

    Google Scholar 

  13. Frakes, W.B. and Baeza-Yates, R., Information Retrieval: Data Structures and Algorithms, Englewood Cliffs, New Jersey: Prentice-Hall, Inc., 1992.

    Google Scholar 

  14. Jarvelin, K. and Kekalainen, J., Cumulated GainBased Evaluation of IR Techniques. http://wwwsis. utafi/infim/julkaisut/fire/KJJK-nDCGpdf. Cited January 23, 2013.

    Google Scholar 

  15. Salton, G., et al., The Smart Retrieval System Experiments in Automatic Document Retrieval, Englewood Cliffs, New Jersey: Prentice Hall, Inc., 1971.

    Google Scholar 

  16. Zobel, J., How reliable are the results of large-scale information retrieval experiment? Proc. SIGIR’98, 1998. pp. 307–314.

    Google Scholar 

  17. Sokirko, A.V., Semantic dictionaries in automatic text processing (based on DIALING materials), Cand. Sci. (Techn.) Dissertation, Moscow, 2001. http://wwwaotru/ docs/sokirko/sokirko-candid-1html, Cited January 23, 2013.

    Google Scholar 

  18. Automatic Text Processing. http://wwwaotru, Cited January 23, 2013.

  19. Zolotova, G.A., Onipenko, N.K., and Sidorova, M.Yu., Kommunikativnaya grammatika russkogo yazyka (Communicative Grammar of the Russian Language), Moscow, 2004.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to R. E. Suvorov.

Additional information

Original Russian Text © R.E. Suvorov, I.V. Sochenkov, 2013, published in Iskusstvennyi Intellekt i Prinyatie Reshenii, 2013, No. 1, pp. 33–40.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Suvorov, R.E., Sochenkov, I.V. Establishing the similarity of scientific and technical documents based on thematic significance. Sci. Tech.Inf. Proc. 42, 321–327 (2015). https://doi.org/10.3103/S0147688215050081

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.3103/S0147688215050081

Keywords

Navigation