Skip to main content
Log in

Theoretical Foundations, Methods, and Algorithms for Lossless-in-Sense Text Compression

  • SCIENTIFIC SCHOOL OF THE YAROSLAV-THE-WISE NOVGOROD STATE UNIVERSITY, VELIKY NOVGOROD, THE RUSSIAN FEDERATION
  • G.M. Emelyanov’s Scientific School
  • Published:
Pattern Recognition and Image Analysis Aims and scope Submit manuscript

Abstract

The article is devoted to the scientific school developed by the first author in 1995–2012 in Yaroslav-the-Wise Novgorod State University (Veliky Novgorod, Russia). The finite practical goal of the research carried out by the school can be denoted here as the revelation of the most rational variant for sense transfer in a knowledge unit defined by a set of semantically equivalent natural-language phrases. One phrase here corresponds to the simple spread natural-language sentence (according to the “Meaning–Text” theory terminology). Knowledge formed herewith about synonymy and forms of language expression of relationships between concepts of some topical area are in demand in tasks requiring the establishment of full or partial equivalence in the meaning of both complete sentences of natural language and their combinations, and individual fragments of phrases. The results are both theoretical and practical in nature. Offered methods and their software implementations can be used for decision of a wide range of tasks of recognition and analysis of semantics of complex information objects (texts and images at first), and for lossless-in-sense information compression.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

REFERENCES

  1. Antiplagiat. https://www.antiplagiat.ru. Cited July 8, 2022.

  2. G. M. Emelyanov, A. N. Kornyshov, and D. V. Mikhailov, “Conceptual-situational modeling of the process of rephrasing of natural language statements as precedent-based learning,” Nauchn.-Teoreticheskii Zh. Iskusstvennyi intellekt 2, 72–75 (2006).

  3. G. M. Emelyanov, T. V. Krechetova, and E. P. Kurashova, “Semantic analysis in computer-aided systems of speech understanding,” Pattern Recognit. Image Anal. 8, 408–410 (1998).

    Google Scholar 

  4. G. M. Emelyanov, T. V. Krechetova, and E. P. Kurashova, “Tree grammars in the problems of searching for images by their verbal descriptions,” Pattern Recognit. Image Anal. 10, 520–526 (2000).

    Google Scholar 

  5. G. M. Emelyanov and D. V. Mikhailov, “Sense standards, recognition of textual information and its compression based on knowledge of synonymy,” Pattern Recognit. Image Anal. 24, 63–72 (2014). https://doi.org/10.1134/s1054661814010118

    Article  Google Scholar 

  6. G. M. Emel’yanov and D. V. Mikhailov, “Clusterization of semantic meanings in the problem of sense equivalence situation recognition,” Pattern Recognit. Image Anal. 19, 92–102 (2009).

    Article  Google Scholar 

  7. G. M. Emel’yanov, D. V. Mikhailov, and N. A. Stepanova, “Analysis of semantic relations in classification of sense images of statements,” Pattern Recognit. Image Anal. 17, 274–278 (2007). https://doi.org/10.1134/s1054661807020150

    Article  Google Scholar 

  8. G. M. Emelyanov and D. V. Mikhailov, “Sense’s standards and machine understanding of texts in the system for computer-aided testing of knowledge,” Pattern Recognit. Image Anal. 21, 705–719 (2011). https://doi.org/10.1134/s1054661811040067

    Article  Google Scholar 

  9. G. M. Emelyanov, D. V. Mikhailov, and A. P. Kozlov, “Relevance of a set of topical texts to a knowledge unit and the estimation of the closeness of linguistic forms of its expression to a semantic pattern,” Pattern Recognit. Image Anal. 28, 771–782 (2018). https://doi.org/10.1134/s1054661818040090

    Article  Google Scholar 

  10. G. M. Emelyanov, D. V. Mikhailov, and A. P. Kozlov, “The TF-IDF measure and analysis of links between words within N-grams in the formation of knowledge units for open tests,” Pattern Recognit. Image Anal. 27, 825–831 (2017). https://doi.org/10.1134/s1054661817040058

    Article  Google Scholar 

  11. G. M. Emelyanov, D. V. Mikhailov, and N. A. Stepanova, “Semantic relation analysis for classification of the meaning patterns of utterances,” Pattern Recognit. Image Anal. 15, 382–383 (2005).

    Google Scholar 

  12. G. M. Emelyanov, D. V. Mikhailov, and E. I. Zaitseva, “Recognition of superphrase unities in texts while establishing their semantic equivalence,” Pattern Recognit. Image Anal. 13, 447–451 (2003).

    Google Scholar 

  13. G. M. Emelyanov, D. V. Mikhailov, and E. I. Zaitseva, “Synonymic transformations in analysis of semantic pattern equivalence at the superphrase unity level,” Pattern Recognit. Image Anal. 13, 21–23 (2003).

    Google Scholar 

  14. G. M. Emelyanov and E. I. Smirnova, “Algebra of the logical simulation of hypersegment image databases,” Pattern Recognit. Image Anal. 10, 156–163 (2000).

    Google Scholar 

  15. G. M. Emelyanov and E. I. Smirnova, “Logical model of hypertext image database,” Pattern Recognit. Image Anal. 9, 458–491 (1999).

    Google Scholar 

  16. Demo version program system testing knowledge (Visual Prolog 5.2). http://www.machinelearning.ru/wiki/images/5/5b/Open_form_testing.rar. Cited July 11, 2022.

  17. I. A. Mel’chuk, An Attempt at a Theory of “Meaning Text” Linguistic Models: Semantics, Syntax (Shkola Yazyki Russkoi Kul’tury, Moscow, 1999).

    Google Scholar 

  18. D. V. Mikhailov and G. M. Emelyanov, “Information-logical model of system of Δ-grammar,” Izv. S.-Peterb. Gos. Elektrotekh. Univ. LETI, Ser. Inf., Upr. Komp’yuternye Tekhnol. 3, 96–102 (2003).

    Google Scholar 

  19. D. V. Mikhailov, G. M. Emelyanov, and N. A. Stepanova, “Formation and clustering of noun contexts within the framework of splintered values,” Pattern Recognit. Image Anal. 19, 664–672 (2009). https://doi.org/10.1134/s1054661809040154

    Article  Google Scholar 

  20. D. V. Mikhailov and G. M. Emel’yanov, “Semantic clustering and affinity measure of subject-oriented language texts,” Pattern Recognit. Image Anal. 20, 376–385 (2010). https://doi.org/10.1134/s1054661810030144

    Article  Google Scholar 

  21. D. V. Mikhailov and G. M. Emel’yanov, “Semantic standards and knowledge transfer in the problem of knowledge assessment on the basis of open tests,” Pattern Recognit. Image Anal. 25, 223–229 (2015). https://doi.org/10.1134/s1054661815020170

    Article  Google Scholar 

  22. D. V. Mikhaylov and G. M. Emelyanov, “Analysis of the mutual relevance of topical corpus documents in the problem of assessing the proximity of text to the semantic standard,” Pattern Recognit. Image Anal. 31, 588–594 (2021). https://doi.org/10.1134/s1054661821030172

    Article  Google Scholar 

  23. D. V. Mikhaylov and G. M. Emelyanov, “Estimation of the closeness to a semantic pattern of a topical text without construction of periphrases,” Pattern Recognit. Image Anal. 29, 647–653 (2019). https://doi.org/10.1134/s1054661819040114

    Article  Google Scholar 

  24. D. V. Mikhaylov and G. M. Emel’yanov, “Hierarchization of topical texts based on the estimate of proximity to the semantic pattern without paraphrasing,” Pattern Recognit. Image Anal. 30, 440–449 (2020). https://doi.org/10.1134/s1054661820030207

    Article  Google Scholar 

  25. D. V. Mikhaylov and G. M. Emelyanov, Theoretical Foundations of the Synthesis of Open Question-Answering Systems: Semantic Equivalence of Texts and Models of Their Recognition: Monograph (Yaroslav-the-Wise Novgorod State University, Velikii Novgorod, 2010).

    Google Scholar 

  26. D. V. Mikhaylov, A. P. Kozlov, and G. M. Emelyanov, “An approach based on analysis of n-grams on links of words to extract the knowledge and relevant linguistic means on subject-oriented text sets,” Comput. Opt. 41, 461–471 (2017). https://doi.org/10.18287/2412-6179-2017-41-3-461-471

    Article  Google Scholar 

  27. D. V. Mikhaylov, A. P. Kozlov, and G. M. Emelyanov, “An approach based on TF-IDF metrics to extract the knowledge and relevant linguistic means on subject-oriented text sets,” Comput. Opt. 39, 429–438 (2015). https://doi.org/10.18287/0134-2452-2015-39-3-429-438

    Article  Google Scholar 

  28. D. V. Mikhaylov, A. P. Kozlov, and G. M. Emelyanov, “Extraction of knowledge and relevant linguistic means with efficiency estimation for the formation of subject-oriented text sets,” Komp’yuternaya Opt. 40, 572–582 (2016). https://doi.org/10.18287/2412-6179-2016-40-4-572-582

    Article  Google Scholar 

  29. S. G. Sereda, “Methods of the decision rule optimization in the problem of segmentation of the hierarchical textures,” Pattern Recognit. Image Anal. 13, 165–167 (2003).

    Google Scholar 

  30. S. G. Sereda, S. A. Guzeev, and G. M. Emelyanov, “Interactive learning in texture segmentation,” Pattern Recognit. Image Anal. 6, 67–68 (1996).

    Google Scholar 

  31. S. G. Sereda, S. A. Guzeev, and G. M. Emelyanov, “Modeling of hierarchical textures and synthesis of algorithms for their segmentation,” Pattern Recognit. Image Anal. 8, 254–255 (1998).

    Google Scholar 

  32. S. G. Sereda and G. M. Emelyanov, “On constructing the features in the problem of image segmentation,” Pattern Recognit. Image Anal. 13, 168–169 (2003).

    Google Scholar 

  33. S. G. Sereda and G. M. Emelyanov, “Formation of notion system for texture description,” Pattern Recognit. Image Anal. 9, 181–183 (1999).

    Google Scholar 

  34. T. T. Tanimoto, An Elementary Mathematical Theory of Classification and Prediction (Int. Business Machines Corporation, New York, 1958).

  35. Eclipse Foundation. https://www.eclipse.org. Cited July 11, 2022.

  36. I. O. Titov and G. M. Emel’yanov, “ System of the computer vision moving air object,” Komp’yuternaya Opt. 35, 491–495 (2011).

    Google Scholar 

  37. D. A. Tsymbal, G. M. Emelyanov, D. V. Chebotarev, and A. N. Sergeev, “An algorithm of the multichannel texture segmentation (Gabor filters),” Pattern Recognit. Image Anal. 11, 256–257 (2001).

    Google Scholar 

Download references

Funding

The work was carried out with partial support from the Russian Foundation for Basic Research (project no. 19-01-00006-a).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to D. V. Mikhaylov.

Ethics declarations

The authors of this work declare that they have no conflicts of interest.

Additional information

Gennady Martinovich Emelyanov. Born 1943. Graduated from the Ul’yanov (Lenin) Leningrad Institute of Electrical Engineering in 1966. Obtained his Cand. Sci. and his Dr. Sci. degrees in 1971 and 1990, respectively. From 1993 to 2003, a Dean of the Faculty of Mathematics and Computer Science at Yaroslav-the-Wise Novgorod State University. Now he is a Professor of the Department of Information Technologies and Systems at the same university. Scientific interests: construction of problem-oriented computing systems of image processing and analysis. He is the author of 103 publications in the field of pattern recognition and image analysis.

Dmitry Vladimirovich Mikhaylov. Born 1974. Graduated from the Yaroslav-the-Wise Novgorod State University, Novgorod, in 1997. Obtained his Cand. Sci. and his Dr. Sci. degrees in Physics and Mathematics in 2003 and 2013, respectively. From 2000 to 2007 has worked at the Department of Computer Software of Novgorod State University. Now he is a Professor of the Department of Information Technologies and Systems at the same university. Since 2002 has been a member of Russian Association for Pattern Recognition and Image Analysis. Scientific interests: computational linguistics and artificial intelligence. He has authored 48 papers in the scientific area of Pattern Recognition and Image Analysis.

Publisher’s Note.

Pleiades Publishing remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Emelyanov, G.M., Mikhaylov, D.V. Theoretical Foundations, Methods, and Algorithms for Lossless-in-Sense Text Compression. Pattern Recognit. Image Anal. 33, 1657–1663 (2023). https://doi.org/10.1134/S1054661823040144

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S1054661823040144

Keywords:

Navigation