Skip to main content
Log in

Twenty Similarity Functions for Two Finite Sequences

Programming and Computer Software Aims and scope Submit manuscript

Abstract

This paper considers various numerical functions that determine the degree of similarity between two finite sequences. These similarity measures are based on the concept of embedding for sequences, which we define here. A special case of this embedding is a subsequence. Other cases additionally require equal distances between adjacent symbols of a subsequence in both sequences. This is a generalization of the concept of the substring with unit distances. Moreover, equality of distances from the beginning of the sequences to the first embedded symbol or from the last embedded symbol to the end of the sequences may be required. In addition to the last two cases, an embedding can occur in the sequence more than once. In the literature, functions such as the number of common embeddings or the number of pairs of occurrences of embeddings in a sequence are used. We introduce three additional functions: the sum of lengths of common embeddings, the sum of the minimum numbers of occurrences of a common embedding in both sequences, and the similarity function based on the longest common embedding. In total, we consider 20 numerical functions; for 17 of these functions, algorithms (including new ones) of polynomial complexity are proposed; for two functions, algorithms of exponential complexity with a reduced exponent are proposed. In Conclusions, we briefly compare these embeddings and functions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

REFERENCES

  1. Wagner, R. and Fischer, M., The string-to-string correction problem, J. ACM, 1974, vol. 21, no. 1, pp. 168–173. https://doi.org/10.1145/321796.321811

    Article  MathSciNet  MATH  Google Scholar 

  2. Wang, H., All common subsequences, Proc. 20th Int. Joint Conf. Artificial Intelligence (IJCAI), Hyderabad, India, 2007. https://www.aaai.org/Papers/IJCAI/2007/IJCAI07-101.pdf.

  3. Elzinga, C., Rahmann, S., and Wang, H., Algorithms for subsequence combinatorics, Theor. Comput. Sci., 2008, vol. 409, no. 3, pp. 394–404. https://doi.org/10.1016/j.tcs.2008.08.035

    Article  MathSciNet  MATH  Google Scholar 

  4. Proc. Int. Conf. Sequence Analysis and Related Methods (LaCOSAII), Lausanne, Switzerland, 2016. https://www.academia.edu/83294569/Proceedings_of_the_International_Conference_on_Sequence_Analysis_and_Related_Methods_LaCOSA_II_Lausanne_Switzerland_June_8_10_2016.

  5. Znamenskii, S.V., Model and axioms of similarity metrics, Program. Sist.: Teor. Prilozh., 2017, vol. 8, no. 4, pp. 347–357. https://doi.org/10.25209/2079-3316-2017-8-4-347-357

    Article  Google Scholar 

  6. Conte, A., Grossi, R., Punzi, G., et al., Enumeration of maximal common subsequences between two strings, Algorithmica, 2022, vol. 84, pp. 757–783. https://doi.org/10.1007/s00453-021-00898-5

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to I. Burdonov or A. Maksimov.

Ethics declarations

The authors declare that they have no conflicts of interest.

Additional information

Translated by Yu. Kornienko

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Burdonov, I., Maksimov, A. Twenty Similarity Functions for Two Finite Sequences. Program Comput Soft 49, 373–387 (2023). https://doi.org/10.1134/S0361768823050031

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S0361768823050031

Keywords:

Navigation