Skip to main content
Log in

A Fully Compressed Algorithm for Computing the Edit Distance of Run-Length Encoded Strings

  • Published:
Algorithmica Aims and scope Submit manuscript

Abstract

A recent trend in stringology explores the possibility of utilizing text compression to speed up similarity computation between strings. In this line of investigation, run-length encoding is one of the earliest studied compression schemes. Despite its simple coding nature, the only positive result before this work is the computation of the in-del distance (dual of longest common subsequence), which requires O(mnlogmn) time, where m and n denote the number of runs of the input strings. The worst-case time complexity of computing the edit distance between two run-length encoded strings still depends on the uncompressed string lengths. In this paper, we break the foundational gap by providing its first “fully compressed” algorithm whose running time depends solely on the compressed string lengths. Specifically, given two strings, compressed into m and n runs, mn, we present an O(mn 2)-time algorithm for computing the edit distance of the strings. Our approach also yields the first fully compressed solution to approximate matching of a pattern of m runs in a text of n runs in O(mn 2) time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Aggarwal, A., Park, J.K.: Notes on searching in multidimensional monotone arrays. FOCS, pp. 497–512 (1988)

  2. Amir, A., Benson, G.: Efficient two-dimensional compressed matching. DCC, pp. 279–288 (1992)

  3. Amir, A., Landau, G.M., Sokol, D.: Inplace run-length 2d compressed search. Theor. Comput. Sci. 290(3), 1361–1383 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  4. Amir, A., Benson, G., Farach, M.: Let sleeping files lie: pattern-matching in Z-compressed files. J. Comput. Syst. Sci. 52(2), 299–307 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  5. Apostolico, A., Landau, G.M., Skiena, S.: Matching for run-length encoded strings. J. Complex. 15(1), 4–16 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  6. Arbell, O., Landau, G.M., Mitchell, J.S.B.: Edit distance of run-length encoded strings. Inf. Process. Lett. 83(6), 307–314 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  7. Bunke, H., Csirik, J.: An improved algorithm for computing the edit distance of run-length coded strings. Inf. Process. Lett. 54(2), 93–96 (1995)

    Article  MATH  Google Scholar 

  8. Chen, K.-Y., Hsu, P.-H., Chao, K.-M.: Hardness of comparing two run-length encoded strings. J. Complex. 26(4), 364–374 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  9. Crochemore, M., Landau, G.M., Ziv-Ukelson, M.: A subquadratic sequence alignment algorithm for unrestricted scoring matrices. SIAM J. Comput. 32(6), 1654–1673 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  10. Gasieniec, L., Rytter, W.: Almost optimal fully LZW-compressed pattern matching. DCC, pp. 316–325 (1999)

  11. Gajewska, H., Tarjan, R.E.: Deques with heap order. Inf. Process. Lett. 22(4), 197–200 (1986)

    Article  Google Scholar 

  12. Hermelin, D., Landau, G.M., Landau, S., Weimann, O.: A unified algorithm for accelerating edit-distance computation via text-compression. STACS, pp. 529–540 (2009)

  13. Hirschberg, D.S.: A linear space algorithm for computing maximal common subsequences. Commun. ACM 18(6), 341–343 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  14. Huang, G.-S., Liu, J.J., Wang, Y.-L.: Sequence alignment algorithms for run-length-encoded strings. COCOON, pp. 319–330 (2008)

  15. Hirao, M., Shinohara, A., Takeda, M., Arikawa, S.: Fully compressed pattern matching algorithm for balanced straight-line programs. SPIRE, pp. 132–138 (2000)

  16. Karpinski, M., Rytter, W., Shinohara, A.: An efficient pattern-matching algorithm for strings with short descriptions. Nord. J. Comput. 4(2), 172–186 (1997)

    MathSciNet  MATH  Google Scholar 

  17. Kim, J.W., Amir, A., Landau, G.M., Park, K.: Similarity between compressed strings. Encyclopedia of Algorithms, pp. 843–845 (2008)

  18. Liu, J.J., Huang, G.-S., Wang, Y.-L., Lee, R.C.-T.: Edit distance for a run-length-encoded string and an uncompressed string. Inf. Process. Lett. 105(1), 12–16 (2007)

    Article  MathSciNet  Google Scholar 

  19. Masek, W.J., Paterson, M.: A faster algorithm computing string edit distances. J. Comput. Syst. Sci. 20(1), 18–31 (1980)

    Article  MathSciNet  MATH  Google Scholar 

  20. Mäkinen, V., Ukkonen, E., Navarro, G.: Approximate matching of run-length compressed strings. Algorithmica 35(4), 347–369 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  21. Mitchell, J.S.B.: A geometric shortest path problem, with application to computing a longest common subsequence in run-length encoded strings. Technical Report, SUNY Stony Brook (1997)

  22. Schmidt, J.P.: All highest scoring paths in weighted grid graphs and their application to finding all approximate repeats in strings. SIAM J. Comput. 27(4), 972–992 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  23. Ukkonen, E.: Finding approximate patterns in strings. J. Algorithms 6(1), 132–137 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  24. Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. J. ACM 21(1), 168–173 (1974)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

Kuan-Yu Chen and Kun-Mao Chao were supported in part by NSC grants 97-2221-E-002-097-MY3 and 98-2221-E-002-081-MY3 from the National Science Council, Taiwan.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kun-Mao Chao.

Additional information

A preliminary version of this work appeared in the 18th Annual European Symposium on Algorithms, United Kingdom, 2010.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, KY., Chao, KM. A Fully Compressed Algorithm for Computing the Edit Distance of Run-Length Encoded Strings. Algorithmica 65, 354–370 (2013). https://doi.org/10.1007/s00453-011-9592-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00453-011-9592-4

Keywords

Navigation