Skip to main content

Indexed Dynamic Programming to Boost Edit Distance and LCSS Computation

  • Conference paper
  • First Online:
Book cover String Processing and Information Retrieval (SPIRE 2018)

Abstract

There are efficient dynamic programming solutions to the computation of the Edit Distance from \(S\in [1..\sigma ]^n\) to \(T\in [1..\sigma ]^m\), for many natural subsets of edit operations, typically in time within O(nm) in the worst-case over strings of respective lengths n and m (which is likely to be optimal), and in time within \(O(n+m)\) in some special cases (e.g., disjoint alphabets). We describe how indexing the strings (in linear time), and using such an index to refine the recurrence formulas underlying the dynamic programs, yield faster algorithms in a variety of models, on a continuum of classes of instances of intermediate difficulty between the worst and the best case, thus refining the analysis beyond the worst case analysis. As a side result, we describe similar properties for the computation of the Longest Common Sub Sequence \(\mathtt {LCSS}(S,T)\) between S and T, since it is a particular case of Edit Distance, and we discuss the application of similar algorithmic and analysis techniques for other dynamic programming solutions. More formally, we propose a parameterized analysis of the computational complexity of the Edit Distance for various sets of operators and of the Longest Common Sub Sequence in function of the area of the dynamic program matrix relevant to the computation.

A longer version is available at the urls https://arxiv.org/abs/1806.04277 (pdf) and https://gitlab.com/FineGrainedAnalysis/EditDistances (pdf and sources).

J. Barbay—Supported by project Fondecyt Regular no. 1170366 from Conicyt.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abu-Khzam, F.N., Fernau, H., Langston, M.A., Lee-Cultura, S., Stege, U.: Charge and reduce: a fixed-parameter algorithm for string-to-string correction. Discret. Optim. (DO) 8(1), 41–49 (2011)

    Article  MathSciNet  Google Scholar 

  2. Backurs, A., Indyk, P.: Edit distance cannot be computed in strongly subquadratic time (unless SETH is false). In: Proceedings of the Annual ACM Symposium on Theory of Computing (STOC) (2015)

    Google Scholar 

  3. Barbay, J., Pérez-Lantero, P.: Adaptive computation of the swap-insert correction distance. In: Proceedings of the Annual Symposium on String Processing and Information Retrieval (SPIRE), pp. 21–32 (2015)

    Chapter  Google Scholar 

  4. Barbay, J., Pérez-Lantero, P.: Adaptive computation of the swap-insert correction distance. ACM Trans. Algorithms (TALG) (2018, to appear). Accepted 25 May 2018

    Google Scholar 

  5. Bentley, J.L., Yao, A.C.C.: An almost optimal algorithm for unbounded searching. Inf. Process. Lett. (IPL) 5(3), 82–87 (1976)

    Article  MathSciNet  Google Scholar 

  6. Bergroth, L., Hakonen, H., Raita, T.: A survey of longest common subsequence algorithms. In: Proceedings of the 11th Symposium on String Processing and Information Retrieval (SPIRE), pp. 39–48 (2000)

    Google Scholar 

  7. Bringmann, K.: Why walking the dog takes time: Fréchet distance has no strongly subquadratic algorithms unless SETH fails. In: Proceedings of the 2014 IEEE 55th Annual Symposium on Foundations of Computer Science, FOCS 2014, pp. 661–670. IEEE Computer Society, Washington, DC (2014)

    Google Scholar 

  8. Eiter, T., Mannila, H.: Computing discrete Fréchet distance. Technical report, Christian Doppler Labor für Expertensyteme, Technische Universität Wien (1994)

    Google Scholar 

  9. Golynski, A., Munro, J.I., Rao, S.S.: Rank/select operations on large alphabets: a tool for text indexing. In: Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithm, SODA 2006, pp. 368–373. Society for Industrial and Applied Mathematics, Philadelphia (2006)

    Google Scholar 

  10. Alt, H., Godau, M.: Computing the Fréchet distance between two polygonal curves. Int. J. Comput. Geom. Appl. (IJCGA) 5(1–2), 75–91 (1995)

    Article  Google Scholar 

  11. Hart, M.: Gutenberg project. https://www.gutenberg.org/. Accessed 27 May 2018

  12. Meister, D.: Using swaps and deletes to make strings match. Theor. Comput. Sci. (TCS) 562, 606–620 (2015)

    Article  MathSciNet  Google Scholar 

  13. Parikh, R.J.: On context-free languages. J. ACM (JACM) 13(4), 570–581 (1966). https://doi.org/10.1145/321356.321364

    Article  MATH  Google Scholar 

  14. Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. J. ACM (JACM) 21(1), 168–173 (1974)

    Article  MathSciNet  Google Scholar 

  15. Wagner, R.A., Lowrance, R.: An extension of the string-to-string correction problem. J. ACM (JACM) 22(2), 177–183 (1975)

    Article  MathSciNet  Google Scholar 

  16. Wagner, R.A.: On the complexity of the extended string-to-string correction problem. In: Proceedings of the Annual ACM Symposium on Theory of Computing, STOC 1975, pp. 218–223. ACM (1975)

    Google Scholar 

  17. Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes: Compressing and Indexing Documents and Images. The Morgan Kaufmann Series in Multimedia Information. Morgan Kaufmann Publishers, San Francisco (1999)

    MATH  Google Scholar 

Download references

Acknowledgement

The author would like to thank Pablo Pérez-Lantero for introducing the problem of computing the Edit Distance between strings; Felipe Lizama for a semester of very interesting discussions about this approach; and an anonymous referee from the journal Transaction on Algorithms for his positive feedback and encouragement.

Funding. Jérémy Barbay is partially funded by the project Fondecyt Regular no. 1170366 from Conicyt.

Data and Material Availability. The source of this article, along with the code and data used for the experiments described within, will be made publicly available upon publication at the url https://gitlab.com/FineGrainedAnalysis/EditDistances.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jérémy Barbay .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Barbay, J., Olivares, A. (2018). Indexed Dynamic Programming to Boost Edit Distance and LCSS Computation. In: Gagie, T., Moffat, A., Navarro, G., Cuadros-Vargas, E. (eds) String Processing and Information Retrieval. SPIRE 2018. Lecture Notes in Computer Science(), vol 11147. Springer, Cham. https://doi.org/10.1007/978-3-030-00479-8_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-00479-8_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-00478-1

  • Online ISBN: 978-3-030-00479-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics