Abstract
There are efficient dynamic programming solutions to the computation of the Edit Distance from \(S\in [1..\sigma ]^n\) to \(T\in [1..\sigma ]^m\), for many natural subsets of edit operations, typically in time within O(nm) in the worst-case over strings of respective lengths n and m (which is likely to be optimal), and in time within \(O(n+m)\) in some special cases (e.g., disjoint alphabets). We describe how indexing the strings (in linear time), and using such an index to refine the recurrence formulas underlying the dynamic programs, yield faster algorithms in a variety of models, on a continuum of classes of instances of intermediate difficulty between the worst and the best case, thus refining the analysis beyond the worst case analysis. As a side result, we describe similar properties for the computation of the Longest Common Sub Sequence \(\mathtt {LCSS}(S,T)\) between S and T, since it is a particular case of Edit Distance, and we discuss the application of similar algorithmic and analysis techniques for other dynamic programming solutions. More formally, we propose a parameterized analysis of the computational complexity of the Edit Distance for various sets of operators and of the Longest Common Sub Sequence in function of the area of the dynamic program matrix relevant to the computation.
A longer version is available at the urls https://arxiv.org/abs/1806.04277 (pdf) and https://gitlab.com/FineGrainedAnalysis/EditDistances (pdf and sources).
J. Barbay—Supported by project Fondecyt Regular no. 1170366 from Conicyt.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abu-Khzam, F.N., Fernau, H., Langston, M.A., Lee-Cultura, S., Stege, U.: Charge and reduce: a fixed-parameter algorithm for string-to-string correction. Discret. Optim. (DO) 8(1), 41–49 (2011)
Backurs, A., Indyk, P.: Edit distance cannot be computed in strongly subquadratic time (unless SETH is false). In: Proceedings of the Annual ACM Symposium on Theory of Computing (STOC) (2015)
Barbay, J., Pérez-Lantero, P.: Adaptive computation of the swap-insert correction distance. In: Proceedings of the Annual Symposium on String Processing and Information Retrieval (SPIRE), pp. 21–32 (2015)
Barbay, J., Pérez-Lantero, P.: Adaptive computation of the swap-insert correction distance. ACM Trans. Algorithms (TALG) (2018, to appear). Accepted 25 May 2018
Bentley, J.L., Yao, A.C.C.: An almost optimal algorithm for unbounded searching. Inf. Process. Lett. (IPL) 5(3), 82–87 (1976)
Bergroth, L., Hakonen, H., Raita, T.: A survey of longest common subsequence algorithms. In: Proceedings of the 11th Symposium on String Processing and Information Retrieval (SPIRE), pp. 39–48 (2000)
Bringmann, K.: Why walking the dog takes time: Fréchet distance has no strongly subquadratic algorithms unless SETH fails. In: Proceedings of the 2014 IEEE 55th Annual Symposium on Foundations of Computer Science, FOCS 2014, pp. 661–670. IEEE Computer Society, Washington, DC (2014)
Eiter, T., Mannila, H.: Computing discrete Fréchet distance. Technical report, Christian Doppler Labor für Expertensyteme, Technische Universität Wien (1994)
Golynski, A., Munro, J.I., Rao, S.S.: Rank/select operations on large alphabets: a tool for text indexing. In: Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithm, SODA 2006, pp. 368–373. Society for Industrial and Applied Mathematics, Philadelphia (2006)
Alt, H., Godau, M.: Computing the Fréchet distance between two polygonal curves. Int. J. Comput. Geom. Appl. (IJCGA) 5(1–2), 75–91 (1995)
Hart, M.: Gutenberg project. https://www.gutenberg.org/. Accessed 27 May 2018
Meister, D.: Using swaps and deletes to make strings match. Theor. Comput. Sci. (TCS) 562, 606–620 (2015)
Parikh, R.J.: On context-free languages. J. ACM (JACM) 13(4), 570–581 (1966). https://doi.org/10.1145/321356.321364
Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. J. ACM (JACM) 21(1), 168–173 (1974)
Wagner, R.A., Lowrance, R.: An extension of the string-to-string correction problem. J. ACM (JACM) 22(2), 177–183 (1975)
Wagner, R.A.: On the complexity of the extended string-to-string correction problem. In: Proceedings of the Annual ACM Symposium on Theory of Computing, STOC 1975, pp. 218–223. ACM (1975)
Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes: Compressing and Indexing Documents and Images. The Morgan Kaufmann Series in Multimedia Information. Morgan Kaufmann Publishers, San Francisco (1999)
Acknowledgement
The author would like to thank Pablo Pérez-Lantero for introducing the problem of computing the Edit Distance between strings; Felipe Lizama for a semester of very interesting discussions about this approach; and an anonymous referee from the journal Transaction on Algorithms for his positive feedback and encouragement.
Funding. Jérémy Barbay is partially funded by the project Fondecyt Regular no. 1170366 from Conicyt.
Data and Material Availability. The source of this article, along with the code and data used for the experiments described within, will be made publicly available upon publication at the url https://gitlab.com/FineGrainedAnalysis/EditDistances.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Barbay, J., Olivares, A. (2018). Indexed Dynamic Programming to Boost Edit Distance and LCSS Computation. In: Gagie, T., Moffat, A., Navarro, G., Cuadros-Vargas, E. (eds) String Processing and Information Retrieval. SPIRE 2018. Lecture Notes in Computer Science(), vol 11147. Springer, Cham. https://doi.org/10.1007/978-3-030-00479-8_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-00479-8_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00478-1
Online ISBN: 978-3-030-00479-8
eBook Packages: Computer ScienceComputer Science (R0)