Abstract
In this paper, we propose three algorithms for the problem of string edit distance with duplication and contraction operations, which improve the time complexity of previous algorithms for this problem. These include a faster algorithm for the general case of the problem, and two improvements which apply under certain assumptions on the cost function. The general algorithm is based on fast min-plus multiplication of square matrices, and obtains the running time of \(O\left(\frac{|\Sigma|n^3 \log^3\log n}{\log^2n}\right)\), where n is the length of the input strings and |Σ| is the alphabet size. This algorithm is further accelerated, under some assumption on the cost function, to \(O\left(|\Sigma|\left(n^2+\frac{nn'^2\log^3\log n'}{\log^2 n'}\right)\right)\) time, where n′ is the length of the run-length encoding of the input. Another improvement is based on a new fast matrix-vector min-plus multiplication under a certain discreteness assumption, and yields an \(O\left( |\Sigma| \frac{n^3}{\log^2n}\right)\) time algorithm. Furthermore, this algorithm is online, in the sense that one of the strings may be given letter by letter. As part of this algorithm we present the currently fastest online algorithm for weighted CFG parsing for discrete weighted grammars. This result is useful on its own.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Abouelhoda, M.I., Giegerich, R., Behzadi, B., Steyaert, J.-M.: Alignment of minisatellite maps based on run-length encoding scheme. J. of Bioinformatics and Computational Biology 7(2), 287–308 (2009)
Akutsu, T.: Approximation and exact algorithms for RNA secondary structure prediction and recognition of stochastic context-free languages. J. of Combinatorial Optimization 3(2), 321–336 (1999)
Arlazarov, V.L., Dinic, E.A., Kronod, M.A., Faradzev, I.A.: On economical construction of the transitive closure of an oriented graph. Soviet Math. Dokl. 11, 1209–1210 (1970)
Behzadi, B., Steyaert, J.M.: The minisatellite transformation problem revisited: A run length encoded approach. Algorithms in Bioinformatics, 290–301 (2004)
Behzadi, B., Steyaert, J.M.: An improved algorithm for generalized comparison of minisatellites. J. of Discrete Algorithms 3(2-4), 375–389 (2005)
Benedí, J.M., Sánchez, J.A.: Fast stochastic context-free parsing: A stochastic version of the Valiant algorithm. Pattern Recognition and Image Analysis, 80–88 (2007)
Bérard, S., Nicolas, F., Buard, J., Gascuel, O., Rivals, E.: A fast and specific alignment method for minisatellite maps. Evolutionary bioinformatics online 2, 303 (2006)
Bérard, S., Rivals, E.: Comparison of minisatellites. J. of Computational biology 10(3-4), 357–372 (2003)
Chan, T.M.: More algorithms for all-pairs shortest paths in weighted graphs. In: Proc. 39th ACM Symposium on Theory of Computing (STOC), pp. 590–598 (2007)
Chappelier, J.C., Rajman, M.: A generalized CYK algorithm for parsing stochastic CFG. In: Tabulation en analyse syntaxique et déduction. Journées, pp. 133–137 (1998)
Jobling, M.A., Heyer, E., Dieltjes, P., de Knijff, P.: Y-chromosome-specific microsatellite mutation rates re-examined using a minisatellite, MSY1. Human Molecular Genetics 8(11), 2117–2120 (1999)
Kasami, T.: An efficient recognition and syntax-analysis algorithm for context-free languages. Defense Technical Information Center (1965)
Valiant, L.G.: General context-free recognition in less than cubic time. J. of Computer and System Sciences 10(2), 308–314 (1975)
Williams, R.: Matrix-vector multiplication in sub-quadratic time(some preprocessing required). In: Proc. 18th ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 995–1001 (2007)
Zakov, S., Tsur, D., Ziv-Ukelson, M.: Reducing the worst case running times of a family of RNA and CFG problems, using Valiant’s approach. In: Algorithms in Bioinformatics, pp. 65–77 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pinhas, T., Tsur, D., Zakov, S., Ziv-Ukelson, M. (2011). Edit Distance with Duplications and Contractions Revisited. In: Giancarlo, R., Manzini, G. (eds) Combinatorial Pattern Matching. CPM 2011. Lecture Notes in Computer Science, vol 6661. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21458-5_37
Download citation
DOI: https://doi.org/10.1007/978-3-642-21458-5_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21457-8
Online ISBN: 978-3-642-21458-5
eBook Packages: Computer ScienceComputer Science (R0)