Skip to main content

Edit Distance with Duplications and Contractions Revisited

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6661))

Abstract

In this paper, we propose three algorithms for the problem of string edit distance with duplication and contraction operations, which improve the time complexity of previous algorithms for this problem. These include a faster algorithm for the general case of the problem, and two improvements which apply under certain assumptions on the cost function. The general algorithm is based on fast min-plus multiplication of square matrices, and obtains the running time of \(O\left(\frac{|\Sigma|n^3 \log^3\log n}{\log^2n}\right)\), where n is the length of the input strings and |Σ| is the alphabet size. This algorithm is further accelerated, under some assumption on the cost function, to \(O\left(|\Sigma|\left(n^2+\frac{nn'^2\log^3\log n'}{\log^2 n'}\right)\right)\) time, where n′ is the length of the run-length encoding of the input. Another improvement is based on a new fast matrix-vector min-plus multiplication under a certain discreteness assumption, and yields an \(O\left( |\Sigma| \frac{n^3}{\log^2n}\right)\) time algorithm. Furthermore, this algorithm is online, in the sense that one of the strings may be given letter by letter. As part of this algorithm we present the currently fastest online algorithm for weighted CFG parsing for discrete weighted grammars. This result is useful on its own.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abouelhoda, M.I., Giegerich, R., Behzadi, B., Steyaert, J.-M.: Alignment of minisatellite maps based on run-length encoding scheme. J. of Bioinformatics and Computational Biology 7(2), 287–308 (2009)

    Article  Google Scholar 

  2. Akutsu, T.: Approximation and exact algorithms for RNA secondary structure prediction and recognition of stochastic context-free languages. J. of Combinatorial Optimization 3(2), 321–336 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  3. Arlazarov, V.L., Dinic, E.A., Kronod, M.A., Faradzev, I.A.: On economical construction of the transitive closure of an oriented graph. Soviet Math. Dokl. 11, 1209–1210 (1970)

    MATH  Google Scholar 

  4. Behzadi, B., Steyaert, J.M.: The minisatellite transformation problem revisited: A run length encoded approach. Algorithms in Bioinformatics, 290–301 (2004)

    Google Scholar 

  5. Behzadi, B., Steyaert, J.M.: An improved algorithm for generalized comparison of minisatellites. J. of Discrete Algorithms 3(2-4), 375–389 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  6. Benedí, J.M., Sánchez, J.A.: Fast stochastic context-free parsing: A stochastic version of the Valiant algorithm. Pattern Recognition and Image Analysis, 80–88 (2007)

    Google Scholar 

  7. Bérard, S., Nicolas, F., Buard, J., Gascuel, O., Rivals, E.: A fast and specific alignment method for minisatellite maps. Evolutionary bioinformatics online 2, 303 (2006)

    Google Scholar 

  8. Bérard, S., Rivals, E.: Comparison of minisatellites. J. of Computational biology 10(3-4), 357–372 (2003)

    Article  Google Scholar 

  9. Chan, T.M.: More algorithms for all-pairs shortest paths in weighted graphs. In: Proc. 39th ACM Symposium on Theory of Computing (STOC), pp. 590–598 (2007)

    Google Scholar 

  10. Chappelier, J.C., Rajman, M.: A generalized CYK algorithm for parsing stochastic CFG. In: Tabulation en analyse syntaxique et déduction. Journées, pp. 133–137 (1998)

    Google Scholar 

  11. Jobling, M.A., Heyer, E., Dieltjes, P., de Knijff, P.: Y-chromosome-specific microsatellite mutation rates re-examined using a minisatellite, MSY1. Human Molecular Genetics 8(11), 2117–2120 (1999)

    Article  Google Scholar 

  12. Kasami, T.: An efficient recognition and syntax-analysis algorithm for context-free languages. Defense Technical Information Center (1965)

    Google Scholar 

  13. Valiant, L.G.: General context-free recognition in less than cubic time. J. of Computer and System Sciences 10(2), 308–314 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  14. Williams, R.: Matrix-vector multiplication in sub-quadratic time(some preprocessing required). In: Proc. 18th ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 995–1001 (2007)

    Google Scholar 

  15. Zakov, S., Tsur, D., Ziv-Ukelson, M.: Reducing the worst case running times of a family of RNA and CFG problems, using Valiant’s approach. In: Algorithms in Bioinformatics, pp. 65–77 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Pinhas, T., Tsur, D., Zakov, S., Ziv-Ukelson, M. (2011). Edit Distance with Duplications and Contractions Revisited. In: Giancarlo, R., Manzini, G. (eds) Combinatorial Pattern Matching. CPM 2011. Lecture Notes in Computer Science, vol 6661. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21458-5_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-21458-5_37

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-21457-8

  • Online ISBN: 978-3-642-21458-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics