CPM 2002: Combinatorial Pattern Matching pp 85-98

# Edit Distance with Move Operations

• Dana Shapira
• James A. Storer
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2373)

## Abstract

The traditional edit-distance problem is to find the minimum number of insert-character and delete-character (and sometimes change character) operations required to transform one string into another. Here we consider the more general problem of strings being represented by a singly linked list (one character per node) and being able to apply these operations to the pointer associated with a vertex as well as the character associated with the vertex. That is, in O(1) time, not only can characters be inserted or deleted, but also substrings can be moved or deleted. We limit our attention to the ability to move substrings and leave substring deletions for future research. Note that O(1) time substring move operations imply O(1) substring exchange operations as well, a form of transformation that has been of interest in molecular biology. We show that this problem is NP-complete, show that a “recursive” sequence of moves can be simulated with at most a constant factor increase by a non-recursive sequence, and present a polynomial time greedy algorithm for non-recursive moves with a worst-case log factor approximation to optimal. The development of this greedy algorithm shows how to reduce moves of substrings to moves of characters, and how to convert moves with characters to only insert and deletes of characters.

## Keywords

Greedy Algorithm Edit Distance Optimal Block Move Operation Primary Block
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

## References

1. 1.
Bafna V. AND Pevzner P.A., Genome rearrangements and sorting by reversals, 34th IEEE Symposium on Foundations of Computer Science, (1993) 148–157Google Scholar
2. 2.
Bafna V. AND Pevzner P.A., Sorting by transpositions, 34th SIAM J. Discrete Math., 11(2), (1998) 124–240
3. 3.
Garey M.R. AND Johnson D.S., Computers and Intractability, A guide to the Theory of NP-Completeness, Bell Laboratories Murry Hill, NJ, (1979)
4. 4.
Hamming R.W., Coding and information Theory, Englewood Cliffs, NJ, Prentice Hall, (1980)
5. 5.
Hannenhalli S., Polynomial-time Algorithm for Computing Translocation Distance between Genomes CPM, (1996) 162–176Google Scholar
6. 6.
Kececioglu J. AND Sankoff D., Exact and approximation algorithms for the inversion distance between two permutations. Pro. of 4th Ann. Symp. on Combinatorial Pattern Matching, Lecture Notes in Computer Science 684, (1993) 87–105
7. 7.
Liben-Nowell D., On the Structure of Syntenic Distance, CPM, (1999) 50–65Google Scholar
8. 8.
Lopresti D. AND Tomkins A., Block Edit Models for Approximate String Matching, Theoretical Computer Science, 181, (1997) 159–179
9. 9.
Muthukrishnan S. AND Sahinalp S.C., Approximate nearest neighbors and sequence comparison with block operations, STOC’00, ACM Symposium on Theory of Computing, (2000) 416–424Google Scholar
10. 10.
Smith T.F. AND Waterman M.S., Identification of common molecular sequences, Journal of Molecular Biology, 147, (1981) 195–197
11. 11.
Storer J. A., An Introduction to Data Structures and Algorithms, Birkhauser-Springer, (2001)Google Scholar
12. 12.
Tichy W.F., The string to string correction problem with block moves, ACM Transactions on Computer Systems, 2(4), (1984) 309–321