AnO(ND) difference algorithm and its variations
- Eugene W. Myers
- … show all 1 hide
Rent the article at a discountRent now
* Final gross prices may vary according to local VAT.Get Access
The problems of finding a longest common subsequence of two sequencesA andB and a shortest edit script for transformingA intoB have long been known to be dual problems. In this paper, they are shown to be equivalent to finding a shortest/longest path in an edit graph. Using this perspective, a simpleO(ND) time and space algorithm is developed whereN is the sum of the lengths ofA andB andD is the size of the minimum edit script forA andB. The algorithm performs well when differences are small (sequences are similar) and is consequently fast in typical applications. The algorithm is shown to haveO(N+D 2) expected-time performance under a basic stochastic model. A refinement of the algorithm requires onlyO(N) space, and the use of suffix trees leads to anO(N logN+D 2) time variation.
- A. V. Aho, D. S. Hirschberg, and J. D. Ullman. Bounds on the complexity of the longest common subsequence problem.J. ACM,23, 1 (1976), 1–12. CrossRef
- A. V. Aho, J. E. Hopcroft, and J. D. Ullman.Data Structures and Algorithms. Addison-Wesley: Reading, MA, 1983, pp. 203–208.
- E. W. Dijkstra. A note on two problems in connexion with graphs.Numer. Math. 1 (1959), 269–271. CrossRef
- J. Gosling. A redisplay algorithm.Proceedings ACM SIGPLAN/SIGOA Symposium on Text Manipulation, 1981, pp.
- P. A. V. Hall and G. R. Dowling. Approximate string matching.Comput. Surv. 12, 4 (1980), 381–402. CrossRef
- D. Harel and R. E. Tarjan. Fast algorithms for finding nearest common ancestors.SIAM J. Comput.,13, 2 (1984), 338–355. CrossRef
- D. S. Hirschberg. A linear space algorithm for computing maximal common subsequences.Commun. ACM,18, 6 (1975), 341–343. CrossRef
- D. S. Hirschberg. Algorithms for the longest common subsequence problem.J. ACM 24, 4 (1977), 664–675. CrossRef
- D. S. Hirschberg. An information-theoretic lower bound for the longest common subsequence problem.Inform. Process. Lett. 7, 1 (1978), 40–41. CrossRef
- J. W. Hunt and M. D. McIlroy. An algorithm for differential file comparison. Computing Science Technical Report 41, Bell Laboratories (1975).
- J. W. Hunt and T. G. Szymanski. A fast algorithm for computing longest common subsequences.Commun. ACM,20, 5 (1977), 350–353. CrossRef
- D. E. Knuth.The Art of Computer Programming, Vol. 3: Sorting and Searching. Addison-Wesley: Reading, MA, 1983, pp. 490–493.
- W. J. Masek and M. S. Paterson. A faster algorithm for computing string edit distances.J. Comput. System Sci. 20, 1 (1980) 18–31. CrossRef
- E. M. McCreight. A space-economical suffix tree construction algorithm.J. ACM 23, 2 (1976), 262–272. CrossRef
- W. Miller and E. W. Myers. A file comparison program.Software—Practice and Experience,15, 11 (1985), 1025–1040. CrossRef
- N. Nakatsu, Y. Kambayashi and S. Yajima. A longest common subsequence algorithm suitable for similar text strings.Acta Inform.,18 (1982), 171–179. CrossRef
- M. J. Rochkind. The source code control system.IEEE Trans. Software Engrg.,1, 4 (1975), 364–370.
- D. Sankoff and J. B. Kruskal.Time Warps, String Edits and Macromolecules: The Theory and Practice of Sequence Comparison. Addison-Wesley: Reading, MA, 1983.
- W. Tichy. The string-to-string correction problem with block moves.ACM Trans. Comput. Systems,2, (1984), 309–321. CrossRef
- R. A. Wagner and M. J. Fischer. The string-to-string correction problem.J. ACM 21, 1 (1974), 168–173. CrossRef
- AnO(ND) difference algorithm and its variations
Volume 1, Issue 1-4 , pp 251-266
- Cover Date
- Print ISSN
- Online ISSN
- Additional Links
- Longest common subsequence
- Shortest edit script
- Edit graph
- File comparison
- Industry Sectors
- Eugene W. Myers (1)
- Author Affiliations
- 1. Department of Computer Science, University of Arizona, 85721, Tucson, AZ, USA