Algorithmica

, Volume 1, Issue 1–4, pp 251–266 | Cite as

AnO(ND) difference algorithm and its variations

  • Eugene W. Myers
Article

Abstract

The problems of finding a longest common subsequence of two sequencesA andB and a shortest edit script for transformingA intoB have long been known to be dual problems. In this paper, they are shown to be equivalent to finding a shortest/longest path in an edit graph. Using this perspective, a simpleO(ND) time and space algorithm is developed whereN is the sum of the lengths ofA andB andD is the size of the minimum edit script forA andB. The algorithm performs well when differences are small (sequences are similar) and is consequently fast in typical applications. The algorithm is shown to haveO(N+D2) expected-time performance under a basic stochastic model. A refinement of the algorithm requires onlyO(N) space, and the use of suffix trees leads to anO(N logN+D2) time variation.

Key words

Longest common subsequence Shortest edit script Edit graph File comparison 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    A. V. Aho, D. S. Hirschberg, and J. D. Ullman. Bounds on the complexity of the longest common subsequence problem.J. ACM,23, 1 (1976), 1–12.MATHCrossRefMathSciNetGoogle Scholar
  2. [2]
    A. V. Aho, J. E. Hopcroft, and J. D. Ullman.Data Structures and Algorithms. Addison-Wesley: Reading, MA, 1983, pp. 203–208.MATHGoogle Scholar
  3. [3]
    E. W. Dijkstra. A note on two problems in connexion with graphs.Numer. Math. 1 (1959), 269–271.MATHCrossRefMathSciNetGoogle Scholar
  4. [4]
    J. Gosling. A redisplay algorithm.Proceedings ACM SIGPLAN/SIGOA Symposium on Text Manipulation, 1981, pp.Google Scholar
  5. [5]
    P. A. V. Hall and G. R. Dowling. Approximate string matching.Comput. Surv. 12, 4 (1980), 381–402.CrossRefMathSciNetGoogle Scholar
  6. [6]
    D. Harel and R. E. Tarjan. Fast algorithms for finding nearest common ancestors.SIAM J. Comput.,13, 2 (1984), 338–355.MATHCrossRefMathSciNetGoogle Scholar
  7. [7]
    D. S. Hirschberg. A linear space algorithm for computing maximal common subsequences.Commun. ACM,18, 6 (1975), 341–343.MATHCrossRefMathSciNetGoogle Scholar
  8. [8]
    D. S. Hirschberg. Algorithms for the longest common subsequence problem.J. ACM 24, 4 (1977), 664–675.MATHCrossRefMathSciNetGoogle Scholar
  9. [9]
    D. S. Hirschberg. An information-theoretic lower bound for the longest common subsequence problem.Inform. Process. Lett. 7, 1 (1978), 40–41.MATHCrossRefMathSciNetGoogle Scholar
  10. [10]
    J. W. Hunt and M. D. McIlroy. An algorithm for differential file comparison. Computing Science Technical Report 41, Bell Laboratories (1975).Google Scholar
  11. [11]
    J. W. Hunt and T. G. Szymanski. A fast algorithm for computing longest common subsequences.Commun. ACM,20, 5 (1977), 350–353.MATHCrossRefMathSciNetGoogle Scholar
  12. [12]
    D. E. Knuth.The Art of Computer Programming, Vol. 3: Sorting and Searching. Addison-Wesley: Reading, MA, 1983, pp. 490–493.Google Scholar
  13. [13]
    W. J. Masek and M. S. Paterson. A faster algorithm for computing string edit distances.J. Comput. System Sci. 20, 1 (1980) 18–31.MATHCrossRefMathSciNetGoogle Scholar
  14. [14]
    E. M. McCreight. A space-economical suffix tree construction algorithm.J. ACM 23, 2 (1976), 262–272.MATHCrossRefMathSciNetGoogle Scholar
  15. [15]
    W. Miller and E. W. Myers. A file comparison program.Software—Practice and Experience,15, 11 (1985), 1025–1040.CrossRefGoogle Scholar
  16. [16]
    N. Nakatsu, Y. Kambayashi and S. Yajima. A longest common subsequence algorithm suitable for similar text strings.Acta Inform.,18 (1982), 171–179.MATHCrossRefMathSciNetGoogle Scholar
  17. [17]
    M. J. Rochkind. The source code control system.IEEE Trans. Software Engrg.,1, 4 (1975), 364–370.Google Scholar
  18. [18]
    D. Sankoff and J. B. Kruskal.Time Warps, String Edits and Macromolecules: The Theory and Practice of Sequence Comparison. Addison-Wesley: Reading, MA, 1983.Google Scholar
  19. [19]
    W. Tichy. The string-to-string correction problem with block moves.ACM Trans. Comput. Systems,2, (1984), 309–321.CrossRefGoogle Scholar
  20. [20]
    R. A. Wagner and M. J. Fischer. The string-to-string correction problem.J. ACM 21, 1 (1974), 168–173.MATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag New York Inc. 1986

Authors and Affiliations

  • Eugene W. Myers
    • 1
  1. 1.Department of Computer ScienceUniversity of ArizonaTucsonUSA

Personalised recommendations