Skip to main content
Log in

The longest common subsequence problem revisited

  • Published:
Algorithmica Aims and scope Submit manuscript

Abstract

This paper re-examines, in a unified framework, two classic approaches to the problem of finding a longest common subsequence (LCS) of two strings, and proposes faster implementations for both. Letl be the length of an LCS between two strings of lengthm andnm, respectively, and let s be the alphabet size. The first revised strategy follows the paradigm of a previousO(ln) time algorithm by Hirschberg. The new version can be implemented in timeO(lm · min logs, logm, log(2n/m)), which is profitable when the input strings differ considerably in size (a looser bound for both versions isO(mn)). The second strategy improves on the Hunt-Szymanski algorithm. This latter takes timeO((r +n) logn), wherermn is the total number of matches between the two input strings. Such a performance is quite good (O(n logn)) whenrn, but it degrades to Θ(mn logn) in the worst case. On the other hand the variation presented here is never worse than linear-time in the productmn. The exact time bound derived for this second algorithm isO(m logn +d log(2mn/d)), wheredr is the number ofdominant matches (elsewhere referred to asminimal candidates) between the two strings. Both algorithms require anO(n logs) preprocessing that is nearly standard for the LCS problem, and they make use of simple and handy auxiliary data structures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. A. V. Aho, D. S. Hirschberg, and J. D. Ullman, Bounds on the complexity of the maximal common subsequence problem,J. Assoc. Comput. Mach.,23 (1976), 1–12.

    MATH  MathSciNet  Google Scholar 

  2. A. V. Aho, J. E. Hopcroft, and J. D. Ullman,The Design and Analysis of Computer Algorithms, Addison-Wesley, Reading, MA, 1976

    Google Scholar 

  3. A. Apostolico, Improving the worst-case performance of the Hunt-Szymanski strategy for the longest common subsequence of two strings,Inform. Process. Lett.,23 (1986), 63–69.

    Article  MATH  MathSciNet  Google Scholar 

  4. A. Apostolico and C. Guerra, A fast linear space algorithm for computing longest common subsequences,Proceedings of the 23rd Allerton Conference, Monticello, IL, 1985, pp. 76–84.

  5. J. L. Bentley and A. C.-C. Yao, An almost optimal algorithm for unbounded searching,Inform. Process. Lett.,5 (1976), 82–87.

    Article  MATH  MathSciNet  Google Scholar 

  6. M. R. Brown, and R. E. Tarjan, A representation of linear lists with movable fingers,Proceedings of the Wth Annual ACM Symposium on Theory of Computing, San Diego, CA, 1978, pp. 19–29.

  7. M. R. Brown and R. E. Tarjan, A fast merging algorithm,J. Assoc. Comput. Mach. 26 (1979), 211–226.

    MATH  MathSciNet  Google Scholar 

  8. D. S. Hirschberg, A linear space algorithm for computing maximal common subsequences,Comm. ACM,18 (1975), 341–343.

    Article  MATH  MathSciNet  Google Scholar 

  9. D. S. Hirschberg, Algorithms for the longest common subsequence problemJ Assoc. Comput. Mach.,24 (1977), 664–675.

    MATH  MathSciNet  Google Scholar 

  10. W. J. Hsu and M. W. Du, New algorithms for the LCS problem,J. Comput. System Sci.,29 (1984), 133–152.

    Article  MATH  MathSciNet  Google Scholar 

  11. J. W. Hunt and T. G. Szymanski, A fast algorithm for computing longest common subsequences,Comm. ACM,20 (1977), 350–353.

    Article  MATH  MathSciNet  Google Scholar 

  12. H. M. Martinez (ed.), Mathematical and computational problems in the analysis of molecular sequences,Bull. Math. Biol,46, 4 (1984).

    Google Scholar 

  13. W. J. Masek and M. S. Paterson, A faster algorithm for computing string editing distances,J. Comput. System Sci.,20 (1980), 18–31.

    Article  MATH  MathSciNet  Google Scholar 

  14. K. Mehlhorn,Data Structures and Algorithms 1: Sorting and Searching, EATCS Monographs on TCS, Springer-Verlag, Berlin, 1984.

    Google Scholar 

  15. D. Sankoff and J. B. Kruskal (eds.),Time Warps, String Edits and Macromolecules: The Theory and Practice of Sequence Comparisons, Addison-Wesley, Reading, MA, 1983.

    Google Scholar 

  16. P. van Emde Boas, Preserving order in a forest in less than logarithmic time and linear space,Inform. Process. Lett.,6 (1977), 80–82.

    Article  MATH  Google Scholar 

  17. R. A. Wagner and M. J. Fischer, The string to string correction problem,J. Assoc. Comput. Mach.,21 (1974), 168–173.

    MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

Communicated by David P. Dobkin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Apostolico, A., Guerra, C. The longest common subsequence problem revisited. Algorithmica 2, 315–336 (1987). https://doi.org/10.1007/BF01840365

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01840365

Key words

Navigation