Two algorithms for the longest common subsequence of three (or more) strings
Various algorithms have been proposed, over the years, for the longest common subsequence problem on 2 strings (2-LCS), many of these improving, at least for some cases, on the classical dynamic programming approach. However, relatively little attention has been paid in the literature to the k-LCS problem for k > 2, a problem that has interesting applications in areas such as the multiple alignment of sequences in molecular biology.
In this paper, we describe and analyse two algorithms with particular reference to the 3-LCS problem, though each algorithm can be extended to solve the k-LCS problem for general k. The first algorithm, which can be viewed as a “lazy” version of dynamic programming, has time and space complexity that is O(n(n−1)2) for 3 strings, and O(kn(n−1)k}-1) for k strings, where n is the common length of the strings and l is the length of an LCS. The second algorithm, which involves evaluating entries in a “threshold” table in diagonal order, has time and space complexity that is O(l(n−1)2+sn) for 3 strings, and O(kl(n−1)k−1+ksn) for k strings, where s is the alphabet size. For simplicity, the algorithms are presented for equal-length strings, though extension to unequal-length strings is straightforward.
Empirical evidence is presented to show that both algorithms show significant improvement on the basic dynamic programming approach, and on an earlier algorithm proposed by Hsu and Du, particularly, as would be expected, in the case where l is relatively large, with the balance of evidence being heavily in favour of the threshold approach.
Key wordsstring algorithms longest common subsequence
- 1.A. Apostolico. Improving the worst-case performance of the Hunt-Szymanski strategy for the longest common subsequence of two strings. Information Processing Letters, 23:63–69, 1986.Google Scholar
- 2.A. Apostolico, S. Browne, and C. Guerra. Fast linear-space computations of longest common subsequences. Theoretical Computer Science, 92:3–17, 1992.Google Scholar
- 3.A. Apostolico and C. Guerra. The longest common subsequence problem revisited. Algorithmica, 2:315–336, 1987.Google Scholar
- 4.D.S. Hirschberg. A linear space algorithm for computing maximal common subsequences. Communications of the A.C.M., 18:341–343, 1975.Google Scholar
- 5.D.S. Hirschberg. Algorithms for the longest common subsequence problem. Journal of the A.C.M., 24:664–675, 1977.Google Scholar
- 6.W.J. Hsu and M.W. Du. Computing a longest common subsequence for a set of strings. BIT, 24:45–59, 1984.Google Scholar
- 7.J.W. Hunt and T.G. Szymanski. A fast algorithm for computing longest common subsequences. Communications of the A.C.M., 20:350–353, 1977.Google Scholar
- 8.S.Y. Itoga. The string merging problem. BIT, 21:20–30, 1981.Google Scholar
- 9.W.J. Masek and M.S. Paterson. A faster algorithm for computing string editing distances. J. Comput. System Sci., 20:18–31, 1980.Google Scholar
- 10.E.W. Myers. An O(ND) difference algorithm and its variations. Algorithmica, 1:251–266, 1986.Google Scholar
- 11.N. Nakatsu, Y. Kambayashi, and S. Yajima. A longest common subsequence algorithm suitable for similar text strings. Acta Informatica, 18:171–179, 1982.Google Scholar
- 12.D. Sankoff. Matching sequences under deletion insertion constraints. Proc. Nat. Acad. Sci. U.S.A., 69:4–6, 1972.Google Scholar
- 13.E. Ukkonen. Algorithms for approximate string matching. Information and Control, 64:100–118, 1985.Google Scholar
- 14.R.A. Wagner and M.J. Fischer. The string-to-string correction problem. Journal of the A.C.M., 21:168–173, 1974.Google Scholar
- 15.S. Wu, U. Manber, G. Myers, and W. Miller. An O(NP) sequence comparison algorithm. Information Processing Letters, 35:317–323, 1990.Google Scholar