Abstract
The “Common Substring Alignment” problem is defined as follows. The input consists of a set of strings S 1, S 2 ... S c, with a common substring appearing at least once in each of them, and a target string T. The goal is to compute similarity of all strings S i with T, without computing the part of the common substring over and over again. In this paper we consider the Common Substring Alignment problem for the LCS (Longest Common Subsequence) similarity metric. Our algorithm gains its efficiency by exploiting the sparsity inherent to the LCS problem. Let Y be the common substring, n be the size of the compared sequences, L y be the length of the LCS of T and Y, denoted |LCS[T, Y]|, and L be max{|LCS[T, S i]|}. Our algorithm consists of an O(nL y) time encoding stage that is executed once per common substring, and an O(L) time alignment stage that is executed once for each appearance of the common substring in each source string. The additional running time depends only on the length of the parts of the strings that are not in any common substring.
partially supported by NSF grant CCR-0104307, by the Israel Science Foundation grant 282/01, by the FIRST Foundation of the Israel Academy of Science and Humanities, and by IBM Faculty Partnership Award.
On Education Leave from the IBM T.J. Watson Research Center; michal@cs.haifa.ac.il; partially supported by the Israel Science Foundation grant 282/01, and by the FIRST Foundation of the Israel Academy of Science and Humanities.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
A. Apostolico, String editing and longest common subsequences. In G. Rozenberg and A. Salomaa, editors, Handbook of Formal Languages, Vol. 2, 361–398, Berlin, 1997. Springer Verlag.
Apostolico A., and C. Guerra, The longest common subsequence problem revisited. Algorithmica, 2, 315–336 (1987).
Aggarwal, A., M. Klawe, S. Moran, P. Shor, and R. Wilber, Geometric Applications of a Matrix-Searching Algorithm, Algorithmica, 2, 195–208 (1987).
Benson, G., A space efficient algorithm for finding the best nonoverlapping alignment score, Theoretical Computer Science, 145, 357–369 (1995).
Crochemore, M., G.M. Landau, and M. Ziv-Ukelson, A Sub-quadratic Sequence Alignment Algorithm for Unrestricted Cost Matrices, Proc. Symposium On Discrete Algorithms, 679–688 (2002).
Eppstein, D., Z. Galil, R. Giancarlo, and G.F. Italiano, Sparse Dynamic Programming I: Linear Cost Functions, JACM, 39, 546–567 (1992).
Gusfield, D., Algorithms on Strings, Trees, and Sequences. Cambridge University Press, (1997).
Hirshberg, D.S., “Algorithms for the longest common subsequence problem”, JACM, 24(4), 664–675 (1977).
Hunt, J. W. and T. G. Szymanski. “A fast algorithm for computing longest common subsequences.” Communications of the ACM, 20, 350–353 (1977).
Kannan, S. K., and E. W. Myers, An Algorithm For Locating Non-Overlapping Regions of Maximum Alignment Score, SIAM J. Comput., 25(3), 648–662 (1996).
Landau, G.M., and M. Ziv-Ukelson, On the Shared Substring Alignment Problem, Proc. 11th Annual ACM-SIAM Symposium on Discrete Algorithms, 804–814 (2000).
Landau, G.M., and M. Ziv-Ukelson, On the Common Substring Alignment Problem, Journal of Algorithms, 41(2), 338–359 (2001)
Monge, G., Déblai et Remblai, Mémoires de l’Academie des Sciences, Paris (1781).
Myers, E. W., “Incremental Alignment Algorithms and their Applications,” Tech. Rep. 86-22, Dept. of Computer Science, U. of Arizona. 1986.
Schmidt, J.P., All Highest Scoring Paths In Weighted Grid Graphs and Their Application To Finding All Approximate Repeats In Strings, SIAM J. Comput, 27(4), 972–992 (1998).
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Landau, G.M., Schieber, B., Ziv-Ukelson, M. (2003). Sparse LCS Common Substring Alignment. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds) Combinatorial Pattern Matching. CPM 2003. Lecture Notes in Computer Science, vol 2676. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44888-8_17
Download citation
DOI: https://doi.org/10.1007/3-540-44888-8_17
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40311-1
Online ISBN: 978-3-540-44888-4
eBook Packages: Springer Book Archive