Sparse LCS Common Substring Alignment

Landau, Gad M.; Schieber, Baruch; Ziv-Ukelson, Michal

doi:10.1007/3-540-44888-8_17

Sparse LCS Common Substring Alignment

Gad M. Landau^7,8,
Baruch Schieber⁹ &
Michal Ziv-Ukelson^8,9

Conference paper
First Online: 01 January 2003

657 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2676))

Abstract

The “Common Substring Alignment” problem is defined as follows. The input consists of a set of strings S ₁, S ₂ ... S _c, with a common substring appearing at least once in each of them, and a target string T. The goal is to compute similarity of all strings S _i with T, without computing the part of the common substring over and over again. In this paper we consider the Common Substring Alignment problem for the LCS (Longest Common Subsequence) similarity metric. Our algorithm gains its efficiency by exploiting the sparsity inherent to the LCS problem. Let Y be the common substring, n be the size of the compared sequences, L _y be the length of the LCS of T and Y, denoted |LCS[T, Y]|, and L be max{|LCS[T, S _i]|}. Our algorithm consists of an O(nL _y) time encoding stage that is executed once per common substring, and an O(L) time alignment stage that is executed once for each appearance of the common substring in each source string. The additional running time depends only on the length of the parts of the strings that are not in any common substring.

partially supported by NSF grant CCR-0104307, by the Israel Science Foundation grant 282/01, by the FIRST Foundation of the Israel Academy of Science and Humanities, and by IBM Faculty Partnership Award.

On Education Leave from the IBM T.J. Watson Research Center; michal@cs.haifa.ac.il; partially supported by the Israel Science Foundation grant 282/01, and by the FIRST Foundation of the Israel Academy of Science and Humanities.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

A. Apostolico, String editing and longest common subsequences. In G. Rozenberg and A. Salomaa, editors, Handbook of Formal Languages, Vol. 2, 361–398, Berlin, 1997. Springer Verlag.
Google Scholar
Apostolico A., and C. Guerra, The longest common subsequence problem revisited. Algorithmica, 2, 315–336 (1987).
Article MATH MathSciNet Google Scholar
Aggarwal, A., M. Klawe, S. Moran, P. Shor, and R. Wilber, Geometric Applications of a Matrix-Searching Algorithm, Algorithmica, 2, 195–208 (1987).
Article MATH MathSciNet Google Scholar
Benson, G., A space efficient algorithm for finding the best nonoverlapping alignment score, Theoretical Computer Science, 145, 357–369 (1995).
Article MATH MathSciNet Google Scholar
Crochemore, M., G.M. Landau, and M. Ziv-Ukelson, A Sub-quadratic Sequence Alignment Algorithm for Unrestricted Cost Matrices, Proc. Symposium On Discrete Algorithms, 679–688 (2002).
Google Scholar
Eppstein, D., Z. Galil, R. Giancarlo, and G.F. Italiano, Sparse Dynamic Programming I: Linear Cost Functions, JACM, 39, 546–567 (1992).
Article MATH MathSciNet Google Scholar
Gusfield, D., Algorithms on Strings, Trees, and Sequences. Cambridge University Press, (1997).
Google Scholar
Hirshberg, D.S., “Algorithms for the longest common subsequence problem”, JACM, 24(4), 664–675 (1977).
Article Google Scholar
Hunt, J. W. and T. G. Szymanski. “A fast algorithm for computing longest common subsequences.” Communications of the ACM, 20, 350–353 (1977).
Article MATH MathSciNet Google Scholar
Kannan, S. K., and E. W. Myers, An Algorithm For Locating Non-Overlapping Regions of Maximum Alignment Score, SIAM J. Comput., 25(3), 648–662 (1996).
Article MATH MathSciNet Google Scholar
Landau, G.M., and M. Ziv-Ukelson, On the Shared Substring Alignment Problem, Proc. 11th Annual ACM-SIAM Symposium on Discrete Algorithms, 804–814 (2000).
Google Scholar
Landau, G.M., and M. Ziv-Ukelson, On the Common Substring Alignment Problem, Journal of Algorithms, 41(2), 338–359 (2001)
Article MATH MathSciNet Google Scholar
Monge, G., Déblai et Remblai, Mémoires de l’Academie des Sciences, Paris (1781).
Google Scholar
Myers, E. W., “Incremental Alignment Algorithms and their Applications,” Tech. Rep. 86-22, Dept. of Computer Science, U. of Arizona. 1986.
Google Scholar
Schmidt, J.P., All Highest Scoring Paths In Weighted Grid Graphs and Their Application To Finding All Approximate Repeats In Strings, SIAM J. Comput, 27(4), 972–992 (1998).
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer and Information Science, Polytechnic University, Six MetroTech Center, Brooklyn, NY, 11201-3840
Gad M. Landau
Department of Computer Science, Haifa University, Haifa, 31905, Israel
Gad M. Landau & Michal Ziv-Ukelson
IBM T.J. Watson Research Center, P.O. Box 218, Yorktown Heights, NY, 10598
Baruch Schieber & Michal Ziv-Ukelson

Authors

Gad M. Landau
View author publications
You can also search for this author in PubMed Google Scholar
Baruch Schieber
View author publications
You can also search for this author in PubMed Google Scholar
Michal Ziv-Ukelson
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Depto. de Ciencias de la Computación, Universidad de Chile, Blanco Encalada 2120, Santiago, 6511224, Chile
Ricardo Baeza-Yates
Escuela de Ciencias Físico-Matemáticas, Universidad Michoacana, Edificio “B”, ciudad universitaria, Morelia Michoacán, Mexico
Edgar Chávez
Université de Marne-la-Vallée, 77454, Marne-la-Vallée Cedex 2, France
Maxime Crochemore

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Landau, G.M., Schieber, B., Ziv-Ukelson, M. (2003). Sparse LCS Common Substring Alignment. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds) Combinatorial Pattern Matching. CPM 2003. Lecture Notes in Computer Science, vol 2676. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44888-8_17

Download citation

DOI: https://doi.org/10.1007/3-540-44888-8_17
Published: 27 May 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40311-1
Online ISBN: 978-3-540-44888-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics