Regular Language Constrained Sequence Alignment Revisited

Kucherov, Gregory; Pinhas, Tamar; Ziv-Ukelson, Michal

doi:10.1007/978-3-642-19222-7_39

Gregory Kucherov¹⁸,
Tamar Pinhas¹⁹ &
Michal Ziv-Ukelson¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6460))

Included in the following conference series:

International Workshop on Combinatorial Algorithms

688 Accesses

Abstract

Imposing constraints in the form of a finite automaton or a regular expression is an effective way to incorporate additional a priori knowledge into sequence alignment procedures. With this motivation, Arslan [1] introduced the Regular Language Constrained Sequence Alignment Problem and proposed an O(n ² t ⁴) time and O(n ² t ²) space algorithm for solving it, where n is the length of the input strings and t is the number of states in the non-deterministic automaton, which is given as input. Chung et al. [2] proposed a faster O(n ² t ³) time algorithm for the same problem. In this paper, we further speed up the algorithms for Regular Language Constrained Sequence Alignment by reducing their worst case time complexity bound to O(n ² t ³/logt). This is done by establishing an optimal bound on the size of Straight-Line Programs solving the maxima computation subproblem of the basic dynamic programming algorithm. We also study another solution based on a Steiner Tree computation. While it does not improve the run time complexity in the worst case, our simulations show that both approaches are efficient in practice, especially when the input automata are dense.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Arslan, A.: Regular expression constrained sequence alignment. Journal of Discrete Algorithms 5(4), 647–661 (2007)
Article MATH Google Scholar
Chung, Y., Lu, C., Tang, C.: Efficient algorithms for regular expression constrained sequence alignment. Information Processing Letters 103(6), 240–246 (2007)
Article MATH Google Scholar
Smith, T., Waterman, M.: Identification of common molecular subsequences. Journal of Molecular Biology 147(1), 195–197 (1981)
Article Google Scholar
Arslan, A., Egecioglu, O.: Algorithms for the constrained longest common subsequence problems. International Journal of Foundations of Computer Science 16(6), 1099–1110 (2005)
Article MATH Google Scholar
Chen, Y., Chao, K.: On the generalized constrained longest common subsequence problems. Journal of Combinatorial Optimization, 1–10 (2009)
Google Scholar
Iliopoulos, C., Rahman, M.: New efficient algorithms for the LCS and constrained LCS problems. Information Processing Letters 106(1), 13–18 (2008)
Article MATH Google Scholar
Peng, Z., Ting, H.: Time and space efficient algorithms for constrained sequence alignment. In: Domaratzki, M., Okhotin, A., Salomaa, K., Yu, S. (eds.) CIAA 2004. LNCS, vol. 3317, pp. 237–246. Springer, Heidelberg (2005)
Chapter Google Scholar
Tsai, Y.: The constrained longest common subsequence problem. Information Processing Letters 88(4), 173–176 (2003)
Article MATH Google Scholar
Bairoch, A.: The PROSITE dictionary of sites and patterns in proteins, its current status. Nucleic Acids Research 21(13), 3097 (1993)
Article Google Scholar
Tang, C., Lu, C., Chang, M., Tsai, Y., Sun, Y., Chao, K., Chang, J., Chiou, Y., Wu, C., Chang, H., et al.: Constrained multiple sequence alignment tool development and its application to RNase family alignment. Journal of Bioinformatics and Computational Biology 1(2), 267–287 (2003)
Article Google Scholar
Bern, M., Plassmann, P.: The Steiner problem with edge lengths 1 and 2. Information Processing Letters 32(4), 171–176 (1989)
Article MATH Google Scholar
Shi, W., Su, C.: The rectilinear Steiner arborescence problem is NP-complete. SIAM Journal on Computing 35(3), 729–740 (2006)
Article MATH Google Scholar
Foulds, L., Graham, R.: The Steiner problem in phylogeny is NP-complete. Advances in Applied Mathematics 3(43-49), 299 (1982)
MATH Google Scholar
Jia, W., Han, B., Au, P., He, Y., Zhou, W.: Optimal multicast tree routing for cluster computing in hypercube interconnection networks. IEICE Transactions on Information and Systems E87-D, 1625–1632 (2004)
Google Scholar
Lin, X., Ni, L.: Multicast communication in multicomputer networks. IEEE Transactions on Parallel and Distributed Systems 4(10), 1105–1117 (1993)
Article Google Scholar
Sheu, S., Yang, C.: Multicast algorithms for hypercube multiprocessors. Journal of Parallel and Distributed Computing 61(1), 137–149 (2001)
Article MATH Google Scholar
Dinur, I., Safra, S.: On the hardness of approximating minimum vertex cover. Annals of Mathematics 162(1), 439–486 (2005)
Article MATH Google Scholar
Sylvester, J.: Thoughts on inverse orthogonal matrices simultaneous sign successions, and tessellated pavements in two or more colors, with applications to Newton’s rule, ornamental tile-work and the theory of numbers. Phil. Mag. 34(2), 461–475 (1867)
Google Scholar
Seberry, J., Yamada, M.: Hadamard matrices, sequences, and block designs. Contemporary Design Theory: A Collection of Surveys, 431–560 (1992)
Google Scholar
Savage, J.: An algorithm for the computation of linear forms. SIAM J. Comput. 3(2), 150–158 (1974)
Article MATH Google Scholar
Hromkoviěc, J., Seibert, S., Wilke, T.: Translating regular expressions into small ε-free nondeterministic finite automata. Journal of Computer and System Sciences 62(4), 565–588 (2001)
Article MATH Google Scholar
Schnitger, G.: Regular expressions and NFAs without epsilon-transitions. In: Durand, B., Thomas, W. (eds.) STACS 2006. LNCS, vol. 3884, p. 432. Springer, Heidelberg (2006)
Chapter Google Scholar
Geffert, V.: Translation of binary regular expressions into nondeterministic ε-free automata with O(n logn) transitions. Journal of Computer and System Sciences 66(3), 451–472 (2003)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

LIFL/CNRS and INRIA Lille Nord-Europe, Villeneuve d’Ascq, France
Gregory Kucherov
Department of Computer Science, Ben-Gurion University of the Negev, Be’er Sheva, Israel
Tamar Pinhas & Michal Ziv-Ukelson

Authors

Gregory Kucherov
View author publications
You can also search for this author in PubMed Google Scholar
Tamar Pinhas
View author publications
You can also search for this author in PubMed Google Scholar
Michal Ziv-Ukelson
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of London, King’s College, The Strand, WC2R 2LS, London, UK
Costas S. Iliopoulos
Department of Computing and Software, McMaster University, 1280 Main Street West, L8S 4K1, Hamilton, ON, Canada
William F. Smyth

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kucherov, G., Pinhas, T., Ziv-Ukelson, M. (2011). Regular Language Constrained Sequence Alignment Revisited. In: Iliopoulos, C.S., Smyth, W.F. (eds) Combinatorial Algorithms. IWOCA 2010. Lecture Notes in Computer Science, vol 6460. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19222-7_39

Download citation

DOI: https://doi.org/10.1007/978-3-642-19222-7_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19221-0
Online ISBN: 978-3-642-19222-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics