Abstract
Imposing constraints in the form of a finite automaton or a regular expression is an effective way to incorporate additional a priori knowledge into sequence alignment procedures. With this motivation, Arslan [1] introduced the Regular Language Constrained Sequence Alignment Problem and proposed an O(n 2 t 4) time and O(n 2 t 2) space algorithm for solving it, where n is the length of the input strings and t is the number of states in the non-deterministic automaton, which is given as input. Chung et al. [2] proposed a faster O(n 2 t 3) time algorithm for the same problem. In this paper, we further speed up the algorithms for Regular Language Constrained Sequence Alignment by reducing their worst case time complexity bound to O(n 2 t 3/logt). This is done by establishing an optimal bound on the size of Straight-Line Programs solving the maxima computation subproblem of the basic dynamic programming algorithm. We also study another solution based on a Steiner Tree computation. While it does not improve the run time complexity in the worst case, our simulations show that both approaches are efficient in practice, especially when the input automata are dense.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Arslan, A.: Regular expression constrained sequence alignment. Journal of Discrete Algorithms 5(4), 647–661 (2007)
Chung, Y., Lu, C., Tang, C.: Efficient algorithms for regular expression constrained sequence alignment. Information Processing Letters 103(6), 240–246 (2007)
Smith, T., Waterman, M.: Identification of common molecular subsequences. Journal of Molecular Biology 147(1), 195–197 (1981)
Arslan, A., Egecioglu, O.: Algorithms for the constrained longest common subsequence problems. International Journal of Foundations of Computer Science 16(6), 1099–1110 (2005)
Chen, Y., Chao, K.: On the generalized constrained longest common subsequence problems. Journal of Combinatorial Optimization, 1–10 (2009)
Iliopoulos, C., Rahman, M.: New efficient algorithms for the LCS and constrained LCS problems. Information Processing Letters 106(1), 13–18 (2008)
Peng, Z., Ting, H.: Time and space efficient algorithms for constrained sequence alignment. In: Domaratzki, M., Okhotin, A., Salomaa, K., Yu, S. (eds.) CIAA 2004. LNCS, vol. 3317, pp. 237–246. Springer, Heidelberg (2005)
Tsai, Y.: The constrained longest common subsequence problem. Information Processing Letters 88(4), 173–176 (2003)
Bairoch, A.: The PROSITE dictionary of sites and patterns in proteins, its current status. Nucleic Acids Research 21(13), 3097 (1993)
Tang, C., Lu, C., Chang, M., Tsai, Y., Sun, Y., Chao, K., Chang, J., Chiou, Y., Wu, C., Chang, H., et al.: Constrained multiple sequence alignment tool development and its application to RNase family alignment. Journal of Bioinformatics and Computational Biology 1(2), 267–287 (2003)
Bern, M., Plassmann, P.: The Steiner problem with edge lengths 1 and 2. Information Processing Letters 32(4), 171–176 (1989)
Shi, W., Su, C.: The rectilinear Steiner arborescence problem is NP-complete. SIAM Journal on Computing 35(3), 729–740 (2006)
Foulds, L., Graham, R.: The Steiner problem in phylogeny is NP-complete. Advances in Applied Mathematics 3(43-49), 299 (1982)
Jia, W., Han, B., Au, P., He, Y., Zhou, W.: Optimal multicast tree routing for cluster computing in hypercube interconnection networks. IEICE Transactions on Information and Systems E87-D, 1625–1632 (2004)
Lin, X., Ni, L.: Multicast communication in multicomputer networks. IEEE Transactions on Parallel and Distributed Systems 4(10), 1105–1117 (1993)
Sheu, S., Yang, C.: Multicast algorithms for hypercube multiprocessors. Journal of Parallel and Distributed Computing 61(1), 137–149 (2001)
Dinur, I., Safra, S.: On the hardness of approximating minimum vertex cover. Annals of Mathematics 162(1), 439–486 (2005)
Sylvester, J.: Thoughts on inverse orthogonal matrices simultaneous sign successions, and tessellated pavements in two or more colors, with applications to Newton’s rule, ornamental tile-work and the theory of numbers. Phil. Mag. 34(2), 461–475 (1867)
Seberry, J., Yamada, M.: Hadamard matrices, sequences, and block designs. Contemporary Design Theory: A Collection of Surveys, 431–560 (1992)
Savage, J.: An algorithm for the computation of linear forms. SIAM J. Comput. 3(2), 150–158 (1974)
Hromkoviěc, J., Seibert, S., Wilke, T.: Translating regular expressions into small ε-free nondeterministic finite automata. Journal of Computer and System Sciences 62(4), 565–588 (2001)
Schnitger, G.: Regular expressions and NFAs without epsilon-transitions. In: Durand, B., Thomas, W. (eds.) STACS 2006. LNCS, vol. 3884, p. 432. Springer, Heidelberg (2006)
Geffert, V.: Translation of binary regular expressions into nondeterministic ε-free automata with O(n logn) transitions. Journal of Computer and System Sciences 66(3), 451–472 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kucherov, G., Pinhas, T., Ziv-Ukelson, M. (2011). Regular Language Constrained Sequence Alignment Revisited. In: Iliopoulos, C.S., Smyth, W.F. (eds) Combinatorial Algorithms. IWOCA 2010. Lecture Notes in Computer Science, vol 6460. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19222-7_39
Download citation
DOI: https://doi.org/10.1007/978-3-642-19222-7_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19221-0
Online ISBN: 978-3-642-19222-7
eBook Packages: Computer ScienceComputer Science (R0)