RNA multiple structural alignment with longest common subsequences Article

First Online: 02 November 2006 DOI :
10.1007/s10878-006-9020-x

Cite this article as: Bereg, S., Kubica, M., Waleń, T. et al. J Comb Optim (2007) 13: 179. doi:10.1007/s10878-006-9020-x
Abstract In this paper, we present a new model for RNA multiple sequence structural alignment based on the longest common subsequence . We consider both the off-line and on-line cases. For the off-line case, i.e., when the longest common subsequence is given as a linear graph with n vertices, we first present a polynomial O (n ^{2} ) time algorithm to compute its maximum nested loop. We then consider a slightly different problem—the Maximum Loop Chain problem and present an algorithm which runs in O (n ^{5} ) time. For the on-line case, i.e., given m RNA sequences of lengths n , compute the longest common subsequence of them such that this subsequence either induces a maximum nested loop or the maximum number of matches, we present efficient algorithms using dynamic programming when m is small.

Keywords RNA multiple structure alignment Longest common subsequence Dynamic programming This research is partially supported by EPSCOR Visiting Scholar's Program and MSU Short-term
Professional Development Program.

References Chin FYL, De Santis A, Ferrara AL, Ho NL, Kim SK (2004) A simple algorithm for the constrained sequence problems. Inform Proc Lett 90(4):175–179

Cormen T, Leiserson C, Rivest R, Stein C (2001) Introduction to Algorithms, 2nd edn, MIT Press

Dayhoff M (1965) Computer aids to protein sequence determination. J Theoret Biol 8(1):97–112

CrossRef Google Scholar Dayhoff M (1969) Computer analysis of protein evolution. Sci Am 221(1):86–95

CrossRef Google Scholar Davydov E, Batzoglu S (2004) A computational model for RNA multiple structural alignment. In: Proc. 15th Ann. Symp. Combinatorial Pattern Matching, LNCS 3109, pp 254–269

Deng X, Li G, Li Z, Ma B, Wang L (2002) A PTAS for distinguishing (sub)string selection. In: Proc. ICALP’02, pp 740–751

Eddy SR (2001) Noncoding RNA genes and the modern RNA world. Nat Rev Genet 2:919–929

CrossRef Google Scholar Goldman D, Istrail S, Papadimitriou C (1999) Algorithmic aspects of protein structure similarity. In: Proc. 40th Ann. Symp. Foundations of Computer Science (FOCS’99), pp 512–522

Greenberg RI (2003) Bounds on the Number of the Longest Common Subsequence Problem. CoRR cs.DM/0301030

Hirschberg D (1975) The longest common subsequence problem. PhD Thesis, Princeton University

Hsu WJ, Du MW (1984) Computing a longest common subsequence for a set of strings. BIT 24:45–59

CrossRef MathSciNet MATH Google Scholar Jiang T, Li M (1995) On the approximation of shortest common supersequences and longest common subsequences. SIAM J Comput 24(5):1122–1139

CrossRef MathSciNet MATH Google Scholar Kubica M, Rizzi R, Vialette S, Walen T (2006) Approximation of RNA multiple structural alignment. In: Proc. 17th Ann Symp Combinatorial Pattern Matching, LNCS 4009, pp 211–222

Lanctot K, Li M, Ma B, Wang S, Zhang L (1999) Distinguishing string selection problems. In: Proc. 6th Ann. ACM-SIAM Symp. on Discrete Algorithms, pp 633–642

Li M, Ma B, Wang L (1999) Finding similar regions in many strings. In: Proc. 31st ACM Symp. on Theory of Computing (STOC’99), pp 473–482

Maier D (1978) The complexity of some problems on subsequences and supersequences. J ACM 25:322–336

CrossRef MathSciNet MATH Google Scholar Nussinov R, Pieczenik G, Griggs J, Kleitman D (1978) Algorithms for loop matching. SIAM J Appl Math 35:68–82

CrossRef MathSciNet MATH Google Scholar Rick C (2000) Efficient computation of all longest common subsequences. In: Proc. 7th Scandinavian Workshop on Algorithm Theory (SWAT’00), pp 407–418

Rivas E, Eddy SR (1999) A dynamic programming algorithm for RNA structure prediction including pseudoknots. J Mol Biol 285:2053–2068

CrossRef Google Scholar Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147:195–197

CrossRef Google Scholar Tsai Y-T (2003) The constrained longest common subsequence problem. Inform Proc Lett 88(4):173–176

Zucker M, Stiegler P (1981) Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res 9:133–148

Google Scholar Zucker M (1989) Computer prediction of RNA structure. Methods Enzymol 180:262–288

CrossRef Google Scholar © Springer Science+Business Media, LLC 2006

Authors and Affiliations 1. Department of Computer Science University of Texas at Dallas Richardson USA 2. Institute of Informatics Warsaw University Banacha 2 Poland 3. Department of Computer Science Montana State University Bozeman USA