RNA multiple structural alignment with longest common subsequences Authors Sergey Bereg Department of Computer Science University of Texas at Dallas Marcin Kubica Institute of Informatics Warsaw University Tomasz Waleń Institute of Informatics Warsaw University Binhai Zhu Department of Computer Science Montana State University Article

First Online: 02 November 2006 DOI :
10.1007/s10878-006-9020-x

Cite this article as: Bereg, S., Kubica, M., Waleń, T. et al. J Comb Optim (2007) 13: 179. doi:10.1007/s10878-006-9020-x
Abstract In this paper, we present a new model for RNA multiple sequence structural alignment based on the longest common subsequence . We consider both the off-line and on-line cases. For the off-line case, i.e., when the longest common subsequence is given as a linear graph with n vertices, we first present a polynomial O (n ^{2} ) time algorithm to compute its maximum nested loop. We then consider a slightly different problem—the Maximum Loop Chain problem and present an algorithm which runs in O (n ^{5} ) time. For the on-line case, i.e., given m RNA sequences of lengths n , compute the longest common subsequence of them such that this subsequence either induces a maximum nested loop or the maximum number of matches, we present efficient algorithms using dynamic programming when m is small.

Keywords RNA multiple structure alignment Longest common subsequence Dynamic programming This research is partially supported by EPSCOR Visiting Scholar's Program and MSU Short-term
Professional Development Program.

Download to read the full article text

References Chin FYL, De Santis A, Ferrara AL, Ho NL, Kim SK (2004) A simple algorithm for the constrained sequence problems. Inform Proc Lett 90(4):175–179

Cormen T, Leiserson C, Rivest R, Stein C (2001) Introduction to Algorithms, 2nd edn, MIT Press

Dayhoff M (1965) Computer aids to protein sequence determination. J Theoret Biol 8(1):97–112

CrossRef Dayhoff M (1969) Computer analysis of protein evolution. Sci Am 221(1):86–95

CrossRef Davydov E, Batzoglu S (2004) A computational model for RNA multiple structural alignment. In: Proc. 15th Ann. Symp. Combinatorial Pattern Matching, LNCS 3109, pp 254–269

Deng X, Li G, Li Z, Ma B, Wang L (2002) A PTAS for distinguishing (sub)string selection. In: Proc. ICALP’02, pp 740–751

Eddy SR (2001) Noncoding RNA genes and the modern RNA world. Nat Rev Genet 2:919–929

CrossRef Goldman D, Istrail S, Papadimitriou C (1999) Algorithmic aspects of protein structure similarity. In: Proc. 40th Ann. Symp. Foundations of Computer Science (FOCS’99), pp 512–522

Greenberg RI (2003) Bounds on the Number of the Longest Common Subsequence Problem. CoRR cs.DM/0301030

Hirschberg D (1975) The longest common subsequence problem. PhD Thesis, Princeton University

Hsu WJ, Du MW (1984) Computing a longest common subsequence for a set of strings. BIT 24:45–59

CrossRef MathSciNet MATH Jiang T, Li M (1995) On the approximation of shortest common supersequences and longest common subsequences. SIAM J Comput 24(5):1122–1139

CrossRef MathSciNet MATH Kubica M, Rizzi R, Vialette S, Walen T (2006) Approximation of RNA multiple structural alignment. In: Proc. 17th Ann Symp Combinatorial Pattern Matching, LNCS 4009, pp 211–222

Lanctot K, Li M, Ma B, Wang S, Zhang L (1999) Distinguishing string selection problems. In: Proc. 6th Ann. ACM-SIAM Symp. on Discrete Algorithms, pp 633–642

Li M, Ma B, Wang L (1999) Finding similar regions in many strings. In: Proc. 31st ACM Symp. on Theory of Computing (STOC’99), pp 473–482

Maier D (1978) The complexity of some problems on subsequences and supersequences. J ACM 25:322–336

CrossRef MathSciNet MATH Nussinov R, Pieczenik G, Griggs J, Kleitman D (1978) Algorithms for loop matching. SIAM J Appl Math 35:68–82

CrossRef MathSciNet MATH Rick C (2000) Efficient computation of all longest common subsequences. In: Proc. 7th Scandinavian Workshop on Algorithm Theory (SWAT’00), pp 407–418

Rivas E, Eddy SR (1999) A dynamic programming algorithm for RNA structure prediction including pseudoknots. J Mol Biol 285:2053–2068

CrossRef Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147:195–197

CrossRef Tsai Y-T (2003) The constrained longest common subsequence problem. Inform Proc Lett 88(4):173–176

Zucker M, Stiegler P (1981) Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res 9:133–148

Zucker M (1989) Computer prediction of RNA structure. Methods Enzymol 180:262–288

CrossRef © Springer Science+Business Media, LLC 2006