Journal of Combinatorial Optimization

, Volume 13, Issue 2, pp 179–188 | Cite as

RNA multiple structural alignment with longest common subsequences

  • Sergey Bereg
  • Marcin Kubica
  • Tomasz Waleń
  • Binhai Zhu
Article

Abstract

In this paper, we present a new model for RNA multiple sequence structural alignment based on the longest common subsequence. We consider both the off-line and on-line cases. For the off-line case, i.e., when the longest common subsequence is given as a linear graph with n vertices, we first present a polynomial O(n 2) time algorithm to compute its maximum nested loop. We then consider a slightly different problem—the Maximum Loop Chain problem and present an algorithm which runs in O(n 5) time. For the on-line case, i.e., given m RNA sequences of lengths n, compute the longest common subsequence of them such that this subsequence either induces a maximum nested loop or the maximum number of matches, we present efficient algorithms using dynamic programming when m is small.

Keywords

RNA multiple structure alignment Longest common subsequence Dynamic programming 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chin FYL, De Santis A, Ferrara AL, Ho NL, Kim SK (2004) A simple algorithm for the constrained sequence problems. Inform Proc Lett 90(4):175–179Google Scholar
  2. Cormen T, Leiserson C, Rivest R, Stein C (2001) Introduction to Algorithms, 2nd edn, MIT PressGoogle Scholar
  3. Dayhoff M (1965) Computer aids to protein sequence determination. J Theoret Biol 8(1):97–112CrossRefGoogle Scholar
  4. Dayhoff M (1969) Computer analysis of protein evolution. Sci Am 221(1):86–95CrossRefGoogle Scholar
  5. Davydov E, Batzoglu S (2004) A computational model for RNA multiple structural alignment. In: Proc. 15th Ann. Symp. Combinatorial Pattern Matching, LNCS 3109, pp 254–269Google Scholar
  6. Deng X, Li G, Li Z, Ma B, Wang L (2002) A PTAS for distinguishing (sub)string selection. In: Proc. ICALP’02, pp 740–751Google Scholar
  7. Eddy SR (2001) Noncoding RNA genes and the modern RNA world. Nat Rev Genet 2:919–929CrossRefGoogle Scholar
  8. Goldman D, Istrail S, Papadimitriou C (1999) Algorithmic aspects of protein structure similarity. In: Proc. 40th Ann. Symp. Foundations of Computer Science (FOCS’99), pp 512–522Google Scholar
  9. Greenberg RI (2003) Bounds on the Number of the Longest Common Subsequence Problem. CoRR cs.DM/0301030Google Scholar
  10. Hirschberg D (1975) The longest common subsequence problem. PhD Thesis, Princeton UniversityGoogle Scholar
  11. Hsu WJ, Du MW (1984) Computing a longest common subsequence for a set of strings. BIT 24:45–59CrossRefMathSciNetMATHGoogle Scholar
  12. Jiang T, Li M (1995) On the approximation of shortest common supersequences and longest common subsequences. SIAM J Comput 24(5):1122–1139CrossRefMathSciNetMATHGoogle Scholar
  13. Kubica M, Rizzi R, Vialette S, Walen T (2006) Approximation of RNA multiple structural alignment. In: Proc. 17th Ann Symp Combinatorial Pattern Matching, LNCS 4009, pp 211–222Google Scholar
  14. Lanctot K, Li M, Ma B, Wang S, Zhang L (1999) Distinguishing string selection problems. In: Proc. 6th Ann. ACM-SIAM Symp. on Discrete Algorithms, pp 633–642Google Scholar
  15. Li M, Ma B, Wang L (1999) Finding similar regions in many strings. In: Proc. 31st ACM Symp. on Theory of Computing (STOC’99), pp 473–482Google Scholar
  16. Maier D (1978) The complexity of some problems on subsequences and supersequences. J ACM 25:322–336CrossRefMathSciNetMATHGoogle Scholar
  17. Nussinov R, Pieczenik G, Griggs J, Kleitman D (1978) Algorithms for loop matching. SIAM J Appl Math 35:68–82CrossRefMathSciNetMATHGoogle Scholar
  18. Rick C (2000) Efficient computation of all longest common subsequences. In: Proc. 7th Scandinavian Workshop on Algorithm Theory (SWAT’00), pp 407–418Google Scholar
  19. Rivas E, Eddy SR (1999) A dynamic programming algorithm for RNA structure prediction including pseudoknots. J Mol Biol 285:2053–2068CrossRefGoogle Scholar
  20. Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147:195–197CrossRefGoogle Scholar
  21. Tsai Y-T (2003) The constrained longest common subsequence problem. Inform Proc Lett 88(4):173–176Google Scholar
  22. Zucker M, Stiegler P (1981) Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res 9:133–148Google Scholar
  23. Zucker M (1989) Computer prediction of RNA structure. Methods Enzymol 180:262–288CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2006

Authors and Affiliations

  • Sergey Bereg
    • 1
  • Marcin Kubica
    • 2
  • Tomasz Waleń
    • 2
  • Binhai Zhu
    • 3
  1. 1.Department of Computer ScienceUniversity of Texas at DallasRichardsonUSA
  2. 2.Institute of InformaticsWarsaw UniversityBanacha 2Poland
  3. 3.Department of Computer ScienceMontana State UniversityBozemanUSA

Personalised recommendations