Structural Alignment of Pseudoknotted RNA
In this paper, we address the problem of discovering novel non-coding RNA (ncRNA) using primary sequence, and secondary structure conservation, focusing on ncRNA families with pseudo-knotted structures. Our main technical result is an efficient algorithm for computing an optimum structural alignment of an RNA sequence against a genomic substring. This algorithm finds two applications. First, by scanning a genome, we can identify novel (homologous) pseudoknotted ncRNA, and second, we can infer the secondary structure of the target aligned sequence. We test an implementation of our algorithm (Pal), and show that it has near-perfect behavior for predicting the structure of many known pseudoknots. Additionally, it can detect the true homologs with high sensitivity and specificity in controlled tests. We also use Pal to search entire viral genome and mouse genome for novel homologs of some viral, and eukaryotic pseudoknots respectively. In each case, we have found strong support for novel homologs.
Unable to display preview. Download preview PDF.
- 12.Rastogi, T., Beattie, T.L., Olive, J.E., Collins, R.A.: A long-range pseudoknot is required for activity of the Neurospora VS ribozyme. EMBO J. 15, 2820–2825 (1996)Google Scholar
- 18.Evans, P.: Algorithms and Complexity for Annotated Sequence Analysis. PhD thesis, University of Victoria, Victoria BC, Canada (1964)Google Scholar
- 27.Weinberg, Z., Ruzzo, W.L.: Faster genome annotation of non-coding rna families without loss of accuracy. In: Proceedings of the Annual Intl. Conference on Computational Biology (RECOMB) (2004)Google Scholar
- 28.Zhang, S., Borovok, I., Aharonowitz, Y., Sharan, R., Bafna, V.: A Sequence-Based Filtering Method for ncRNA Identification and its Application to Searching for Riboswitch Elements (manuscript, 2005)Google Scholar
- 30.Williams, G.D., Chang, R.Y., Brian, D.A.: A phylogenetically conserved hairpin-type 3’ untranslated region pseudoknot functions in coronavirus RNA replication. J. Virol. 73, 8349–8355 (1999)Google Scholar