Abstract
The comprehensive annotation of non-coding RNAs in newly sequenced genomes is still a largely unsolved problem because many functional RNAs exhibit not only poorly conserved sequences but also large variability in structure. In many cases, such as Y RNAs, vault RNAs, or telomerase RNAs, sequences differ by large insertions or deletions and have only a few small sequence patterns in common.
Here we present fragrep2, a purely sequence-based approach to detect such patterns in complete genomes. A fragrep2 pattern consists of an ordered list of position-specific weight matrices (PWMs) describing short, approximately conserved sequence elements, that are separated by intervals of non-conserved regions of bounded length. The program uses a fractional programming approach to align the PWMs to genomic DNA in order to allow for a bounded number of insertions and deletions in the patterns. These patterns are then combined to significant combinations of PWMs. At this step, a subset of PWMs may be deleted, i.e., have no match in the current region of the genome. The program furthermore estimates p- and E-values for the matches.
We apply fragrep2 to homology searches for RNase MRP, unveiling two previously unidentified matches as well as reproducing the results of two previous surveys. Furthermore, we complement the picture of vertebrate vault RNAs, a class of ncRNAs that has not received much attention so far.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Lowe, T., Eddy, S.: tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucl. Acids Res. 25, 955–964 (1997)
Nawrocki, E.P., Eddy, S.R.: Query-dependent banding for faster RNA similarity searches. PLoS Comp. Biol. 3, 56 (2007), doi:10.1371/journal.pcbi.0030056
Weinberg, Z., Ruzzo, W.L.R.: Sequence-based heuristics for faster annotation of non-coding RNA families. Bioinformatics 22, 35–39 (2006)
Chen, J.L., Blasco, M.A., Greider, C.W.: Secondary structure of vertebrate telomerase RNA. Cell 100, 503–514 (2000)
Mosig, A., Sameith, K., Stadler, P.F.: fragrep: Efficient search for fragmented patterns in genomic sequences. Geno. Prot. Bioinfo. 4, 56–60 (2005)
Xie, M., Mosig, A., Qi, X., Li, Y., Stadler, P.F., Chen, J.L.: Structure and function of the smallest vertebrate telomerase RNA from teleost fish. in preparation
Kel, A.E., Gößling, E., Reuter, I., Cheremushkin, E., Kel-Margoulis, O.V.E.W.: MATCHTM: a tool for searching transcription factor binding sites in DNA sequences. Nucl. Acids Res. 31, 3576–3579 (2003)
Dinkelbach, W.: On nonlinear fractional programming. Manage. Sci. 13, 492–498 (1967)
Schaible, S.: Fractional programming. Z. Operations Res. 27, 39–54 (1983)
Arslan, A.N., Eğecioğlu, Ö.: Efficient algorithms for normalized edit distance. J. Discr. Algorithms 1, 3–20 (2000)
Arslan, A.N., Eğecioğlu, Ö., Pevzner, P.: A new approach to sequence comparison: Normalized sequence alignment. Bioinformatics 17, 327–337 (2001)
Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the aminoacid sequences of two proteins. J. Mol. Biol. 48, 443–452 (1970)
Griffiths-Jones, S., Moxon, S., Marshall, M., Khanna, A., Eddy, S.R., Bateman, A.: Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res. 33, D121–D124 (2005)
Piccinelli, P., Rosenblad, M.A., Samuelsson, T.: Identification and analysis of ribonuclease P and MRP RNA in a broad range of eukaryotes. Nucleic Acids Res. 33, 4485–4495 (2005)
Woodhams, M.D., Stadler, P.F., Penny, D., Collins, L.J.: RNAse MRP and the RNA processing cascade in the eukaryotic ancestor. BMC Evol. Biol. 7, 13 (2007)
van Zon, A., Mossink, M.H., Scheper, R.J., Sonneveld, P., Wiemer, E.A.C.: The vault complex. Cell. Mol. Life Sci. 60, 1828–1837 (2003)
van Zon, A., Mossink, M.H., Schoester, M., Scheffer, G.L., Scheper, R.J., Sonneveld, P., Wiemer, E.A.C.: Multiple human vault RNAs. J. Biol. Chem. 276, 37715–37721 (2001)
Kickhoefer, V.A., Searles, R.P., Kedersha, N.L., Garber, M.E., Johnson, D.L., Rome, L.H.: Vault ribonucleoprotein particles from rat and bullfrog contain a related small RNA that is transcribed by RNA polymerase III. J. Biol. Chem. 268, 7868–7873 (1993)
Vilalta, A., Kickhoefer, V.A., Rome, L.H., Johnson, D.L.: The rat vault RNA gene contains a unique RNA polymerase III promoter composed of both external and internal elements that function synergistically. J. Biol. Chem. 269, 29752–29759 (1994)
Kickhoefer, V.A., Emre, N., Stephen, A.G., Poderycki, M.J., Rome, L.H.: Identification of conserved vault RNA expression elements and a non-expressed mouse vault RNA gene. Gene 309, 65–70 (2003)
Chen, J.L., Greider, C.W.: An emerging consensus for telomerase rna structure. Proc. Natl. Acad. Sci. U S A 101(41), 14683–14684 (2004)
Tzfati, Y., Knight, Z., Roy, J., Blackburn, E.H.: A novel pseudoknot element is essential for the action of a yeast telomerase. Genes & Dev. 17, 1779–1788 (2003)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mosig, A., Chen, J.J.L., Stadler, P.F. (2007). Homology Search with Fragmented Nucleic Acid Sequence Patterns. In: Giancarlo, R., Hannenhalli, S. (eds) Algorithms in Bioinformatics. WABI 2007. Lecture Notes in Computer Science(), vol 4645. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74126-8_31
Download citation
DOI: https://doi.org/10.1007/978-3-540-74126-8_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74125-1
Online ISBN: 978-3-540-74126-8
eBook Packages: Computer ScienceComputer Science (R0)