Abstract
Identifying non-coding RNA regions on the genome using computational methods is currently receiving a lot of attention. In general, it is essentially more difficult than the problem of detecting protein-coding genes because non-coding RNA regions have only weak statistical signals. On the other hand, most functional RNA families have conserved sequences and secondary structures which are characteristic of their molecular function in a cell. These are known as sequence motifs and consensus structures, respectively. In this paper, we propose an improved method which extends a pairwise structural alignment method for RNA sequences to handle position specific scoring matrices and hence to incorporate motifs into structural alignment of RNA sequences. To model sequence motifs, we employ position specific scoring matrices (PSSMs). Experimental results show that PSSMs enable us to find individual RNA families efficiently, especially if we have biological knowledge such as sequence motifs.
Similar content being viewed by others
References
Coventry A., Kleitman D.J. and Berger B. (2004). MSARI: multiple sequence alignments for statistical detection of RNA secondary structure. Proc. Natl Acad. Sci. USA 101(33): 12,102–12,107
Dalli D., Wilm A., Mainz I. and Steger G. (2006). STRAL: progressive alignment of non-coding RNA using base pairing probability vectors in quadratic time. Bioinformatics 22(13): 1593–1599
Deng W., Zhu X., Skogerbø G., Zhao Y., Fu Z., Wang Y., He H., Cai L., Sun H., Liu C., Li B., Bai B., Wang J., Jia D., Sun S., He H., Cui Y., Wang Y., Bu D. and Chen R. (2006). Organization of the Caenorhabditis elegans small non-coding transcriptome: genomic features, biogenesis, and expression. Genome Res. 16(1): 20–29
Do C.B., Mahabhashyam M.S.P., Brudno M. and Batzoglou S. (2005). ProbCons: probabilistic consistency-based multiple sequence alignment. Genome Res. 15(2): 330–340
Eddy S.R. (1998). Profile hidden Markov models. Bioinformatics 14(9): 755–763
Eddy S.R. (2001). Non-coding RNA genes and the modern RNA world. Nat. Rev. Genet. 2(12): 919–929
Eddy S.R. (2002). A memory-efficient dynamic programming algorithm for optimal alignment of a sequence to an RNA secondary structure. BMC Bioinform. 3(1): 18
Eddy S.R. and Durbin R. (1994). RNA sequence analysis using covariance models. Nucleic Acids Res. 22(11): 2079–2088
Freyhult E.K., Bollback J.P. and Gardner P.P. (2007). Exploring genomic dark matter: a critical assessment of the performance of homology search methods on noncoding RNA. Genome Res. 17(1): 117–125
Gautheret D. and Lambert A. (2001). Direct RNA motif definition and identification from multiple sequence alignments using secondary structure profiles. J. Mol. Biol. 313(5): 1003–1011
Griffiths-Jones S., Moxon S., Marshall M., Khanna A., Eddy S.R. and Bateman A. (2005). Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res. 33(Database issue): D121–D124
Havgaard J.H., Lyngsø R.B., Stormo G.D. and Gorodkin J. (2005). Pairwise local structural alignment of RNA sequences with sequence similarity less than 40%. Bioinformatics 21(9): 1815–1824
Höchsmann M., Voss B. and Giegerich R. (2004). Pure multiple RNA secondary structure alignments: a progressive profile approach. IEEE Trans. Comput. Biol. Bioinform. 1(1): 53–62
Hofacker I.L. (2003). Vienna RNA secondary structure server. Nucleic Acids Res. 31(13): 3429–3431
Hofacker I.L., Bernhart S.H.F. and Stadler P.F. (2004). Alignment of RNA base pairing probability matrices. Bioinformatics 20(14): 2222–2227
Hofacker I.L., Fekete M. and Stadler P.F. (2002). Secondary structure prediction for aligned RNA sequences. J. Mol. Biol. 319(5): 1059–1066
Hüttenhofer A., Brosius J. and Bachellerie J.P. (2002). RNomics: identification and function of small, non-messenger RNAs. Curr. Opin. Chem. Biol. 6(6): 835–843
Klein R.J. and Eddy S.R. (2003). RSEARCH: finding homologs of single structured RNA sequences. BMC Bioinform. 4(1): 44
Krogh A., Brown M., Mian I.S., Sjölander K. and Haussler D. (1994). Hidden Markov models in computational biology. Applications to protein modeling. J. Mol. Biol. 235(5): 1501–1531
Lambert A., Legendre M., Fontaine J.F. and Gautheret D. (2005). Computing expectation values for RNA motifs using discrete convolutions. BMC Bioinform. 6(1): 118
Mathews D.H. and Turner D.H. (2002). Dynalign: an algorithm for finding the secondary structure common to two RNA sequences. J. Mol. Biol. 317(2): 191–203
Matsui H., Sato K. and Sakakibara Y. (2005). Pair stochastic tree adjoining grammars for aligning and predicting pseudoknot RNA structures. Bioinformatics 21(11): 2611–2617
Pedersen J.S., Bejerano G., Siepel A., Rosenbloom K., Lindblad-Toh K., Lander E.S., Kent J., Miller W. and Haussler D. (2006). Identification and classification of conserved RNA secondary structures in the human genome. PLoS Comput. Biol. 2(4): e33
Rivas E. and Eddy S.R. (2001). Noncoding RNA gene detection using comparative sequence analysis. BMC Bioinform. 2(1): 8
Sakakibara Y. (2003). Pair hidden Markov models on tree structures. Bioinformatics 19(suppl 1): i232–i240
Sakakibara Y., Brown M., Hughey R., Mian I.S., Sjölander K., Underwood R.C. and Haussler D. (1994). Stochastic context-free grammars for tRNA modeling. Nucleic Acids Res. 22(23): 5112–5120
Sankoff D. (1985). Simultaneous solution of the RNA folding, alignment and protosequence problems. SIAM J. Appl. Math. 45(5): 810–825
Sato K. and Sakakibara Y. (2005). RNA secondary structural alignment with conditional random fields. Bioinformatics 21(suppl 2): ii237–ii242
Schattner P. (2002). Searching for RNA genes using base-composition statistics. Nucleic Acids Res. 30(9): 2076–2082
Torarinsson E., Havgaard J.H. and Gorodkin J. (2007). Multiple structural alignment and clustering of RNA sequences. Bioinformatics 23: 926–932
Uzilov A., Keegan J. and Mathews D. (2006). Detection of non-coding RNAs on the basis of predicted secondary structure formation free energy change. BMC Bioinform. 7(1): 173
Wachi M., Ogawa T., Yokoyama K., Hokii Y., Shimoyama M., Muto A. and Ushida C. (2004). Isolation of eight novel Caenorhabditis elegans small RNAs. Gene 335: 47–56
Washietl S., Hofacker I.L. and Stadler P.F. (2005). Fast and reliable prediction of noncoding RNAs. Proc. Natl. Acad. Sci. USA 102(7): 2454–2459
Zuker M. and Stiegler P. (1981). Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res. 9(1): 133–148
Author information
Authors and Affiliations
Corresponding author
Additional information
K. Sato and K. Morita contributed equally to this work.
Rights and permissions
About this article
Cite this article
Sato, K., Morita, K. & Sakakibara, Y. PSSMTS: position specific scoring matrices on tree structures. J. Math. Biol. 56, 201–214 (2008). https://doi.org/10.1007/s00285-007-0108-4
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00285-007-0108-4