Journal of Mathematical Biology

, Volume 56, Issue 1–2, pp 201–214 | Cite as

PSSMTS: position specific scoring matrices on tree structures

Article

Abstract

Identifying non-coding RNA regions on the genome using computational methods is currently receiving a lot of attention. In general, it is essentially more difficult than the problem of detecting protein-coding genes because non-coding RNA regions have only weak statistical signals. On the other hand, most functional RNA families have conserved sequences and secondary structures which are characteristic of their molecular function in a cell. These are known as sequence motifs and consensus structures, respectively. In this paper, we propose an improved method which extends a pairwise structural alignment method for RNA sequences to handle position specific scoring matrices and hence to incorporate motifs into structural alignment of RNA sequences. To model sequence motifs, we employ position specific scoring matrices (PSSMs). Experimental results show that PSSMs enable us to find individual RNA families efficiently, especially if we have biological knowledge such as sequence motifs.

Keywords

Structural alignment Position specific scoring matrix Non-coding RNA 

Mathematics Subject Classification (2000)

92B05 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Coventry A., Kleitman D.J. and Berger B. (2004). MSARI: multiple sequence alignments for statistical detection of RNA secondary structure. Proc. Natl Acad. Sci. USA 101(33): 12,102–12,107 CrossRefGoogle Scholar
  2. 2.
    Dalli D., Wilm A., Mainz I. and Steger G. (2006). STRAL: progressive alignment of non-coding RNA using base pairing probability vectors in quadratic time. Bioinformatics 22(13): 1593–1599 CrossRefGoogle Scholar
  3. 3.
    Deng W., Zhu X., Skogerbø G., Zhao Y., Fu Z., Wang Y., He H., Cai L., Sun H., Liu C., Li B., Bai B., Wang J., Jia D., Sun S., He H., Cui Y., Wang Y., Bu D. and Chen R. (2006). Organization of the Caenorhabditis elegans small non-coding transcriptome: genomic features, biogenesis, and expression. Genome Res. 16(1): 20–29 CrossRefGoogle Scholar
  4. 4.
    Do C.B., Mahabhashyam M.S.P., Brudno M. and Batzoglou S. (2005). ProbCons: probabilistic consistency-based multiple sequence alignment. Genome Res. 15(2): 330–340 CrossRefGoogle Scholar
  5. 5.
    Eddy S.R. (1998). Profile hidden Markov models. Bioinformatics 14(9): 755–763 CrossRefGoogle Scholar
  6. 6.
    Eddy S.R. (2001). Non-coding RNA genes and the modern RNA world. Nat. Rev. Genet. 2(12): 919–929 CrossRefGoogle Scholar
  7. 7.
    Eddy S.R. (2002). A memory-efficient dynamic programming algorithm for optimal alignment of a sequence to an RNA secondary structure. BMC Bioinform. 3(1): 18 CrossRefGoogle Scholar
  8. 8.
    Eddy S.R. and Durbin R. (1994). RNA sequence analysis using covariance models. Nucleic Acids Res. 22(11): 2079–2088 CrossRefGoogle Scholar
  9. 9.
    Freyhult E.K., Bollback J.P. and Gardner P.P. (2007). Exploring genomic dark matter: a critical assessment of the performance of homology search methods on noncoding RNA. Genome Res. 17(1): 117–125 CrossRefGoogle Scholar
  10. 10.
    Gautheret D. and Lambert A. (2001). Direct RNA motif definition and identification from multiple sequence alignments using secondary structure profiles. J. Mol. Biol. 313(5): 1003–1011 CrossRefGoogle Scholar
  11. 11.
    Griffiths-Jones S., Moxon S., Marshall M., Khanna A., Eddy S.R. and Bateman A. (2005). Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res. 33(Database issue): D121–D124 CrossRefGoogle Scholar
  12. 12.
    Havgaard J.H., Lyngsø R.B., Stormo G.D. and Gorodkin J. (2005). Pairwise local structural alignment of RNA sequences with sequence similarity less than 40%. Bioinformatics 21(9): 1815–1824 CrossRefGoogle Scholar
  13. 13.
    Höchsmann M., Voss B. and Giegerich R. (2004). Pure multiple RNA secondary structure alignments: a progressive profile approach. IEEE Trans. Comput. Biol. Bioinform. 1(1): 53–62 CrossRefGoogle Scholar
  14. 14.
    Hofacker I.L. (2003). Vienna RNA secondary structure server. Nucleic Acids Res. 31(13): 3429–3431 CrossRefGoogle Scholar
  15. 15.
    Hofacker I.L., Bernhart S.H.F. and Stadler P.F. (2004). Alignment of RNA base pairing probability matrices. Bioinformatics 20(14): 2222–2227 CrossRefGoogle Scholar
  16. 16.
    Hofacker I.L., Fekete M. and Stadler P.F. (2002). Secondary structure prediction for aligned RNA sequences. J. Mol. Biol. 319(5): 1059–1066 CrossRefGoogle Scholar
  17. 17.
    Hüttenhofer A., Brosius J. and Bachellerie J.P. (2002). RNomics: identification and function of small, non-messenger RNAs. Curr. Opin. Chem. Biol. 6(6): 835–843 CrossRefGoogle Scholar
  18. 18.
    Klein R.J. and Eddy S.R. (2003). RSEARCH: finding homologs of single structured RNA sequences. BMC Bioinform. 4(1): 44 CrossRefGoogle Scholar
  19. 19.
    Krogh A., Brown M., Mian I.S., Sjölander K. and Haussler D. (1994). Hidden Markov models in computational biology. Applications to protein modeling. J. Mol. Biol. 235(5): 1501–1531 CrossRefGoogle Scholar
  20. 20.
    Lambert A., Legendre M., Fontaine J.F. and Gautheret D. (2005). Computing expectation values for RNA motifs using discrete convolutions. BMC Bioinform. 6(1): 118 CrossRefGoogle Scholar
  21. 21.
    Mathews D.H. and Turner D.H. (2002). Dynalign: an algorithm for finding the secondary structure common to two RNA sequences. J. Mol. Biol. 317(2): 191–203 CrossRefGoogle Scholar
  22. 22.
    Matsui H., Sato K. and Sakakibara Y. (2005). Pair stochastic tree adjoining grammars for aligning and predicting pseudoknot RNA structures. Bioinformatics 21(11): 2611–2617 CrossRefGoogle Scholar
  23. 23.
    Pedersen J.S., Bejerano G., Siepel A., Rosenbloom K., Lindblad-Toh K., Lander E.S., Kent J., Miller W. and Haussler D. (2006). Identification and classification of conserved RNA secondary structures in the human genome. PLoS Comput. Biol. 2(4): e33 CrossRefGoogle Scholar
  24. 24.
    Rivas E. and Eddy S.R. (2001). Noncoding RNA gene detection using comparative sequence analysis. BMC Bioinform. 2(1): 8 CrossRefGoogle Scholar
  25. 25.
    Sakakibara Y. (2003). Pair hidden Markov models on tree structures. Bioinformatics 19(suppl 1): i232–i240 CrossRefMathSciNetGoogle Scholar
  26. 26.
    Sakakibara Y., Brown M., Hughey R., Mian I.S., Sjölander K., Underwood R.C. and Haussler D. (1994). Stochastic context-free grammars for tRNA modeling. Nucleic Acids Res. 22(23): 5112–5120 CrossRefGoogle Scholar
  27. 27.
    Sankoff D. (1985). Simultaneous solution of the RNA folding, alignment and protosequence problems. SIAM J. Appl. Math. 45(5): 810–825 MATHCrossRefMathSciNetGoogle Scholar
  28. 28.
    Sato K. and Sakakibara Y. (2005). RNA secondary structural alignment with conditional random fields. Bioinformatics 21(suppl 2): ii237–ii242 CrossRefGoogle Scholar
  29. 29.
    Schattner P. (2002). Searching for RNA genes using base-composition statistics. Nucleic Acids Res. 30(9): 2076–2082 CrossRefGoogle Scholar
  30. 30.
    Torarinsson E., Havgaard J.H. and Gorodkin J. (2007). Multiple structural alignment and clustering of RNA sequences. Bioinformatics 23: 926–932 CrossRefGoogle Scholar
  31. 31.
    Uzilov A., Keegan J. and Mathews D. (2006). Detection of non-coding RNAs on the basis of predicted secondary structure formation free energy change. BMC Bioinform. 7(1): 173 CrossRefGoogle Scholar
  32. 32.
    Wachi M., Ogawa T., Yokoyama K., Hokii Y., Shimoyama M., Muto A. and Ushida C. (2004). Isolation of eight novel Caenorhabditis elegans small RNAs. Gene 335: 47–56 CrossRefGoogle Scholar
  33. 33.
    Washietl S., Hofacker I.L. and Stadler P.F. (2005). Fast and reliable prediction of noncoding RNAs. Proc. Natl. Acad. Sci. USA 102(7): 2454–2459 CrossRefGoogle Scholar
  34. 34.
    Zuker M. and Stiegler P. (1981). Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res. 9(1): 133–148 CrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2007

Authors and Affiliations

  • Kengo Sato
    • 1
  • Kensuke Morita
    • 2
  • Yasubumi Sakakibara
    • 2
  1. 1.Japan Biological Informatics ConsortiumTokyoJapan
  2. 2.Department of Biosciences and InformaticsKeio UniversityYokohamaJapan

Personalised recommendations