Skip to main content

Pattern Discovery in RNA Secondary Structure Using Affix Trees

  • Conference paper
  • First Online:
Combinatorial Pattern Matching (CPM 2003)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2676))

Included in the following conference series:

Abstract

We present an algorithm for finding common secondary structure motifs in a set of unaligned RNA sequences. The basic version of the algorithm takes as input a set of strings representing the secondary structure of the sequences, enumerates a set of candidate secondary structure patterns, and finally reports all those patterns that appear, possibly with variations, in all or most of the sequences of the set. By considering structural information only, the algorithm can be applied to cases where the input sequences do not present any significant similarity. However, sequence information can be added to the algorithm at different levels. Patterns describing RNA secondary structure elements present a peculiar symmetric layout that makes affix trees a suitable indexing structure that significantly accelerates the searching process, by permitting bidirectional search from the middle to the outside of patterns. In case the secondary structure of the input sequences is not available, we show how the algorithm can deal with the uncertainty deriving from prediction methods, or can predict the structure by itself on the fly while searching for patterns, again taking advantage of the information contained in the affix tree built for the sequences. Finally, we present some case studies where the algorithm was able to detect experimentally known RNA stem-loop motifs, either by using predicted structures, or by folding the sequences by itself.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Gesteland, R., Cech, T., Atkins, J., (eds.): The RNA World. Cold Spring Harbor Laboratory Press, New York (1999)

    Google Scholar 

  2. Simons, R., Grumberg-Magnago, M., (eds.): RNA Structure and Function. Cold Spring Harbor Laboratory Press, New York (1998)

    Google Scholar 

  3. Fox, G., Woese, C.: 5s rna secondary structure. Nature 256 (1975) 505–507

    Article  Google Scholar 

  4. Westhof, E., Auffinger, E., Gaspin, C.: Dna and rna structure prediction. In: DNA — Protein Sequence Analysis, Oxford (1996) 255–278

    Google Scholar 

  5. Stephan, W., Parsch, J., Braverman, J.: Comparative sequence analysis and patterns of covariation in rna secondary structures. Genetics 154 (2000) 909–921

    Google Scholar 

  6. Gorodkin, J., Heyer, L., Stormo, G.: Finding common sequence and structure motifs in a set of rna sequences. Nucleic Acids Research 25 (1997) 3724–3732

    Article  Google Scholar 

  7. Gorodkin, J., Stricklin, S., Stormo, G.: Discovering common stem-loop motifs in unaligned rna sequences. Nucleic Acids Research 29 (2001) 2135–2144

    Article  Google Scholar 

  8. Maass, M.: Linear bidirectional on-line construction of affix trees. Proc. of CPM 2000, Lecture Notes in Computer Science 1848 (2000) 320–334

    Google Scholar 

  9. Gusfield, D.: Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, New York (1997)

    MATH  Google Scholar 

  10. Marsan, L., Sagot, M.: Algorithms for extracting structured motifs using a suffix tree with application to promoter and regulatory site consensus identification. Journal of Computational Biology 7 (2000) 345–360

    Article  Google Scholar 

  11. Sagot, M.: Spelling approximate repeated or common motifs using a suffix tree. Lecture Notes in Computer Science 1380 (1998) 111–127

    Chapter  Google Scholar 

  12. Pavesi, G., Mauri, G., Pesole, G.: An algorithm for finding signals of unknown length in dna sequences. Proc. of ISMB’ 01, Bioinformatics 17 (2001) S207–S214

    Google Scholar 

  13. Hertz, G., Hartzell, G., Stormo, G.: Identification of consensus patterns in unaligned dna sequences known to be functionally related. Comput.Appl.Biosci. 6 (1990) 81–92

    Google Scholar 

  14. Hertz, G., Stormo, G.: Identifying dna and protein patterns with statistically significant alignment of multiple sequences. Bioinformatics 15 (1999) 563–577

    Article  Google Scholar 

  15. Zucker, M., Matthews, D.H., Turner, D.H.: Algorithms and thermodynamics for rna secondary structure prediction: a practical guide. In: RNA Biochemistry and Biotechnology, NATO ASI Series, Kluwer Academic Publishers (1999) 11–43

    Google Scholar 

  16. Hofacker, I., Fontana, W., Stadler, P., Bonhoeffer, S., Tacker, M., Schuster, P.: Fast folding and comparison of rna secondary structures. Monatshefte f Chemie 125 (1994) 167–188

    Article  Google Scholar 

  17. Wuchty, S., Fontana, W., Schuster, P.: Complete suboptimal folding of rna and the stability of secondary structures. Biopolymers 49 (1999) 145–165

    Article  Google Scholar 

  18. Ward, J.: Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association 58 (1963) 236–244

    Article  MathSciNet  Google Scholar 

  19. Hentze, M., Kuhn, L.: Molecular control of vertebrate iron metabolism: mrna based regulatory circuits operated by iron, nitric oxide and oxidative stress. Proc. Natl. Acad. Sci. USA 93 (1996) 8175–8182

    Article  Google Scholar 

  20. Williams, A., Marzluff, W.: The sequence of the stem and flanking sequences at the 3’ end of histone mrna are critical determinants for the binding of the stem-loop binding protein. Nucleic Acids Research 23 (1996) 654–662

    Article  Google Scholar 

  21. Walter, A., Turner, D., Kim, J., Lyttle, M., Muller, P., Mathews, D., Zuker, M.: Coaxial stacking of helices enhances binding of oligoribonucleotides. PNAS 91 (1994) 9218–9222

    Article  Google Scholar 

  22. Mathews, D., Sabina, J., Zucker, M., Turner, D.: Expanded sequence dependence of thermodynamic parameters provides robust prediction of rna secondary structure. Journal of Molecular Biology 288 (1999) 911–940

    Article  Google Scholar 

  23. Pain, V.: Initiation of protein synthesis in eukaryotic cells. Eur. J. Biochem. 236 (1996) 747–771

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mauri, G., Pavesi, G. (2003). Pattern Discovery in RNA Secondary Structure Using Affix Trees. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds) Combinatorial Pattern Matching. CPM 2003. Lecture Notes in Computer Science, vol 2676. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44888-8_21

Download citation

  • DOI: https://doi.org/10.1007/3-540-44888-8_21

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40311-1

  • Online ISBN: 978-3-540-44888-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics