Skip to main content

Structural and Functional Annotation of Long Noncoding RNAs

  • Protocol
  • First Online:
Bioinformatics

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1526))

Abstract

Protein-coding RNAs represent only a small fraction of the transcriptional output in higher eukaryotes. The remaining RNA species encompass a broad range of molecular functions and regulatory roles, a consequence of the structural polyvalence of RNA polymers. Albeit several classes of small noncoding RNAs are relatively well characterized, the accessibility of affordable high-throughput sequencing is generating a wealth of novel, unannotated transcripts, especially long noncoding RNAs (lncRNAs) that are derived from genomic regions that are antisense, intronic, intergenic, and overlapping protein-coding loci. Parsing and characterizing the functions of noncoding RNAs—lncRNAs in particular—is one of the great challenges of modern genome biology. Here we discuss concepts and computational methods for the identification of structural domains in lncRNAs from genomic and transcriptomic data. In the first part, we briefly review how to identify RNA structural motifs in individual lncRNAs. In the second part, we describe how to leverage the evolutionary dynamics of structured RNAs in a computationally efficient screen to detect putative functional lncRNA motifs using comparative genomics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Liu G, Mattick JS, Taft RJ (2013) A meta-analysis of the genomic and transcriptomic composition of complex life. Cell Cycle 12(13):2061–2072

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Taft RJ, Pheasant M, Mattick JS (2007) The relationship between non-protein-coding DNA and eukaryotic complexity. Bioessays 29(3):288–299

    Article  CAS  PubMed  Google Scholar 

  3. Djebali S, Davis CA, Merkel A et al (2012) Landscape of transcription in human cells. Nature 489(7414):101–108

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Mercer TR, Gerhardt DJ, Dinger ME et al (2012) Targeted RNA sequencing reveals the deep complexity of the human transcriptome. Nat Biotechnol 30(1):99–104

    Article  CAS  Google Scholar 

  5. Morris KV, Mattick JS (2014) The rise of regulatory RNA. Nat Rev Genet 15(6):423–437

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Fatica A, Bozzoni I (2014) Long non-coding RNAs: new players in cell differentiation and development. Nat Rev Genet 15(1):7–21

    Article  CAS  PubMed  Google Scholar 

  7. Mattick JS (1994) Introns: evolution and function. Curr Opin Genet Dev 4(6):823–831

    Article  CAS  PubMed  Google Scholar 

  8. Mattick JS (2001) Non-coding RNAs: the architects of eukaryotic complexity. EMBO Rep 2(11):986–991

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Mattick JS (2011) The central role of RNA in human development and cognition. FEBS Lett 585(11):1600–1616

    Article  CAS  PubMed  Google Scholar 

  10. Mattick JS (2010) RNA as the substrate for epigenome-environment interactions: RNA guidance of epigenetic processes and the expansion of RNA editing in animals underpins development, phenotypic plasticity, learning, and cognition. Bioessays 32(7):548–552

    Article  CAS  PubMed  Google Scholar 

  11. Ezkurdia I, Juan D, Rodriguez JM et al (2014) Multiple evidence strands suggest that there may be as few as 19,000 human protein-coding genes. Hum Mol Genet 23(22):5866–5878

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Gascoigne DK, Cheetham SW, Cattenoz PB et al (2012) Pinstripe: a suite of programs for integrating transcriptomic and proteomic datasets identifies novel proteins and improves differentiation of protein-coding and non-coding genes. Bioinformatics 28(23):3042–3050

    Article  CAS  PubMed  Google Scholar 

  13. Mercer TR, Mattick JS (2013) Structure and function of long noncoding RNAs in epigenetic regulation. Nat Struct Mol Biol 20(3):300–307

    Article  CAS  PubMed  Google Scholar 

  14. Koziol MJ, Rinn JL (2010) RNA traffic control of chromatin complexes. Curr Opin Genet Dev 20(2):142–148

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Mattick JS, Amaral PP, Dinger ME et al (2009) RNA regulation of epigenetic processes. Bioessays 31(1):51–59

    Article  CAS  PubMed  Google Scholar 

  16. Wang KC, Chang HY (2011) Molecular mechanisms of long noncoding RNAs. Mol Cell 43(6):904–914

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Li L, Chang HY (2014) Physiological roles of long noncoding RNAs: insight from knockout mice. Trends Cell Biol 24(10):594–602

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Mattick JS (2009) The genetic signatures of noncoding RNAs. PLoS Genet 5(4):e1000459

    Article  PubMed  PubMed Central  Google Scholar 

  19. Quek XC, Thomson DW, Maag JL et al (2014) lncRNAdb v2.0: expanding the reference database for functional long noncoding RNAs. Nucleic Acids Res 43:D168–D173. doi:10.1093/nar/gku988

    Article  PubMed  PubMed Central  Google Scholar 

  20. Sauvageau M, Goff LA, Lodato S et al (2013) Multiple knockout mouse models reveal lincRNAs are required for life and brain development. Elife 2:e01749

    Article  PubMed  PubMed Central  Google Scholar 

  21. Rinn JL, Kertesz M, Wang JK et al (2007) Functional demarcation of active and silent chromatin domains in human HOX loci by noncoding RNAs. Cell 129(7):1311–1323

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Wang KC, Yang YW, Liu B et al (2011) A long noncoding RNA maintains active chromatin to coordinate homeotic gene expression. Nature 472(7341):120–124

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Ulitsky I, Shkumatava A, Jan CH et al (2011) Conserved function of lincRNAs in vertebrate embryonic development despite rapid sequence evolution. Cell 147(7):1537–1550

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Johnsson P, Lipovich L, Grander D et al (2014) Evolutionary conservation of long non-coding RNAs; sequence, structure, function. Biochim Biophys Acta 1840(3):1063–1071

    Article  CAS  PubMed  Google Scholar 

  25. Bejerano G, Haussler D, Blanchette M (2004) Into the heart of darkness: large-scale clustering of human non-coding DNA. Bioinformatics 20(Suppl 1):i40–i48

    Article  CAS  PubMed  Google Scholar 

  26. Calin GA, Liu CG, Ferracin M et al (2007) Ultraconserved regions encoding ncRNAs are altered in human leukemias and carcinomas. Cancer Cell 12(3):215–229

    Article  CAS  PubMed  Google Scholar 

  27. Stephen S, Pheasant M, Makunin IV et al (2008) Large-scale appearance of ultraconserved elements in tetrapod genomes and slowdown of the molecular clock. Mol Biol Evol 25(2):402–408

    Article  CAS  PubMed  Google Scholar 

  28. Kapusta A, Kronenberg Z, Lynch VJ et al (2013) Transposable elements are major contributors to the origin, diversification, and regulation of vertebrate long noncoding RNAs. PLoS Genet 9(4):e1003470

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Matylla-Kulinska K, Tafer H, Weiss A et al (2014) Functional repeat-derived RNAs often originate from retrotransposon-propagated ncRNAs. Wiley Interdiscip Rev RNA 5(5):591–600

    CAS  PubMed  PubMed Central  Google Scholar 

  30. Smith M, Bringaud F, Papadopoulou B (2009) Organization and evolution of two SIDER retroposon subfamilies and their impact on the Leishmania genome. BMC Genomics 10:240

    Article  PubMed  PubMed Central  Google Scholar 

  31. Stombaugh J, Zirbel CL, Westhof E et al (2009) Frequency and isostericity of RNA base pairs. Nucleic Acids Res 37(7):2294–2312

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Cruz JA, Westhof E (2009) The dynamic landscapes of RNA architecture. Cell 136(4):604–609

    Article  CAS  PubMed  Google Scholar 

  33. Smith MA, Gesell T, Stadler PF et al (2013) Widespread purifying selection on RNA structure in mammals. Nucleic Acids Res 41(17):8220–8236

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Trapnell C, Hendrickson DG, Sauvageau M et al (2013) Differential analysis of gene regulation at transcript resolution with RNA-seq. Nat Biotechnol 31(1):46–53

    Article  CAS  PubMed  Google Scholar 

  35. Haas BJ, Papanicolaou A, Yassour M et al (2013) De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat Protoc 8(8):1494–1512

    Article  CAS  PubMed  Google Scholar 

  36. Flicek P, Amode MR, Barrell D et al (2014) Ensembl 2014. Nucleic Acids Res 42(Database issue):D749–D755

    Article  CAS  PubMed  Google Scholar 

  37. Karolchik D, Barber GP, Casper J et al (2014) The UCSC Genome Browser database: 2014 update. Nucleic Acids Res 42(Database issue):D764–D770

    Article  CAS  PubMed  Google Scholar 

  38. Goecks J, Nekrutenko A, Taylor J et al (2010) Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol 11(8):R86

    Article  PubMed  PubMed Central  Google Scholar 

  39. Paten B, Herrero J, Beal K et al (2008) Enredo and Pecan: genome-wide mammalian consistency-based multiple alignment with paralogs. Genome Res 18(11):1814–1828

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Dewey CN (2007) Aligning multiple whole genomes with Mercator and MAVID. Methods Mol Biol 395:221–236

    Article  CAS  PubMed  Google Scholar 

  41. Blanchette M, Kent WJ, Riemer C et al (2004) Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res 14(4):708–715

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Blankenberg D, Taylor J, Nekrutenko A et al (2011) Making whole genome multiple alignments usable for biologists. Bioinformatics 27(17):2426–2428

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Ilott NE, Ponting CP (2013) Predicting long non-coding RNAs using RNA sequencing. Methods 63(1):50–59

    Article  CAS  PubMed  Google Scholar 

  44. Dinger ME, Pang KC, Mercer TR et al (2008) Differentiating protein-coding and noncoding RNA: challenges and ambiguities. PLoS Comput Biol 4(11):e1000176

    Article  PubMed  PubMed Central  Google Scholar 

  45. Burge SW, Daub J, Eberhardt R et al (2013) Rfam 11.0: 10 years of RNA families. Nucleic Acids Res 41(Database issue):D226–D232

    Article  CAS  PubMed  Google Scholar 

  46. Altschul SF, Madden TL, Schaffer AA et al (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Eddy SR (1996) Hidden Markov models. Curr Opin Struct Biol 6(3):361–365

    Article  CAS  PubMed  Google Scholar 

  48. Krogh A, Brown M, Mian IS et al (1994) Hidden Markov models in computational biology. Applications to protein modeling. J Mol Biol 235(5):1501–1531

    Article  CAS  PubMed  Google Scholar 

  49. Dayhoff MO, Schwartz RM, Orcutt BC (1978) A model of evolutionary change in proteins. National Biomedical Research Foundation, Washington, DC

    Google Scholar 

  50. Henikoff S, Henikoff JG (1992) Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A 89(22):10915–10919

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Griffiths-Jones S, Bateman A, Marshall M et al (2003) Rfam: an RNA family database. Nucleic Acids Res 31(1):439–441

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Nawrocki EP, Burge SW, Bateman A et al (2014) Rfam 12.0: updates to the RNA families database. Nucleic Acids Res 43:D130–D137. doi:10.1093/nar/gku1063

    Article  PubMed  PubMed Central  Google Scholar 

  53. Gardner PP, Eldai H (2014) Annotating RNA motifs in sequences and alignments. Nucleic Acids Res 43:691–698. doi:10.1093/nar/gku1327

    Article  PubMed  PubMed Central  Google Scholar 

  54. Nawrocki EP, Eddy SR (2013) Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 29(22):2933–2935

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Griffiths-Jones S (2005) Annotating non-coding RNAs with Rfam. Curr Protoc Bioinformatics Chapter 12, Unit 12.15

    Google Scholar 

  56. Macke TJ, Ecker DJ, Gutell RR et al (2001) RNAMotif, an RNA secondary structure definition and search algorithm. Nucleic Acids Res 29(22):4724–4735

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Will S, Siebauer MF, Heyne S et al (2013) LocARNAscan: incorporating thermodynamic stability in sequence and structure-based RNA homology search. Algorithms Mol Biol 8:14

    Article  PubMed  PubMed Central  Google Scholar 

  58. Lorenz R, Bernhart SH, Honer Zu Siederdissen C et al (2011) ViennaRNA Package 2.0. Algorithms Mol Biol 6:26

    Article  PubMed  PubMed Central  Google Scholar 

  59. Markham NR, Zuker M (2008) UNAFold: software for nucleic acid folding and hybridization. Methods Mol Biol 453:3–31

    Article  CAS  PubMed  Google Scholar 

  60. Mathews DH (2004) Using an RNA secondary structure partition function to determine confidence in base pairs predicted by free energy minimization. RNA 10(8):1178–1190

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Mathews DH, Disney MD, Childs JL et al (2004) Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure. Proc Natl Acad Sci U S A 101(19):7287–7292

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Hamada M, Kiryu H, Sato K et al (2009) Prediction of RNA secondary structure using generalized centroid estimators. Bioinformatics 25(4):465–473

    Article  CAS  PubMed  Google Scholar 

  63. Gruber AR, Lorenz R, Bernhart SH et al (2008) The Vienna RNA websuite. Nucleic Acids Res 36(Web Server issue):W70–W74

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Lange SJ, Maticzka D, Mohl M et al (2012) Global or local? Predicting secondary structure and accessibility in mRNAs. Nucleic Acids Res 40(12):5215–5226

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Wan XF, Lin G, Xu D (2006) Rnall: an efficient algorithm for predicting RNA local secondary structural landscape in genomes. J Bioinform Comput Biol 4(5):1015–1031

    Article  CAS  PubMed  Google Scholar 

  66. Soldatov RA, Vinogradova SV, Mironov AA (2014) RNASurface: fast and accurate detection of locally optimal potentially structured RNA segments. Bioinformatics 30(4):457–463

    Article  CAS  PubMed  Google Scholar 

  67. Seffens W, Digby D (1999) mRNAs have greater negative folding free energies than shuffled or codon choice randomized sequences. Nucleic Acids Res 27(7):1578–1584

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Chen JH, Le SY, Shapiro B et al (1990) A computational procedure for assessing the significance of RNA secondary structure. Comput Appl Biosci 6(1):7–18

    PubMed  Google Scholar 

  69. Le SY, Maizel JV Jr (1989) A method for assessing the statistical significance of RNA folding. J Theor Biol 138(4):495–510

    Article  CAS  PubMed  Google Scholar 

  70. Rivas E, Eddy SR (2000) Secondary structure alone is generally not statistically significant for the detection of noncoding RNAs. Bioinformatics 16(7):583–605

    Article  CAS  PubMed  Google Scholar 

  71. Bonnet E, Wuyts J, Rouze P et al (2004) Evidence that microRNA precursors, unlike other non-coding RNAs, have lower folding free energies than random sequences. Bioinformatics 20(17):2911–2917

    Article  CAS  PubMed  Google Scholar 

  72. Clote P, Ferre F, Kranakis E et al (2005) Structural RNA has lower folding energy than random RNA of the same dinucleotide frequency. RNA 11(5):578–591

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Kavanaugh LA, Dietrich FS (2009) Non-coding RNA prediction and verification in Saccharomyces cerevisiae. PLoS Genet 5(1):e1000321

    Article  PubMed  PubMed Central  Google Scholar 

  74. Kutter C, Watt S, Stefflova K et al (2012) Rapid turnover of long noncoding RNAs and the evolution of gene expression. PLoS Genet 8(7):e1002841

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. Sievers F, Higgins DG (2014) Clustal Omega, accurate alignment of very large numbers of sequences. Methods Mol Biol 1079:105–116

    Article  CAS  PubMed  Google Scholar 

  76. Katoh K, Standley DM (2014) MAFFT: iterative refinement and additional methods. Methods Mol Biol 1079:131–146

    Article  PubMed  Google Scholar 

  77. Gorodkin J, Hofacker IL (2011) From structure prediction to genomic screens for novel non-coding RNAs. PLoS Comput Biol 7(8):e1002100

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  78. Gruber AR, Findeiss S, Washietl S et al (2010) RNAz 2.0: improved noncoding RNA detection. Pac Symp Biocomput, 69–79

    Google Scholar 

  79. Parker BJ, Moltke I, Roth A et al (2011) New families of human regulatory RNA structures identified by comparative analysis of vertebrate genomes. Genome Res 21(11):1929–1943

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  80. Pedersen JS, Bejerano G, Siepel A et al (2006) Identification and classification of conserved RNA secondary structures in the human genome. PLoS Comput Biol 2(4):e33

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  81. Li JH, Liu S, Zhou H et al (2014) starBase v2.0: decoding miRNA-ceRNA, miRNA-ncRNA and protein-RNA interaction networks from large-scale CLIP-Seq data. Nucleic Acids Res 42(Database issue):D92–D97

    Article  CAS  PubMed  Google Scholar 

  82. Sorescu DA, Mohl M, Mann M et al (2012) CARNA—alignment of RNA structure ensembles. Nucleic Acids Res 40(Web Server issue):W49–W53

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  83. Will S, Reiche K, Hofacker IL et al (2007) Inferring noncoding RNA families and classes by means of genome-scale structure-based clustering. PLoS Comput Biol 3(4):e65

    Article  PubMed  PubMed Central  Google Scholar 

  84. Havgaard J, Kaur S, Gorodkin J (2012) Comparative ncRNA gene and structure prediction using Foldalign and FoldalignM. Curr Protoc Bioinformatics Chapter 12, Unit12.11

    Google Scholar 

  85. Torarinsson E, Havgaard JH, Gorodkin J (2007) Multiple structural alignment and clustering of RNA sequences. Bioinformatics 23(8):926–932

    Article  CAS  PubMed  Google Scholar 

  86. Heyne S, Costa F, Rose D et al (2012) GraphClust: alignment-free structural clustering of local RNA secondary structures. Bioinformatics 28(12):i224–i232

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  87. Liu Q, Olman V, Liu H et al (2008) RNACluster: an integrated tool for RNA secondary structure comparison and clustering. J Comput Chem 29(9):1517–1526

    Article  CAS  PubMed  Google Scholar 

  88. Middleton SA, Kim J (2014) NoFold: RNA structure clustering without folding or alignment. RNA 20(11):1671–1683

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  89. Reiche K, Stadler PF (2007) RNAstrand: reading direction of structured RNAs in multiple sequence alignments. Algorithms Mol Biol 2:6

    Article  PubMed  PubMed Central  Google Scholar 

  90. Paten B, Herrero J, Fitzgerald S et al (2008) Genome-wide nucleotide-level mammalian ancestor reconstruction. Genome Res 18(11):1829–1843

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Martin A. Smith .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media New York

About this protocol

Cite this protocol

Smith, M.A., Mattick, J.S. (2017). Structural and Functional Annotation of Long Noncoding RNAs. In: Keith, J. (eds) Bioinformatics. Methods in Molecular Biology, vol 1526. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-6613-4_4

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-6613-4_4

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-6611-0

  • Online ISBN: 978-1-4939-6613-4

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics