Bioinformatics Tools for Next-Generation RNA Sequencing Analysis

  • Marco Marconi
  • Julio Rodriguez-Romero
  • Ane Sesma
  • Mark D. Wilkinson


The purpose of this chapter is to introduce the reader to some of the most popular bioinformatics tools and resources available for RNA analysis. The introduction of RNA next-generation sequencing led to an explosion in the amount of quantitative transcript sequence data, which necessitated the development of adequate tools to process and make a sense of these rich and complex datasets. A large number of programs, platforms, and databases dedicated to RNA analysis have been produced over the past approximately 20 years; however, like so much other bioinformatics software, only a small portion of them are still available and in-use. As such, we will focus only on those tools and applications still in common use. This chapter is composed of three sections: the description of the general protocols for RNA sequence (generically called RNA-Seq) analyses, an outline of the most common approaches to map polyadenylation sites, and a brief introduction to noncoding RNA (ncRNA) analysis. The first section will describe the composition of steps within a typical RNA-Seq study: the experimental design, the sequencing methods, the data quality control, the read mapping, and the differential expression analysis. The second section will introduce a few recent methods developed to map polyadenylation sites: the experimental protocols (which are variations of RNA-Seq), polyadenylation site databases and prediction programs, and cis-regulatory elements discovery. The third and final section will present several of the ncRNA databases and prediction tools.


Transcriptome Assembly Polyadenylation Site Efficiency Element Burrows Wheeler Transform ncRNA Family 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Ahmed F, Kumar M, Raghava GPS (2009) Prediction of polyadenylation signals in human DNA sequences using nucleotide frequencies. In Silico Biol 9:135–148PubMedGoogle Scholar
  2. Amaral PP, Clark MB, Gascoigne DK, Dinger ME, Mattick JS (2011) lncRNAdb: a reference database for long noncoding RNAs. Nucleic Acids Res 39:D146–D151PubMedCentralPubMedCrossRefGoogle Scholar
  3. Amaral PP, Dinger ME, Mercer TR, Mattick JS (2008) The eukaryotic genome as an RNA machine. Sci (NY) 319:1787–1789CrossRefGoogle Scholar
  4. Anders S, Huber W (2010) Differential expression analysis for sequence count data. Genome Biol 11(10):R106. doi: 10.1186/gb-2010-11-10-r106 PubMedCentralPubMedCrossRefGoogle Scholar
  5. Arvas M, Pakula T, Smit B, Rautio J, Koivistoinen H, Jouhten P, Lindfors E, Wiebe M, Penttila M, Saloheimo M (2011) Correlation of gene expression and protein production rate—a system wide study. BMC Genom 12:616. doi: 10.1186/1471-2164-12-616 CrossRefGoogle Scholar
  6. Bailey TL, Boden M, Buske FA, Frith M, Grant CE, Clementi L, Ren J, Li WW, Noble WS (2009) MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res 37:W202–W208PubMedCentralPubMedCrossRefGoogle Scholar
  7. Beck AH, Weng Z, Witten DM, Zhu S, Foley JW, Lacroute P, Smith CL, Tibshirani R, van de Rijn M, Sidow A, West RB (2010) 3rd end. Sequencing for expression quantification (3SEQ) from archival tumor samples. PLoS ONE 5:e8768PubMedCentralPubMedCrossRefGoogle Scholar
  8. Bland C, Ramsey TL, Sabree F, Lowe M, Brown K, Kyrpides NC, Hugenholtz P (2007) CRISPR recognition tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats. BMC Bioinformatics 8:209. doi: 10.1186/1471-2105-8-209
  9. Brockman JM, Singh P, Liu D, Quinlan S, Salisbury J, Graber JH (2005) PACdb: PolyA cleavage site and 3′-UTR database. Bioinformatics, vol 21. Oxford, England, p 3691–3693Google Scholar
  10. Bu D, Yu K, Sun S, Xie C, Skogerb\o G, Miao R, Xiao H, Liao Q, Luo H, Zhao G, Zhao H, Liu Z, Liu C, Chen R, Zhao Y (2012) NONCODE v3.0: integrative annotation of long noncoding RNAs. Nucleic Acids Res 40:D210–215Google Scholar
  11. Burge SW, Daub J, Eberhardt R, Tate J, Barquist L, Nawrocki EP, Eddy SR, Gardner PP, Bateman A (2013) Rfam 11.0: 10 years of RNA families. Nucleic Acids Res 41:D226–232Google Scholar
  12. Cock PJ, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, Friedberg I, Hamelryck T, Kauff F, Wilczynski B, de Hoon MJ (2009) Biopython: freely available python tools for computational molecular biology and bioinformatics. Bioinformatics 25(11):1422–1423. doi: 10.1093/bioinformatics/btp163 PubMedCentralPubMedCrossRefGoogle Scholar
  13. Cock PJ, Fields CJ, Goto N, Heuer ML, Rice PM (2010) The sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res 38(6):1767–1771. doi: 10.1093/nar/gkp1137 PubMedCentralPubMedCrossRefGoogle Scholar
  14. Chen LL, Carmichael GG (2010) Long noncoding RNAs in mammalian cells: what, where, and why? Wiley Interdiscip Rev RNA 1:2–21PubMedCrossRefGoogle Scholar
  15. Cheng Y, Miura RM, Tian B (2006) Prediction of mRNA polyadenylation sites by support vector machine. Bioinformatics (Oxford, England) 22:2320–2325Google Scholar
  16. David M, Dzamba M, Lister D, Ilie L, Brudno M (2011) SHRiMP2: sensitive yet practical SHort Read Mapping. Bioinformatics 27(7):1011–1012. doi: 10.1093/bioinformatics/btr046 PubMedCrossRefGoogle Scholar
  17. Derti A, Garrett-Engele P, Macisaac KD, Stevens RC, Sriram S, Chen R, Rohl CA, Johnson JM, Babak T (2012) A quantitative atlas of polyadenylation in five mammals. Genome Res 22(6):1173–1183. doi: 10.1101/gr.132563.111 PubMedCentralPubMedCrossRefGoogle Scholar
  18. Di Giammartino DC, Nishida K, Manley JL (2011) Mechanisms and consequences of alternative polyadenylation. Mol Cell 43(6):853–866. doi: 10.1016/j.molcel.2011.08.017 PubMedCentralPubMedCrossRefGoogle Scholar
  19. Elahi E, Ronaghi M (2004) Pyrosequencing: a tool for DNA sequencing analysis. Methods Mol Biol 255:211–219. doi: 10.1385/1-59259-752-1:211 PubMedGoogle Scholar
  20. Flavell SW, Kim T-K, Gray JM, Harmin DA, Hemberg M, Hong EJ, Markenscoff-Papadimitriou E, Bear DM, Greenberg ME (2008) Genome-wide analysis of MEF2 transcriptional program reveals synaptic target genes and neuronal activity-dependent polyadenylation site selection. Neuron 60:1022–1038PubMedCentralPubMedCrossRefGoogle Scholar
  21. Fox-Walsh K, Davis-Turak J, Zhou Y, Li H, Fu X-D (2011) A multiplex RNA-seq strategy to profile poly(A+) RNA: application to analysis of transcription response and 3rd edn formation. Genomics 98:266–271PubMedCentralPubMedCrossRefGoogle Scholar
  22. Gaspin C, Schiex T, Zytnicki M (2008) DARN! A weighted constraint solver for RNA motif localization.  10.1007/s10601-007-9033-9. 13
  23. Griffiths-Jones S (2010) miRBase: microRNA sequences and annotation. Current protocols in bioinformatics / editoral board. Andreas D Baxevanis et al (eds) Chapter 12: Unit 12.19.11-10Google Scholar
  24. Grillo G, Turi A, Licciulli F, Mignone F, Liuni S, Banfi S, Gennarino VA, Horner DS, Pavesi G, Picardi E, Pesole G (2010) UTRdb and UTRsite (RELEASE 2010): a collection of sequences and regulatory motifs of the untranslated regions of eukaryotic mRNAs. Nucleic Acids Res 38:D75–D80PubMedCentralPubMedCrossRefGoogle Scholar
  25. Hofacker I (2003) Vienna RNA secondary structure server. Nucleic Acids Res 31:3429–3431Google Scholar
  26. Hoque M, Ji Z, Zheng D, Luo W, Li W, You B, Park JY, Yehia G, Tian B (2013) Analysis of alternative cleavage and polyadenylation by 3′ region extraction and deep sequencing. Nat Methods 10:133–139PubMedCentralPubMedCrossRefGoogle Scholar
  27. Jacquier A (2009) The complex eukaryotic transcriptome: unexpected pervasive transcription and novel small RNAs. Nat Rev Genet 10:833–844PubMedCrossRefGoogle Scholar
  28. Jan CH, Friedman RC, Ruby JG, Bartel DP (2011) Formation, regulation and evolution of Caenorhabditis elegans 3′UTRs. Nature 469:97–101PubMedCentralPubMedCrossRefGoogle Scholar
  29. Jochl C, Rederstorff M, Hertel J, Stadler PF, Hofacker IL, Schrettl M, Haas H, Huttenhofer A (2008) Small ncRNA transcriptome analysis from Aspergillus fumigatus suggests a novel mechanism for regulation of protein synthesis. Nucleic Acids Res 36(8):2677–2689. doi: 10.1093/nar/gkn123 PubMedCentralPubMedCrossRefGoogle Scholar
  30. Kalkatawi M, Rangkuti F, Schramm M, Jankovic BR, Kamau A, Chowdhary R, Archer JAC, Bajic VB (2012) Dragon PolyA spotter: predictor of poly(A) motifs within human genomic DNA sequences. Bioinformatics (Oxford, England) 28:127–129Google Scholar
  31. Kavanaugh LA, Dietrich FS (2009) Non-coding RNA prediction and verification in Saccharomyces cerevisiae. PLoS Genet 5:e1000321PubMedCentralPubMedCrossRefGoogle Scholar
  32. Kin T, Yamada K, Terai G, Okida H, Yoshinari Y, Ono Y, Kojima A, Kimura Y, Komori T, Asai K (2007) fRNAdb: a platform for mining/annotating functional RNA candidates from non-coding RNA sequences. Nucleic Acids Res 35:D145–D148PubMedCentralPubMedCrossRefGoogle Scholar
  33. Kozarewa I, Ning Z, Quail MA, Sanders MJ, Berriman M, Turner DJ (2009) Amplification-free Illumina sequencing-library preparation facilitates improved mapping and assembly of (G+C)-biased genomes. Nat Methods 6(4):291–295. doi: 10.1038/nmeth.1311 PubMedCentralPubMedCrossRefGoogle Scholar
  34. Lagesen K, Hallin P, R\o dland EA, Staerfeldt H-H, Rognes T, Ussery DW (2007) RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res 35:3100–3108Google Scholar
  35. Lambert AE, Fontaine J-F, Legendre M, Leclerc F, Permal E, Major FC, Putzer H, Delfour O, Michot B, Gautheret D (2004) The ERPIN server: an interface to profile-based RNA motif identification. Nucleic Acids Res. 32:W160–W165Google Scholar
  36. Langmead B, Salzberg SL (2012) Fast gapped-read alignment with bowtie 2. Nat Methods 9(4):357–359. doi: 10.1038/nmeth.1923 PubMedCentralPubMedCrossRefGoogle Scholar
  37. Lee JY, Yeh I, Park JY, Tian B (2007) PolyA\_DB 2: mRNA polyadenylation sites in vertebrate genes. Nucleic Acids Res 35:D165–D168PubMedCentralPubMedCrossRefGoogle Scholar
  38. Lestrade L, Weber MJ (2006) snoRNA-LBME-db, a comprehensive database of human H/ACA and C/D box snoRNAs. Nucleic Acids Res 34:D158–D162PubMedCentralPubMedCrossRefGoogle Scholar
  39. Li H, Durbin R (2010) Fast and accurate long-read alignment with burrows-wheeler transform. Bioinformatics 26(5):589–595. doi: 10.1093/bioinformatics/btp698 PubMedCentralPubMedCrossRefGoogle Scholar
  40. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, Genome Project Data Processing S (2009) The sequence alignment/map format and SAMtools. bioinformatics 25(16):2078–2079. doi: 10.1093/bioinformatics/btp352
  41. Liu JM, Camilli A (2010) A broadening world of bacterial small RNAs. Curr Opin Microbiol 13:18–23PubMedCentralPubMedCrossRefGoogle Scholar
  42. Lunter G, Goodson M (2011) Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads. Genome Res 21(6):936–939. doi: 10.1101/gr.111120.110 PubMedCentralPubMedCrossRefGoogle Scholar
  43. Lutz CS, Moreira A (2010) Alternative mRNA polyadenylation in eukaryotes: an effective regulator of gene expression. Wiley Interdisc Rev RNA 2:22–31CrossRefGoogle Scholar
  44. Macke TJ, Ecker DJ, Gutell RR, Gautheret D, Case DA, Sampath R (2001) RNAMotif, an RNA secondary structure definition and search algorithm. Nucleic Acids Res 29:4724–4735PubMedCentralPubMedCrossRefGoogle Scholar
  45. Mardis ER (2008) Next-generation DNA sequencing methods. Annu Rev Genomics Hum Genet 9:387–402. doi: 10.1146/annurev.genom.9.081307.164359 PubMedCrossRefGoogle Scholar
  46. M-Je Schmidt, Norbury CJ (2010) Polyadenylation and beyond: emerging roles for noncanonical poly(A) polymerases. Wiley interdisc Rev RNA 1:142–151Google Scholar
  47. McKernan KJ, Peckham HE, Costa GL, McLaughlin SF, Fu Y, Tsung EF, Clouser CR, Duncan C, Ichikawa JK, Lee CC, Zhang Z, Ranade SS, Dimalanta ET, Hyland FC, Sokolsky TD, Zhang L, Sheridan A, Fu H, Hendrickson CL, Li B, Kotler L, Stuart JR, Malek JA, Manning JM, Antipova AA, Perez DS, Moore MP, Hayashibara KC, Lyons MR, Beaudoin RE, Coleman BE, Laptewicz MW, Sannicandro AE, Rhodes MD, Gottimukkala RK, Yang S, Bafna V, Bashir A, MacBride A, Alkan C, Kidd JM, Eichler EE, Reese MG, De La Vega FM, Blanchard AP (2009) Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding. Genome Res 19(9):1527–1541. doi: 10.1101/gr.091868.109 PubMedCentralPubMedCrossRefGoogle Scholar
  48. Mischo HE (1829) Proudfoot NJ (2013) Disengaging polymerase: terminating RNA polymerase II transcription in budding yeast. Biochim Biophys Acta 1:174–185. doi: 10.1016/j.bbagrm.2012.10.003 Google Scholar
  49. Morris AR, Bos A, Diosdado B, Rooijers K, Elkon R, Bolijn AS, Carvalho B, Meijer GA, Agami R (2012) Alternative cleavage and polyadenylation during colorectal cancer development. Clin Cancer Res Official J Am Assoc Cancer Res 18(19):5256–5266. doi: 10.1158/1078-0432.CCR-12-0543 CrossRefGoogle Scholar
  50. Nam DK, Lee S, Zhou G, Cao X, Wang C, Clark T, Chen J, Rowley JD, Wang SM (2002) Oligo(dT) primer generates a high frequency of truncated cDNAs through internal poly(A) priming during reverse transcription. Proc Natl Acad Sci USA 99:6152–6156PubMedCentralPubMedCrossRefGoogle Scholar
  51. Nawrocki EP, Eddy SR (2013) Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics Oxford, England p 1–3Google Scholar
  52. Ozsolak F, Kapranov P, Foissac S, Kim SW, Fishilevich E, Monaghan AP, John B, Milos PM (2010) Comprehensive polyadenylation site maps in yeast and human reveal pervasive alternative polyadenylation. Cell 143:1018–1029PubMedCentralPubMedCrossRefGoogle Scholar
  53. Pelechano V, Wilkening S, Järvelin AI, Tekkedil MM, Steinmetz LM (2012) Genome-wide polyadenylation site mapping. Meth Enzymol 513:271–296. doi: 10.1016/B978-0-12-391938-0.00012-4 Google Scholar
  54. Perez-Canadillas JM (2006) Grabbing the message: structural basis of mRNA 3′UTR recognition by Hrp1. EMBO J 25(13):3167–3178. doi: 10.1038/sj.emboj.7601190 PubMedCentralPubMedCrossRefGoogle Scholar
  55. Ponting CP, Oliver PL, Reik W (2009) Evolution and functions of long noncoding RNAs. Cell 136:629–641PubMedCrossRefGoogle Scholar
  56. Reimers M, Carey VJ (2006) Bioconductor: an open source framework for bioinformatics and computational biology. Methods Enzymol 411:119–134. doi: 10.1016/S0076-6879(06)11008-3 PubMedCrossRefGoogle Scholar
  57. Robinson MD, McCarthy DJ, Smyth GK (2010) edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26(1):139–140. doi: 10.1093/bioinformatics/btp616 PubMedCentralPubMedCrossRefGoogle Scholar
  58. Sandberg R, Neilson JR, Sarma A, Sharp PA, Burge CB (2008) Proliferating cells express mRNAs with shortened 3′ untranslated regions and fewer micro RNA target sites. Sci NY 320:1643–1647CrossRefGoogle Scholar
  59. Schattner P, Brooks AN, Lowe TM (2005) The tRNAscan-SE, snoscan and snoGPS web servers for the detection of tRNAs and snoRNAs. Nucleic Acids Res 33:W686–W689PubMedCentralPubMedCrossRefGoogle Scholar
  60. Shen Y, Liu Y, Liu L, Liang C, Li QQ (2008) Unique features of nuclear mRNA poly(A) signals and alternative polyadenylation in Chlamydomonas reinhardtii. Genetics 179(1):167–176. doi: 10.1534/genetics.108.088971 PubMedCentralPubMedCrossRefGoogle Scholar
  61. Shendure J, Ji H (2008) Next-generation DNA sequencing. Nat Biotechnol 26(10):1135–1145. doi: 10.1038/nbt1486 PubMedCrossRefGoogle Scholar
  62. Shepard PJ, Choi E-A, Lu J, Flanagan LA, Hertel KJ, Shi Y (2011) Complex and dynamic landscape of RNA polyadenylation revealed by PAS-Seq. RNA NY 17:761–772Google Scholar
  63. Singh P, Alley TL, Wright SM, Kamdar S, Schott W, Wilpan RY, Mills KD, Graber JH (2009) Global changes in processing of mRNA 3′ untranslated regions characterize clinically distinct cancer subtypes. Cancer Res 69:9422–9430PubMedCentralPubMedCrossRefGoogle Scholar
  64. Stajich JE, Block D, Boulez K, Brenner SE, Chervitz SA, Dagdigian C, Fuellen G, Gilbert JG, Korf I, Lapp H, Lehvaslaiho H, Matsalla C, Mungall CJ, Osborne BI, Pocock MR, Schattner P, Senger M, Stein LD, Stupka E, Wilkinson MD, Birney E (2002) The bioperl toolkit: perl modules for the life sciences. Genome Res 12(10):1611–1618. doi: 10.1101/gr.361602 PubMedCentralPubMedCrossRefGoogle Scholar
  65. Steigele S, Huber W, Stocsits C, Stadler PF, Nieselt K (2007) Comparative analysis of structured RNAs in S. cerevisiae indicates a multitude of different functions. BMC Biol 5:25. doi: 10.1186/1741-7007-5-25 PubMedCentralPubMedCrossRefGoogle Scholar
  66. Thomas-Chollier M, Defrance M, Medina-Rivera A, Sand O, Herrmann C, Thieffry D, van Helden J (2011) RSAT 2011: regulatory sequence analysis tools. Nucleic Acids Res 39:W86–W91PubMedCentralPubMedCrossRefGoogle Scholar
  67. Tian B, Hu J, Zhang H, Lutz CS (2005) A large-scale analysis of mRNA polyadenylation of human and mouse genes. Nucleic Acids Res 33:201–212PubMedCentralPubMedCrossRefGoogle Scholar
  68. Trapnell C, Pachter L, Salzberg SL (2009) TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25(9):1105–1111. doi: 10.1093/bioinformatics/btp120 PubMedCentralPubMedCrossRefGoogle Scholar
  69. Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, Pimentel H, Salzberg SL, Rinn JL, Pachter L (2012) Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and cufflinks. Nat Protoc 7(3):562–578. doi: 10.1038/nprot.2012.016 PubMedCentralPubMedCrossRefGoogle Scholar
  70. Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, Kingsmore SF, Schroth GP, Burge CB (2008) Alternative isoform regulation in human tissue transcriptomes. Nature 456:470–476PubMedCentralPubMedCrossRefGoogle Scholar
  71. Wang K, Singh D, Zeng Z, Coleman SJ, Huang Y, Savich GL, He X, Mieczkowski P, Grimm SA, Perou CM, MacLeod JN, Chiang DY, Prins JF, Liu J (2010) MapSplice: accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Res 38(18):e178. doi: 10.1093/nar/gkq622 PubMedCentralPubMedCrossRefGoogle Scholar
  72. Wang KC, Chang HY (2011) Molecular mechanisms of long noncoding RNAs. Mol Cell 43(6):904–914. doi: 10.1016/j.molcel.2011.08.018 PubMedCentralPubMedCrossRefGoogle Scholar
  73. Washietl S, Hofacker IL, Stadler PF (2005) Fast and reliable prediction of noncoding RNAs. Proc Natl Acad Sci USA 15:2454–2459. doi: 10.1073/pnas.0409169102 Google Scholar
  74. Waters LS, Storz G (2009) Regulatory RNAs in bacteria. Cell 136:615–628PubMedCentralPubMedCrossRefGoogle Scholar
  75. Wilkening S, Pelechano V, J\”arvelin AI, Tekkedil MM, Anders S, Benes V, Steinmetz LM (2013) An efficient method for genome-wide polyadenylation site mapping and RNA quantification. Nucleic Acids Res 41:e65Google Scholar
  76. Wilson JT, deRiel JK, Forget BG, Marotta CA, Weissman SM (1977) Nucleotide sequence of 3′ untranslated portion of human alpha globin mRNA. Nucleic Acids Res 4(7):2353–2368PubMedCentralPubMedCrossRefGoogle Scholar
  77. Wu TD, Nacu S (2010) Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics 26(7):873–881. doi: 10.1093/bioinformatics/btq057 PubMedCentralPubMedCrossRefGoogle Scholar
  78. Yoon OK, Brem RB (2010) Noncanonical transcript forms in yeast and their regulation during environmental stress. RNA NY 16:1256–1267Google Scholar
  79. Zhang H, Lee JY, Tian B (2005) Biased alternative polyadenylation in human tissues. Genome Biol 6:R100PubMedCentralPubMedCrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Marco Marconi
    • 1
  • Julio Rodriguez-Romero
    • 1
  • Ane Sesma
    • 1
  • Mark D. Wilkinson
    • 1
  1. 1.Centre for Plant Biotechnology and Genomics, Department of BiotechnologyTechnical University of MadridPozuelo de AlarcónSpain

Personalised recommendations