Abstract
Ribonucleic acids (RNAs) are fundamental molecules that control regulation and expression of the genome and therefore the function of a cell. Robust analysis and quantification of RNA transcripts hold critical importance in understanding cell function, altered phenotypes in different biological context, for understanding and targeting diseases. The development of RNA-sequencing (RNA-Seq) now provides opportunities to analyze the expression and function of RNA molecules at an unprecedented scale. However, the strategy for RNA-Seq experimental design and data analysis can substantially differ depending on the biological application. The design choice could also have significant impact for downstream results and interpretation of data. Here we describe key critical considerations required for RNA-Seq experimental design and also describe a step-by-step bioinformatics workflow detailing the different steps required for RNA-Seq data analysis. We believe this article will be a valuable guide for designing and analyzing RNA-Seq data to address a wide range of different biological questions.
Key words
- RNA-Seq
- Gene expression
- Transcript
- Sequencing
- Alignment
- Sequence reads
This is a preview of subscription content, access via your institution.
Buying options

References
Ozsolak F, Milos PM (2011) RNA sequencing: advances, challenges and opportunities. Nat Rev Genet 12(2):87–98. https://doi.org/10.1038/nrg2934
Crick F (1970) Central dogma of molecular biology. Nature 227(5258):561–563
Crick FH (1958) On protein synthesis. Symp Soc Exp Biol 12:138–163
Statello L, Guo CJ, Chen LL, Huarte M (2021) Gene regulation by long non-coding RNAs and its biological functions. Nat Rev Mol Cell Biol 22(2):96–118. https://doi.org/10.1038/s41580-020-00315-9
Gebert LFR, MacRae IJ (2019) Regulation of microRNA function in animals. Nat Rev Mol Cell Biol 20(1):21–37. https://doi.org/10.1038/s41580-018-0045-7
Schena M, Shalon D, Davis RW, Brown PO (1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270(5235):467–470
Murphy D (2002) Gene expression studies using microarrays: principles, problems, and prospects. Adv Physiol Educ 26(1-4):256–270
Abdullah-Sayani A, Bueno-de-Mesquita JM, van de Vijver MJ (2006) Technology Insight: tuning into the genetic orchestra using microarrays–limitations of DNA microarrays in clinical practice. Nat Clin Pract Oncol 3(9):501–516. https://doi.org/10.1038/ncponc0587
Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10(1):57–63. https://doi.org/10.1038/nrg2484
Wilhelm BT, Landry JR (2009) RNA-Seq-quantitative measurement of expression through massively parallel RNA-sequencing. Methods 48(3):249–257. https://doi.org/10.1016/j.ymeth.2009.03.016
Zhao S, Fung-Leung WP, Bittner A, Ngo K, Liu X (2014) Comparison of RNA-Seq and microarray in transcriptome profiling of activated T cells. PLoS One 9(1):e78644. https://doi.org/10.1371/journal.pone.0078644
Leichter AL, Purcell RV, Sullivan MJ, Eccles MR, Chatterjee A (2015) Multi-platform microRNA profiling of hepatoblastoma patients using formalin fixed paraffin embedded archival samples. GigaScience 4:54. https://doi.org/10.1186/s13742-015-0099-9
Chatterjee A, Leichter AL, Fan V, Tsai P, Purcell RV, Sullivan MJ, Eccles MR (2015) A cross comparison of technologies for the detection of microRNAs in clinical FFPE samples of hepatoblastoma patients. Sci Rep 5:10438. https://doi.org/10.1038/srep10438
Petrova OE, Garcia-Alcalde F, Zampaloni C, Sauer K (2017) Comparative evaluation of rRNA depletion procedures for the improved analysis of bacterial biofilm and mixed pathogen culture transcriptomes. Sci Rep 7:41114. https://doi.org/10.1038/srep41114
Wang C, Gong B, Bushel PR, Thierry-Mieg J, Thierry-Mieg D, Xu J, Fang H, Hong H, Shen J, Su Z, Meehan J, Li X, Yang L, Li H, Labaj PP, Kreil DP, Megherbi D, Gaj S, Caiment F, van Delft J, Kleinjans J, Scherer A, Devanarayan V, Wang J, Yang Y, Qian HR, Lancashire LJ, Bessarabova M, Nikolsky Y, Furlanello C, Chierici M, Albanese D, Jurman G, Riccadonna S, Filosi M, Visintainer R, Zhang KK, Li J, Hsieh JH, Svoboda DL, Fuscoe JC, Deng Y, Shi L, Paules RS, Auerbach SS, Tong W (2014) The concordance between RNA-seq and microarray data depends on chemical treatment and transcript abundance. Nat Biotechnol 32(9):926–932. https://doi.org/10.1038/nbt.3001
Tang F, Barbacioru C, Wang Y, Nordman E, Lee C, Xu N, Wang X, Bodeau J, Tuch BB, Siddiqui A, Lao K, Surani MA (2009) mRNA-Seq whole-transcriptome analysis of a single cell. Nat Methods 6(5):377–382. https://doi.org/10.1038/nmeth.1315
Chatterjee A, Ahn A, Rodger EJ, Stockwell PA, Eccles MR (2018) A guide for designing and analyzing RNA-Seq data. Methods Mol Biol 1783:35–80. https://doi.org/10.1007/978-1-4939-7834-2_3
Chen C-H, Pan C-Y, Lin W-c (2019) Overlapping protein-coding genes in human genome and their coincidental expression in tissues. Sci Rep 9(1):13377. https://doi.org/10.1038/s41598-019-49802-w
Zhao S, Zhang Y, Gordon W, Quan J, Xi H, Du S, von Schack D, Zhang B (2015) Comparison of stranded and non-stranded RNA-seq transcriptome profiling and investigation of gene overlap. BMC Genomics 16(1):675. https://doi.org/10.1186/s12864-015-1876-7
Krzywinski M, Altman N (2013) Power and sample size. Nat Methods 10(12):1139–1140. https://doi.org/10.1038/nmeth.2738
Button KS, Ioannidis JPA, Mokrysz C, Nosek BA, Flint J, Robinson ESJ, Munafò MR (2013) Power failure: why small sample size undermines the reliability of neuroscience. Nat Rev Neurosci 14(5):365–376. https://doi.org/10.1038/nrn3475
Hart SN, Therneau TM, Zhang Y, Poland GA, Kocher J-P (2013) Calculating sample size estimates for RNA sequencing data. J Comput Biol 20(12):970–978. https://doi.org/10.1089/cmb.2012.0283
Busby MA, Stewart C, Miller CA, Grzeda KR, Marth GT (2013) Scotty: a web tool for designing RNA-Seq experiments to measure differential gene expression. Bioinformatics 29(5):656–657. https://doi.org/10.1093/bioinformatics/btt015
Tarazona S, García-Alcalde F, Dopazo J, Ferrer A, Conesa A (2011) Differential expression in RNA-seq: a matter of depth. Genome Res 21(12):2213–2223. https://doi.org/10.1101/gr.124321.111
Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 5(7):621–628. https://doi.org/10.1038/nmeth.1226
Chhangawala S, Rudy G, Mason CE, Rosenfeld JA (2015) The impact of read length on quantification of differentially expressed genes and splice junction detection. Genome Biol 16(1):131. https://doi.org/10.1186/s13059-015-0697-y
Uszczynska-Ratajczak B, Lagarde J, Frankish A, Guigó R, Johnson R (2018) Towards a complete map of the human long non-coding RNA transcriptome. Nat Rev Genet 19(9):535–548. https://doi.org/10.1038/s41576-018-0017-y
Weirather JL, de Cesare M, Wang Y, Piazza P, Sebastiano V, Wang X-J, Buck D, Au KF (2017) Comprehensive comparison of Pacific Biosciences and Oxford Nanopore Technologies and their applications to transcriptome analysis. F1000Research 6:100. https://doi.org/10.12688/f1000research.10571.2
O’Neil D, Glowatz H, Schlumpberger M (2013) Ribosomal RNA depletion for efficient use of RNA-Seq capacity. Curr Protoc Mol Biol 103(1):4.19.11–4.19.18. https://doi.org/10.1002/0471142727.mb0419s103
Zaghlool A, Ameur A, Nyberg L, Halvardson J, Grabherr M, Cavelier L, Feuk L (2013) Efficient cellular fractionation improves RNA sequencing analysis of mature and nascent transcripts from human tissues. BMC Biotechnol 13(1):99. https://doi.org/10.1186/1472-6750-13-99
Kim SH, Das A, Chai JC, Binas B, Choi MR, Park KS, Lee YS, Jung KH, Chai YG (2016) Transcriptome sequencing wide functional analysis of human mesenchymal stem cells in response to TLR4 ligand. Sci Rep 6:30311–30311. https://doi.org/10.1038/srep30311
Ewels P, Magnusson M, Lundin S, Käller M (2016) MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32(19):3047–3048. https://doi.org/10.1093/bioinformatics/btw354
Liao Y, Shi W (2020) Read trimming is not required for mapping and quantification of RNA-seq reads at the gene level. NAR Genomics Bioinformatics 2(3):lqaa068. https://doi.org/10.1093/nargab/lqaa068
Meyers BC, Scalabrin S, Morgante M (2004) Mapping and sequencing complex genomes: let’s get physical! Nat Rev Genet 5(8):578–588. https://doi.org/10.1038/nrg1404
Hatem A, Bozdağ D, Toland AE, Çatalyürek ÜV (2013) Benchmarking short sequence mapping tools. BMC Bioinformatics 14(1):184. https://doi.org/10.1186/1471-2105-14-184
Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR (2013) STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29(1):15–21. https://doi.org/10.1093/bioinformatics/bts635
Kim D, Langmead B, Salzberg SL (2015) HISAT: a fast spliced aligner with low memory requirements. Nat Methods 12(4):357–360. https://doi.org/10.1038/nmeth.3317
Bray NL, Pimentel H, Melsted P, Pachter L (2016) Near-optimal probabilistic RNA-seq quantification. Nat Biotechnol 34(5):525–527. https://doi.org/10.1038/nbt.3519
Patro R, Duggal G, Love MI, Irizarry RA, Kingsford C (2017) Salmon provides fast and bias-aware quantification of transcript expression. Nat Methods 14(4):417–419. https://doi.org/10.1038/nmeth.4197
Li B, Dewey CN (2011) RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12:323–323. https://doi.org/10.1186/1471-2105-12-323
Wagner GP, Kin K, Lynch VJ (2012) Measurement of mRNA abundance using RNA-seq data: RPKM measure is inconsistent among samples. Theory Biosci 131(4):281–285. https://doi.org/10.1007/s12064-012-0162-3
Pachter L (2011) Models for transcript quantification from RNA-Seq. https://doi.org/10.48550/arXiv.1104.3889
Zhao S, Ye Z, Stanton R (2020) Misuse of RPKM or TPM normalization when comparing across samples and sequencing protocols. RNA 26(8):903–909. https://doi.org/10.1261/rna.074922.120
Conesa A, Madrigal P, Tarazona S, Gomez-Cabrero D, Cervera A, McPherson A, Szcześniak MW, Gaffney DJ, Elo LL, Zhang X, Mortazavi A (2016) A survey of best practices for RNA-seq data analysis. Genome Biol 17(1):13. https://doi.org/10.1186/s13059-016-0881-8
Costa-Silva J, Domingues D, Lopes FM (2017) RNA-Seq differential expression analysis: an extended review and a software tool. PLoS One 12(12):e0190152. https://doi.org/10.1371/journal.pone.0190152
Quinn TP, Crowley TM, Richardson MF (2018) Benchmarking differential expression analysis tools for RNA-Seq: normalization-based vs. log-ratio transformation-based methods. BMC Bioinformatics 19(1):274. https://doi.org/10.1186/s12859-018-2261-8
Everaert C, Luypaert M, Maag JLV, Cheng QX, Dinger ME, Hellemans J, Mestdagh P (2017) Benchmarking of RNA-sequencing analysis workflows using whole-transcriptome RT-qPCR expression data. Sci Rep 7(1):1559. https://doi.org/10.1038/s41598-017-01617-3
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, Genome Project Data Processing S (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25(16):2078–2079. https://doi.org/10.1093/bioinformatics/btp352
Anders S, Pyl PT, Huber W (2015) HTSeq–a Python framework to work with high-throughput sequencing data. Bioinformatics 31(2):166–169. https://doi.org/10.1093/bioinformatics/btu638
Liao Y, Smyth GK, Shi W (2013) The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote. Nucleic Acids Res 41(10):e108–e108. https://doi.org/10.1093/nar/gkt214
O’Leary NA, Wright MW, Brister JR, Ciufo S, Haddad D, McVeigh R, Rajput B, Robbertse B, Smith-White B, Ako-Adjei D, Astashyn A, Badretdin A, Bao Y, Blinkova O, Brover V, Chetvernin V, Choi J, Cox E, Ermolaeva O, Farrell CM, Goldfarb T, Gupta T, Haft D, Hatcher E, Hlavina W, Joardar VS, Kodali VK, Li W, Maglott D, Masterson P, McGarvey KM, Murphy MR, O’Neill K, Pujar S, Rangwala SH, Rausch D, Riddick LD, Schoch C, Shkeda A, Storz SS, Sun H, Thibaud-Nissen F, Tolstoy I, Tully RE, Vatsan AR, Wallin C, Webb D, Wu W, Landrum MJ, Kimchi A, Tatusova T, DiCuccio M, Kitts P, Murphy TD, Pruitt KD (2016) Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res 44(D1):D733–D745. https://doi.org/10.1093/nar/gkv1189
Aken BL, Ayling S, Barrell D, Clarke L, Curwen V, Fairley S, Fernandez Banet J, Billis K, García Girón C, Hourlier T, Howe K, Kähäri A, Kokocinski F, Martin FJ, Murphy DN, Nag R, Ruffier M, Schuster M, Tang YA, Vogel J-H, White S, Zadissa A, Flicek P, Searle SMJ (2016) The ensembl gene annotation system. Database (Oxford) 2016:baw093. https://doi.org/10.1093/database/baw093
Frankish A, Diekhans M, Ferreira A-M, Johnson R, Jungreis I, Loveland J, Mudge JM, Sisu C, Wright J, Armstrong J, Barnes I, Berry A, Bignell A, Carbonell Sala S, Chrast J, Cunningham F, Di Domenico T, Donaldson S, Fiddes IT, García Girón C, Gonzalez JM, Grego T, Hardy M, Hourlier T, Hunt T, Izuogu OG, Lagarde J, Martin FJ, Martínez L, Mohanan S, Muir P, Navarro FCP, Parker A, Pei B, Pozo F, Ruffier M, Schmitt BM, Stapleton E, Suner M-M, Sycheva I, Uszczynska-Ratajczak B, Xu J, Yates A, Zerbino D, Zhang Y, Aken B, Choudhary JS, Gerstein M, Guigó R, Hubbard TJP, Kellis M, Paten B, Reymond A, Tress ML, Flicek P (2019) GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res 47(D1):D766–D773. https://doi.org/10.1093/nar/gky955
Hamaguchi Y, Zeng C, Hamada M (2021) Impact of human gene annotations on RNA-seq differential expression analysis. BMC Genomics 22(1):730. https://doi.org/10.1186/s12864-021-08038-7
Wu PY, Phan JH, Wang MD (2012) The effect of human genome annotation complexity on RNA-Seq gene expression quantification. In: 2012 IEEE international conference on bioinformatics and biomedicine workshops, 4–7 October 2012, pp 712–717. https://doi.org/10.1109/BIBMW.2012.6470224
Wang L, Wang S, Li W (2012) RSeQC: quality control of RNA-seq experiments. Bioinformatics 28(16):2184–2185. https://doi.org/10.1093/bioinformatics/bts356
Love MI, Huber W, Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15(12):550. https://doi.org/10.1186/s13059-014-0550-8
Soneson C, Love MI, Robinson MD (2105) Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Research 4:1521
Patro R, Mount SM, Kingsford C (2014) Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms. Nat Biotechnol 32(5):462–464. https://doi.org/10.1038/nbt.2862
Pertea M, Pertea GM, Antonescu CM, Chang T-C, Mendell JT, Salzberg SL (2015) StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol 33(3):290–295. https://doi.org/10.1038/nbt.3122
Afgan E, Baker D, Batut B, van den Beek M, Bouvier D, Čech M, Chilton J, Clements D, Coraor N, Grüning BA, Guerler A, Hillman-Jackson J, Hiltemann S, Jalili V, Rasche H, Soranzo N, Goecks J, Taylor J, Nekrutenko A, Blankenberg D (2018) The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update. Nucleic Acids Res 46(W1):W537–W544. https://doi.org/10.1093/nar/gky379
Köster J, Rahmann S (2012) Snakemake—a scalable bioinformatics workflow engine. Bioinformatics 28(19):2520–2522. https://doi.org/10.1093/bioinformatics/bts480
Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S (2020) The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol 38(3):276–278. https://doi.org/10.1038/s41587-020-0439-x
Sharon D, Tilgner H, Grubert F, Snyder M (2013) A single-molecule long-read survey of the human transcriptome. Nat Biotechnol 31(11):1009–1014. https://doi.org/10.1038/nbt.2705
Byrne A, Beaudin AE, Olsen HE, Jain M, Cole C, Palmer T, DuBois RM, Forsberg EC, Akeson M, Vollmers C (2017) Nanopore long-read RNAseq reveals widespread transcriptional variation among the surface receptors of individual B cells. Nat Commun 8(1):16027. https://doi.org/10.1038/ncomms16027
Soneson C, Yao Y, Bratus-Neuenschwander A, Patrignani A, Robinson MD, Hussain S (2019) A comprehensive examination of Nanopore native RNA sequencing for characterization of complex transcriptomes. Nat Commun 10(1):3359. https://doi.org/10.1038/s41467-019-11272-z
Hansen KD, Wu Z, Irizarry RA, Leek JT (2011) Sequencing technology does not eliminate biological variability. Nat Biotechnol 29(7):572–573. https://doi.org/10.1038/nbt.1910
Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y (2008) RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res 18(9):1509–1517. https://doi.org/10.1101/gr.079558.108
Takele Assefa A, Vandesompele J, Thas O (2020) On the utility of RNA sample pooling to optimize cost and statistical power in RNA sequencing experiments. BMC Genomics 21(1):312. https://doi.org/10.1186/s12864-020-6721-y
Klaus B (2015) Statistical relevance—relevant statistics, part I. EMBO J 34(22):2727–2730. https://doi.org/10.15252/embj.201592958
Goh WWB, Wang W, Wong L (2017) Why batch effects matter in omics data, and how to avoid them. Trends Biotechnol 35(6):498–507. https://doi.org/10.1016/j.tibtech.2017.02.012
Zhang Y, Parmigiani G, Johnson WE (2020) ComBat-seq: batch effect adjustment for RNA-seq count data. NAR Genomics Bioinformatics 2(3):lqaa078. https://doi.org/10.1093/nargab/lqaa078
Acknowledgments
We would like to thank the Rutherford Discovery Fellowship Program (Royal Society of New Zealand) for supporting AC’s current position and the Dunedin School of Medicine for supporting our work.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Gimenez, G., Stockwell, P.A., Rodger, E.J., Chatterjee, A. (2023). Strategy for RNA-Seq Experimental Design and Data Analysis. In: Seymour, G.J., Cullinan, M.P., Heng, N.C., Cooper, P.R. (eds) Oral Biology. Methods in Molecular Biology, vol 2588. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2780-8_16
Download citation
DOI: https://doi.org/10.1007/978-1-0716-2780-8_16
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-2779-2
Online ISBN: 978-1-0716-2780-8
eBook Packages: Springer Protocols