It’s DE-licious: A Recipe for Differential Expression Analyses of RNA-seq Experiments Using Quasi-Likelihood Methods in edgeR
RNA sequencing (RNA-seq) is widely used to profile transcriptional activity in biological systems. Here we present an analysis pipeline for differential expression analysis of RNA-seq experiments using the Rsubread and edgeR software packages. The basic pipeline includes read alignment and counting, filtering and normalization, modelling of biological variability and hypothesis testing. For hypothesis testing, we describe particularly the quasi-likelihood features of edgeR. Some more advanced downstream analysis steps are also covered, including complex comparisons, gene ontology enrichment analyses and gene set testing. The code required to run each step is described, along with an outline of the underlying theory. The chapter includes a case study in which the pipeline is used to study the expression profiles of mammary gland cells in virgin, pregnant and lactating mice.
Key wordsRNA-seq Differential expression Generalized linear models Quasi-likelihood Variability Read alignment Read counts
This worked was funded by the University of Melbourne (Elizabeth and Vernon Puzey Scholarship to Aaron T.L. Lun), by the National Health and Medical Research Council (NHMRC) (Fellowship 1058892 and Program 1054618 to Gordon K. Smyth), by the NHMRC Independent Research Institutes Infrastructure Support (IRIIS) Scheme, and by a Victorian State Government Operational Infrastructure Support (OIS) Grant.
- 7.Lund et al SP (2012) Detecting differential expression in RNA-sequence data using quasi-likelihood with shrunken dispersion estimates. Stat Appl Genet Mol Biol 11.5:Article 8Google Scholar
- 10.Anders S et al (2013) Count-based differential expression analysis of RNA sequencing data using R and Bioconductor. Nat Protoc 8:1765–1786Google Scholar
- 11.Fu NY, Rios A, Pal B, Soetanto R, Lun ATL, Liu K, Beck T, Best S, Vaillant F, Bouillet P, Strasser A, Preiss T, Smyth GK, Lindeman G, Visvader J (2015) EGF-mediated induction of Mcl-1 at the switch to lactation is essential for alveolar cell survival. Nat Cell Biol 17.4:365–375Google Scholar
- 12.Huber W et al (2015) Orchestrating high-throughput genomic analysis with Bioconductor. Nat Methods 12.2:115–121Google Scholar
- 16.Smyth GK (2004) Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 3.1:Article 3Google Scholar
- 17.Phipson B et al (2013) Empirical Bayes in the presence of exceptional cases, with application to microarray data. Tech. rep. Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia, May 2013. http://www.statsci.org/smyth/pubs/RobustEBayesPreprint.pdf
- 19.Wu D et al (2010) ROAST: rotation gene set tests for complex microarray experiments. Bioinformatics 26.17:2176–2182Google Scholar