Advertisement

It’s DE-licious: A Recipe for Differential Expression Analyses of RNA-seq Experiments Using Quasi-Likelihood Methods in edgeR

  • Aaron T. L. Lun
  • Yunshun Chen
  • Gordon K. Smyth
Part of the Methods in Molecular Biology book series (MIMB, volume 1418)

Abstract

RNA sequencing (RNA-seq) is widely used to profile transcriptional activity in biological systems. Here we present an analysis pipeline for differential expression analysis of RNA-seq experiments using the Rsubread and edgeR software packages. The basic pipeline includes read alignment and counting, filtering and normalization, modelling of biological variability and hypothesis testing. For hypothesis testing, we describe particularly the quasi-likelihood features of edgeR. Some more advanced downstream analysis steps are also covered, including complex comparisons, gene ontology enrichment analyses and gene set testing. The code required to run each step is described, along with an outline of the underlying theory. The chapter includes a case study in which the pipeline is used to study the expression profiles of mammary gland cells in virgin, pregnant and lactating mice.

Key words

RNA-seq Differential expression Generalized linear models Quasi-likelihood Variability Read alignment Read counts 

Notes

Acknowledgements

This worked was funded by the University of Melbourne (Elizabeth and Vernon Puzey Scholarship to Aaron T.L. Lun), by the National Health and Medical Research Council (NHMRC) (Fellowship 1058892 and Program 1054618 to Gordon K. Smyth), by the NHMRC Independent Research Institutes Infrastructure Support (IRIIS) Scheme, and by a Victorian State Government Operational Infrastructure Support (OIS) Grant.

References

  1. 1.
    Mortazavi A et al (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 5.7:621–628CrossRefPubMedGoogle Scholar
  2. 2.
    Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10.1:57–63CrossRefPubMedPubMedCentralGoogle Scholar
  3. 3.
    Shendure J, Aiden EL (2012) The expanding scope of DNA sequencing. Nat Biotechnol 30.11:1084–1094CrossRefPubMedPubMedCentralGoogle Scholar
  4. 4.
    Liao Y, Smyth GK, Shi W (2013) The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote. Nucleic Acids Res 41.10:e108CrossRefPubMedPubMedCentralGoogle Scholar
  5. 5.
    Robinson MD, McCarthy DJ, Smyth GK (2010) edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26.1:139–140CrossRefPubMedPubMedCentralGoogle Scholar
  6. 6.
    McCarthy DJ, Chen Y, Smyth GK (2012) Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Res 40.10:4288–4297CrossRefPubMedPubMedCentralGoogle Scholar
  7. 7.
    Lund et al SP (2012) Detecting differential expression in RNA-sequence data using quasi-likelihood with shrunken dispersion estimates. Stat Appl Genet Mol Biol 11.5:Article 8Google Scholar
  8. 8.
    Robinson MD, Smyth GK (2008) Small-sample estimation of negative binomial dispersion, with applications to SAGE data. Biostatistics 9.2:321–332CrossRefPubMedGoogle Scholar
  9. 9.
    Robinson MD, Smyth GK (2007) Moderated statistical tests for assessing differences in tag abundance. Bioinformatics 23.21:2881–2887CrossRefPubMedGoogle Scholar
  10. 10.
    Anders S et al (2013) Count-based differential expression analysis of RNA sequencing data using R and Bioconductor. Nat Protoc 8:1765–1786Google Scholar
  11. 11.
    Fu NY, Rios A, Pal B, Soetanto R, Lun ATL, Liu K, Beck T, Best S, Vaillant F, Bouillet P, Strasser A, Preiss T, Smyth GK, Lindeman G, Visvader J (2015) EGF-mediated induction of Mcl-1 at the switch to lactation is essential for alveolar cell survival. Nat Cell Biol 17.4:365–375Google Scholar
  12. 12.
    Huber W et al (2015) Orchestrating high-throughput genomic analysis with Bioconductor. Nat Methods 12.2:115–121Google Scholar
  13. 13.
    Trapnell C, Pachter L, Salzberg SL (2009) TopHat: discovering splice junctions with RNA-seq. Bioinformatics 25.9:1105–1111CrossRefPubMedPubMedCentralGoogle Scholar
  14. 14.
    Liao Y, Smyth GK, Shi W (2014) featureCounts: an efficient general-purpose read summarization program. Bioinformatics 30:923–930CrossRefPubMedGoogle Scholar
  15. 15.
    Anders S, Pyl PT, Huber W (2015) HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics 31.2:166–169CrossRefPubMedPubMedCentralGoogle Scholar
  16. 16.
    Smyth GK (2004) Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 3.1:Article 3Google Scholar
  17. 17.
    Phipson B et al (2013) Empirical Bayes in the presence of exceptional cases, with application to microarray data. Tech. rep. Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia, May 2013. http://www.statsci.org/smyth/pubs/RobustEBayesPreprint.pdf
  18. 18.
    Robinson MD, Oshlack A (2010) A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol 11.3:R25CrossRefPubMedPubMedCentralGoogle Scholar
  19. 19.
    Wu D et al (2010) ROAST: rotation gene set tests for complex microarray experiments. Bioinformatics 26.17:2176–2182Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Aaron T. L. Lun
    • 1
  • Yunshun Chen
    • 1
  • Gordon K. Smyth
    • 1
  1. 1.Walter and Eliza Hall Institute of Medical ResearchParkvilleAustralia

Personalised recommendations