edgeR for Differential RNA-seq and ChIP-seq Analysis: An Application to Stem Cell Biology

  • Olga Nikolayeva
  • Mark D. Robinson
Part of the Methods in Molecular Biology book series (MIMB, volume 1150)


The edgeR package, an R-based tool within the Bioconductor project, offers a flexible statistical framework for detection of changes in abundance based on counts. In this chapter, we illustrate the use of edgeR on a human embryonic stem cell dataset, in particular for RNA-seq and ChIP-seq data. We focus on a step-by-step statistical analysis of differential expression, going from raw data to a list of putative differentially expressed genes and give examples of integrative analysis using the ChIP-seq data. We emphasize data quality spot checks and the use of positive controls throughout the process and give practical recommendations for reproducible research.


Differential count analysis edgeR RNA-seq ChIP-seq Reproducible research Integrative analysis Human embryonic stem cells 


  1. 1.
    Anders S, McCarthy DJ, Chen Y, Okoniewski M, Smyth GK, Huber W, Robinson MD (2013) Count-based differential expression analysis of RNA sequencing data using R and Bioconductor. Nat Prot (in press)Google Scholar
  2. 2.
    Anders S, Reyes A, Huber W (2012) Detecting differential usage of exons from RNA-seq data. Genome Res Adv Ac (2008):1–19. ISSN 10889051Google Scholar
  3. 3.
    Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y (2008) RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res 18(9):1509–17. ISSN 10889051Google Scholar
  4. 4.
    McCarthy DJ, Chen Y, Smyth GK (2012) Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Res 1–10. ISSN 1362-4962Google Scholar
  5. 5.
    Robinson MD, Smyth GK (2007) Moderated statistical tests for assessing differences in tag abundance. Bioinformatics 23(21):2881–2887PubMedCrossRefGoogle Scholar
  6. 6.
    Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JYH, Zhang J (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5(10):R80PubMedCentralPubMedCrossRefGoogle Scholar
  7. 7.
    R Development Core Team R (2011) R: A language and environment for statistical computing. ISSN 16000706Google Scholar
  8. 8.
    Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL (2013) TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14(4):R36. ISSN 14656906Google Scholar
  9. 9.
    Wu TD, Nacu S (2010) Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics 26(7):873–881PubMedCentralPubMedCrossRefGoogle Scholar
  10. 10.
    Wang K, Singh D, Zeng Z, Coleman SJ, Huang Y, Savich GL, He X, Mieczkowski P, Grimm SA, Perou CM, MacLeod JN, Chiang DY, Prins JF, Liu J (2010) MapSplice: Accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Res 38(18):e178PubMedCentralPubMedCrossRefGoogle Scholar
  11. 11.
    Liao Y, Smyth GK, Shi W (2013) The subread aligner: fast, accurate and scalable read mapping by seed-and-vote. Nucleic Acids Res 41(10):e108PubMedCentralPubMedCrossRefGoogle Scholar
  12. 12.
    Fiume M, Williams V, Brudno M (2010) Savant: Genome Browser for high throughput sequencing data. Bioinformatics 26(1):1–7CrossRefGoogle Scholar
  13. 13.
    Fiume M, Smith EJM, Brook A, Strbenac D, Turner B, Mezlini AM, Robinson MD, Wodak SJ, Brudno M (2012) Savant Genome Browser 2: visualization and analysis for population-scale genomics. Nucleic Acids Res 40(W1):1–7. ISSN 13624962Google Scholar
  14. 14.
    Thorvaldsdóttir H, Robinson JT, Mesirov JP (2012) Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform Adv pu:bbs017. ISSN 14774054.Google Scholar
  15. 15.
    Ihaka R, Gentleman R (1996) R: a language for data analysis and graphics. J Comput Graph Stat 5(3):299–314Google Scholar
  16. 16.
    Morgan M, Anders S, Lawrence M, Aboyoun P, Pagès H, Gentleman R (2009) ShortRead: a bioconductor package for input, quality assessment and exploration of high-throughput sequence data. Bioinformatics 25(19):2607–2608PubMedCentralPubMedCrossRefGoogle Scholar
  17. 17.
    Robinson MD, McCarthy DJ, Smyth GK (2010) edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26(1):139–140PubMedCentralPubMedCrossRefGoogle Scholar
  18. 18.
    Carlson M, Pages H, Aboyoun P, Falcon S, Morgan M, Sarkar D, Lawrence M. GenomicFeatures: Tools for making and manipulating transcript centric annotationsGoogle Scholar
  19. 19.
    Lawrence M, Gentleman R, Carey V (2009) rtracklayer: an R package for interfacing with genome browsers. Bioinformatics 25:1841–1842PubMedCentralPubMedCrossRefGoogle Scholar
  20. 20.
    Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R (2009) The sequence alignment/Map format and SAMtools. Bioinformatics 25(16):2078–2079PubMedCentralPubMedCrossRefGoogle Scholar
  21. 21.
    Xie R, Everett LJ, Lim H-W, Patel Na, Schug J, Kroon E, Kelly OG, Wang A, D’Amour Ka, Robins AJ, Won KJ, Kaestner KH, Sander M (2013) Dynamic chromatin remodeling mediated by polycomb proteins orchestrates pancreatic differentiation of human embryonic stem cells. Cell Stem Cell 12(2):224–37. ISSN 1875–9777Google Scholar
  22. 22.
    Lerch A, Gaiditzis D, Stadler MB (2012) QuasR: quantify and annotate short reads in RGoogle Scholar
  23. 23.
    Durinck S, Moreau Y, Kasprzyk A, Davis S, De Moor B, Brazma A, Huber W (2005) BioMart and bioconductor: a powerful link between biological databases and microarray data analysis. Bioinformatics 21(16):3439–40. ISSN 13674803Google Scholar
  24. 24.
    Durinck S, Spellman P, Birney E, Huber W (2009) Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Nat Protoc 4(8):1184–1191PubMedCentralPubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.Institute of Molecular Life SciencesUniversity of ZurichZurichSwitzerland
  2. 2.SIB Swiss Institute of BioinformaticsUniversity of ZurichZurichSwitzerland

Personalised recommendations