Analysis of ChIP-seq Data in R/Bioconductor

  • Ines de SantiagoEmail author
  • Thomas Carroll
Part of the Methods in Molecular Biology book series (MIMB, volume 1689)


The development of novel high-throughput sequencing methods for ChIP (chromatin immunoprecipitation) has provided a very powerful tool to study gene regulation in multiple conditions at unprecedented resolution and scale. Proactive quality-control and appropriate data analysis techniques are of critical importance to extract the most meaningful results from the data. Over the last years, an array of R/Bioconductor tools has been developed allowing researchers to process and analyze ChIP-seq data. This chapter provides an overview of the methods available to analyze ChIP-seq data based primarily on software packages from the open-source Bioconductor project. Protocols described in this chapter cover basic steps including data alignment, peak calling, quality control and data visualization, as well as more complex methods such as the identification of differentially bound regions and functional analyses to annotate regulatory regions. The steps in the data analysis process were demonstrated on publicly available data sets and will serve as a demonstration of the computational procedures routinely used for the analysis of ChIP-seq data in R/Bioconductor, from which readers can construct their own analysis pipelines.

Key words

ChIP-seq Bioconductor Workflow Sequencing Computational analysis Data analysis 


  1. 1.
    Landt SG, Marinov GK, Kundaje A et al (2012) ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Res 22(9):1813–1831CrossRefPubMedPubMedCentralGoogle Scholar
  2. 2.
    Meyer CA, Liu XS (2014) Identifying and mitigating bias in next-generation sequencing methods for chromatin biology. Nat Rev Genet 15(11):709–721CrossRefPubMedPubMedCentralGoogle Scholar
  3. 3.
    Consortium EP (2012) An integrated encyclopedia of dNA elements in the human genome. Nature 489(7414):57–74CrossRefGoogle Scholar
  4. 4.
  5. 5.
    Zhu Y, Stephens RM, Meltzer PS et al (2013) SRAdb: query and use public next-generation sequencing data from within R. BMC Bioinformatics 14:19CrossRefPubMedPubMedCentralGoogle Scholar
  6. 6.
    Pagés H, Aboyoun P, Gentleman R, DebRoy S (2016) Biostrings: string objects representing biological sequences, and matching algorithms. R package version. 2.42.1Google Scholar
  7. 7.
    Liao Y, Smyth GK, Shi W (2013) The subread aligner: fast, accurate and scalable read mapping by seed-and-vote. Nucleic Acids Res 41:e108CrossRefPubMedPubMedCentralGoogle Scholar
  8. 8.
    Lawrence M, Gentleman R, Carey V (2009) Rtracklayer: an R package for interfacing with genome browsers. Bioinformatics 25(14):1841–1842CrossRefPubMedPubMedCentralGoogle Scholar
  9. 9.
    Brown G (2016) GreyListChIP: grey lists – mask artefact regions based on ChIP inputs. R package version 1.4.1Google Scholar
  10. 10.
    Langmead B, Trapnell C, Pop M et al (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10(3):R25CrossRefPubMedPubMedCentralGoogle Scholar
  11. 11.
    Gaidatzis D, Lerch A, Hahne F et al (2015) QuasR: quantification and annotation of short reads in R. Bioinformatics 31(7):1130–1132CrossRefPubMedGoogle Scholar
  12. 12.
    Morgan M, Pagés H, Obenchain V, et al (2016) Rsamtools: binary alignment (BAM), FASTA, variant call (BCF), and Tabix file import. R package version 1.26.1Google Scholar
  13. 13.
    Park PJ (2009) ChIP–seq: advantages and challenges of a maturing technology. Nat Rev Genet 10(10):669–680CrossRefPubMedPubMedCentralGoogle Scholar
  14. 14.
    Zhang Y, Liu T, Meyer CA et al (2008) Model-based analysis of ChIP-seq (MACS). Genome Biol 9(9):R137CrossRefPubMedPubMedCentralGoogle Scholar
  15. 15.
    Kharchenko PV, Tolstorukov MY, Park PJ (2008) Design and analysis of ChIP-seq experiments for DNA-binding proteins. Nat Biotechnol 26(12):1351–1359CrossRefPubMedPubMedCentralGoogle Scholar
  16. 16.
    Hower V, Evans SN, Pachter L (2011) Shape-based peak identification for ChIP-seq. BMC Bioinformatics 12:1CrossRefGoogle Scholar
  17. 17.
    Kornacker K, Rye MB, Haandstad T et al (2012) The triform algorithm: improved sensitivity and specificity in ChIP-seq peak finding. BMC Bioinformatics 13:1CrossRefGoogle Scholar
  18. 18.
    Carroll TS, Liang Z, Salama R et al (2014) Impact of artifact removal on chIP quality metrics in ChIP-seq and ChIP-exo data. Front Genet 5:75CrossRefPubMedPubMedCentralGoogle Scholar
  19. 19.
    Planet E, Attolini CS-O, Reina O et al (2012) htSeqTools: high-throughput sequencing quality control, processing and visualization in R. Bioinformatics 28(4):589–590CrossRefPubMedGoogle Scholar
  20. 20.
    Liu T, Ortiz JA, Taing L et al (2011) Cistrome: an integrative platform for transcriptional regulation studies. Genome Biol 12(8):R183CrossRefGoogle Scholar
  21. 21.
    Stark R, Brown G (2011) DiffBind: Differential binding analysis of ChIP-seq peak data. R package version 2.2.8Google Scholar
  22. 22.
    Ross-Innes CS, Stark R, Teschendorff AE et al (2012) Differential oestrogen receptor binding is associated with clinical outcome in breast cancer. Nature 481(7381):389–393PubMedPubMedCentralGoogle Scholar
  23. 23.
    Love MI, Huber W, Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15(12):550CrossRefPubMedPubMedCentralGoogle Scholar
  24. 24.
    Robinson MD, McCarthy DJ, Smyth GK (2010) EdgeR: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26(1):139–140CrossRefPubMedGoogle Scholar
  25. 25.
    Liang K (2012) DBChIP: Differential Binding of Transcription Factor with ChIP-Seq. R package version 1.16.0Google Scholar
  26. 26.
    Shao Z, Zhang Y, Yuan G-C et al (2012) MAnorm: a robust model for quantitative comparison of ChIP-seq data sets. Genome Biol 13(3):R16CrossRefPubMedPubMedCentralGoogle Scholar
  27. 27.
    Chen L, Wang C, Qin ZS et al (2015) A novel statistical method for quantitative comparison of multiple ChIP-seq datasets. Bioinformatics 31(12):1889–1896CrossRefPubMedPubMedCentralGoogle Scholar
  28. 28.
    Dharmalingam G, Carroll T (2015) SoGGi: visualise ChIP-Seq, MNase-Seq and motif occurrence as aggregate plots summarised over grouped genomic intervals. R package version 1.4.4Google Scholar
  29. 29.
    Carroll T, Khadayate S, Pajon A, et al (2014) Tracktables: build IGV tracks and HTML reports. R package version 1.8.1Google Scholar
  30. 30.
    Zhu LJ, Gazin C, Lawson ND et al (2010) ChIPpeakAnno: a bioconductor package to annotate ChIP-seq and ChIP-chip data. BMC Bioinformatics 11:1Google Scholar
  31. 31.
    Heinz S, Benner C, Spann N et al (2010) Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and b cell identities. Mol Cell 38(4):576–589CrossRefPubMedPubMedCentralGoogle Scholar
  32. 32.
    Yu G, Wang L-G, He Q-Y (2015) ChIPseeker: an R/Bioconductor package for ChIP peak annotation, comparison and visualization. Bioinformatics 31(14):2382–2383CrossRefPubMedGoogle Scholar
  33. 33.
    Welch RP, Lee C, Imbriano PM et al (2014) ChIP-enrich: gene set enrichment testing for ChIP-seq data. Nucleic Acids Res 42(13):e105CrossRefPubMedPubMedCentralGoogle Scholar
  34. 34.
    McLean CY, Bristor D, Hiller M et al (2010) GREAT improves functional interpretation of cis-regulatory regions. Nat Biotechnol 28(5):495–501CrossRefPubMedPubMedCentralGoogle Scholar
  35. 35.
    Machanick P, Bailey TL (2011) MEME-ChIP: motif analysis of large DNA datasets. Bioinformatics 27(12):1696–1697CrossRefPubMedPubMedCentralGoogle Scholar
  36. 36.
    Thomas-Chollier M, Herrmann C, Defrance M et al (2012) RSAT peak-motifs: motif analysis in full-size ChIP-seq datasets. Nucleic Acids Res 40(4):e31CrossRefPubMedGoogle Scholar
  37. 37.
    Lawrence M, Huber W, Pagés H et al (2013) Software for computing and annotating genomic ranges. PLoS Comput Biol 9(8):e1003118CrossRefPubMedPubMedCentralGoogle Scholar
  38. 38.
    Team TBD (2014) BSgenome.Mmusculus.UCSC.mm10: full genome sequences for mus musculus UCSC version mm10. R package version 1.4.0Google Scholar
  39. 39.
    Matys V, Kel-Margoulis OV, Fricke E et al (2006) TRANSFAC and its module tRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic Acids Res 34(Database issue):D108–D110CrossRefPubMedGoogle Scholar
  40. 40.
    Mathelier A, Zhao X, Zhang AW et al (2013) JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles. Nucleic Acids Res 42(Database issue):D142–D147PubMedPubMedCentralGoogle Scholar
  41. 41.
    Shannon P (2016) MotifDb: an annotated collection of protein-DNA binding sequence motifs. R package version 1.14.0.Google Scholar
  42. 42.
    Bembom O (2016) SeqLogo: sequence Logos for DNA sequence alignments. R package version 1.40.0Google Scholar
  43. 43.
    Rashid NU, Giresi PG, Ibrahim JG et al (2011) ZINBA integrates local covariates with DNA-seq data to identify broad and narrow regions of enrichment, even within amplified genomic regions. Genome Biol 12(7):R67CrossRefPubMedPubMedCentralGoogle Scholar
  44. 44.
    Wang J, Lunyak VV, Jordan IK (2013) BroadPeak: a novel algorithm for identifying broad peaks in diffuse ChIP-seq datasets. Bioinformatics 29(4):492–493CrossRefPubMedGoogle Scholar
  45. 45.
    Harmanci A, Rozowsky J, Gerstein M (2104) MUSIC: identification of enriched regions in ChIP-seq experiments using a mappability-corrected multiscale signal processing framework. Genome Biol 15(10):474CrossRefGoogle Scholar
  46. 46.
    Leek JT, Scharpf RB, Bravo HC et al (2010) Tackling the widespread and critical impact of batch effects in high-throughput data. Nat Rev Genet 11:733–739CrossRefPubMedGoogle Scholar
  47. 47.
    Nakato R, Shirahige K (2016) Recent advances in ChIP-seq analysis: from quality management to whole-genome annotation. Brief Bioinform 8(2):279–290Google Scholar

Copyright information

© Springer Science+Business Media LLC 2018

Authors and Affiliations

  1. 1.Li Ka Shing Centre, Cancer Research UK Cambridge InstituteUniversity of CambridgeCambridgeUK
  2. 2.Medical Research Council, London Institute of Medical SciencesImperial College LondonLondonUK

Personalised recommendations