Skip to main content

Statistical Analysis of ChIP-seq Data with MOSAiCS

  • Protocol
  • First Online:
Deep Sequencing Data Analysis

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1038))

Abstract

Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is invaluable for identifying genome-wide binding of transcription factors and mapping of epigenomic profiles. We present a statistical protocol for analyzing ChIP-seq data. We describe guidelines for data preprocessing and quality control and provide detailed examples of identifying ChIP-enriched regions using the Bioconductor package “mosaics.”

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Shen Y, Yue F, McCleary DF, Ye Z, Edsall L, Kuan S, Wagner U, Dixon J, Lee L, Lobanenkov VV, Ren B (2012) A map of cis-regulatory sequences in the mouse genome. Nature 488:116–120

    Article  PubMed  CAS  Google Scholar 

  2. Fujiwara T, O’Geen H, Keles S, Blahnik K, Linnemann AK, Kang Y, Choi K, Farnham PJ, Bresnick EH (2009) Discovering hematopoietic mechanisms through genome-wide analysis of GATA factor chromatin occupancy. Mol Cell 36(4):667–681

    Article  PubMed  CAS  Google Scholar 

  3. Wilbanks EG, Facciotti MT (2010) Evaluation of algorithm performance in ChIP-Seq peak detection. PLoS One 5:e11471

    Article  PubMed  Google Scholar 

  4. Chen Y, Negre N, Li Q, Mieczkowska JO, Slattery M, Liu T, Zhang T, Kim T-K, He HH, Zieba J, Ruan Y, Bickel PJ, Myers RM, Wold BJ, White KP, Lieb JD, Liu XS (2012) Systematic evaluation of factors influencing ChIP-seq fidelity. Nat Methods 9(6):609–614

    Article  PubMed  CAS  Google Scholar 

  5. Kuan PF, Chung D, Pan G, Thomson JA, Stewart R, Keles S (2011) A statistical framework for the analysis of ChIP-Seq data. J Am Stat Assoc 106(495):891–903

    Article  CAS  Google Scholar 

  6. Chung D, Kuan P-F, Li B, SanalKumar R, Liang K, Bresnick E, Dewey C, Keles S (2011) Discovering transcription factor binding sites in highly repetitive regions of genomeswith multi-read analysis of ChIP-Seq data. PLoS Comput Biol 7(7):e1002111

    Article  PubMed  CAS  Google Scholar 

  7. Langmead B, Trapnell C, Pop M, Salzberg SL (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10:R25

    Article  PubMed  Google Scholar 

  8. Rozowsky J, Euskirchen G, Auerbach R, Zhang Z, Gibson T, Bjornson R, Carriero N, Snyder M, Gerstein M (2009) PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls. Nat Biotechnol 27:66–75

    Article  PubMed  CAS  Google Scholar 

  9. Benjamini Y, Speed TS (2012) Summarizing and correcting the GC content bias in high-throughput sequencing. Nucleic Acids Res 40(10):e72

    Article  PubMed  CAS  Google Scholar 

  10. Liang K, Keles S (2012) Normalization of ChIP-seq data with control. BMC Bioinformatics 13:199

    Article  PubMed  Google Scholar 

  11. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy Stat Soc B Met 57(1):289–300

    Google Scholar 

  12. Liang K, Keles S (2012) Detecting differential binding of transcription factors with ChIP-seq. Bioinformatics 28(1):121–122

    Article  PubMed  CAS  Google Scholar 

  13. Zeng X, Sanalkumar R, Bresnick EH, Li H, Chang Q, Keles S (2012) jMOSAiCS: joint analysis of multiple ChIP-seq datasets. Submitted. Technical report available at http://www.stat.wisc.edu/~keles/Papers/jmosaics.pdf. R package available at http://www.stat.wisc.edu/~keles/Software/

Download references

Acknowledgments

This work is supported by National Institutes of Health Grants (HG0067161, HG003747) to S.K. We thank Audrey Gasch and Jeff Lewis (yeast TFx), John Svaren and Rajini Srinivasan (Sox10 in rat), and Qiang Chang and Emily Cunningham (human ChIP-seq) for the datasets and useful discussions regarding the analysis.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Appendix: R Script for the Analysis of Yeast TFx ChIP-seq Datasets

Appendix: R Script for the Analysis of Yeast TFx ChIP-seq Datasets

library( mosaics)

library( hexbin)

# construct bin-level files for each replicate ChIP sample #

constructBins(infile = "TFx_EtOH_IP_1_2mis_bowtie_uni.txt", outfileLoc = "/bin_cap0",

fileFormat = "bowtie", byChr = FALSE, fragLen = 200, binSize = 200, capping = 0, PET = FALSE)

constructBins(infile = "TFx_EtOH_IP_2_2mis_bowtie_uni.txt", outfileLoc = "/bin_cap0",

fileFormat = "bowtie", byChr = FALSE, fragLen = 200, binSize = 200, capping = 0, PET = FALSE)

constructBins( infile = "TFx_EtOH_IP_3_2mis_bowtie_uni.txt", outfileLoc = "/bin_cap0",

fileFormat = "bowtie", byChr = FALSE, fragLen = 200, binSize = 200, capping = 0, PET = FALSE)

# construct bin-level files for pooled ChIP (uncapped and capped by 3) and pooled input samples #

constructBins( infile = "TFx_EtOH_input_2mis_bowtie_uni.txt", outfileLoc = "/bin_cap0",

fileFormat = "bowtie", byChr = FALSE, fragLen = 200, binSize = 200, capping = 0, PET = FALSE)

constructBins( infile = "TFx_EtOH_input_2mis_bowtie_uni.txt", outfileLoc = "/bin_cap3",

fileFormat = "bowtie", byChr = FALSE, fragLen = 200, binSize = 200, capping = 3, PET = FALSE)

constructBins( infile = "TFx_EtOH_chip_2mis_bowtie_uni.txt", outfileLoc = "/bin_cap0",

fileFormat = "bowtie", byChr = FALSE, fragLen = 200, binSize = 200, capping = 0, PET = FALSE)

# generate hexbin plots between replicates to decide on pooling #

bin_rep = vector( “list”,3)

bin_rep[[1]] = readBins(type = c("chip","input"), fileName = c( "/bin_cap0/TFx_EtOH_IP_1_2mis_bowtie_uni.txt_fragL200_bin200.txt"," /bin_cap0/TFx_EtOH_IP_2_2mis_bowtie_uni.txt_fragL200_bin200.txt"))

bin_rep[[2]] = readBins(type = c("chip","input"), fileName = c( "/bin_cap0/TFx_EtOH_IP_1_2mis_bowtie_uni.txt_fragL200_bin200.txt","/bin_cap0/TFx_EtOH_IP_3_2mis_bowtie_uni.txt_fragL200_bin200.txt"))

bin_rep[[3]] = readBins(type = c("chip","input"), fileName = c( "/bin_cap0/TFx_EtOH_IP_2_2mis_bowtie_uni.txt_fragL200_bin200.txt"," /bin_cap0/TFx_EtOH_IP_3_2mis_bowtie_uni.txt_fragL200_bin200.txt"))

xlabel = c( “rep1”, ”rep1”, ”rep2”); ylabel = c( “rep2”, ”rep3”, ”rep3”)

for (i in 1:3) {

a = hexbin( bin_rep[[i]]@tagCount, bin_rep[[i]]@input, xbins = 100)

plot( a, trans = log, inv = exp, xlab = xlabel[i], ylab = ylabel[i], colramp = rainbow)

}

# generate hexbin plots of uncapped and capped bin-level read count data #

bin_cap = readBins(type = c("chip","input"), filename = c( "/bin_cap0/TFx_EtOH_IP_2mis_bowtie_uni.txt_fragL200_bin200.txt", "/bin_cap3/TFx_EtOH_IP_2mis_bowtie_uni.txt_fragL200_bin200.txt"))

a = hexbin (bin_cap@ input, bin_cap@ tagCount, xbins = 100)

plot(a, trans = log, inv = exp, xlab = "Capped counts", ylab = "Uncapped counts", colramp = rainbow)

# load bin-level files for MOSAiCS analysis #

bin = readBins(type = c("chip","input"), filename = c("/bin_cap0/TFx_EtOH_IP_2mis_bowtie_uni.txt_fragL200_bin200.txt", "/bin_cap0/TFx_EtOH_input_2mis_bowtie_uni.txt_fragL200_bin200.txt"))

# generate hexbin plot to check ChIP enrichment over Input#

a = hexbin (bin@input, bin@tagCount, xbins = 100)

plot(a, trans = log, inv = exp, xlab = "Input", ylab = "ChIP", colramp = rainbow)

# histogram of bin-level files #

plot(bin)

# use Input-only model to fit data #

fit = mosaicsFit(bin, analysisType = "IO", bgEst = "automatic", truncProb = 0.08)

# goodness-of-fit plot of the model #

plot(fit)

# call peaks #

thres = quantile (fit@tagCount, probs = 0.95)

peak = mosaicsPeak(fit, signalModel = "2S", FDR = 0.05, maxgap = 200, minsize = 50, thres = thres)

# export peaks #

export(peak, type = "txt", filename = "TFx_TSpeakList.txt")

export(peak, type = "bed", filename = "TFx_TSpeakList.bed")

# generate wig files #

generateWig(infile = "TFx_EtOH_IP_2mis_bowtie_uni.txt ", PET = FALSE, fileFormat = "bowtie", outfileLoc = "/bin_cap0/", byChr = FALSE, useChrfile = FALSE, chrfile = NULL, fragLen = 200, span = 200, capping = 0)

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media New York

About this protocol

Cite this protocol

Sun, G., Chung, D., Liang, K., Keleş, S. (2013). Statistical Analysis of ChIP-seq Data with MOSAiCS. In: Shomron, N. (eds) Deep Sequencing Data Analysis. Methods in Molecular Biology, vol 1038. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-62703-514-9_12

Download citation

  • DOI: https://doi.org/10.1007/978-1-62703-514-9_12

  • Published:

  • Publisher Name: Humana Press, Totowa, NJ

  • Print ISBN: 978-1-62703-513-2

  • Online ISBN: 978-1-62703-514-9

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics