Advertisement

Tiling Arrays pp 105-124 | Cite as

Integrative Analysis of ChIP-Chip and ChIP-Seq Dataset

  • Lihua Julie Zhu
Part of the Methods in Molecular Biology book series (MIMB, volume 1067)

Abstract

Epigenetic regulation and interactions between transcription factors and regulatory genomic regions play crucial roles in controlling transcriptional regulatory networks that drive development, environmental responses, and disease. Chromatin immunoprecipitation (ChIP) followed by high-throughput sequencing (ChIP-seq) and ChIP followed by genomic tiling microarray hybridization (ChIP-chip) are the two of the most widely used technologies for genome-wide identification of DNA protein interactions and histone modification in vivo. Many algorithms and tools have been developed and evaluated that allow identification of transcription factor binding sites from ChIP-seq or ChIP-chip datasets. However, binding site identification is only the first step; the ultimate goal is to discover the regulatory network of the transcription factor (TF). Here, we present a common workflow for downstream analysis of ChIP-chip and ChIP-seq with an emphasis on annotating binding sites and integration with gene expression data to identify direct and indirect targets of the TF. These tools will help with the overall goal of unraveling transcriptional regulatory networks using datasets publicly available in GEO.

Key words

ChIP-chip ChIP-seq Gene expression Regulatory network Microarray GO enrichment analysis Pathway analysis Integrated analysis Transcription factor Motif discovery 

Notes

Acknowledgment

I would like to thank Dr. Michael Brodsky at Program in Gene Function and Expression in University of Massachusetts Medical School for his critical review of the manuscript and his excellent suggestions.

References

  1. 1.
    Johnson DS, Mortazavi A, Myers RM, Wold B (2007) Genome-wide mapping of in vivo protein-DNA interactions. Science 316:1497–1502PubMedCrossRefGoogle Scholar
  2. 2.
    Robertson G, Hirst M, Bainbridge M, Bilenky M, Zhao Y, Zeng T et al (2007) Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat Methods 4:651–657PubMedCrossRefGoogle Scholar
  3. 3.
    Valouev A, Johnson DS, Sundquist A, Medina C, Anton E, Batzoglou S et al (2008) Genome-wide analysis of transcription factor binding sites based on ChIP-Seq data. Nat Methods 5:829–834PubMedCrossRefGoogle Scholar
  4. 4.
    Johnson DS, Li W, Gordon DB, Bhattacharjee A, Curry B, Ghosh J et al (2008) Systematic evaluation of variability in ChIP-chip experiments using predefined DNA targets. Genome Res 18:393–403PubMedCrossRefGoogle Scholar
  5. 5.
    Kidder BL, Hu G, Zhao K (2011) ChIP-Seq: technical considerations for obtaining high-quality data. Nat Immunol 12:918–922PubMedCrossRefGoogle Scholar
  6. 6.
    Buck MJ, Lieb JD (2004) ChIP-chip: considerations for the design, analysis, and application of genome-wide chromatin immunoprecipitation experiments. Genomics 83:349–360PubMedCrossRefGoogle Scholar
  7. 7.
    Park PJ (2009) ChIP-seq: advantages and challenges of a maturing technology. Nat Rev Genet 10:669–680PubMedCrossRefGoogle Scholar
  8. 8.
    Ho JW, Bishop E, Karchenko PV, Negre N, White KP, Park PJ (2011) ChIP-chip versus ChIP-seq: lessons for experimental design and data analysis. BMC Genomics 12:134PubMedCrossRefGoogle Scholar
  9. 9.
    Metzker ML (2010) Sequencing technologies—the next generation. Nat Rev Genet 11:31–46PubMedCrossRefGoogle Scholar
  10. 10.
    Cock PJ, Fields CJ, Goto N, Heuer ML, Rice PM (2010) The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res 38:1767–1771PubMedCrossRefGoogle Scholar
  11. 11.
    Langmead B, Trapnell C, Pop M, Salzberg SL (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10:R25PubMedCrossRefGoogle Scholar
  12. 12.
    Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754–1760PubMedCrossRefGoogle Scholar
  13. 13.
    Li R, Li Y, Kristiansen K, Wang J (2008) SOAP: short oligonucleotide alignment program. Bioinformatics 24:713–714PubMedCrossRefGoogle Scholar
  14. 14.
    Li H, Homer N (2010) A survey of sequence alignment algorithms for next-generation sequencing. Brief Bioinform 11:473–483PubMedCrossRefGoogle Scholar
  15. 15.
    Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE et al (2008) Model-based analysis of ChIP-Seq (MACS). Genome Biol 9:R137PubMedCrossRefGoogle Scholar
  16. 16.
    Fejes AP, Robertson G, Bilenky M, Varhol R, Bainbridge M, Jones SJ (2008) FindPeaks 3.1: a tool for identifying areas of enrichment from massively parallel short-read sequencing technology. Bioinformatics 24:1729–1730PubMedCrossRefGoogle Scholar
  17. 17.
    Albert I, Wachi S, Jiang C, Pugh BF (2008) GeneTrack – a genomic data processing and visualization framework. Bioinformatics 24:1305–1306PubMedCrossRefGoogle Scholar
  18. 18.
    Jothi R, Cuddapah S, Barski A, Cui K, Zhao K (2008) Genome-wide identification of in vivo protein-DNA binding sites from ChIP-Seq data. Nucleic Acids Res 36:5221–5231PubMedCrossRefGoogle Scholar
  19. 19.
    Nix DA, Courdy SJ, Boucher KM (2008) Empirical methods for controlling false positives and estimating confidence in ChIP-Seq peaks. BMC Bioinformatics 9:523PubMedCrossRefGoogle Scholar
  20. 20.
    Spyrou C, Stark R, Lynch AG, Tavare S (2009) BayesPeak: Bayesian analysis of ChIP-seq data. BMC Bioinformatics 10:299PubMedCrossRefGoogle Scholar
  21. 21.
    Ji H, Jiang H, Ma W, Wong WH (2011) Using CisGenome to analyze ChIP-chip and ChIP-seq data. Curr Protoc Bioinformatics Chapter 2:Unit2 13Google Scholar
  22. 22.
    Muino JM, Kaufmann K, van Ham RC, Angenent GC, Krajewski P (2011) ChIP-seq Analysis in R (CSAR): an R package for the statistical detection of protein-bound genomic regions. Plant Methods 7:11PubMedCrossRefGoogle Scholar
  23. 23.
    Taslim C, Huang T, Lin S (2011) DIME: R-package for identifying differential ChIP-seq based on an ensemble of mixture models. Bioinformatics 27:1569–1570PubMedCrossRefGoogle Scholar
  24. 24.
    Zhang X, Robertson G, Krzywinski M, Ning K, Droit A, Jones S, Gottardo R (2011) PICS: probabilistic inference for ChIP-seq. Biometrics 67:151–163PubMedCrossRefGoogle Scholar
  25. 25.
    Wilbanks EG, Facciotti MT (2010) Evaluation of algorithm performance in ChIP-seq peak detection. PLoS One 5:e11471PubMedCrossRefGoogle Scholar
  26. 26.
    Laajala TD, Raghav S, Tuomela S, Lahesmaa R, Aittokallio T, Elo LL (2009) A practical comparison of methods for detecting transcription factor binding sites in ChIP-seq experiments. BMC Genomics 10:618PubMedCrossRefGoogle Scholar
  27. 27.
    Kasprzyk A, Keefe D, Smedley D, London D, Spooner W, Melsopp C et al (2004) EnsMart: a generic system for fast and flexible access to biological data. Genome Res 14:160–169PubMedCrossRefGoogle Scholar
  28. 28.
    Zhu LJ, Gazin C, Lawson ND, Pages H, Lin SM, Lapointe DS, Green MR (2010) ChIPpeakAnno: a bioconductor package to annotate ChIP-seq and ChIP-chip data. BMC Bioinformatics 11:237PubMedCrossRefGoogle Scholar
  29. 29.
    Bailey TL, Elkan C (1994) Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc Int Conf Intell Syst Mol Biol 2:28–36PubMedGoogle Scholar
  30. 30.
    Bailey TL (2011) DREME: motif discovery in transcription factor ChIP-seq data. Bioinformatics 27:1653–1659PubMedCrossRefGoogle Scholar
  31. 31.
    Li L (2009) GADEM: a genetic algorithm guided formation of spaced dyads coupled with an EM algorithm for motif discovery. J Comput Biol 16:317–329PubMedCrossRefGoogle Scholar
  32. 32.
    Hochbaum D, Zhang Y, Stuckenholz C, Labhart P, Alexiadis V, Martin R et al (2011) DAF-12 regulates a connected network of genes to ensure robust developmental decisions. PLoS Genet 7:e1002179PubMedCrossRefGoogle Scholar
  33. 33.
    Fisher AL, Lithgow GJ (2006) The nuclear hormone receptor DAF-12 has opposing effects on Caenorhabditis elegans lifespan and regulates genes repressed in multiple long-lived worms. Aging Cell 5:127–138PubMedCrossRefGoogle Scholar
  34. 34.
    Sandelin A, Alkema W, Engstrom P, Wasserman WW, Lenhard B (2004) JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res 32(Database issue):D91–D94PubMedCrossRefGoogle Scholar
  35. 35.
    Schneider TD, Stephens RM (1990) Sequence logos: a new way to display consensus sequences. Nucleic Acids Res 18:6097–6100PubMedCrossRefGoogle Scholar
  36. 36.
    Mahony S, Benos PV (2007) STAMP: a web tool for exploring DNA-binding motif similarities. Nucleic Acids Res 35(Web Server issue):W253–W258PubMedCrossRefGoogle Scholar
  37. 37.
    Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S et al (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5:R80PubMedCrossRefGoogle Scholar
  38. 38.
    Ihaka R, Gentlemen R (1996) R: a language for data analysis and graphics. J Comput Graph Stat 5:299–314Google Scholar
  39. 39.
    Hinrichs AS, Karolchik D, Baertsch R, Barber GP, Bejerano G et al (2006) The UCSC Genome Browser Database: update 2006. Nucleic Acids Res 34(Database issue):D590–D598PubMedCrossRefGoogle Scholar
  40. 40.
    Lawrence M, Gentleman R, Carey V (2009) rtracklayer: an R package for interfacing with genome browsers. Bioinformatics 25:1841–1842PubMedCrossRefGoogle Scholar
  41. 41.
    Mahony S, Auron PE, Benos PV (2007) DNA familial binding profiles made easy: comparison of various motif alignment and clustering strategies. PLoS Comput Biol 3:e61PubMedCrossRefGoogle Scholar
  42. 42.

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Lihua Julie Zhu
    • 1
  1. 1.Program in Gene Function and Expression; Program in Bioinformatics and Integrated Biology; Program in Molecular MedicineUniversity of Massachusetts – Medical SchoolWorcesterUSA

Personalised recommendations