Computational Analysis of ChIP-chip Data

  • Hongkai JiEmail author
Part of the Springer Handbooks of Computational Statistics book series (SHCS)


Chromatin immunoprecipitation coupled with genome tiling array hybridization, also known as ChIP-chip, is a powerful technology to identify protein-DNA interactions in genomes. It is widely used to locate transcription factor binding sites and histone modifications. Data generated by ChIP-chip provide important information on gene regulation. This chapter reviews fundamental issues in ChIP-chip data analysis. Topics include data preprocessing, background correction, normalization, peak detection and motif analysis. Statistical models and principles that significantly improve data analysis are discussed. Popular software tools are briefly introduced.


Hide Markov Model Probe Intensity Quantile Normalization Tiling Array ChIP Sample 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work is partially supported by the Johns Hopkins Faculty Professional Development Fund to H.J. The author would like to thank Jennifer T. Judy for helpful comments and proofreading the draft of this chapter.


  1. 1.
    Bailey, T. L., & Elkan, C. (1994). Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In Proceedings of the second international conference on intelligent systems for molecular biology (pp. 28–36). Menlo Park, California, USA: AAAI Press.Google Scholar
  2. 2.
    Bailey, T. L., & Gribskov, M. (1998). Combining evidence using p-values: Application to sequence homology searches. Bioinformatics, 14, 48–54.CrossRefGoogle Scholar
  3. 3.
    Baldi, P., & Long, A. D. (2001). A Bayesian framework for the analysis of microarray expression data: Regularized t-test and statistical inferences of gene changes. Bioinformatics, 17, 509–519.CrossRefGoogle Scholar
  4. 4.
    Barrett, T., Troup, D. B., Wilhite, S. E., et al. (2007). NCBI GEO: Mining tens of millions of expression profiles – database and tools update. Nucleic Acids Research, 35(Database issue), D760–765.Google Scholar
  5. 5.
    Bernstein, B. E., Mikkelsen, T. S., Xie, X., et al. (2006). A bivalent chromatin structure marks key developmental genes in embryonic stem cells. Cell, 125, 315–326.CrossRefGoogle Scholar
  6. 6.
    Bolstad, B. M., Irizarry, R. A., Astrand, M., & Speed, T. P. (2003). A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics, 19, 185–193.CrossRefGoogle Scholar
  7. 7.
    Boyer, L. A., et al. (2005). Core transcriptional regulatory circuitry in human embryonic stem cells. Cell, 122, 947–956.CrossRefGoogle Scholar
  8. 8.
    Carroll, J. S., et al. (2006). Genome-wide analysis of estrogen receptor binding sites. Nature Genetics, 38, 1289–1297.CrossRefGoogle Scholar
  9. 9.
    Cawley, S., et al. (2004). Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell, 116, 499–509.CrossRefGoogle Scholar
  10. 10.
    Cui, X., Hwang, J. T. G., Qiu, J., et al. (2005). Improved statistical tests for differential gene expression by shrinking variance components estimates. Biostatistics, 6, 59–75.CrossRefzbMATHGoogle Scholar
  11. 11.
    Durbin, R., Eddy, S. R., Krogh, A., & Mitchison, G. (1998). Biological sequence analysis – probabilistic models of proteins and nucleic acids. Cambridge: Cambridge University Press.CrossRefzbMATHGoogle Scholar
  12. 12.
    Irizarry, R. A., Hobbs, B., Collin, F., et al. (2003). Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics, 4, 249–264.CrossRefzbMATHGoogle Scholar
  13. 13.
    Jensen, S. T., Liu, X. S., Zhou, Q., & Liu, J. S. (2004). Computational discovery of gene regulatory binding motifs: A Bayesian perspective. Statistical Science, 19, 188–204.MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Ji, H., Vokes, S. A., & Wong, W. H. (2006). A comparative analysis of genome-wide chromatin immunoprecipitation data for mammalian transcription factors. Nucleic Acids Research, 34, e146.CrossRefGoogle Scholar
  15. 15.
    Ji, H., Jiang, H., Ma, W., et al. (2008). An integrated software system for analyzing ChIP-chip and ChIP-seq data. Nature Biotechnology, 26, 1293–1300.CrossRefGoogle Scholar
  16. 16.
    Ji, H., & Wong, W. H. (2005). TileMap: Create chromosomal map of tiling array hybridizations. Bioinformatics, 21, 3629–3636.CrossRefGoogle Scholar
  17. 17.
    Ji, X., Li, W., Song, J., Wei, L., & Liu, X. S. (2006). CEAS: cis-regulatory element annotation system. Nucleic Acids Research, 34, W551–554.CrossRefGoogle Scholar
  18. 18.
    Jiang, H., & Wong, W. H. (2008). SeqMap: Mapping massive amount of oligonucleotides to the genome. Bioinformatics, 24, 2395–2396.CrossRefGoogle Scholar
  19. 19.
    Johnson D. S., et al. (2008). Systematic evaluation of variability in ChIP-chip experiments using predefined DNA targets. Genome Research, 18, 393–403.CrossRefGoogle Scholar
  20. 20.
    Johnson, W. E., Li, W., Meyer, C. A., et al. (2006). Model-based analysis of tiling-arrays for ChIP-chip. Proceedings of the National Academy of Sciences of the United States of America, 103, 12457–12462.CrossRefGoogle Scholar
  21. 21.
    Judy, J. T., & Ji, H. (2009). TileProbe: Modeling tiling array probe effects using publicly available data. Bioinformatics, 25, 2369–2375.CrossRefGoogle Scholar
  22. 22.
    Kampa, D., et al. (2004). Novel RNAs identified from an in-depth analysis of the transcriptome of human chromosomes 21 and 22. Genome Research, 14, 331–342.CrossRefGoogle Scholar
  23. 23.
    Keles, S., van der Laan, M. J., Dudoit, S., & Cawley, S. E. (2006). Multiple testing methods for ChIP-Chip high density oligonucleotide array data. Journal of Computational Biology, 13, 579–613.MathSciNetCrossRefGoogle Scholar
  24. 24.
    Li, C., & Wong, W. H. (2001). Model-based analysis of oligonucleotide arrays: Expression index computation and outlier detection. Proceedings of the National Academy of Sciences of the United States of America, 98, 31–36.CrossRefzbMATHGoogle Scholar
  25. 25.
    Li, W., Carroll, J. S., Brown, M., & Liu, X. S. (2008). xMAN: Extreme MApping of OligoNucleotides. BMC Genomics, 9(Suppl. 1), S20.CrossRefGoogle Scholar
  26. 26.
    Li, W., Meyer, C. A., & Liu, X. S. (2005). A hidden markov model for analyzing ChIP-chip experiments on genome tiling arrays and its application to p53 binding se-quences. Bioinformatics, 21(Suppl. 1), i274–i282.CrossRefGoogle Scholar
  27. 27.
    Li, X. Y., MacArthur, S., & Bourgon, R. (2008). Transcription factors bind thousands of active and inactive regions in the Drosophila blastoderm. PLoS Biology, 6, e27.CrossRefGoogle Scholar
  28. 28.
    Liu, J. S., Neuwald, A. F., & Lawrence, C. E. (1995). Bayesian models for multiple local sequence alignment and Gibbs sampling strategies. Journal of the American Statistical Association, 90, 1156–1170.CrossRefzbMATHGoogle Scholar
  29. 29.
    Liu, X. S. (2007). Getting started in tiling microarray analysis. PLoS Computational Biology, 3, e183.CrossRefGoogle Scholar
  30. 30.
    Liu, X. S., Brutlag, D. L., & Liu, J. S. (2002). An algorithm for finding protein-DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments. Nature Biotechnology, 20, 835–839.Google Scholar
  31. 31.
    Qi, Y., et al. (2006). High-resolution computational models of genome binding events. Nature Biotechnology, 24, 963–970.CrossRefGoogle Scholar
  32. 32.
    Ren, B., Robert, F., Wyrick, J. J., et al. (2000). Genome-wide location and function of DNA binding proteins. Science, 290, 2306–2309.CrossRefGoogle Scholar
  33. 33.
    Shendure, J., & Ji, H. (2008). Next-generation DNA sequencing. Nature Biotechnology, 26, 1135–1145.CrossRefGoogle Scholar
  34. 34.
    Smyth, G. K. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology, 3, Article 3.Google Scholar
  35. 35.
    Song, J. S., et al. (2007). Microarray blob-defect removal improves array analysis. Bioinformatics, 23, 966–971.CrossRefGoogle Scholar
  36. 36.
    Vokes, S. A., et al. (2007). Genomic characterization of Gli-activator targets in sonic hedgehog-mediated neural patterning. Development, 134, 1977–1989.CrossRefGoogle Scholar
  37. 37.
    Vokes, S. A., Ji, H., Wong, W. H., & McMahon, A. P. (2008). A genome-scale analysis of the cis-regulatory circuitry underlying sonic hedgehog mediated patterning of the mammalian limb. Genes & Development, 22, 2651–2663.CrossRefGoogle Scholar
  38. 38.
    Wu, Z., Irizarry, R. A., Gentleman, R., et al. (2004). A model based background adjustement for oligonucleotide expression arrays. Journal of the American Statistical Association, 99, 909–917.MathSciNetCrossRefzbMATHGoogle Scholar
  39. 39.
    Zheng, M., Barrera, L. O., Ren, B., Wu, & Y. N. (2007). ChIP-chip: Data, model, and analysis. Biometrics,63, 787–796.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  1. 1.Department of BiostatisticsJohns Hopkins Bloomberg School of Public HealthBaltimoreUSA

Personalised recommendations