Skip to main content

Probabilistic Peak Calling and Controlling False Discovery Rate Estimations in Transcription Factor Binding Site Mapping from ChIP-seq

  • Protocol
  • First Online:
Computational Biology of Transcription Factor Binding

Part of the book series: Methods in Molecular Biology ((MIMB,volume 674))

Abstract

Localizing the binding sites of regulatory proteins is becoming increasingly feasible and accurate. This is due to dramatic progress not only in chromatin immunoprecipitation combined by next-generation sequencing (ChIP-seq) but also in advanced statistical analyses. A fundamental issue, however, is the alarming number of false positive predictions. This problem can be remedied by improved peak calling methods of twin peaks, one at each strand of the DNA, kernel density estimators, and false discovery rate estimations based on control libraries. Predictions are filtered by de novo motif discovery in the peak environments. These methods have been implemented in, among others, Valouev et al.’s Quantitative Enrichment of Sequence Tags (QuEST) software tool. We demonstrate the prediction of the human growth-associated binding protein (GABPα) based on ChIP-seq observations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. ENCODE Consortium. (2007) Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447, 799–816.

    Article  Google Scholar 

  2. Blanchette, M., Bataille, A.R., Chen, X. et al. (2006) Genome-wide computational prediction of transcriptional regulatory modules reveals new insights into human gene expression. Genome Res 16, 656–668.

    Article  PubMed  CAS  Google Scholar 

  3. Barski, A., and Zhao, K. (2009) Genomic location analysis by ChIP-Seq. J Cell Biochem 107, 11–18.

    Article  PubMed  CAS  Google Scholar 

  4. Carroll, J.S., Meyer, C.A., Song, J. et al. (2006) Genome-wide analysis of estrogen receptor binding sites. Nat Genet 38, 1289–1297.

    Article  PubMed  CAS  Google Scholar 

  5. Kim, T.H., Barrera, L.O., Zheng, M. et al. (2005) A high-resolution map of active promoters in the human genome. Nature 436, 876–880.

    Article  PubMed  CAS  Google Scholar 

  6. Lee, T.I., Jenner, R.G., Boyer, L.A. et al. (2006) Control of developmental regulators by Polycomb in human embryonic stem cells. Cell 125, 301–313.

    Article  PubMed  CAS  Google Scholar 

  7. Park, P.J. (2009) ChIP-seq: advantages and challenges of a maturing technology. Nat Rev Genet 10, 669–680.

    Article  PubMed  CAS  Google Scholar 

  8. Collas, P. (2009) The state-of-the-art of chromatin immunoprecipitation. Methods Mol Biol 567, 1–25.

    Article  PubMed  CAS  Google Scholar 

  9. Harbison, C.T., Gordon, D.B., Lee, T.I. et al. (2004) Transcriptional regulatory code of a eukaryotic genome. Nature 431, 99–104.

    Article  PubMed  CAS  Google Scholar 

  10. Ozsolak, F., Song, J.S., Liu, X.S. et al. (2007) High-throughput mapping of the chromatin structure of human promoters. Nat Biotechnol 25, 244–248.

    Article  PubMed  CAS  Google Scholar 

  11. Cawley, S., Bekiranov, S., Ng, H.H. et al. (2004) Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 116, 499–509.

    Article  PubMed  CAS  Google Scholar 

  12. Euskirchen, G., Royce, T.E., Bertone, P. et al. (2004) CREB binds to multiple loci on human chromosome 22. Mol Cell Biol 24, 3804–3814.

    Article  PubMed  CAS  Google Scholar 

  13. Mathur, D., Danford, T.W., Boyer, L.A. et al. (2008) Analysis of the mouse embryonic stem cell regulatory networks obtained by ChIP-chip and ChIP-PET. Genome Biol 9, R126.

    Article  PubMed  Google Scholar 

  14. Johnson, D.S., Li, W., Gordon, D.B. et al. (2008) Systematic evaluation of variability in ChIP-chip experiments using predefined DNA targets. Genome Res 18, 393–403.

    Article  PubMed  Google Scholar 

  15. Kim, J., Bhinge, A.A., Morgan, X.C. et al. (2005) Mapping DNA-protein interactions in large genomes by sequence tag analysis of genomic enrichment. Nat Methods 2, 47–53.

    Article  PubMed  CAS  Google Scholar 

  16. Bhinge, A.A., Kim, J., Euskirchen, G.M. et al. (2007) Mapping the chromosomal targets of STAT1 by Sequence Tag Analysis of Genomic Enrichment (STAGE). Genome Res 17, 910–916.

    Article  PubMed  CAS  Google Scholar 

  17. Quail, M.A., Kozarewa, I., Smith, F. et al. (2008) A large genome center’s improvements to the Illumina sequencing system. Nat Methods 5, 1005–1010.

    Article  PubMed  CAS  Google Scholar 

  18. Margulies, M., Egholm, M., Altman, W.E. et al. (2005) Genome sequencing in microfabricated high-density picolitre reactors. Nature 437, 376–380.

    PubMed  CAS  Google Scholar 

  19. Wei, C.L., Wu, Q., Vega, V.B. et al. (2006) A global map of p53 transcription-factor binding sites in the human genome. Cell 124, 207–219.

    Article  PubMed  CAS  Google Scholar 

  20. Johnson, D.S., Mortazavi, A., Myers, R.M. et al. (2007) Genome-wide mapping of in vivo protein-DNA interactions. Science 316, 1497–1502.

    Article  PubMed  CAS  Google Scholar 

  21. Robertson, G., Hirst, M., Bainbridge, M. et al. (2007) Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat Methods 4, 651–657.

    Article  PubMed  CAS  Google Scholar 

  22. Zeller, K.I., Zhao, X., Lee, C.W. et al. (2006) Global mapping of c-Myc binding sites and target gene networks in human B cells. Proc Natl Acad Sci USA 103, 17834–17839.

    Article  PubMed  CAS  Google Scholar 

  23. Hamza, M.S., Pott, S., Vega, V.B. et al. (2009) De-novo identification of PPARgamma/RXR binding sites and direct targets during adipogenesis. PLoS One 4, e4907.

    Article  PubMed  Google Scholar 

  24. Nielsen, R., Pedersen, T.A., Hagenbeek, D. et al. (2008) Genome-wide profiling of PPARgamma:RXR and RNA polymerase II occupancy reveals temporal activation of distinct metabolic pathways and changes in RXR dimer composition during adipogenesis. Genes Dev 22, 2953–2967.

    Article  PubMed  CAS  Google Scholar 

  25. Valouev, A., Johnson, D.S., Sundquist, A. et al. (2008) Genome-wide analysis of transcription factor binding sites based on ChIP-Seq data. Nat Methods 5, 829–834.

    Article  PubMed  CAS  Google Scholar 

  26. Ji, H., Jiang, H., Ma, W. et al. (2008) An integrated software system for analyzing ChIP-chip and ChIP-seq data. Nat Biotechnol 26, 1293–1300.

    Article  PubMed  CAS  Google Scholar 

  27. Fejes, A.P., Robertson, G., Bilenky, M. et al. (2008) FindPeaks 3.1: a tool for identifying areas of enrichment from massively parallel short-read sequencing technology. Bioinformatics 24, 1729–1730.

    Article  PubMed  CAS  Google Scholar 

  28. Zhang, Y., Liu, T., Meyer, C.A. et al. (2008) Model-based analysis of ChIP-Seq (MACS). Genome Biol 9, R137.

    Article  PubMed  Google Scholar 

  29. Jothi, R., Cuddapah, S., Barski, A. et al. (2008) Genome-wide identification of in vivo protein-DNA binding sites from ChIP-Seq data. Nucleic Acids Res 36, 5221–5231.

    Article  PubMed  CAS  Google Scholar 

  30. Nix, D.A., Courdy, S.J., and Boucher, K.M. (2008) Empirical methods for controlling false positives and estimating confidence in ChIP-Seq peaks. BMC Bioinformatics 9, 523.

    Article  PubMed  Google Scholar 

  31. Kharchenko, P.V., Tolstorukov, M.Y., and Park, P.J. (2008) Design and analysis of ChIP-seq experiments for DNA-binding proteins. Nat Biotechnol 26, 1351–1359.

    Article  PubMed  CAS  Google Scholar 

  32. Mortazavi, A., Williams, B.A., McCue, K. et al. (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 5, 621–628.

    Article  PubMed  CAS  Google Scholar 

  33. Boyle, A.P., Guinney, J., Crawford, G.E. et al. (2008) F-Seq: a feature density estimator for high-throughput sequence tags. Bioinformatics 24, 2537–2538.

    Article  PubMed  CAS  Google Scholar 

  34. Tuteja, G., White, P., Schug, J. et al. (2009) Extracting transcription factor targets from ChIP-Seq data. Nucleic Acids Res 37, e113.

    Article  PubMed  Google Scholar 

  35. Rozowsky, J., Euskirchen, G., Auerbach, R.K. et al. (2009) PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls. Nat Biotechnol 27, 66–75.

    Article  PubMed  CAS  Google Scholar 

  36. Briguet, A., and Ruegg, M.A. (2000) The Ets transcription factor GABP is required for postsynaptic differentiation in vivo. J Neurosci 20, 5989–5996.

    PubMed  CAS  Google Scholar 

  37. Rosmarin, A.G., Resendes, K.K., Yang, Z. et al. (2004) GA-binding protein transcription factor: a review of GABP as an integrator of intracellular signaling and protein-protein interactions. Blood Cells Mol Dis 32, 143–154.

    Article  PubMed  CAS  Google Scholar 

  38. Temple, M.D., and Murray, V. (2005) Footprinting the ‘essential regulatory region’ of the retinoblastoma gene promoter in intact human cells. Int J Biochem Cell Biol 37, 665–678.

    Article  PubMed  CAS  Google Scholar 

  39. Langmead, B., Trapnell, C., Pop, M. et al. (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10, R25.

    Article  PubMed  Google Scholar 

  40. Rumble, S.M., Lacroute, P., Dalca, A.V. et al. (2009) SHRiMP: accurate mapping of short color-space reads. PLoS Comput Biol 5, e1000386.

    Article  PubMed  Google Scholar 

  41. Warren, R.L., Sutton, G.G., Jones, S.J. et al. (2007) Assembling millions of short DNA sequences using SSAKE. Bioinformatics 23, 500–501.

    Article  PubMed  CAS  Google Scholar 

  42. Dohm, J.C., Lottaz, C., Borodina, T. et al. (2007) SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing. Genome Res 17, 1697–1706.

    Article  PubMed  CAS  Google Scholar 

  43. Slater, G.S., and Birney, E. (2005) Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics 6, 31.

    Article  PubMed  Google Scholar 

  44. Silverman, B. (1986) Density estimation for statistics and data analysis. Chapman and Hall, Boca Raton, FL.

    Google Scholar 

  45. Collins, P.J., Kobayashi, Y., Nguyen, L. et al. (2007) The ets-related transcription factor GABP directs bidirectional transcription. PLoS Genet 3, e208.

    Article  PubMed  Google Scholar 

  46. Benjamini, Y., Hochberg, Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple hypothesis testing. J R Statistic Soc B 57, 289–300.

    Google Scholar 

  47. Rhead, B., Karolchik, D., Kuhn, R.M. et al. (2009) The UCSC genome browser database: update 2010. Nucleic Acids Res, doi:10.1093/nar/gkp1939.

    Google Scholar 

  48. Bailey, T.L., Williams, N., Misleh, C. et al. (2006) MEME: discovering and analyzing DNA and protein sequence motifs. Nucleic Acids Res 34, W369–W373.

    Article  PubMed  CAS  Google Scholar 

  49. Haar, A. (1910) Zur Theorie der orthogonalen Funktionensysteme. Math Ann 3, 331–371.

    Article  Google Scholar 

  50. Hsu, L., Self, S.G., Grove, D. et al. (2005) Denoising array-based comparative genomic hybridization data using wavelets. Biostatistics 6, 211–226.

    Article  PubMed  Google Scholar 

Download references

Acknowledgments

IL thanks the NSF Grant EPS-0701892 for funding.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Istvan Ladunga .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Jiao, S., Bailey, C.P., Zhang, S., Ladunga, I. (2010). Probabilistic Peak Calling and Controlling False Discovery Rate Estimations in Transcription Factor Binding Site Mapping from ChIP-seq. In: Ladunga, I. (eds) Computational Biology of Transcription Factor Binding. Methods in Molecular Biology, vol 674. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-60761-854-6_10

Download citation

  • DOI: https://doi.org/10.1007/978-1-60761-854-6_10

  • Published:

  • Publisher Name: Humana Press, Totowa, NJ

  • Print ISBN: 978-1-60761-853-9

  • Online ISBN: 978-1-60761-854-6

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics