A complete workflow for the analysis of full-size ChIP-seq (and similar) data sets using peak-motifs

Thomas-Chollier, Morgane; Darbo, Elodie; Herrmann, Carl; Defrance, Matthieu; Thieffry, Denis; van Helden, Jacques

doi:10.1038/nprot.2012.088

A complete workflow for the analysis of full-size ChIP-seq (and similar) data sets using peak-motifs

Protocol
Published: 26 July 2012

Volume 7, pages 1551–1568, (2012)
Cite this article

From

View current issue Submit your manuscript

Morgane Thomas-Chollier¹,
Elodie Darbo²,
Carl Herrmann²,
Matthieu Defrance³,
Denis Thieffry^2,4 &
…
Jacques van Helden^2,5

17k Accesses
66 Citations
4 Altmetric
Explore all metrics

Abstract

This protocol explains how to use the online integrated pipeline 'peak-motifs' (http://rsat.ulb.ac.be/rsat/) to predict motifs and binding sites in full-size peak sets obtained by chromatin immunoprecipitation–sequencing (ChIP-seq) or related technologies. The workflow combines four time- and memory-efficient motif discovery algorithms to extract significant motifs from the sequences. Discovered motifs are compared with databases of known motifs to identify potentially bound transcription factors. Sequences are scanned to predict transcription factor binding sites and analyze their enrichment and positional distribution relative to peak centers. Peaks and binding sites are exported as BED tracks that can be uploaded into the University of California Santa Cruz (UCSC) genome browser for visualization in the genomic context. This protocol is illustrated with the analysis of a set of 6,000 peaks (8 Mb in total) bound by the Drosophila transcription factor Krüppel. The complete workflow is achieved in about 25 min of computational time on the Regulatory Sequence Analysis Tools (RSAT) Web server. This protocol can be followed in about 1 h.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

**Figure 2: Screenshot of the peak-motifs web form.**

**Figure 3: Input sequence treatment (top) and motif discovery (bottom) options.**

**Figure 4: Options for motif comparisons (top) and predicted sites visualization (bottom).**

**Figure 5: Sequence lengths and composition.**

**Figure 6: Dinucleotide composition and derived background models.**

**Figure 8: Discovered motifs grouped by algorithm.**

**Figure 9: Discovered motifs with motif comparisons.**

**Figure 11: Predicted sites visualized in their genomic contexts on the UCSC genome browser.**

**Figure 12: Motif discovery approaches.**

ChIPseek, a web-based analysis tool for ChIP data

Article Open access 30 June 2014

Direct ChIP-Seq significance analysis improves target prediction

Article Open access 26 May 2015

RSAT::Plants: Motif Discovery in ChIP-Seq Peaks of Plant Genomes

Accession codes

Accessions

Gene Expression Omnibus

References

Robertson, G. et al. Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat. Methods 4, 651–657 (2007).
PubMed CAS Google Scholar
Johnson, D.S., Mortazavi, A., Myers, R.M. & Wold, B. Genome-wide mapping of in vivo protein-DNA interactions. Science 316, 1497–1502 (2007).
PubMed CAS Google Scholar
Pepke, S., Wold, B. & Mortazavi, A. Computation for ChIP-seq and RNA-seq studies. Nat. Methods 6, S22–S32 (2009).
PubMed PubMed Central CAS Google Scholar
Boeva, V. et al. De novo motif identification improves the accuracy of predicting transcription factor binding sites in ChIP-Seq data analysis. Nucleic Acids Res. 38, e126 (2010).
PubMed PubMed Central Google Scholar
Machanick, P. & Bailey, T.L. MEME-ChIP: motif analysis of large DNA datasets. Bioinformatics 27, 1696 (2011).
PubMed PubMed Central CAS Google Scholar
Bailey, T.L. DREME: Motif discovery in transcription factor ChIP-seq data. Bioinformatics 27, 1653–1659 (2011).
PubMed PubMed Central CAS Google Scholar
Rusk, N. Focus on next-generation sequencing data analysis. Nat. Methods 6, S1 (2009).
PubMed CAS Google Scholar
McPherson, J.D. Next-generation gap. Nat. Methods 6, S2–S5 (2009).
PubMed CAS Google Scholar
Thomas-Chollier, M. et al. RSAT peak-motifs: motif analysis in full-size ChIP-seq datasets. Nucleic Acids Res. 40, e31 (2012).
PubMed CAS Google Scholar
Salmon-Divon, M., Dvinge, H., Tammoja, K. & Bertone, P. PeakAnalyzer: genome-wide annotation of chromatin binding and modification loci. BMC Bioinformatics 11, 415 (2010).
PubMed PubMed Central Google Scholar
Portales-Casamar, E. et al. JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles. Nucleic Acids Res. 38, D105–D110 (2010).
PubMed CAS Google Scholar
Wingender, E. The TRANSFAC project as an example of framework technology that supports the analysis of genomic regulation. Brief Bioinform. 9, 326–332 (2008).
PubMed CAS Google Scholar
Gama-Castro, S. et al. RegulonDB version 7.0: transcriptional regulation of Escherichia coli K-12 integrated within genetic sensory response units (Gensor Units). Nucleic Acids Res. 39, D98–D105 (2011).
PubMed CAS Google Scholar
Medina-Rivera, A. et al. Theoretical and empirical quality assessment of transcription factor-binding motifs. Nucleic Acids Res. 39, 808–824 (2011).
PubMed CAS Google Scholar
Chen, X. et al. Integration of external signaling pathways with the core transcriptional network in embryonic stem cells. Cell 133, 1106–1117 (2008).
PubMed CAS Google Scholar
Cline, M.S. et al. Integration of biological networks and gene expression data using Cytoscape. Nat. Protoc. 2, 2366–2382 (2007).
PubMed PubMed Central CAS Google Scholar
Fujita, P.A. et al. The UCSC Genome Browser database: update 2011. Nucleic Acids Res. 39, D876–D882 (2011).
PubMed CAS Google Scholar
Fullwood, M.J., Wei, C.L., Liu, E.T. & Ruan, Y. Next-generation DNA sequencing of paired-end tags (PET) for transcriptome and genome analyses. Genome Res. 19, 521–532 (2009).
PubMed PubMed Central CAS Google Scholar
Lee, T.I. et al. Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 298, 799–804 (2002).
PubMed CAS Google Scholar
Sanford, J.R. et al. Splicing factor SFRS1 recognizes a functionally diverse landscape of RNA transcripts. Genome Res. 19, 381–394 (2009).
PubMed PubMed Central CAS Google Scholar
van Helden, J., del Olmo, M. & Perez-Ortin, J.E. Statistical analysis of yeast genomic downstream sequences reveals putative polyadenylation signals. Nucleic Acids Res. 28, 1000–1010 (2000).
PubMed CAS Google Scholar
Sand, O., Thomas-Chollier, M., Vervisch, E. & van Helden, J. Analyzing multiple data sets by interconnecting RSAT programs via SOAP Web services: an example with ChIP-chip data. Nat. Protoc. 3, 1604–1615 (2008).
PubMed CAS Google Scholar
van Helden, J., Andre, B. & Collado-Vides, J. Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J. Mol. Biol. 281, 827–842 (1998).
PubMed CAS Google Scholar
van Helden, J., Rios, A.F. & Collado-Vides, J. Discovering regulatory elements in non-coding sequences by analysis of spaced dyads. Nucleic Acids Res. 28, 1808–1818 (2000).
PubMed CAS Google Scholar
Thomas-Chollier, M. et al. RSAT 2011: regulatory sequence analysis tools. Nucleic Acids Res. 39, W86–W91 (2011).
PubMed PubMed Central CAS Google Scholar
Kulakovskiy, I.V., Boeva, V.A., Favorov, A.V. & Makeev, V.J. Deep and wide digging for binding motifs in ChIP-Seq data. Bioinformatics 26, 2622–2623 (2010).
PubMed CAS Google Scholar
Agius, P., Arvey, A., Chang, W., Noble, W.S. & Leslie, C. High resolution models of transcription factor-DNA affinities improve in vitro and in vivo binding predictions. PLoS Comput. Biol. 6, e1000916 (2010).
PubMed PubMed Central Google Scholar
Mercier, E. et al. An integrated pipeline for the genome-wide analysis of transcription factor binding sites from ChIP-Seq. Plos ONE 6, e16432 (2011).
PubMed PubMed Central CAS Google Scholar
Kuttippurathu, L. et al. CompleteMOTIFs: DNA motif discovery platform for transcription factor binding experiments. Bioinformatics 27, 715–717 (2010).
PubMed PubMed Central Google Scholar
van Heeringen, S.J. & Veenstra, G.J. GimmeMotifs: a de novo motif prediction pipeline for ChIP-sequencing experiments. Bioinformatics 27, 270–271 (2011).
PubMed CAS Google Scholar
Langmead, B., Trapnell, C., Pop, M. & Salzberg, S.L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).
PubMed PubMed Central Google Scholar
Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008).
PubMed PubMed Central Google Scholar
Sand, O., Turatsinze, J.V. & vanHelden, J. Evaluating the prediction of cis-acting regulatory elements in genome sequences. in Modern Genome Annotation: The BioSapiens Network (eds. Frishman, D. & Valencia, A.) (Springer, 2008).
Bradley, R.K. et al. Binding site turnover produces pervasive quantitative changes in transcription factor binding between closely related Drosophila species. PLoS Biol. 8, e1000343 (2010).
PubMed PubMed Central Google Scholar
Barrett, T. et al. NCBI GEO: archive for functional genomics data sets—10 years on. Nucleic Acids Res. 39, D1005–D1010 (2011).
PubMed CAS Google Scholar
Goecks, J., Nekrutenko, A. & Taylor, J. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 11, R86 (2010).
PubMed PubMed Central Google Scholar
Valouev, A. et al. Genome-wide analysis of transcription factor binding sites based on ChIP-Seq data. Nat. Methods 5, 829–834 (2008).
PubMed PubMed Central CAS Google Scholar
Bergman, C.M., Carlson, J.W. & Celniker, S.E. Drosophila DNase I footprint database: a systematic genome annotation of transcription factor binding sites in the fruitfly, Drosophila melanogaster. Bioinformatics 21, 1747–1749 (2005).
PubMed CAS Google Scholar
Flicek, P. et al. Ensembl 2011. Nucleic Acids Res. 39, D800–D806 (2011).
PubMed CAS Google Scholar
Harrison, M.M., Li, X.Y., Kaplan, T., Botchan, M.R. & Eisen, M.B. Zelda binding in the early Drosophila melanogaster embryo marks regions subsequently activated at the maternal-to-zygotic transition. PLoS Genet. 7, e1002266 (2011).
PubMed PubMed Central CAS Google Scholar
Kanodia, J.S. et al. Pattern formation by graded and uniform signals in the early Drosophila embryo. Biophys. J. 102, 427–433 (2012).
PubMed PubMed Central CAS Google Scholar
Tsurumi, A. et al. STAT is an essential activator of the zygotic genome in the early Drosophila embryo. PLoS Genet. 7, e1002086 (2011).
PubMed PubMed Central CAS Google Scholar
Blow, M.J. et al. ChIP-Seq identification of weakly conserved heart enhancers. Nat. Genet. 42, 806–810 (2010).
PubMed PubMed Central CAS Google Scholar
Zhu, L.J. et al. FlyFactorSurvey: a database of Drosophila transcription factor binding specificities determined using the bacterial one-hybrid system. Nucleic Acids Res. 39, D111–D117 (2011).
PubMed CAS Google Scholar
Defrance, M., Janky, R., Sand, O. & van Helden, J. Using RSAT oligo-analysis and dyad-analysis tools to discover regulatory signals in nucleic sequences. Nat. Protoc. 3, 1589–1603 (2008).
PubMed CAS Google Scholar
Turatsinze, J.V., Thomas-Chollier, M., Defrance, M. & van Helden, J. Using RSAT to scan genome sequences for transcription factor binding sites and cis-regulatory modules. Nat. Protoc. 3, 1578–1588 (2008).
PubMed CAS Google Scholar

Download references

Acknowledgements

This work was supported by the Alexander von Humboldt foundation to M.T.-C.; the Agence Nationale de Recherche (ANR) partner of the ERASysBio+ initiative supported under the EU ERA-NET Plus scheme in FP7 to C.H.; ANR Young Researchers Grant 'CardiHox' to C.H.; the Belgian Program on Interuniversity Attraction Poles, initiated by the Belgian Federal Science Policy Office (project P6/25 (BioMaGNet)); EU-funded Cooperation in Science and Technology (COST) action (BM1006 'SEQAHEAD—Next-Generation Sequencing Data Analysis Network'); FP7 MICROME Collaborative Project (Microbial genomics and bio-informatics', contract number 222886-2). We acknowledge the colleagues who helped to install and maintain the RSAT Web servers: R. Leplae (ULB, Belgium), R. Zayas-Lagunas (UNAM, Mexico), E. Bongcam-Rudloff (Uppsala, Sweden), F.-X. Théodule (Aix Marseille Université, France), P. Vincens (Ecole Normale Supérieure, France) and F. Joubert (Pretoria, South Africa).

Author information

Authors and Affiliations

Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany
Morgane Thomas-Chollier
Technological Advances for Genomics and Clinics, Institut National de la Santé et de la Recherche Médicale (INSERM) U928 and Université de la Méditerranée, Marseille, France
Elodie Darbo, Carl Herrmann, Denis Thieffry & Jacques van Helden
Centro de Ciencias Genomicas, Universidad Nacional Autónoma de México (UNAM), Cuernavaca, Mexico
Matthieu Defrance
Institut de Biologie de l'Ecole Normale Supérieure—Centre National de la Recherche Scientifique Unité Mixte de Recherche (CNRS UMR) 8197 and INSERM U1024, Paris, France
Denis Thieffry
Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe), Université Libre de Bruxelles, Bruxelles, Belgium
Jacques van Helden

Authors

Morgane Thomas-Chollier
View author publications
You can also search for this author in PubMed Google Scholar
Elodie Darbo
View author publications
You can also search for this author in PubMed Google Scholar
Carl Herrmann
View author publications
You can also search for this author in PubMed Google Scholar
Matthieu Defrance
View author publications
You can also search for this author in PubMed Google Scholar
Denis Thieffry
View author publications
You can also search for this author in PubMed Google Scholar
Jacques van Helden
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.v.H., M.T.-C. and M.D. initiated and developed the peak-motifs software tool. E.D., C.H. and D.T. contributed to improve the tool and analyzed the study case for this protocol. All authors edited the manuscript.

Corresponding authors

Correspondence to Morgane Thomas-Chollier or Jacques van Helden.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Table 1

Comparison of software tools used for analyzing motifs in ChIP-seq peak sequences. This is an updated version of the Table 1 from the original peak-motifs publication⁹ summarizing the tasks, algorithms and usability properties to compare the different software options for the users. Adapted from Morgane Thomas-Chollier, Carl Herrmann, Matthieu Defrance, Olivier Sand, Denis Thieffry, Jacques van Helden, RSAT peak-motifs: motif analysis in full-size ChIP-seq datasets, Nucleic Acids Research, 2012, 40(4), by permission of Oxford University Press. (PDF 808 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Thomas-Chollier, M., Darbo, E., Herrmann, C. et al. A complete workflow for the analysis of full-size ChIP-seq (and similar) data sets using peak-motifs. Nat Protoc 7, 1551–1568 (2012). https://doi.org/10.1038/nprot.2012.088

Download citation

Published: 26 July 2012
Issue Date: August 2012
DOI: https://doi.org/10.1038/nprot.2012.088
Springer Nature Limited

This article is cited by

Mitotic bookmarking redundancy by nuclear receptors in pluripotent cells
- Almira Chervova
- Amandine Molliex
- Pablo Navarro
Nature Structural & Molecular Biology (2024)
YAP induces an oncogenic transcriptional program through TET1-mediated epigenetic remodeling in liver growth and tumorigenesis
- Bo-Kuan Wu
- Szu-Chieh Mei
- Duojia Pan
Nature Genetics (2022)
A systematic study of HIF1A cofactors in hypoxic cancer cells
- Yuxiang Zhang
- Saidi Wang
- Xiaoman Li
Scientific Reports (2022)
Nuclear transporter Importin-13 plays a key role in the oxidative stress transcriptional response
- K. A. Gajewska
- H. Lescesen
- D. A. Jans
Nature Communications (2021)
FoxH1 represses miR-430 during early embryonic development of zebrafish via non-canonical regulation
- Patrick Fischer
- Hao Chen
- Dirk Meyer
BMC Biology (2019)

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A complete workflow for the analysis of full-size ChIP-seq (and similar) data sets using peak-motifs

From

Abstract

Access this article

Similar content being viewed by others

ChIPseek, a web-based analysis tool for ChIP data

Direct ChIP-Seq significance analysis improves target prediction

RSAT::Plants: Motif Discovery in ChIP-Seq Peaks of Plant Genomes

Accession codes

Accessions

Gene Expression Omnibus

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Supplementary information

Supplementary Table 1

Rights and permissions

About this article

Cite this article

This article is cited by

Mitotic bookmarking redundancy by nuclear receptors in pluripotent cells

YAP induces an oncogenic transcriptional program through TET1-mediated epigenetic remodeling in liver growth and tumorigenesis

A systematic study of HIF1A cofactors in hypoxic cancer cells

Nuclear transporter Importin-13 plays a key role in the oxidative stress transcriptional response

FoxH1 represses miR-430 during early embryonic development of zebrafish via non-canonical regulation

Navigation

A complete workflow for the analysis of full-size ChIP-seq (and similar) data sets using peak-motifs

Abstract

Access this article

Similar content being viewed by others

Accession codes

Accessions

Gene Expression Omnibus

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Navigation