Abstract
This protocol explains how to use the online integrated pipeline 'peak-motifs' (http://rsat.ulb.ac.be/rsat/) to predict motifs and binding sites in full-size peak sets obtained by chromatin immunoprecipitation–sequencing (ChIP-seq) or related technologies. The workflow combines four time- and memory-efficient motif discovery algorithms to extract significant motifs from the sequences. Discovered motifs are compared with databases of known motifs to identify potentially bound transcription factors. Sequences are scanned to predict transcription factor binding sites and analyze their enrichment and positional distribution relative to peak centers. Peaks and binding sites are exported as BED tracks that can be uploaded into the University of California Santa Cruz (UCSC) genome browser for visualization in the genomic context. This protocol is illustrated with the analysis of a set of 6,000 peaks (8 Mb in total) bound by the Drosophila transcription factor Krüppel. The complete workflow is achieved in about 25 min of computational time on the Regulatory Sequence Analysis Tools (RSAT) Web server. This protocol can be followed in about 1 h.
Similar content being viewed by others
References
Robertson, G. et al. Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat. Methods 4, 651–657 (2007).
Johnson, D.S., Mortazavi, A., Myers, R.M. & Wold, B. Genome-wide mapping of in vivo protein-DNA interactions. Science 316, 1497–1502 (2007).
Pepke, S., Wold, B. & Mortazavi, A. Computation for ChIP-seq and RNA-seq studies. Nat. Methods 6, S22–S32 (2009).
Boeva, V. et al. De novo motif identification improves the accuracy of predicting transcription factor binding sites in ChIP-Seq data analysis. Nucleic Acids Res. 38, e126 (2010).
Machanick, P. & Bailey, T.L. MEME-ChIP: motif analysis of large DNA datasets. Bioinformatics 27, 1696 (2011).
Bailey, T.L. DREME: Motif discovery in transcription factor ChIP-seq data. Bioinformatics 27, 1653–1659 (2011).
Rusk, N. Focus on next-generation sequencing data analysis. Nat. Methods 6, S1 (2009).
McPherson, J.D. Next-generation gap. Nat. Methods 6, S2–S5 (2009).
Thomas-Chollier, M. et al. RSAT peak-motifs: motif analysis in full-size ChIP-seq datasets. Nucleic Acids Res. 40, e31 (2012).
Salmon-Divon, M., Dvinge, H., Tammoja, K. & Bertone, P. PeakAnalyzer: genome-wide annotation of chromatin binding and modification loci. BMC Bioinformatics 11, 415 (2010).
Portales-Casamar, E. et al. JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles. Nucleic Acids Res. 38, D105–D110 (2010).
Wingender, E. The TRANSFAC project as an example of framework technology that supports the analysis of genomic regulation. Brief Bioinform. 9, 326–332 (2008).
Gama-Castro, S. et al. RegulonDB version 7.0: transcriptional regulation of Escherichia coli K-12 integrated within genetic sensory response units (Gensor Units). Nucleic Acids Res. 39, D98–D105 (2011).
Medina-Rivera, A. et al. Theoretical and empirical quality assessment of transcription factor-binding motifs. Nucleic Acids Res. 39, 808–824 (2011).
Chen, X. et al. Integration of external signaling pathways with the core transcriptional network in embryonic stem cells. Cell 133, 1106–1117 (2008).
Cline, M.S. et al. Integration of biological networks and gene expression data using Cytoscape. Nat. Protoc. 2, 2366–2382 (2007).
Fujita, P.A. et al. The UCSC Genome Browser database: update 2011. Nucleic Acids Res. 39, D876–D882 (2011).
Fullwood, M.J., Wei, C.L., Liu, E.T. & Ruan, Y. Next-generation DNA sequencing of paired-end tags (PET) for transcriptome and genome analyses. Genome Res. 19, 521–532 (2009).
Lee, T.I. et al. Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 298, 799–804 (2002).
Sanford, J.R. et al. Splicing factor SFRS1 recognizes a functionally diverse landscape of RNA transcripts. Genome Res. 19, 381–394 (2009).
van Helden, J., del Olmo, M. & Perez-Ortin, J.E. Statistical analysis of yeast genomic downstream sequences reveals putative polyadenylation signals. Nucleic Acids Res. 28, 1000–1010 (2000).
Sand, O., Thomas-Chollier, M., Vervisch, E. & van Helden, J. Analyzing multiple data sets by interconnecting RSAT programs via SOAP Web services: an example with ChIP-chip data. Nat. Protoc. 3, 1604–1615 (2008).
van Helden, J., Andre, B. & Collado-Vides, J. Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J. Mol. Biol. 281, 827–842 (1998).
van Helden, J., Rios, A.F. & Collado-Vides, J. Discovering regulatory elements in non-coding sequences by analysis of spaced dyads. Nucleic Acids Res. 28, 1808–1818 (2000).
Thomas-Chollier, M. et al. RSAT 2011: regulatory sequence analysis tools. Nucleic Acids Res. 39, W86–W91 (2011).
Kulakovskiy, I.V., Boeva, V.A., Favorov, A.V. & Makeev, V.J. Deep and wide digging for binding motifs in ChIP-Seq data. Bioinformatics 26, 2622–2623 (2010).
Agius, P., Arvey, A., Chang, W., Noble, W.S. & Leslie, C. High resolution models of transcription factor-DNA affinities improve in vitro and in vivo binding predictions. PLoS Comput. Biol. 6, e1000916 (2010).
Mercier, E. et al. An integrated pipeline for the genome-wide analysis of transcription factor binding sites from ChIP-Seq. Plos ONE 6, e16432 (2011).
Kuttippurathu, L. et al. CompleteMOTIFs: DNA motif discovery platform for transcription factor binding experiments. Bioinformatics 27, 715–717 (2010).
van Heeringen, S.J. & Veenstra, G.J. GimmeMotifs: a de novo motif prediction pipeline for ChIP-sequencing experiments. Bioinformatics 27, 270–271 (2011).
Langmead, B., Trapnell, C., Pop, M. & Salzberg, S.L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).
Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008).
Sand, O., Turatsinze, J.V. & vanHelden, J. Evaluating the prediction of cis-acting regulatory elements in genome sequences. in Modern Genome Annotation: The BioSapiens Network (eds. Frishman, D. & Valencia, A.) (Springer, 2008).
Bradley, R.K. et al. Binding site turnover produces pervasive quantitative changes in transcription factor binding between closely related Drosophila species. PLoS Biol. 8, e1000343 (2010).
Barrett, T. et al. NCBI GEO: archive for functional genomics data sets—10 years on. Nucleic Acids Res. 39, D1005–D1010 (2011).
Goecks, J., Nekrutenko, A. & Taylor, J. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 11, R86 (2010).
Valouev, A. et al. Genome-wide analysis of transcription factor binding sites based on ChIP-Seq data. Nat. Methods 5, 829–834 (2008).
Bergman, C.M., Carlson, J.W. & Celniker, S.E. Drosophila DNase I footprint database: a systematic genome annotation of transcription factor binding sites in the fruitfly, Drosophila melanogaster. Bioinformatics 21, 1747–1749 (2005).
Flicek, P. et al. Ensembl 2011. Nucleic Acids Res. 39, D800–D806 (2011).
Harrison, M.M., Li, X.Y., Kaplan, T., Botchan, M.R. & Eisen, M.B. Zelda binding in the early Drosophila melanogaster embryo marks regions subsequently activated at the maternal-to-zygotic transition. PLoS Genet. 7, e1002266 (2011).
Kanodia, J.S. et al. Pattern formation by graded and uniform signals in the early Drosophila embryo. Biophys. J. 102, 427–433 (2012).
Tsurumi, A. et al. STAT is an essential activator of the zygotic genome in the early Drosophila embryo. PLoS Genet. 7, e1002086 (2011).
Blow, M.J. et al. ChIP-Seq identification of weakly conserved heart enhancers. Nat. Genet. 42, 806–810 (2010).
Zhu, L.J. et al. FlyFactorSurvey: a database of Drosophila transcription factor binding specificities determined using the bacterial one-hybrid system. Nucleic Acids Res. 39, D111–D117 (2011).
Defrance, M., Janky, R., Sand, O. & van Helden, J. Using RSAT oligo-analysis and dyad-analysis tools to discover regulatory signals in nucleic sequences. Nat. Protoc. 3, 1589–1603 (2008).
Turatsinze, J.V., Thomas-Chollier, M., Defrance, M. & van Helden, J. Using RSAT to scan genome sequences for transcription factor binding sites and cis-regulatory modules. Nat. Protoc. 3, 1578–1588 (2008).
Acknowledgements
This work was supported by the Alexander von Humboldt foundation to M.T.-C.; the Agence Nationale de Recherche (ANR) partner of the ERASysBio+ initiative supported under the EU ERA-NET Plus scheme in FP7 to C.H.; ANR Young Researchers Grant 'CardiHox' to C.H.; the Belgian Program on Interuniversity Attraction Poles, initiated by the Belgian Federal Science Policy Office (project P6/25 (BioMaGNet)); EU-funded Cooperation in Science and Technology (COST) action (BM1006 'SEQAHEAD—Next-Generation Sequencing Data Analysis Network'); FP7 MICROME Collaborative Project (Microbial genomics and bio-informatics', contract number 222886-2). We acknowledge the colleagues who helped to install and maintain the RSAT Web servers: R. Leplae (ULB, Belgium), R. Zayas-Lagunas (UNAM, Mexico), E. Bongcam-Rudloff (Uppsala, Sweden), F.-X. Théodule (Aix Marseille Université, France), P. Vincens (Ecole Normale Supérieure, France) and F. Joubert (Pretoria, South Africa).
Author information
Authors and Affiliations
Contributions
J.v.H., M.T.-C. and M.D. initiated and developed the peak-motifs software tool. E.D., C.H. and D.T. contributed to improve the tool and analyzed the study case for this protocol. All authors edited the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Table 1
Comparison of software tools used for analyzing motifs in ChIP-seq peak sequences. This is an updated version of the Table 1 from the original peak-motifs publication9 summarizing the tasks, algorithms and usability properties to compare the different software options for the users. Adapted from Morgane Thomas-Chollier, Carl Herrmann, Matthieu Defrance, Olivier Sand, Denis Thieffry, Jacques van Helden, RSAT peak-motifs: motif analysis in full-size ChIP-seq datasets, Nucleic Acids Research, 2012, 40(4), by permission of Oxford University Press. (PDF 808 kb)
Rights and permissions
About this article
Cite this article
Thomas-Chollier, M., Darbo, E., Herrmann, C. et al. A complete workflow for the analysis of full-size ChIP-seq (and similar) data sets using peak-motifs. Nat Protoc 7, 1551–1568 (2012). https://doi.org/10.1038/nprot.2012.088
Published:
Issue Date:
DOI: https://doi.org/10.1038/nprot.2012.088
- Springer Nature Limited
This article is cited by
-
Mitotic bookmarking redundancy by nuclear receptors in pluripotent cells
Nature Structural & Molecular Biology (2024)
-
YAP induces an oncogenic transcriptional program through TET1-mediated epigenetic remodeling in liver growth and tumorigenesis
Nature Genetics (2022)
-
A systematic study of HIF1A cofactors in hypoxic cancer cells
Scientific Reports (2022)
-
Nuclear transporter Importin-13 plays a key role in the oxidative stress transcriptional response
Nature Communications (2021)
-
FoxH1 represses miR-430 during early embryonic development of zebrafish via non-canonical regulation
BMC Biology (2019)