Abstract
A computer program for calculating clusters of binding sites of various transcription factors (TFs) according to the genomic coordinates of the ChIP-seq (Chromatin ImmunoPrecipitation-sequencing) profile peaks is developed. The statistical features of the distribution of the transcription factors’ binding sites (TFBSs) in the mouse genome, obtained with the help of ChIP-seq experiments in embryonic stem cells, are considered. Clusters of sites containing at least four binding sites of various TFs in the mouse genome are determined and their localization relative to the regulatory regions of the genes is described. Two types of colocalization of the sites are confirmed: clusters containing binding sites of factors Oct4, Nanog, and Sox2 located in the distal regions and clusters with n-Myc and c-Myc binding sites located mainly in the promoter regions of mouse genes. Analysis of the new ChIP-seq data on the binding of TFs Nr5a2, Tbx3, Cep, SRF, and USF1 in the same cell type confirmed the differentiation of clusters of the TFBSs into two types: those containing pluripotency regulator binding sites (Oct4, Nanog, and Sox2) and those not containing them. A computer program for the statistical processing of the data on the location of the sites in the genes is developed; it uses the experimental data on site localization obtained by ChIP-seq methods in mouse and human genomes. With the help of this program, the localization patterns of the binding sites of various TFs are detected. The distances between the closest binding sites of the TF groups Oct4, Nanog, and Sox2 and the binding sites of other factors in site clusters that serve as a basis for the analysis of the joint binding of protein complexes to DNA are calculated. The fraction of the presence of the known nucleotide motifs of TFBSs in the genomic regions of ChIP-seq is calculated. The weight matrices for such nucleotide motifs are recalculated. The correlation between the presence of motifs and the ChIP-seq binding intensity is shown. The programs implementing the computerized methods for assessing the clustering of binding sites of various TFs for new ChIP-seq data are available upon request from the authors.
Similar content being viewed by others
References
Babenko, V.N., Kosarev, P.S., Vishnevsky, O.V., Levitsky, V.G., Basin, V.V., and Frolov, A.S., Investigating extended regulatory regions of genomic DNA sequences, Bioinformatics, 1999, vol. 15, nos. 7–8, pp. 644–653. doi 10.1093/bioinformatics/15.7.644
Babenko, V.N., Matvienko, V.F., and Safronova, N.S., Implication of transposons distribution on chromatin state and genome architecture in human, J. Biomol. Struct. Dyn., 2015, vol. 33, no. 1, pp. 10–11. doi 10.1080/07391102.2015.1032559
Bieda, M., Xu, X., Singer, M.A., Green, R., and Farnham, P., Unbiased location analysis of E2F1-binding sites suggests a widespread role for E2F1 in the human genome, Genome Res., 2006, vol. 16, no. 5, pp. 595–605. doi 10.1101/gr.4887606
Boeva, V., Analysis of genomic sequence motifs for deciphering transcription factor binding and transcriptional regulation in eukaryotic cells, Front. Genet., 2016, vol. 7, p. 24. doi 10.3389/fgene.2016.00024
Boyer, L.A., Lee, T.I., Cole, M.F., Johnstone, S.E., Levine, S.S., Zucker, J.P., Guenther, M.G., Kumar, R.M., Murray, H.L., Jenner, R.G., Gifford, D.K., Melton, D.A., Jaenisch, R., and Young, R.A., Core transcriptional regulatory circuitry in human embryonic stem cells, Cell, 2005, vol. 122, no. 6, pp. 947–956. doi 10.1016/j.cell.2005.08.020
Chen, X., Xu, H., Yuan, P., Fang, F., Huss, M., Vega, V.B., Wong, E., Orlov, Y.L., Zhang, W., Jiang, J., Loh, Y.H., Yeo, H.C., Yeo, Z.X., Narang, V., Govindarajan, K.R., Leong, B., Shahab, A., Ruan, Y., Bourque, G., Sung, W.K., Clarke, N.D., Wei, C.L., and Ng, H.H., Integration of external signaling pathways with the core transcriptional network in embryonic stem cells, Cell, 2008, vol. 133, no. 6, pp. 1106–1117. doi 10.1016/j.cell.2008.04.043
Goh, W.S., Orlov, Y., Li, J., and Clarke, N.D., Blurring of high-resolution data shows that the effect of intrinsic nucleosome occupancy on transcription factor binding is mostly regional, not local, PLoS Comput. Biol., 2010, vol. 6, no. 1. doi 10.1371/journal.pcbi.1000649
Golosova, O., Henderson, R., Vas’kin, Yu., Gabrielian, A., Grekhov, G., Nagarajan, V., Oler, A.J., Quinones, M., Hurt, D., Fursov, M., and Huyen, Y., Unipro UGENE NGS pipelines and components for variant calling, RNA-seq and ChIP-seq data analyses, Peer J., 2014, vol. 2. doi 10.7717/peerj.644
Guo, Y., Mahony, S., and Gifford, D.K., High resolution genome wide binding event finding and motif discovery reveals transcription factor spatial binding constraints, PLoS Comput. Biol., 2012, vol. 8, no. 8. doi 10.1371/journal. pcbi.1002638
Han, J., Yuan, P., Yang, H., Zhang, J., Soh, B.S., Li, P., Lim, S.L., Cao, S., Tay, J., Orlov, Y.L., Lufkin, T., Ng, H.H., Tam, W.L., and Lim, B., Tbx3 im proves the germ-line competency of induced pluripotent stem cells, Nature, 2010, vol. 463, no. 7284, pp. 1096–1100.
He, X., Cicek, A.E., Wang Y., Schulz M.H., Le H.-S., and Ziv B.-J., De novo ChIP-seq analysis, Genome Biol., 2015, vol. 16, no. 1, p. 205. doi 10.1186/s13059-015-0756-4
Heinemeyer, T., Wingender, E., Reuter, I., Hermjakob, H., Kel, A.E., Kel, O.V., Ignatieva, E.V., Ananko, E.A., Podkolodnaya, O.A., Kolpakov, F.A., Podkolodny, N.L., and Kolchanov, N.A., Databases on transcriptional regulation: TRANSFAC, TRRD and COMPEL, Nucleic Acid Res., 1998, vol. 26, no. 1, pp. 362–367. doi 10.1093/nar/26.1.362
Heng, J.C., Feng, B., Han, J., Jiang, J., Kraus, P., Ng, J.H., Orlov, Y.L., Huss, M., Yang, L., Lufkin, T., Lim, B., and Ng, H.H., The nuclear receptor Nr5a2 can replace Oct4 in the reprogramming of murine somatic cells to pluripotent cells, Cell Stem Cell, 2010, vol. 6, no. 2, pp. 167–174. doi 10.1016/j.stem.2009.12.009
Hutter, B., Bieg, M., Helms, V., and Paulsen, M., Imprinted genes show unique patterns of sequence conservation, BMC Genomics, 2010, vol. 11, p. 649. doi 10.1186/1471-2164-11-649
Ignatieva, E.V., Podkolodnaya, O.A., Orlov, Yu.L., Vasil’ev, G.V., and Kolchanov, N.A., Regulatory genomics: Combined experimental and computational approaches, Russ. J. Genet., 2015, vol. 51, no. 4, pp. 334–352.
Ivanova, N., Dobrin, R., Lu, R., Kotenko, I., Levorse, J., DeCoste, C., Schafer, X., Lun, Yi., and Lemischka, I.R., Dissecting self-renewal in stem cells with RNA interference, Nature, 2006, vol. 442, no. 7102, pp. 533–538. doi 10.1038/nature04915
Kulakova, E.V., Spitsina, A.M., Orlova, N.G, Dergilev, A.I., Svichkarev, A.V., Safronova, N.S., Chernykh, I.G., and Orlov, Yu.L., Programs for analyzing genomic sequencing data obtained on the basis of ChIP-seq, ChIA-PET, and Hi-C technologies, Program. Sist., Teor. Prilozh., 2015, vol. 6, no. 2, pp. 129–148.
Kuznetsov, V.A., Orlov, Yu.L., Wei, C.L., and Ruan, Y., Computational analysis and modeling of genome-scale avidity distribution of transcription factor binding sites in chippet experiments, Genome Inf., 2007, vol. 19, pp. 83–94.
Kuznetsov, V.A., Singh, O., and Jenjaroenpun, P., Statistics of protein-DNA binding and the total number of binding sites for a transcription factor in the mammalian genome, BMC Genomics, 2010, vol. 11, no. 1, p. 12. doi 10.1186/1471-2164-11-S1-S12
Kuzniewska, B., Nader, K., Dabrowski, M., Kaczmarek, L., and Kalita, K., Adult deletion of SRF increases epileptogenesis and decreases activity-induced gene expression, Mol. Neurobiol., 2015, vol. 1–16. doi 10.1007/s12035-014-9089-7
Lee, K.L., Lim, S.K., Orlov, Y.L., Yit Le, Y., Yang, H., Ang, L.T., Poellinger, L., and Lim, B., Graded Nodal/Activin signaling titrates conversion of quantitative phospho-Smad2 levels into qualitative embryonic stem cell fate decisions, PLoS Genet., 2011, vol. 7, no. 6. doi 10.1371/journal.pgen.1002130
Li, G., Cai, L., Chang, H., Hong, P., Zhou, Q., Kulakova, E.V., Kolchanov, N.A., and Ruan, Y., Chromatin interaction analysis with Paired-End Tag (ChIA-PET) sequencing technology and application, BMC Genomics, 2014, vol. 15, no. 12, p. 11. doi 10.1186/1471-2164-15-S12-S11
Loh, Y.H., Wu, Q., Chew, J.L., Vega, V.B., Zhang, W., Chen, X., Bourque, G., George, J., Leong, B., Liu, J., Wong, K.Y., Sung, K.W., Lee, C.W., Zhao, X.D., Chiu, K.P., et al., The Oct4 and Nanog transcription network regulates pluripotency in mouse embryonic stem cells, Nat. Genet., 2006, vol. 38, no. 4, pp. 431–440. doi 10.1038/ng1760
Orlov, Yu., Xu, H., Afonnikov, D., Lim, B., Heng, J.C., Yuan, P., Chen, M., Yan, J., Clarke, N., Orlova, N., Huss, M., Gunbin, K., Podkolodnyy, N., and Ng, H.H., Computer and statistical analysis of transcription factor binding and chromatin modifications by ChIP-seq data in embryonic stem cell, J. Integr. Bioinf., 2012, vol. 9, no. 2, p. 211. doi 10.2390/biecoll-jib-2012-211
Orlov, Yu.L. and Potapov, V.N., Complexity: An internet resource for analysis of DNA sequence complexity, Nucleic Acid Res., 2004, vol. 32, pp. W628–W633. doi 10.1093/nar/gkh466
Orlov, Yu.L., A computer study of the regulation of transcription of eukaryotic genes using data from experiments of chromatin sequencing and immunoprecipitation, Vavilovskii Zh. Genet. Sel., 2014, vol. 18, no. 1, pp. 193–206.
Orlov, Yu.L., Bragin, A.O., Medvedeva, I.V., Gunbin, K.V., Demenkov, P.S., Vishnevsky, O.V., Levitsky, V.G., Oschepkov, D.Yu., Podkolodny, N.L., Afonnikov, D.A., Grosse, I., and Kolchanov, N.A., ICGenomics: A program complex for the analysis of symbolic sequences of genomics, Vavilovskii Zh. Genet. Sel., 2012, vol. 16, no. 4/1, pp. 732–741.
Orlov, Yu.L., Huss, M.E., Joseph, R., Xu, H., Vega, V.B., Lee, Y.K., Goh, W.S., Thomsen, J.S., Cheung, E.C., Clarke, N.D., and Ng, H.H., Genome-wide statistical analysis of multiple transcription factor binding sites obtained by ChIP-seq technologies, Proc. 1st ACM Workshop on Breaking Frontiers of Computational Biology (Comp-Bio ‘09), New York, 2009, pp. 11–18.
Orlov, Yu.L., Levitskii, V.G., Smirnova, O.G., Podkolodnaya, O.A., Khlebodarova, T.M., and Kolchanov, N.A., Statistical analysis of DNA sequences containing sites of nucleosome formation, Biofizika, 2006, vol. 51, no. 4, pp. 608–614.
Orlov, Yu.L., Te Boekhorst, R., and Abnizova, I.I., Statistical measures of the structure of genomic sequences: Entropy, complexity, and position information, J. Bioinf. Comput. Biol., 2006, vol. 4, pp. 523–536. doi 10.1142/S0219720006001801
Panne, D., Maniatis, T., and Harrison, S.C., An atomic model of the interferon-beta enhanceosome, Cell, 2007, vol. 129, no. 6, pp. 1111–1123. cell.2007.05.019 doi 10.1016/j
Polunin, D.A., Shtaiger, I.A., and Efimov, V.M., Development of the JACOBI 4 software package for multivariate analysis of microchip data, Vestn. Novosib. Gos. Univ., Inf. Tekhnol., 2014, vol. 12, no. 2, pp. 90–98.
Putta, P., Orlov, Yu.L., Podkolodnyy, N.L., and Mitra, C.K., Relatively conserved common short sequences in transcription factor binding sites and miRNA, Russ. J. Genet.: Appl. Res., 2011, vol. 15, no. 4, pp. 750–756.
Safronova, N.S., Babenko, V.N., and Orlov, Yu.L., Analysis of SNP containing sites in human genome using text complexity estimates, J. Biomol. Struct. Dyn., 2015, vol. 33, no. 1, pp. 73–74. doi 10.1080/07391102.2015.1032750
Sirito, M., Lin, Q., Deng, J.M., Behringer, R.R., and Sawadogo, M., Overlapping roles and asymmetrical crossregulation of the USF proteins in mice. Overlapping roles and asymmetrical cross-regulation of the USF proteins in mice, Proc. Natl. Acad. Sci. U.S.A., 1998, vol. 95, no. 7, pp. 3758–3763.
Spitsina, A.M., Orlov, Yu.L., Podkolodnaya, N.N., Svichkarev, A.V., Dergilev, A.I., Chen, M., Kuchin, N.V., Chernykh, I.G., and Glinskii, B.M., Supercomputer analysis of genomic and transcriptomic data obtained with the help of high-performance DNA sequencing technologies, Program. Sist., Teor. Prilozh., 2015, vol. 6, no. 1, pp. 157–174.
Takahashi, K. and Yamanaka, S., Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors, Cell, 2006, vol. 126, no. 4, pp. 663–676. doi 10.1016/j.cell.2006.07.024
Vas’kin, Yu., Khomicheva, I.V., Ignatieva, E.V., and Vityaev, E.E., Expert discovery and UGENE integrated system for intelligent analysis of regulatory regions of genes, In Silico Biol., 2011–2012, vol. 11, nos. 3–4, pp. 97–108. doi 10.3233/ISB-2012-0448
Vas’kin, Yu.Yu., Khomicheva, I.V., Ignat’eva, E.V., and Vityaev, E.E., Analysis of sequences of regulatory regions of genes by the Expert Discovery relational system built into the UGENE package, Vestn. Novosib. Gos. Univ., Inf. Tekhnol., 2012, vol. 10, no. 1, pp. 73–86. doi 10.3233/ISB-2012-0448
Vityaev, E.E., Izvlechenie znanii iz dannykh. Komp’yuternoe poznanie. Modeli kognitivnykh protsessov (Extracting Knowledge from Data. Computer Cognition. Models of Cognitive Processes), Novosibirsk: Novosib. gos. Univ., 2006.
Vityaev, E.E., Orlov, Yu.L., Vishnevskii, O.V., and Belenok, A.S., Kol chanov N.A. Computer system GENE DISCOVERY for the search of regularities of the organization of regulatory sequences of eukaryotes, Mol. Biol., 2001, vol. 35, no. 6, pp. 952–960.
Xu, D., Wei, G., Lu, P., Luo, J., Chen, X., Skogerb, G., and Chen, R., Analysis of the p53/CEP-1 regulated non-coding transcriptome in C. elegans by an NSR-seq strategy, Protein Cell, 2014, vol. 5, no. 10, pp. 770–782. doi 10.1007/s13238- 014-0071-y
Yanan, Z., Quan, X., Ya, G., and Qiang, W., Characterization of a cluster of CTCF-binding sites in a protocadherin regulatory region, Yi Chuan, 2016, vol. 38, no. 4, pp. 323–336. doi 10.16288/j.yczz.16-037
Zhang, Y. and Wang, P., A fast cluster motif finding algorithm for ChIP-Seq data sets, Biomed. Res. Int., 2015, vol. 2015, p. 218068. doi 10.1155/2015/218068
Author information
Authors and Affiliations
Corresponding author
Additional information
Original Russian Text © A.I. Dergilev, A.M. Spitsina, I.V. Chadaeva, A.V. Svichkarev, F.M. Naumenko, E.V. Kulakova, E.R. Galieva, E.E. Vityaev, M. Chen, Yu.L. Orlov, 2016, published in Vavilovskii Zhurnal Genetiki i Selektsii, 2016, Vol. 20, No. 6, pp. 770–778.
Electronic supplementary material
Rights and permissions
About this article
Cite this article
Dergilev, A.I., Spitsina, A.M., Chadaeva, I.V. et al. Computer analysis of colocalization of the TFs’ binding sites in the genome according to the ChIP-seq data. Russ J Genet Appl Res 7, 513–522 (2017). https://doi.org/10.1134/S2079059717050057
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S2079059717050057