Distinguishing between Genomic Regions Bound by Paralogous Transcription Factors

  • Alina Munteanu
  • Raluca Gordân
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7821)


Transcription factors (TFs) regulate gene expression by binding to specific DNA sites in cis regulatory regions of genes. Most eukaryotic TFs are members of protein families that share a common DNA binding domain and often recognize highly similar DNA sequences. Currently, it is not well understood why closely related TFs are able to bind different genomic regions in vivo, despite having the potential to interact with the same DNA sites. Here, we use the Myc/Max/Mad family as a model system to investigate whether interactions with additional proteins (co-factors) can explain why paralogous TFs with highly similar DNA binding preferences interact with different genomic sites in vivo. We use a classification approach to distinguish between targets of c-Myc versus Mad2, using features that reflect the DNA binding specificities of putative co-factors. When applied to c-Myc/Mad2 DNA binding data, our algorithm can distinguish between genomic regions bound uniquely by c-Myc versus Mad2 with 87% accuracy.


Transcription factors protein binding microarray ChIP-seq co-factors support vector machine random forrest 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ren, B., Robert, F., Wyrick, J.J., et al.: Genome-wide location and function of DNA binding proteins. Science 290, 2306–2309 (2000)CrossRefGoogle Scholar
  2. 2.
    Johnson, D.S., Mortazavi, A., Myers, R.M., Wold, B.: Genome-wide mapping of in vivo protein-DNA interactions. Science 316, 1497–1502 (2007)CrossRefGoogle Scholar
  3. 3.
    Berger, M.F., Philippakis, A.A., Qureshi, A.M., et al.: Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities. Nat. Biotech. 24, 1429–1435 (2006)CrossRefGoogle Scholar
  4. 4.
    Robasky, K., Bulyk, M.L.: UniPROBE, update 2011: expanded content and search tools in the online database of protein-binding microarray data on protein-DNA interactions. Nucleic Acids Research 39, D124–D128 (2011)Google Scholar
  5. 5.
    Matys, V., Kel-Margoulis, O.V., Fricke, E., et al.: TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic Acids Research 34, D108–D110 (2006)Google Scholar
  6. 6.
    Portales-Casamar, E., Thongjuea, S., Kwon, A.T., et al.: JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles. Nucleic Acids Research 38, D105–D110 (2010)Google Scholar
  7. 7.
    Badis, G., Berger, M.F., Philippakis, A.A., et al.: Diversity and complexity in DNA recognition by transcription factors. Science 324, 1720–1723 (2009)CrossRefGoogle Scholar
  8. 8.
    Wells, J., Graveel, C.R., Bartley, S.M., et al.: The identification of E2F1-specific target genes. Proc. Natl. Acad. Sci. U S A 99, 3890–3895 (2002)CrossRefGoogle Scholar
  9. 9.
    Wu, Z., Zheng, S., Yu, Q.: The E2F family and the role of E2F1 in apoptosis. Int. J. Biochem. Cell Biol. 41, 2389–2397 (2009)CrossRefGoogle Scholar
  10. 10.
    Tao, Y., Kassatly, R., Cress, W., Horowitz, J.: Subunit composition determines E2F DNA-binding site specificity. Mol. Cell Biol. 17, 6994–7007 (1997)Google Scholar
  11. 11.
    Hollenhorst, P.C., Shah, A.A., Hopkins, C., Graves, B.J.: Genome-wide analyses reveal properties of redundant and specific promoter occupancy within the ETS gene family. Genes Dev. 21, 1882–1894 (2007)CrossRefGoogle Scholar
  12. 12.
    Wei, G.H., Badis, G., Berger, M.F., et al.: Genome-wide analysis of ETS-family DNA-binding in vitro and in vivo. EMBO J. 29, 2147–2160 (2010)CrossRefGoogle Scholar
  13. 13.
    Soleimani, V.D., Punch, V.G., Kawabe, Y.I., et al.: Transcriptional dominance of Pax7 in adult myogenesis is due to high-affinity recognition of homeodomain motifs. Dev. Cell 22, 1208–1220 (2012)CrossRefGoogle Scholar
  14. 14.
    Xu, X., Bieda, M., Jin, V.X., et al.: A comprehensive ChIP-chip analysis of E2F1, E2F4, and E2F6 in normal and tumor cells reveals interchangeable roles of E2F family members. Genome Research 17, 1550–1561 (2007)CrossRefGoogle Scholar
  15. 15.
    ENCODE Project Consortium, Bernstein, B., Birney, E., Dunham, I., Green, E., Gunter, C., Snyder, M.: An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012)Google Scholar
  16. 16.
    Farnham, P.J.: Insights from genomic profiling of transcription factors. Nat. Rev. Genet. 10, 605–616 (2009)CrossRefGoogle Scholar
  17. 17.
    Grandori, C., Cowley, S.M., James, L.P., Eisenman, R.N.: The Myc/Max/Mad network and the transcriptional control of cell behavior. Annu. Rev. Cell Dev. Biol. 16, 653–699 (2000)CrossRefGoogle Scholar
  18. 18.
    Cortes, C., Vapnik, V.: Support-vector networks. Machine Learning 20, 273–297 (1995)zbMATHGoogle Scholar
  19. 19.
    Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001)zbMATHCrossRefGoogle Scholar
  20. 20.
    Rosenbloom, K.R., Dreszer, T.R., Long, J.C., et al.: ENCODE whole-genome data in the UCSC Genome Browser: update, Nucleic Acids Research 40, D912–D917 (2012)Google Scholar
  21. 21.
    Workman, C.T., Yin, Y., Corcoran, D., et al.: enoLOGOS: a versatile web tool for energy normalized sequence logos. Nucl. Acids Res. 33, W389 (2005)Google Scholar
  22. 22.
    Stormo, G.D.: DNA binding sites: representation and discovery. Bioinformatics 16, 16–23 (2000)CrossRefGoogle Scholar
  23. 23.
    Gordân, R., Hartemink, A., Bulyk, M.: Distinguishing direct versus indirect transcription factor-DNA interactions. Genome Res. 19, 2090–2100 (2009)CrossRefGoogle Scholar
  24. 24.
    Song, L., Crawford, G.E.: DNase-seq: A high-resolution technique for mapping active gene regulatory elements across the genome from mammalian cells. Cold Spring Harbor Protocols 2010, pdb.prot5384 (2010)Google Scholar
  25. 25.
    Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 1–27 (2011)CrossRefGoogle Scholar
  26. 26.
    Schwarz, D.F., König, I.R., Ziegler, A.: On safari to random jungle: a fast implementation of random forests for high-dimensional data. Bioinformatics 26, 1752–1758 (2010)CrossRefGoogle Scholar
  27. 27.
    Díaz-Uriarte, R., Alvarez de Andrés, S.: Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7, 3 (2006)CrossRefGoogle Scholar
  28. 28.
    Luo, Q., Li, J., Cenkci, B., Kretzner, L.: Autorepression of c-myc requires both initiator and E2F-binding site elements and cooperation with the p107 gene product. Oncogene 23, 1088–1097 (2004)CrossRefGoogle Scholar
  29. 29.
    Negorev, D.G., Vladimirova, O.V., Kossenkov, A.V., et al.: Sp100 as a potent tumor suppressor: accelerated senescence and rapid malignant transformation of human fibroblasts through modulation of an embryonic stem cell program. Cancer Research 70, 9991–10001 (2010)CrossRefGoogle Scholar
  30. 30.
    Sobek-Klocke, I., Disque-Kochem, C., Ronsiek, M., Klocke, R., et al.: The human gene ZFP161 on 18p11.21-pter encodes a putative c-myc repressor and is homologous to murine Zfp161 (Chr 17) and Zfp161-rs1 (X Chr). Genomics 43, 156–164 (1997)CrossRefGoogle Scholar
  31. 31.
    Chen, G., Zhou, Q.: Searching ChIP-seq genomic islands for combinatorial regulatory codes in mouse ES cells. BMC Genomics 12, 515 (2011)CrossRefGoogle Scholar
  32. 32.
    Machanick, P., Bailey, T.L.: MEME-ChIP: motif analysis of large DNA datasets. Bioinformatics 27, 1696–1697 (2011)CrossRefGoogle Scholar
  33. 33.
    Thomas-Chollier, M., Herrmann, C., Defrance, M., et al.: RSAT peak-motifs: motif analysis in full-size ChIP-seq datasets. NAR 40, e31 (2012)Google Scholar
  34. 34.
    Whitington, T., Frith, M.C., Johnson, J., Bailey, T.L.: Inferring transcription factor complexes from ChIP-seq data. NAR 39, e98 (2011)Google Scholar
  35. 35.
    Gerstein, M.B., Kundaje, A., Hariharan, M., et al.: Architecture of the human regulatory network derived from ENCODE data. Nature 489, 91–100 (2012)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Alina Munteanu
    • 1
  • Raluca Gordân
    • 2
  1. 1.Faculty of Computer ScienceAlexandru I. Cuza UniversityIasiRomania
  2. 2.Institute for Genome Sciences and Policy, Departments of Biostatistics & Bioinformatics, Computer Science, and Molecular Genetics and MicrobiologyDuke UniversityDurhamUSA

Personalised recommendations