Transcription Factor Centric Discovery of Regulatory Elements in Mammalian Genomes Using Alignment-Independent Conservation Maps

  • Nilanjana Banerjee
  • Andrea Califano
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4205)


The computational identification of DNA binding sites that have high affinity for a specific transcription factor is an important problem that has only been partially addressed in prokaryotes and lower eukaryotes. Given the higher length of regulatory regions and the relative low complexity of DNA binding signature, however, methods to address this problem in higher order eukaryotes are lacking. In this paper, we propose a novel computational framework, which combines cellular network reverse engineering, integrative genomics, and comparative genomic approaches, to address this problem for a set of human transcription factors. Specifically, we study the regulatory regions of putative orthologous targets of a given transcription factor, obtained by reverse engineering methods, in several mammalian genomes. Highly conserved regions are identified by pattern discovery. Finally DNA binding sites are inferred from these regions using a standard Position Weight Matrices (PWM) discovery algorithm. By framing the identification of the PWM as an optimization problem over the two parameters of the method, we are able to discover known binding sites for several genes and to propose reasonable signatures for genes that have not been previously characterized.


reverse engineering comparative genomics DNA binding site analysis pattern discovery transcriptional regulation systems biology 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Altschul, S., Erickson, B.: Significance of nucleotide sequence alignments: A method for random sequence permutation that preserves dinucleotide and codon usage. Mol. Biol. Evol. 2, 528–538 (1985)Google Scholar
  2. 2.
    Basso, K., Margolin, A.A., Stolovitzky, G., Klein, U., Dalla-Favera, R., Califano, A.: Reverse engineering of regulatory networks in human B cells. Nat. Genetics 37, 382–390 (2005)CrossRefGoogle Scholar
  3. 3.
    Blanchette, M., Tompa, M.: Discovery of regulatory elements by a computational method for mphylogenetic footprinting. Genome Research 12, 739–748 (2002)CrossRefGoogle Scholar
  4. 4.
    Cardone, M., Kandilci, A.: The Novel ETS Factor TEL2 Cooperates with Myc in B Lyemphomagenesis. Molecular and Cellular Biology 25, 2395–2405 (2005)CrossRefGoogle Scholar
  5. 5.
    Califano, A.: SPLASH: structural pattern localization analysis by sequential histograms. Bioinformatics 16, 341–357 (2000)CrossRefGoogle Scholar
  6. 6.
    Chang, C., Ye, B., Chaganti, R., Dalla-Favera, R.: BCL6, a POZ/zinc-finger protein, is a sequence-specific transcriptional repressor. PNAS 93, 6947–6952 (1996)CrossRefGoogle Scholar
  7. 7.
    Claverie, J.: Some useful statistical properties of position-weight matrices. Comput. Chemistry 18, 287–294 (1994)zbMATHCrossRefGoogle Scholar
  8. 8.
    Cliften, P., Sudarsanam, P., Desikan, A., Fulton, L., Fulton, B., Majors, J., Waterston, R., Cohen, B.A., Johnston, M.: Finding functional features in Saccharomyces genomes by mphylogenetic footprinting. Science 301, 71–76 (2003)CrossRefGoogle Scholar
  9. 9.
    Elemento, O., Tavazoie, S.: Fast and systematic genome-wide discovery of conserved regulatory elements using a non-alignment based approach. Genome Biol. 6, R18 (2005)CrossRefGoogle Scholar
  10. 10.
    Harbison, C.T., Gordon, D.B., Lee, T.I., Rinaldi, N.J., Macisaac, K.D., Danford, T.W., Hannett, N.M., Tagne, J.B., Reynolds, D.B., Yoo, J., Jennings, E.G., Zeitlinger, J., Pokholok, D.K., Kellis, M., Rolfe, P.A., Takusagawa, K.T., Lander, E.S., Gifford, D.K., Fraenkel, E., Young, R.A.: Transcriptional regulatory code of a eukaryotic genome. Nature 431, 99–104 (2004)CrossRefGoogle Scholar
  11. 11.
    Hong, S., Pusapati, R.V., Powers, J.T., Johnson, D.G.: Oncogenes and the DNA Damage Response: Myc and E2F1 Engage the ATM Signaling Pathway to Activate p53 and Induce Apoptosis. Cell. cycle 5, 801–803 (2005)CrossRefGoogle Scholar
  12. 12.
    Hartemink, A.J.: Reverse engineering gene regulatory networks. Nature Biotechnology 23, 554–555 (2005)CrossRefGoogle Scholar
  13. 13.
    Kharchenko, P., Vitkup, D., Church, G.M.: Filling gaps in a metabolic network using expression information. Bioinformatics 20, I178–I185 (2000)CrossRefGoogle Scholar
  14. 14.
    Lenhard, B., Sandelin, A., Mendoza, L., Engstrm, P., Jareborg, N., Wasserman, W.: Identification of conserved regulatory elements by comparative genome analysis. J. Biol. 2, 13 (2003)CrossRefGoogle Scholar
  15. 15.
    Liu, Y., Liu, X.S., Wei, L., Altman, R.B., Batzoglou, S.: Eukaryotic regulatory element conservation analysis and identification using comparative genomics. Genome Res. 3, 451–458 (2004)CrossRefGoogle Scholar
  16. 16.
    Docquier, F., Farrar, D., D’Arcy, V., Chernukhin, I., Robinson, A.F., Loukinov, D., Vatolin, S., Pack, S., Mackay, A., Harris, R.A., Dorricott, H., O’Hare, M.J., Lobanenkov, V., Klenova, E.: Heightened expression of CTCF in breast cancer cells is associated with resistance to apoptosis Cancer Research 65, 5122–5125 (2005)Google Scholar
  17. 17.
    Margolin, A.A., Nemenman, I., Basso, K., Wiggins, C., Stolovitzky, G., Dalla Favera, R., Califano, A.: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context. BMC Bioinformatics 7, S7 (2006)CrossRefGoogle Scholar
  18. 18.
    Margolin, A.A., Wang, K., Lim, W.K., Kustagi, M., Nemenman, I., Califano, A.: Reverse engineering cellular networks. Nature Protocols 1, 662–671 (2006)CrossRefGoogle Scholar
  19. 19.
    Ohashi, Y., Ueda, M., Kawase, T., Kawakami, Y., Toda, M.: Identification of an epigenetically silenced gene, RFX1, in human glioma cells using restriction landmark genomic scanning. Oncogene 23, 7772–7779 (2004)CrossRefGoogle Scholar
  20. 20.
    Prakash, A., Tompa, M.: Discovery of regulatory elements in vertebrates through comparative genomics. Nature Biotechnology 102, 14689–14693 (2005)Google Scholar
  21. 21.
    Schones, D., Sumazin, P., Zhang, M.Q.: Similarity of position frequency matrices for transcription factor binding sites. Bioinformatics 21, 307–313 (2005)CrossRefGoogle Scholar
  22. 22.
    Smith, A., Sumazin, P., Zhang, M.Q.: Identifying tissue-selective transcription factor binding sites in vertebrate promoters. PNAS 102, 1560–1565 (2005)CrossRefGoogle Scholar
  23. 23.
    Sinha, S., Blanchette, M., Tompa, M.: PhyME: a probabilistic algorithm for finding motifs in sets of orthologous sequences. BMC Bioinformatics 5, 170 (2004)CrossRefGoogle Scholar
  24. 24.
    Wasserman, W.W., Palumbo, M., Thompson, W., Fickett, J.: Human-mouse genome comparisons to locate regulatory sites. Nature Genetics 26, 225–228 (2000)CrossRefGoogle Scholar
  25. 25.
    Wang, T., Stormo, G.: Combining phylogenetic data with co-regulated genes to identify regulatory motifs. Bioinformatics 18, 2369–2380 (2003)CrossRefGoogle Scholar
  26. 26.
    Wang, T., Stormo, G.: Identifying the conserved network of cis-regulatory sites of a eukaryotic genome. PNAS 102, 17400–17405 (2006)CrossRefGoogle Scholar
  27. 27.
    Williams, T., Williams, M., Kuick, R., Misek, D., McDonagh, K., Hanash, S., Innis, J.: Candidate downstream regulated genes of HOX group 13 transcription factors with and without monomeric DNA binding capability. Developmental Biology 279, 462–480 (2005)CrossRefGoogle Scholar
  28. 28.
    Xie, X., Lu, J., Kulbokas, E.J., Golub, T.R., Mootha, V., Lindblad-Toh, K., Lander, E.S., Kellis, M.: Systematic discovery of regulatory motifs in human promoters and 3’ UTRs by comparison of several mammals. Nature 434, 338–345 (2005)CrossRefGoogle Scholar
  29. 29.
    Zhu, Z., Pilpel, Y., Church, G.M.: Computational identification of transcription factor binding sites via a transcription-factor-centric clustering (TFCC) algorithm. J. Mol. Biol. 318, 71–81 (2002)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Nilanjana Banerjee
    • 1
  • Andrea Califano
    • 1
  1. 1.Department of Biomedical InformaticsColumbia UniversityNew YorkUSA

Personalised recommendations