Learning Protein-DNA Interaction Landscapes by Integrating Experimental Data through Computational Models

  • Jianling Zhong
  • Todd Wasson
  • Alexander J. Hartemink
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8394)


Transcriptional regulation is directly enacted by the interactions between DNA and many proteins, including transcription factors, nucleosomes, and polymerases. A critical step in deciphering transcriptional regulation is to infer, and eventually predict, the precise locations of these interactions, along with their strength and frequency. While recent datasets yield great insight into these interactions, individual data sources often provide only noisy information regarding one specific aspect of the complete interaction landscape. For example, chromatin immunoprecipitation (ChIP) reveals the precise binding positions of a protein, but only for one protein at a time. In contrast, nucleases like MNase and DNase reveal binding positions for many different proteins at once, but cannot easily determine the identities of those proteins. Here, we develop a novel statistical framework that integrates different sources of experimental information within a thermodynamic model of competitive binding to jointly learn a holistic view of the in vivo protein-DNA interaction landscape. We show that our framework learns an interaction landscape with increased accuracy, explaining multiple sets of data in accordance with thermodynamic principles of competitive DNA binding. The resulting model of genomic occupancy provides a precise, mechanistic vantage point from which to explore the role of protein-DNA interactions in transcriptional regulation.


protein-DNA interaction landscape thermodynamic modeling genomic data integration competitive binding compete 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Biggin, M.: Animal transcription networks as highly connected, quantitative continua. Developmental Cell 21(4), 611–626 (2011)CrossRefGoogle Scholar
  2. 2.
    Bryan, A.K., Goranov, A., Amon, A., et al.: Measurement of mass, density, and volume during the cell cycle of yeast. Proceedings of the National Academy of Sciences 107(3), 999–1004 (2010)CrossRefGoogle Scholar
  3. 3.
    Chen, X., Hoffman, M., Bilmes, J., et al.: A dynamic Bayesian network for identifying protein-binding footprints from single molecule-based sequencing data. Bioinformatics 26(12), i334–i342 (2010)Google Scholar
  4. 4.
    Foat, B., Morozov, A., Bussemaker, H.: Statistical mechanical modeling of genome-wide transcription factor occupancy data by MatrixREDUCE. Bioinformatics 22(14), e141–e149 (2006)Google Scholar
  5. 5.
    Ghaemmaghami, S., Huh, W.K., Bower, K., et al.: Global analysis of protein expression in yeast. Nature 425(6959), 737–741 (2003)CrossRefGoogle Scholar
  6. 6.
    Gordân, R., Hartemink, A.J., Bulyk, M.: Distinguishing direct versus indirect transcription factor-DNA interactions. Genome Research 19(11), 2090–2100 (2009)CrossRefGoogle Scholar
  7. 7.
    Gordân, R., Murphy, K., McCord, R., et al.: Curated collection of yeast transcription factor DNA binding specificity data reveals novel structural and gene regulatory insights. Genome Biology 12(12), R125 (2011)Google Scholar
  8. 8.
    Granek, J., Clarke, N.: Explicit equilibrium modeling of transcription-factor binding and gene regulation. Genome Biology 6(10), R87 (2005)Google Scholar
  9. 9.
    Harbison, C., Gordon, D., Lee, T., et al.: Transcriptional regulatory code of a eukaryotic genome. Nature 431(7004), 99–104 (2004)CrossRefGoogle Scholar
  10. 10.
    Henikoff, J., Belsky, J., Krassovsky, K., et al.: Epigenome characterization at single base-pair resolution. Proceedings of the National Academy of Sciences 108(45), 18318–18323 (2011)CrossRefGoogle Scholar
  11. 11.
    Hesselberth, J., Chen, X., Zhang, Z., et al.: Global mapping of protein-DNA interactions in vivo by digital genomic footprinting. Nature Methods 6(4), 283–289 (2009)CrossRefGoogle Scholar
  12. 12.
    Kaplan, T., Li, X.Y., Sabo, P., et al.: Quantitative models of the mechanisms that control genome-wide patterns of transcription factor binding during early Drosophila development. PLoS Genetics 7(2), e1001290 (2011)Google Scholar
  13. 13.
    Lickwar, C.R., Mueller, F., Hanlon, S.E., et al.: Genome-wide protein-DNA binding dynamics suggest a molecular clutch for transcription factor function. Nature 484(7393), 251–255 (2012)CrossRefGoogle Scholar
  14. 14.
    Luo, K., Hartemink, A.J.: Using DNase digestion data to accurately identify transcription factor binding sites. In: Pacific Symposium on Biocomputing, pp. 80–91. World Scientific (2013)Google Scholar
  15. 15.
    MacIsaac, K., Wang, T., Gordon, D., et al.: An improved map of conserved regulatory sites for Saccharomyces cerevisiae. BMC Bioinformatics 7, 113 (2006)CrossRefGoogle Scholar
  16. 16.
    Pique-Regi, R., Degner, J.F., Pai, A.A., et al.: Accurate inference of transcription factor binding from DNA sequence and chromatin accessibility data. Genome Research 21(3), 447–455 (2011)CrossRefGoogle Scholar
  17. 17.
    Rabiner, L.: A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2), 257–286 (1989)CrossRefGoogle Scholar
  18. 18.
    Ren, B., Robert, F., Wyrick, J., et al.: Genome-wide location and function of DNA binding proteins. Science 290(5500), 2306–2309 (2000)CrossRefGoogle Scholar
  19. 19.
    Rhee, H., Pugh, B.: Comprehensive genome-wide protein-DNA interactions detected at single-nucleotide resolution. Cell 147(6), 1408–1419 (2011)CrossRefGoogle Scholar
  20. 20.
    Rhee, H., Pugh, B.: Genome-wide structure and organization of eukaryotic pre-initiation complexes. Nature 483(7389), 295–301 (2012)CrossRefGoogle Scholar
  21. 21.
    Saul, L., Jordan, M.: Boltzmann chains and hidden Markov models. Advances in Neural Information Processing Systems, pp. 435–442. MIT Press (1995)Google Scholar
  22. 22.
    Segal, E., Raveh-Sadka, T., Schroeder, M., et al.: Predicting expression patterns from regulatory sequence in Drosophila segmentation. Nature 451(7178), 535–540 (2008)CrossRefGoogle Scholar
  23. 23.
    Tanay, A.: Extensive low-affinity transcriptional interactions in the yeast genome. Genome Research 16(8), 962–972 (2006)CrossRefGoogle Scholar
  24. 24.
    Teif, V., Rippe, K.: Calculating transcription factor binding maps for chromatin. Briefings in Bioinformatics 13(2), 187–201 (2012)CrossRefGoogle Scholar
  25. 25.
    Wasson, T., Hartemink, A.J.: An ensemble model of competitive multi-factor binding of the genome. Genome Research 19(11), 2101–2112 (2009)CrossRefGoogle Scholar
  26. 26.
    Weirauch, M.T., Cote, A., Norel, R., et al.: Evaluation of methods for modeling transcription factor sequence specificity. Nature Biotechnology 31(2), 126–134 (2013)Google Scholar
  27. 27.
    Zhu, C., Byers, K., McCord, R., et al.: High-resolution DNA-binding specificity analysis of yeast transcription factors. Genome Research 19(4), 556–566 (2009)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Jianling Zhong
    • 1
  • Todd Wasson
    • 2
  • Alexander J. Hartemink
    • 1
    • 3
  1. 1.Computational Biology & BioinformaticsDuke UniversityDurhamUSA
  2. 2.Lawrence Livermore National LaboratoryLivermoreUSA
  3. 3.Department of Computer ScienceDuke UniversityDurhamUSA

Personalised recommendations