Learning Protein-DNA Interaction Landscapes by Integrating Experimental Data through Computational Models
Transcriptional regulation is directly enacted by the interactions between DNA and many proteins, including transcription factors, nucleosomes, and polymerases. A critical step in deciphering transcriptional regulation is to infer, and eventually predict, the precise locations of these interactions, along with their strength and frequency. While recent datasets yield great insight into these interactions, individual data sources often provide only noisy information regarding one specific aspect of the complete interaction landscape. For example, chromatin immunoprecipitation (ChIP) reveals the precise binding positions of a protein, but only for one protein at a time. In contrast, nucleases like MNase and DNase reveal binding positions for many different proteins at once, but cannot easily determine the identities of those proteins. Here, we develop a novel statistical framework that integrates different sources of experimental information within a thermodynamic model of competitive binding to jointly learn a holistic view of the in vivo protein-DNA interaction landscape. We show that our framework learns an interaction landscape with increased accuracy, explaining multiple sets of data in accordance with thermodynamic principles of competitive DNA binding. The resulting model of genomic occupancy provides a precise, mechanistic vantage point from which to explore the role of protein-DNA interactions in transcriptional regulation.
Keywordsprotein-DNA interaction landscape thermodynamic modeling genomic data integration competitive binding compete
Unable to display preview. Download preview PDF.
- 3.Chen, X., Hoffman, M., Bilmes, J., et al.: A dynamic Bayesian network for identifying protein-binding footprints from single molecule-based sequencing data. Bioinformatics 26(12), i334–i342 (2010)Google Scholar
- 4.Foat, B., Morozov, A., Bussemaker, H.: Statistical mechanical modeling of genome-wide transcription factor occupancy data by MatrixREDUCE. Bioinformatics 22(14), e141–e149 (2006)Google Scholar
- 7.Gordân, R., Murphy, K., McCord, R., et al.: Curated collection of yeast transcription factor DNA binding specificity data reveals novel structural and gene regulatory insights. Genome Biology 12(12), R125 (2011)Google Scholar
- 8.Granek, J., Clarke, N.: Explicit equilibrium modeling of transcription-factor binding and gene regulation. Genome Biology 6(10), R87 (2005)Google Scholar
- 12.Kaplan, T., Li, X.Y., Sabo, P., et al.: Quantitative models of the mechanisms that control genome-wide patterns of transcription factor binding during early Drosophila development. PLoS Genetics 7(2), e1001290 (2011)Google Scholar
- 14.Luo, K., Hartemink, A.J.: Using DNase digestion data to accurately identify transcription factor binding sites. In: Pacific Symposium on Biocomputing, pp. 80–91. World Scientific (2013)Google Scholar
- 21.Saul, L., Jordan, M.: Boltzmann chains and hidden Markov models. Advances in Neural Information Processing Systems, pp. 435–442. MIT Press (1995)Google Scholar
- 26.Weirauch, M.T., Cote, A., Norel, R., et al.: Evaluation of methods for modeling transcription factor sequence specificity. Nature Biotechnology 31(2), 126–134 (2013)Google Scholar