Prediction of Transcription Factor Binding Sites by Integrating DNase Digestion and Histone Modification

  • Eduardo G. Gusmão
  • Christoph Dieterich
  • Ivan G. Costa
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7409)


The identification of cis-acting elements on DNA is crucial for the understanding of the complex regulatory networks that govern many cell mechanisms. However, this task is very complex since it is estimated that there are 1500 different transcription factors (TFs) in the human genome, each of which can bind to multiple loci directly or indirectly. The standard computational approach is the use of a position weight matrix (PWM) to represent the binding preference of a transcription factor and the use of statistical procedures to detect genomic regions with high binding scores. Given the small and degenerate signals of most PWMs, such approach suffers from a very high number of false positive hits. Current research has proven that genome wide assays reflecting open chromatin, such as DNase digestion or histone modifications, can improve sequence based detection of the binding location of transcription factors that are active in a particular cell type. We propose here a Multivariate Hidden Markov Model that is able to improve the prediction of transcription factor binding locations by integrating DNase digestion and histone modification data. Our methodology improves sensitivity, in comparison to existing methods, with little or no effect at specificity rates. This study shows that it is possible to improve predictability power of cis-acting elements by correctly integrating DNase and histone modification data, allowing for more sophisticated studies using a larger set of epigenetic signals.


cis-regulatory elements DNase I-hypersensitive sites histone modifications hidden markov models 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Rosenbloom, K.R., Dreszer, T.R., Long, J.C., Malladi, V.S., Sloan, C.A., Raney, B.J., Cline, M.S., Karolchik, D., Barber, G.P., Clawson, H., Diekhans, M., Fujita, P.A., Goldman, M., Gravell, R.C., Harte, R.A., Hinrichs, A.S., Kirkup, V.M., Kuhn, R.M., Learned, K., Maddren, M., Meyer, L.R., Pohl, A., Rhead, B., Wong, M.C., Zweig, A.S., Haussler, D., Kent, W.J.: ENCODE Whole-Genome Data in the UCSC Genome Browser: Update 2012. Nucleic Acids Res. 40(Database issue), D912–D917 (2012)Google Scholar
  2. 2.
    Maston, G.A., Evans, S.K., Green, M.R.: Transcriptional Regulatory Elements in the Human Genome. Annu. Rev. Genomics Hum. Genet. 7, 29–59 (2006)CrossRefGoogle Scholar
  3. 3.
    Boyle, A.P., Song, L., Lee, B.K., London, D., Keefe, D., Birney, E., Iyer, V.R., Crawford, G.E., Furey, T.S.: High-Resolution Genome-Wide In Vivo Footprinting of Diverse Transcription Factors in Human Cells. Genome Res. Biol. 21(3), 456–464 (2011)CrossRefGoogle Scholar
  4. 4.
    Cuellar-Partida, G., Buske, F.A., McLeay, R.C., Whitington, T., Noble, W.S., Bailey, T.L.: Epigenetic Priors for Identifying Active Transcription Factor Binding Sites. Bioinformatics 28(1), 56–62 (2012)CrossRefGoogle Scholar
  5. 5.
    Stormo, G.D.: DNA Binding Sites: Representation and Discovery. Bioinformatics 16(1), 16–23 (2000)CrossRefGoogle Scholar
  6. 6.
    Park, P.J.: ChIP-seq: Advantages and Challenges of a Maturing Technology. Nature Reviews Genetics 10(10), 669–680 (2009)CrossRefGoogle Scholar
  7. 7.
    Hon, G., Wang, W., Ren, B.: Discovery and Annotation of Functional Chromatin Signatures in the Human Genome. PLoS Computational Biology 5(11), e1000566 (2009)Google Scholar
  8. 8.
    Gross, D.S., Garrard, W.T.: Nuclease Hypersensitive Sites in Chromatin. Ann. Rev. Biochem. 57, 159–197 (1988)CrossRefGoogle Scholar
  9. 9.
    Crawford, G.E., Holt, I.E., Mullikin, J.C., Tai, D., National Institutes of Health Intramural Sequencing Center, Green, E.D., Wolfsberg, T.G., Collins, F.S.: Identifying Gene Regulatory Elements by Genome-Wide Recovery of DNase Hypersensitive Sites. PNAS 101(4), 992–997 (2004)CrossRefGoogle Scholar
  10. 10.
    Barski, A., Cuddapah, S., Cui, K., Roh, T., Schones, D.E., Wang, Z., Wei, G., Chepelev, I., Zhao, K.: High-Resolution Profiling of Histone Methylations in the Human Genome. Cell 129(4), 823–837 (2007)CrossRefGoogle Scholar
  11. 11.
    Won, K., Ren, B., Wang, W.: Genome-Wide Prediction of Transcription Factor Binding Sites Using an Integrated Model. Genome Biology 11(1), R7 (2010)Google Scholar
  12. 12.
    Pique-Regi, R., Degner, J.F., Pai, A.A., Gaffney, D.J., Gilad, Y., Pritchard, J.K.: Accurate Inference of Transcription Factor Binding from DNA Sequence and Chromatin Accessibility Data. Genome Res. 21(3), 447–455 (2011)CrossRefGoogle Scholar
  13. 13.
    Byrne, J.C., Valen, E., Tang, M.E., Marstrand, T., Winther, O., Piedade, I., Krogh, A., Lenhard, B., Sandelin, A.: JASPAR, the Open Access Database of Transcription Factor-Binding Profiles: New Content and Tools in the 2008 Update. Nucleic Acids Research 36(Database issue), D102–D106 (2008)Google Scholar
  14. 14.
    Matys, V., Kel-Margoulis, O.V., Fricke, E., Liebich, I., Land, S., Barre-Dirrie, A., Reuter, I., Chekmenev, D., Krull, M., Hornischer, K., Voss, N., Stegmaier, P., Lewicki-Potapov, B., Saxel, H., Kel, A.E., Wingender, E.: TRANSFAC and its Module TRANSCompel: Transcriptional Gene Regulation in Eukaryotes. Nucleic Acids Research 34(Database issue), D108–D110 (2006)Google Scholar
  15. 15.
    Newburger, D.E., Bulyk, M.L.: UniPROBE: An Online Database of Protein Binding Microarray Data on Protein? DNA Interactions. Nucleic Acids Research 37(Database issue), D77–D82 (2009)Google Scholar
  16. 16.
    Kim, T.H., Abdullaev, Z.K., Smith, A.D., Ching, K.A., Loukinov, D.I., Green, R.D., Zhang, M.Q., Lobanenkov, V.V., Ren, B.: Analysis of the Vertebrate Insulator Protein CTCF-Binding Sites in the Human Genome. Cell 128(6), 1231–1245 (2007)CrossRefGoogle Scholar
  17. 17.
    Cock, P.J.A., Antao, T., Chang, J.T., Chapman, B.A., Cox, C.J., Dalke, A., Friedberg, I., Hamelryck, T., Kauff, F., Wilczynski, B., de Hoon, M.J.L.: Biopython: Freely Available Python Tools for Computational Molecular Biology and Bioinformatics. Bioinformatics 25(11), 1422–1423 (2009)CrossRefGoogle Scholar
  18. 18.
    Boyle, A.P., Guinney, J., Crawford, G.E., Furey, T.S.: F-Seq: A Feature Density Estimator for High-Throughput Sequence Tags. Bioinformatics 24(21), 2537–2538 (2008)CrossRefGoogle Scholar
  19. 19.
    Drouin, R., Angers, M., Dallaire, N., Rose, T.M., Khandjian, E.W., Rousseau, F.: Structural and Functional Characterization of the Human FMR1 Promoter Reveals Similarities with the hnRNP-A2 Promoter Region. Human Molecular Genetics 6(12), 2051–2060 (1997)CrossRefGoogle Scholar
  20. 20.
    Mahony, S., Benos, P.V.: STAMP: A Web Tool for Exploring DNA-Binding Motif Similarities. Nucleic Acids Research 35(Web Server issue), W253–W258 (2007)Google Scholar
  21. 21.
    The General Hidden Markov Model Library (GHMM),
  22. 22.
    Boyle, A.P., Davis, S., Shulha, H.P., Meltzer, P., Margulies, E.H., Weng, Z., Furey, T.S., Crawford, G.E.: High-Resolution Mapping and Characterization of Open Chromatin across the Genome. Cell 132(2), 311–322 (2008)CrossRefGoogle Scholar
  23. 23.
    Crawford, G.E., Davis, S., Scacheri, P.C., Renaud, G., Halawi, M.J., Erdos, M.R., Green, R., Meltzer, P.S., Wolfsberg, T.G., Collins, F.S.: DNase-chip: A High Resolution Method to Identify DNase I Hypersensitive Sites Using Tiled Microarrays. Nature Methods 3(7), 503–509 (2006)CrossRefGoogle Scholar
  24. 24.
    Crawford, G.E., Holt, I.E., Whittle, J., Webb, B.D., Tai, D., Davis, S., Margulies, E.H., Chen, Y., Bernat, J.A., Ginsburg, D., Zhou, D., Luo, S., Vasicek, T.J., Daly, M.J., Wolfsberg, T.G., Collins, F.S.: Genome-Wide Mapping of DNase Hypersensitive Sites Using Massively Parallel Signature Sequencing (MPSS). Genome Res. 16(1), 123–131 (2006)CrossRefGoogle Scholar
  25. 25.
    Zhang, Y., Liu, T., Meyer, C.A., Eeckhoute, J., Johnson, D.S., Bernstein, B.E., Nusbaum, C., Myers, R.M., Brown, M., Li, W., Liu, X.S.: Model-Based Analysis of ChIP-seq (MACS). Genome Biology 9(9), R137.1–R137.9 (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Eduardo G. Gusmão
    • 1
  • Christoph Dieterich
    • 2
  • Ivan G. Costa
    • 1
  1. 1.Center of InformaticsFederal University of PernambucoRecifeBrazil
  2. 2.Berlin Institute for Medical Systems BiologyBerlinGermany

Personalised recommendations