Nucleosome Occupancy Information Improves de novo Motif Discovery

  • Leelavati Narlikar
  • Raluca Gordân
  • Alexander J. Hartemink
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4453)


A complete understanding of transcriptional regulatory processes in the cell requires identification of transcription factor binding sites on a genome-wide scale. Unfortunately, these binding sites are typically short and degenerate, posing a significant statistical challenge: many more matches to known transcription factor binding sites occur in the genome than are actually functional. Chromatin structure is known to play an important role in guiding transcription factors to those sites that are functional. In particular, it has been shown that active regulatory regions are usually depleted of nucleosomes, thereby enabling transcription factors to bind DNA in those regions [1]. In this paper, we describe a novel algorithm which employs an informative prior over DNA sequence positions based on a discriminative view of nucleosome occupancy; the nucleosome occupancy information comes from a recently published computational model [2]. When a Gibbs sampling algorithm with our informative prior is applied to yeast sequence-sets identified by ChIP-chip [3], the correct motif is found in 50% more cases than with an uninformative uniform prior. Moreover, if nucleosome occupancy information is not available, our informative prior reduces to a new kind of prior that can exploit discriminative information in a purely generative setting.


Transcription Factor Binding Site Gibbs Sampling Nucleosome Position Nucleosome Occupancy Informative Prior 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Lee, C., Shibata, Y., Rao, B., Strahl, B., Lieb, J.: Evidence for nucleosome depletion at active regulatory regions genome-wide. Nature Genetics 36(8), 900–905 (2004)CrossRefGoogle Scholar
  2. 2.
    Segal, E., Fondufe-Mittendorf, Y., Chen, L., Thastrom, A., Field, Y., Moore, I., Wang, J., Widom, J.: A genomic code for nucleosome positioning. Nature 442(7104), 772–778 (2006)CrossRefGoogle Scholar
  3. 3.
    Harbison, C., et al.: Transcriptional regulatory code of a eukaryotic genome. Nature 431, 99–104 (2004)CrossRefGoogle Scholar
  4. 4.
    Lee, T., et al.: Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 298, 799–804 (2002)CrossRefGoogle Scholar
  5. 5.
    Liu, X., Noll, D., Lieb, J., Clarke, N.: DIP-chip: Rapid and accurate determination of DNA binding specificity. Genome Research 15(3), 421–427 (2005)CrossRefGoogle Scholar
  6. 6.
    Mukherjee, S., Berger, M., Jona, G., Wang, X., Muzzey, D., Snyder, M., Young, R., Bulyk, M.: Rapid analysis of the DNA binding specificities of transcription factors with DNA microarrays. Nature Genetics 36(12), 1331–1339 (2004)CrossRefGoogle Scholar
  7. 7.
    Spellman, P., Sherlock, G., Zhang, M., Iyer, V., Anders, K., Eisen, M., Brown, P., Botstein, D., Futcher, B.: Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Molecular Biology of the Cell 9, 3273–3297 (1998)Google Scholar
  8. 8.
    Kim, S., Lund, J., Kiraly, M., Duke, K., Jiang, M., Stuart, J., Eizinger, A., Wylie, B., Davidson, G.: A gene expression map for Caenorhabditis elegans. Science 293, 2087–2092 (2001)CrossRefGoogle Scholar
  9. 9.
    Wasserman, W., Sandelin, A.: Applied bioinformatics for the identification of regulatory elements. Nat. Rev. Genet. 5(4), 276–287 (2004)CrossRefGoogle Scholar
  10. 10.
    Siggia, E.: Computational methods for transcriptional regulation. Current Opinion in Genetics and Development 15, 214–221 (2005)CrossRefGoogle Scholar
  11. 11.
    Workman, C., Stormo, G.: ANN-Spec: A method for discovering transcription factor binding sites with improved specificity. In: Pac. Symp. Biocomput., pp. 467–478 (2000)Google Scholar
  12. 12.
    Segal, E., Barash, Y., Simon, I., Friedman, N., Koller, D.: From sequence to expression: A probabilistic framework. In: RECOMB ’02 (2002)Google Scholar
  13. 13.
    Sinha,S,: Discriminative motifs. In: RECOMB ’02 (2002)Google Scholar
  14. 14.
    Hong, P., Liu, X., Zhou, Q., Lu, X., Liu, J., Wong, W.: A boosting approach for motif modeling using ChIP-chip data. Bioinformatics 21(11), 2636–2643 (2005)CrossRefGoogle Scholar
  15. 15.
    Sinha, S.: On counting position weight matrix matches in a sequence, with application to discriminative motif finding. Bioinformatics 22(14), e454–463 (2006)CrossRefGoogle Scholar
  16. 16.
    Tompa, M., et al.: Assessing computational tools for the discovery of transcription factor binding sites. Nat. Biotechnol. 23(1), 137–144 (2005)CrossRefMathSciNetGoogle Scholar
  17. 17.
    Almer, A., Rudolph, H., Hinnen, A., Horz, W.: Removal of positioned nucleosomes from the yeast PHO5 promoter upon PHO5 induction releases additional upstream activating DNA elements. Embo. J. 5, 2689–2696 (1986)Google Scholar
  18. 18.
    Mai, X., Chou, S., Struhl, K.: Preferential accessibility of the yeast his3 promoter is determined by a general property of the DNA sequence, not by specific elements. Cell Biol. 20, 6668–6676 (2000)Google Scholar
  19. 19.
    Sekinger, E., Moqtaderi, Z., Struhl, K.: Intrinsic histone-DNA interactions and low nucleosome density are important for preferential accessibility of promoter regions in yeast. Mol. Cell 18, 735–748 (2005)CrossRefGoogle Scholar
  20. 20.
    Yuan, G., Liu, Y., Dion, M., Slack, M., Wu, L., Altschuler, S., Rando, O.: Genome-scale identification of nucleosome positions in S. cerevisiae. Science 309, 626–630 (2005)CrossRefGoogle Scholar
  21. 21.
    Staden, R.: Computer methods to locate signals in nucleic acid sequences. Nucleic Acids Research 12, 505–519 (1984)CrossRefGoogle Scholar
  22. 22.
    Bailey, T., Elkan, C.: Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In: ISMB ’94, pp. 28–36. AAAI Press, Menlo Park (1994)Google Scholar
  23. 23.
    Gelfand, A., Smith, A.: Sampling based approaches to calculating marginal densities. Journal of the American Statistical Association 85, 398–409 (1990)zbMATHCrossRefMathSciNetGoogle Scholar
  24. 24.
    Liu, J.: The collapsed Gibbs sampler with applications to a gene regulation problem. Journal of the American Statistical Association 89, 958–966 (1994)zbMATHCrossRefMathSciNetGoogle Scholar
  25. 25.
    Liu, J., Neuwald, A., Lawrence, C.: Bayesian models for multiple local sequence alignment and Gibbs sampling strategies. Journal of the American Statistical Association 90, 1156–1170 (1995)zbMATHCrossRefGoogle Scholar
  26. 26.
    Narlikar, L., Gordân, R., Ohler, U., Hartemink, A.: Informative priors based on transcription factor structural class improve de novo motif discovery. Bioinformatics 22(14), e384–e392 (2006)CrossRefGoogle Scholar
  27. 27.
    Roth, F., Hughes, J., Estep, P., Church, G.: Finding DNA regulatory motifs within unaligned non-coding sequences clustered by whole-genome mRNA quantitation. Nature Biotech. 16, 939–945 (1998)CrossRefGoogle Scholar
  28. 28.
    Liu, X., Brutlag, D., Liu, J.: BioProspector: Discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. In: Pac. Symp. Biocomput., pp. 127–138 (2001)Google Scholar
  29. 29.
    Thijs, G., Marchal, K., Lescot, M., Rombauts, S., De Moor, B., Rouze, P., Moreau, Y.: A Gibbs sampling method to detect over-represented motifs in the upstream regions of coexpressed genes. Journal of Computational Biology 9, 447–464 (2002)CrossRefGoogle Scholar
  30. 30.
    Dorrington, R.A., Cooper, T.G.: The DAL82 protein of Saccharomyces cerevisiae binds to the DAL upstream induction sequence (UIS). Nucleic Acids Research 21(16), 3777–3784 (1993)CrossRefGoogle Scholar
  31. 31.
    Jia, Y., Rothermel, B., Thornton, J., Butow, R.A.: A basic helix-loop-helix-leucine zipper transcription complex in yeast functions in a signaling pathway from mitochondria to the nucleus. Molecular and Cellular Biology 17, 1110–1117 (1993)Google Scholar
  32. 32.
    Liu, X., Brutlag, D., Liu, J.: An algorithm for finding protein-DNA binding sites with applications to chromatin immunoprecipitation microarray experiments. Nature Biotech. 20, 835–839 (2002)Google Scholar
  33. 33.
    Kellis, M., Patterson, N., Endrizzi, M., Birren, B., Lander, E.: Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature 432, 241–254 (2003)CrossRefGoogle Scholar
  34. 34.
    Bulyk, M., Johnson, P., Church, G.: Nucleotides of transcription factor binding sites exert interdependent effects on the binding affinities of transcription factors. Nucleic Acids Research 30, 1255–1261 (2002)CrossRefGoogle Scholar
  35. 35.
    Agarwal, P., Bafna, V.: Detecting non-adjacent correlations within signals in DNA. In: RECOMB ’98 (1998)Google Scholar
  36. 36.
    Barash, Y., Elidan, G., Friedman, N., Kaplan, T.: Modeling dependencies in protein-DNA binding sites. In: RECOMB ’03 (2003)Google Scholar
  37. 37.
    Miller, W., Makova, K., Nekrutenko, A., Hardison, R.: Comparative Genomics. Annu. Rev. Genom. Human. Genet. 5, 15–56 (2004)CrossRefGoogle Scholar
  38. 38.
    Siddharthan, R., Siggia, E., Nimwegen, E.: PhyloGibbs: A Gibbs Sampling Motif Finder That Incorporates Phylogeny. PLoS Comput. Biol. 1(7), e67 (2005)CrossRefGoogle Scholar

Copyright information

© Springer Berlin Heidelberg 2007

Authors and Affiliations

  • Leelavati Narlikar
    • 1
  • Raluca Gordân
    • 1
  • Alexander J. Hartemink
    • 1
  1. 1.Department of Computer Science, Duke University, Durham, NC 27708-0129 

Personalised recommendations