A Computational Method to Search for DNA Structural Motifs in Functional Genomic Elements

  • Stephen C.J. Parker
  • Aaron Harlap
  • Thomas D. Tullius
Part of the Methods in Molecular Biology book series (MIMB, volume 759)


The rapidly increasing availability of DNA sequence data from modern high-throughput experimental techniques has created the need for computational algorithms to aid in motif discovery in genomic DNA. Such algorithms are typically used to find a statistical representation of the nucleotide sequence of the target site of a DNA-binding protein within a collection of DNA sequences that are thought to contain segments to which the protein is bound. A major assumption of these algorithms is that the protein recognizes the primary order of nucleotides in the sequence. However, proteins can also recognize the three-dimensional shape and structure of DNA. To account for this, we developed a computational method to predict the local structural profiles of any set of DNA sequences and then to search within these profiles for common DNA structural motifs. Here we describe the details of this method and use it to find a DNA structural motif in the Saccharomyces cerevisiae yeast genome that is associated with binding of the transcription factor RLM1, a component of the protein kinase C-mediated MAP kinase pathway.

Key words

Motif discovery hydroxyl radical DNA structure Gibbs sampling transcription factor RLM1 



We thank Eric Bishop for providing the Perl module that is used to predict hydroxyl radical cleavage patterns for any DNA sequence. SCJP was the recipient of a National Academies Ford Foundation Dissertation Fellowship. This work was supported by an ENCODE Technology Development Grant from the National Human Genome Research Institute of the National Institutes of Health to TDT (HG003541).


  1. 1.
    Harbison, C. T., Gordon, D. B., Lee, T. I., et al. (2004) Transcriptional regulatory code of a eukaryotic genome. Nature 431, 99–104.PubMedCrossRefGoogle Scholar
  2. 2.
    Stormo, G. D. (2000) DNA binding sites: representation and discovery. Bioinformatics 16, 16–23.PubMedCrossRefGoogle Scholar
  3. 3.
    Sathyapriya, R., Vijayabaskar, M. S., and Vishveshwara, S. (2008) Insights into protein–DNA interactions through structure network analysis. PLoS Comput. Biol. 4, e1000170.PubMedCrossRefGoogle Scholar
  4. 4.
    Otwinowski, Z., Schevitz, R. W., Zhang, R., et al. (1988) Crystal structure of trp repressor/operator complex at atomic resolution. Nature 335, 321–329.PubMedCrossRefGoogle Scholar
  5. 5.
    Brennan, R. G., and Matthews, B. W. (1989) Structural basis of DNA-protein recognition. Trends Biochem. Sci. 14, 286–290.PubMedCrossRefGoogle Scholar
  6. 6.
    Gartenberg, M. R., and Crothers, D. M. (1988) DNA sequence determinants of CAP-induced bending and protein binding affinity. Nature 333, 824–829.PubMedCrossRefGoogle Scholar
  7. 7.
    Price, M. A., and Tullius, T. D. (1992) Using hydroxyl radical to probe DNA structure. Methods Enzymol. 212, 194–219.PubMedCrossRefGoogle Scholar
  8. 8.
    Price, M. A., and Tullius, T. D. (1993) How the structure of an adenine tract depends on sequence context: a new model for the structure of TnAn DNA sequences. Biochemistry 32, 127–136.PubMedCrossRefGoogle Scholar
  9. 9.
    Balasubramanian, B., Pogozelski, W. K., and Tullius, T. D. (1998) DNA strand breaking by the hydroxyl radical is governed by the accessible surface areas of the hydrogen atoms of the DNA backbone. Proc. Natl. Acad. Sci. USA 95, 9738–9743.PubMedCrossRefGoogle Scholar
  10. 10.
    Jain, S. S., and Tullius, T. D. (2008) Footprinting protein-DNA complexes using the hydroxyl radical. Nat. Protoc. 3, 1092–1100.PubMedCrossRefGoogle Scholar
  11. 11.
    Greenbaum, J. A., Pang, B., and Tullius, T. D. (2007) Construction of a genome-scale structural map at single-nucleotide resolution. Genome Res., 17, 947–953.PubMedCrossRefGoogle Scholar
  12. 12.
    Greenbaum, J. A., Parker, S. C. J., and Tullius, T. D. (2007) Detection of DNA structural motifs in functional genomic elements. Genome Res. 17, 940–946.PubMedCrossRefGoogle Scholar
  13. 13.
    Lawrence, C., Altschul, S., Boguski, M., Liu, J., Neuwald, A., and Wootton, J. (1993) Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262, 208–214.PubMedCrossRefGoogle Scholar
  14. 14.
    MacIsaac, K. D., Wang, T., Gordon, D. B., Gifford, D. K., Stormo, G. D., and Fraenkel, E. (2006) An improved map of conserved regulatory sites for Saccharomyces cerevisiae. BMC Bioinformatics 7, 113.PubMedCrossRefGoogle Scholar
  15. 15.
    Stajich, J. E., Block, D., Boulez, K., et al. (2002) The Bioperl toolkit: Perl modules for the life sciences. Genome Res. 12, 1611–1618.PubMedCrossRefGoogle Scholar
  16. 16.
    Zhu, C., Byers, K., McCord, R., et al. (2009) High-resolution DNA binding specificity analysis of yeast transcription factors. Genome Res. 19, 556–566.PubMedCrossRefGoogle Scholar
  17. 17.
    Santelli, E., and Richmond, T. J. (2000) Crystal structure of MEF2A core bound to DNA at 1.5 Å resolution. J. Mol. Biol. 297, 437–449.PubMedCrossRefGoogle Scholar
  18. 18.
    Morozov, A. V., and Siggia, E. D. (2007) Connecting protein structure with predictions of regulatory sites. Proc. Natl. Acad. Sci. USA 104, 7068–7073.PubMedCrossRefGoogle Scholar
  19. 19.
    Spellman, P. T., Sherlock, G., Zhang, M. Q., et al. (1998) Comprehensive identification of cell cycle–regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell 9, 3273–3297.PubMedGoogle Scholar
  20. 20.
    Pavlidis, P., and Noble, W. S. (2003) Matrix2png: a utility for visualizing matrix data. Bioinformatics 19, 295–296.PubMedCrossRefGoogle Scholar
  21. 21.
    Schneider, T. D., and Stephens, R. M. (1990) Sequence logos: a new way to display consensus sequences. Nucleic Acids Res. 18, 6097–6100.PubMedCrossRefGoogle Scholar
  22. 22.
    Crooks, G. E., Hon, G., Chandonia, J., and Brenner, S. E. (2004) WebLogo: a sequence logo generator. Genome Res. 14, 1188–1190.PubMedCrossRefGoogle Scholar
  23. 23.
    Kent, W. J., Sugnet, C. W., Furey, T. S., et al. (2002) The human genome browser at UCSC. Genome Res. 12, 996–1006.PubMedGoogle Scholar
  24. 24.
    Karolchik, D., Kuhn, R. M., Baertsch, R., et al. (2008) The UCSC genome browser database: 2008 update. Nucleic Acids Res. 36, D773–779.PubMedCrossRefGoogle Scholar
  25. 25.
    Segal, E., and Widom, J. (2009) Poly(dA:dT) tracts: major determinants of nucleosome organization. Curr. Opin. Struct. Biol. 19, 65–71.PubMedCrossRefGoogle Scholar

Copyright information

© Humana Press 2011

Authors and Affiliations

  • Stephen C.J. Parker
    • 1
  • Aaron Harlap
    • 2
  • Thomas D. Tullius
    • 3
  1. 1.Bioinformatics ProgramBoston UniversityBostonUSA
  2. 2.Newton South High SchoolNewtonUSA
  3. 3.Bioinformatics Program, Department of ChemistryBoston UniversityBostonUSA

Personalised recommendations