Using the Gibbs Motif Sampler for Phylogenetic Footprinting

  • William Thompson
  • Sean Conlan
  • Lee Ann McCue
  • Charles E. Lawrence
Part of the Methods in Molecular Biology™ book series (MIMB, volume 395)


The Gibbs Motif Sampler (Gibbs) is a software package used to predict conserved elements in biopolymer sequences. Although the software can be used to locate conserved motifs in protein sequences, its most common use is the prediction of transcription factor binding sites (TFBSs) in promoters upstream of gene sequences. We will describe approaches that use Gibbs to locate TFBSs in a collection of orthologous nucleotide sequences, i.e., phylogenetic footprinting. To illustrate this technique, we present examples that use Gibbs to detect binding sites for the transcription factor LexA in orthologous sequence data from representative species belonging to two different proteobacterial divisions.

Key Words

Gibbs sampling phylogenetic footprinting transcription regulation 


  1. 1.
    Thompson, W., Rouchka, E. C., and Lawrence, C. E. (2003) Gibbs Recursive Sampler: finding transcription factor binding sites. Nucleic Acids Res. 31, 3580–3585.CrossRefPubMedGoogle Scholar
  2. 2.
    Yan, B., Methe, B. A., Lovley, D. R., and Krushkal, J. (2004) Computational prediction of conserved operons and phylogenetic footprinting of transcription regulatory elements in the metal-reducing bacterial family Geobacteraceae. J. Theor. Biol. 230, 133–144.CrossRefPubMedGoogle Scholar
  3. 3.
    McCue, L., Thompson, W., Carmack, C., et al. (2001) Phylogenetic footprinting of transcription factor binding sites in proteobacterial genomes. Nucleic Acids Res. 29, 774–782.CrossRefPubMedGoogle Scholar
  4. 4.
    McCue, L. A., Thompson, W., Carmack, C. S., and Lawrence, C. E. (2002) Factors influencing the identification of transcription factor binding sites by cross-species comparison. Genome Res. 12, 1523–1532.CrossRefPubMedGoogle Scholar
  5. 5.
    Conlan, S., Lawrence, C., and McCue, L. A. (2005) Rhodopseudomonas palustris regulons detected by cross-species analysis of alphaproteobacterial genomes. Appl. Environ. Microbiol. 71, 7442–7452.CrossRefPubMedGoogle Scholar
  6. 6.
    Sandelin, A., Wasserman, W. W., and Lenhard, B. (2004) ConSite: web-based prediction of regulatory elements using cross-species comparison. Nucleic Acids Res. 32, W249–W252.CrossRefPubMedGoogle Scholar
  7. 7.
    Sinha, S., Schroeder, M., Unnerstall, U., Gaul, U., and Siggia, E. (2004) Cross-species comparison significantly improves genome-wide prediction of cis-regulatory modules in Drosophila. BMC Bioinformatics 5, 129.CrossRefPubMedGoogle Scholar
  8. 8.
    Thompson, W., Palumbo, M. J., Wasserman, W. W., Liu, J. S., and Lawrence, C. E. (2004) Decoding human regulatory circuits. Genome Res. 14, 1967–1974.CrossRefPubMedGoogle Scholar
  9. 9.
    Wasserman, W. W., Palumbo, M., Thompson, W., Fickett, J. W., and Lawrence, C. E. (2000) Human-mouse genome comparisons to locate regulatory sites. Nat. Genet. 26, 225–228.CrossRefPubMedGoogle Scholar
  10. 10.
    Lee, T. K. and Friedman, J. M. (2005) Analysis of NF1 transcriptional regulatory elements. Am. J. Med. Genet. A. 137A, 130–135.CrossRefGoogle Scholar
  11. 11.
    Bailey, T. L. and Elkan, C. (1995) Unsupervised learning of multiple motifs in biopolymers using EM. Machine Learning 21, 51–80.Google Scholar
  12. 12.
    Thijs, G., Marchal, K., Lescot, M., et al. (2002) A Gibbs sampling method to detect overrepresented motifs in the upstream regions of coexpressed genes. J. Comput. Biol. 9, 447–464.CrossRefPubMedGoogle Scholar
  13. 13.
    Blanchette, M., Schwikowski, B., and Tompa, M. (2002) Algorithms for phylogenetic footprinting. J. Comput. Biol. 9, 211–223.CrossRefPubMedGoogle Scholar
  14. 14.
    Buhler, J. and Tompa, M. (2002) Finding motifs using random projections. J. Comput. Biol. 9, 225–242.CrossRefPubMedGoogle Scholar
  15. 15.
    Marsan, L. and Sagot, M. F. (2000) Algorithms for extracting structured motifs using a suffix tree with an application to promoter and regulatory site consensus identification. J. Comput. Biol. 7, 345–362.CrossRefPubMedGoogle Scholar
  16. 16.
    Sinha, S. and Tompa, M. (2002) Discovery of novel transcription factor binding sites by statistical overrepresentation. Nucleic. Acids Res. 30, 5549–5560.CrossRefPubMedGoogle Scholar
  17. 17.
    Stormo, G. D. (1990) Consensus patterns in DNA. Methods Enzymol. 183, 211–221.CrossRefPubMedGoogle Scholar
  18. 18.
    Sinha, S., Blanchette, M., and Tompa, M. (2004) PhyME: a probabilistic algorithm for finding motifs in sets of orthologous sequences. BMC Bioinformatics 5, 170.CrossRefPubMedGoogle Scholar
  19. 19.
    Lawrence, C. E., and Reilly, A. A. (1990) An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Proteins 7, 41–51.CrossRefPubMedGoogle Scholar
  20. 20.
    Lawrence, C., Altschul, S., Boguski, M., Liu, J., Neuwald, A., and Wootton, J. (1993) Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262, 208–214.CrossRefPubMedGoogle Scholar
  21. 21.
    Neuwald, A., Liu, J., and Lawrence, C. (1995) Gibbs motif sampling: detection of bacterial outer membrane protein repeats. Protein Science 4, 1618–1632.CrossRefPubMedGoogle Scholar
  22. 22.
    Liu, J., Neuwald, A., and Lawrence, C. (1995) Bayesian models for multiple local sequence alignment and Gibbs sampling strategies. J. Amer. Stat. Assoc. 90, 1156–1170.CrossRefGoogle Scholar
  23. 23.
    Liu, J. S., Neuwald, A. F., and Lawrence, C. E. (1999) Markovian structures in biological sequence alignments. J. Amer. Stat. Assoc. 94, 1–15.CrossRefGoogle Scholar
  24. 24.
    Thompson, W., McCue, L. A., and Lawrence, C. E. (2005) Using the Gibbs Motif Sampler to find conserved domains in DNA and protein sequences. In Current Protocols in Bioinformatics, (Baxevanis, A. D., Davison, D. B., Page, R. D. M., Petsko, G. A., Stein, L. D., and Stormo, G. D., eds.), John Wiley 8 Sons, Inc., New York, NY, pp. 2.8.1–2.8.38.Google Scholar
  25. 25.
    Remm, M., Storm, C. E. V., and Sonnhammer, E. L. L. (2001) Automatic clustering of orthologs and in-paralogs from pairwise species comparisons J. Mol. Biol. 314, 1041–1052.CrossRefPubMedGoogle Scholar
  26. 26.
    Altschul, S. F., Madden, T. L., Schaffer, A. A., et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucl. Acids Res. 25, 3389–3402.CrossRefPubMedGoogle Scholar
  27. 27.
    Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990) Basic local alignment search tool. J. Mol. Biol. 215, 403–410.PubMedGoogle Scholar
  28. 28.
    Marchal, K., Thijs, G., Keersmaecker, S. D., Monsieurs, P., Moor, B. D., and Vanderleyden, J. (2003) Genome-specific higher-order background models to improve motif detection. Trends Microbiol. 11, 61–66.CrossRefPubMedGoogle Scholar
  29. 29.
    Liu, J. and Lawrence, C. (1999) Bayesian inference on biopolymer models. Bioinformatics 15, 38–52.CrossRefPubMedGoogle Scholar
  30. 30.
    Wanner, B. L. (1996) Phosphorus assimilation and control of the phosphate regulon. In Escherichia coli and Salmonella: Cellular and Molecular Biology, (Neidhardt, F. C., ed.), ASM Press, Washington, DC, pp. 1357–1381.Google Scholar
  31. 31.
    Munch, R., Hiller, K., Barg, H., et al. (2003) PRODORIC: prokaryotic database of gene regulation. Nucleic Acids Res. 31, 266–269.CrossRefPubMedGoogle Scholar
  32. 32.
    Matys, V., Fricke, E., Geffers, R., et al. (2003) TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res. 31, 374–378.CrossRefPubMedGoogle Scholar
  33. 33.
    Sandelin, A., Alkema, W., Engstrom, P., Wasserman, W. W., and Lenhard, B. (2004) JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res. 32, D91–D94.CrossRefPubMedGoogle Scholar
  34. 34.
    Sandelin, A., and Wasserman, W. W. (2004) Constrained binding site diversity within families of transcription factors enhances pattern discovery bioinformatics. J. Mol. Biol. 338, 207–215.CrossRefPubMedGoogle Scholar
  35. 35.
    Fernandez De Henestrosa, A. R., Ogi, T., Aoyagi, S., et al. (2000) Identification of additional genes belonging to the LexA regulon in Escherichia coli. Mol. Microbiol. 35, 1560–1572.CrossRefPubMedGoogle Scholar
  36. 36.
    Dumay, V., Inui, M., and Yukawa, H. (1999) Molecular analysis of the recA gene and SOS box of the purple non-sulfur bacterium Rhodopseudomonas palustris no. 7. Microbiology 145, 1275–1285.CrossRefPubMedGoogle Scholar
  37. 37.
    Fernandez de Henestrosa, A. R., Cune, J., Mazon, G., Dubbels, B. L., Bazylinski, D. A., and Barbe, J. (2003) Characterization of a new LexA binding motif in the marine magnetotactic bacterium strain MC-1. J. Bacteriol. 185, 4471–4482.CrossRefPubMedGoogle Scholar
  38. 38.
    Mazon, G., Erill, I., Campoy, S., Cortes, P., Forano, E., and Barbe, J. (2004) Reconstruction of the evolutionary history of the LexA-binding sequence. Microbiology 150, 3783–3795.CrossRefPubMedGoogle Scholar
  39. 39.
    Schneider, T. D., and Stephens, R. M. (1990) Sequence logos: a new way to display consensus sequences. Nucleic Acids Res. 18, 6097–6100.CrossRefPubMedGoogle Scholar
  40. 40.
    Smit, A. F. A., Hubley, R., and Green, P. RepeatMasker Open-3.0. 1996–2004
  41. 41.
    Newberg, L. A., and Lawrence, C. E. (2004) Mammalian genomes ease location of human DNA functional segments but not their description. Stat. Appl. Genet. Mol. Biol. 3, 1–12.Google Scholar
  42. 42.
    Florczyk, M. A., McCue, L. A., Purkayastha, A., Currenti, E., Wolin, M. J., and McDonough, K. A. (2003) A family of acr-coregulated Mycobacterium tuberculosis genes shares a common DNA motif and requires Rv3133c (dosR or devR) for expression. Infect. Immun. 71, 5332–5343.CrossRefPubMedGoogle Scholar
  43. 43.
    Buck, M. J., and Lieb, J. D. (2004) ChIP-chip: considerations for the design, analysis, and application of genome-wide chromatin immunoprecipitation experiments. Genomics 83, 349–360.CrossRefPubMedGoogle Scholar
  44. 44.
    Wei, C. -L., Wu, Q., Vega, V. B., et al. (2006) A global map of p53 transcription-factor binding sites in the human genome. Cell 124, 207–219.CrossRefPubMedGoogle Scholar

Copyright information

© Humana Press Inc. 2007

Authors and Affiliations

  • William Thompson
    • 1
  • Sean Conlan
    • 2
  • Lee Ann McCue
    • 3
  • Charles E. Lawrence
    • 4
  1. 1.Center for Computational Molecular BiologyBrown UniversityUSA
  2. 2.New York Department of HealthThe Wadsworth CenterUSA
  3. 3.Computational Biology and Bioinformatics, Pacific Northwest National LaboratoryUSA
  4. 4.Center for Computational Molecular BiologyBrown UniversityUSA

Personalised recommendations