Summary
The Gibbs Motif Sampler (Gibbs) is a software package used to predict conserved elements in biopolymer sequences. Although the software can be used to locate conserved motifs in protein sequences, its most common use is the prediction of transcription factor binding sites (TFBSs) in promoters upstream of gene sequences. We will describe approaches that use Gibbs to locate TFBSs in a collection of orthologous nucleotide sequences, i.e., phylogenetic footprinting. To illustrate this technique, we present examples that use Gibbs to detect binding sites for the transcription factor LexA in orthologous sequence data from representative species belonging to two different proteobacterial divisions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Thompson, W., Rouchka, E. C., and Lawrence, C. E. (2003) Gibbs Recursive Sampler: finding transcription factor binding sites. Nucleic Acids Res. 31, 3580–3585.
Yan, B., Methe, B. A., Lovley, D. R., and Krushkal, J. (2004) Computational prediction of conserved operons and phylogenetic footprinting of transcription regulatory elements in the metal-reducing bacterial family Geobacteraceae. J. Theor. Biol. 230, 133–144.
McCue, L., Thompson, W., Carmack, C., et al. (2001) Phylogenetic footprinting of transcription factor binding sites in proteobacterial genomes. Nucleic Acids Res. 29, 774–782.
McCue, L. A., Thompson, W., Carmack, C. S., and Lawrence, C. E. (2002) Factors influencing the identification of transcription factor binding sites by cross-species comparison. Genome Res. 12, 1523–1532.
Conlan, S., Lawrence, C., and McCue, L. A. (2005) Rhodopseudomonas palustris regulons detected by cross-species analysis of alphaproteobacterial genomes. Appl. Environ. Microbiol. 71, 7442–7452.
Sandelin, A., Wasserman, W. W., and Lenhard, B. (2004) ConSite: web-based prediction of regulatory elements using cross-species comparison. Nucleic Acids Res. 32, W249–W252.
Sinha, S., Schroeder, M., Unnerstall, U., Gaul, U., and Siggia, E. (2004) Cross-species comparison significantly improves genome-wide prediction of cis-regulatory modules in Drosophila. BMC Bioinformatics 5, 129.
Thompson, W., Palumbo, M. J., Wasserman, W. W., Liu, J. S., and Lawrence, C. E. (2004) Decoding human regulatory circuits. Genome Res. 14, 1967–1974.
Wasserman, W. W., Palumbo, M., Thompson, W., Fickett, J. W., and Lawrence, C. E. (2000) Human-mouse genome comparisons to locate regulatory sites. Nat. Genet. 26, 225–228.
Lee, T. K. and Friedman, J. M. (2005) Analysis of NF1 transcriptional regulatory elements. Am. J. Med. Genet. A. 137A, 130–135.
Bailey, T. L. and Elkan, C. (1995) Unsupervised learning of multiple motifs in biopolymers using EM. Machine Learning 21, 51–80.
Thijs, G., Marchal, K., Lescot, M., et al. (2002) A Gibbs sampling method to detect overrepresented motifs in the upstream regions of coexpressed genes. J. Comput. Biol. 9, 447–464.
Blanchette, M., Schwikowski, B., and Tompa, M. (2002) Algorithms for phylogenetic footprinting. J. Comput. Biol. 9, 211–223.
Buhler, J. and Tompa, M. (2002) Finding motifs using random projections. J. Comput. Biol. 9, 225–242.
Marsan, L. and Sagot, M. F. (2000) Algorithms for extracting structured motifs using a suffix tree with an application to promoter and regulatory site consensus identification. J. Comput. Biol. 7, 345–362.
Sinha, S. and Tompa, M. (2002) Discovery of novel transcription factor binding sites by statistical overrepresentation. Nucleic. Acids Res. 30, 5549–5560.
Stormo, G. D. (1990) Consensus patterns in DNA. Methods Enzymol. 183, 211–221.
Sinha, S., Blanchette, M., and Tompa, M. (2004) PhyME: a probabilistic algorithm for finding motifs in sets of orthologous sequences. BMC Bioinformatics 5, 170.
Lawrence, C. E., and Reilly, A. A. (1990) An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Proteins 7, 41–51.
Lawrence, C., Altschul, S., Boguski, M., Liu, J., Neuwald, A., and Wootton, J. (1993) Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262, 208–214.
Neuwald, A., Liu, J., and Lawrence, C. (1995) Gibbs motif sampling: detection of bacterial outer membrane protein repeats. Protein Science 4, 1618–1632.
Liu, J., Neuwald, A., and Lawrence, C. (1995) Bayesian models for multiple local sequence alignment and Gibbs sampling strategies. J. Amer. Stat. Assoc. 90, 1156–1170.
Liu, J. S., Neuwald, A. F., and Lawrence, C. E. (1999) Markovian structures in biological sequence alignments. J. Amer. Stat. Assoc. 94, 1–15.
Thompson, W., McCue, L. A., and Lawrence, C. E. (2005) Using the Gibbs Motif Sampler to find conserved domains in DNA and protein sequences. In Current Protocols in Bioinformatics, (Baxevanis, A. D., Davison, D. B., Page, R. D. M., Petsko, G. A., Stein, L. D., and Stormo, G. D., eds.), John Wiley 8 Sons, Inc., New York, NY, pp. 2.8.1–2.8.38.
Remm, M., Storm, C. E. V., and Sonnhammer, E. L. L. (2001) Automatic clustering of orthologs and in-paralogs from pairwise species comparisons J. Mol. Biol. 314, 1041–1052.
Altschul, S. F., Madden, T. L., Schaffer, A. A., et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucl. Acids Res. 25, 3389–3402.
Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990) Basic local alignment search tool. J. Mol. Biol. 215, 403–410.
Marchal, K., Thijs, G., Keersmaecker, S. D., Monsieurs, P., Moor, B. D., and Vanderleyden, J. (2003) Genome-specific higher-order background models to improve motif detection. Trends Microbiol. 11, 61–66.
Liu, J. and Lawrence, C. (1999) Bayesian inference on biopolymer models. Bioinformatics 15, 38–52.
Wanner, B. L. (1996) Phosphorus assimilation and control of the phosphate regulon. In Escherichia coli and Salmonella: Cellular and Molecular Biology, (Neidhardt, F. C., ed.), ASM Press, Washington, DC, pp. 1357–1381.
Munch, R., Hiller, K., Barg, H., et al. (2003) PRODORIC: prokaryotic database of gene regulation. Nucleic Acids Res. 31, 266–269.
Matys, V., Fricke, E., Geffers, R., et al. (2003) TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res. 31, 374–378.
Sandelin, A., Alkema, W., Engstrom, P., Wasserman, W. W., and Lenhard, B. (2004) JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res. 32, D91–D94.
Sandelin, A., and Wasserman, W. W. (2004) Constrained binding site diversity within families of transcription factors enhances pattern discovery bioinformatics. J. Mol. Biol. 338, 207–215.
Fernandez De Henestrosa, A. R., Ogi, T., Aoyagi, S., et al. (2000) Identification of additional genes belonging to the LexA regulon in Escherichia coli. Mol. Microbiol. 35, 1560–1572.
Dumay, V., Inui, M., and Yukawa, H. (1999) Molecular analysis of the recA gene and SOS box of the purple non-sulfur bacterium Rhodopseudomonas palustris no. 7. Microbiology 145, 1275–1285.
Fernandez de Henestrosa, A. R., Cune, J., Mazon, G., Dubbels, B. L., Bazylinski, D. A., and Barbe, J. (2003) Characterization of a new LexA binding motif in the marine magnetotactic bacterium strain MC-1. J. Bacteriol. 185, 4471–4482.
Mazon, G., Erill, I., Campoy, S., Cortes, P., Forano, E., and Barbe, J. (2004) Reconstruction of the evolutionary history of the LexA-binding sequence. Microbiology 150, 3783–3795.
Schneider, T. D., and Stephens, R. M. (1990) Sequence logos: a new way to display consensus sequences. Nucleic Acids Res. 18, 6097–6100.
Smit, A. F. A., Hubley, R., and Green, P. RepeatMasker Open-3.0. 1996–2004 http://www.repeatmasker.org.
Newberg, L. A., and Lawrence, C. E. (2004) Mammalian genomes ease location of human DNA functional segments but not their description. Stat. Appl. Genet. Mol. Biol. 3, 1–12.
Florczyk, M. A., McCue, L. A., Purkayastha, A., Currenti, E., Wolin, M. J., and McDonough, K. A. (2003) A family of acr-coregulated Mycobacterium tuberculosis genes shares a common DNA motif and requires Rv3133c (dosR or devR) for expression. Infect. Immun. 71, 5332–5343.
Buck, M. J., and Lieb, J. D. (2004) ChIP-chip: considerations for the design, analysis, and application of genome-wide chromatin immunoprecipitation experiments. Genomics 83, 349–360.
Wei, C. -L., Wu, Q., Vega, V. B., et al. (2006) A global map of p53 transcription-factor binding sites in the human genome. Cell 124, 207–219.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Humana Press Inc.
About this protocol
Cite this protocol
Thompson, W., Conlan, S., McCue, L.A., Lawrence, C.E. (2007). Using the Gibbs Motif Sampler for Phylogenetic Footprinting. In: Bergman, N.H. (eds) Comparative Genomics. Methods in Molecular Biology™, vol 395. Humana Press. https://doi.org/10.1007/978-1-59745-514-5_25
Download citation
DOI: https://doi.org/10.1007/978-1-59745-514-5_25
Publisher Name: Humana Press
Print ISBN: 978-1-58829-693-1
Online ISBN: 978-1-59745-514-5
eBook Packages: Springer Protocols