Skip to main content

Using the Gibbs Motif Sampler for Phylogenetic Footprinting

  • Protocol
Comparative Genomics

Part of the book series: Methods in Molecular Biology™ ((MIMB,volume 395))

Summary

The Gibbs Motif Sampler (Gibbs) is a software package used to predict conserved elements in biopolymer sequences. Although the software can be used to locate conserved motifs in protein sequences, its most common use is the prediction of transcription factor binding sites (TFBSs) in promoters upstream of gene sequences. We will describe approaches that use Gibbs to locate TFBSs in a collection of orthologous nucleotide sequences, i.e., phylogenetic footprinting. To illustrate this technique, we present examples that use Gibbs to detect binding sites for the transcription factor LexA in orthologous sequence data from representative species belonging to two different proteobacterial divisions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Thompson, W., Rouchka, E. C., and Lawrence, C. E. (2003) Gibbs Recursive Sampler: finding transcription factor binding sites. Nucleic Acids Res. 31, 3580–3585.

    Article  CAS  PubMed  Google Scholar 

  2. Yan, B., Methe, B. A., Lovley, D. R., and Krushkal, J. (2004) Computational prediction of conserved operons and phylogenetic footprinting of transcription regulatory elements in the metal-reducing bacterial family Geobacteraceae. J. Theor. Biol. 230, 133–144.

    Article  CAS  PubMed  Google Scholar 

  3. McCue, L., Thompson, W., Carmack, C., et al. (2001) Phylogenetic footprinting of transcription factor binding sites in proteobacterial genomes. Nucleic Acids Res. 29, 774–782.

    Article  CAS  PubMed  Google Scholar 

  4. McCue, L. A., Thompson, W., Carmack, C. S., and Lawrence, C. E. (2002) Factors influencing the identification of transcription factor binding sites by cross-species comparison. Genome Res. 12, 1523–1532.

    Article  CAS  PubMed  Google Scholar 

  5. Conlan, S., Lawrence, C., and McCue, L. A. (2005) Rhodopseudomonas palustris regulons detected by cross-species analysis of alphaproteobacterial genomes. Appl. Environ. Microbiol. 71, 7442–7452.

    Article  CAS  PubMed  Google Scholar 

  6. Sandelin, A., Wasserman, W. W., and Lenhard, B. (2004) ConSite: web-based prediction of regulatory elements using cross-species comparison. Nucleic Acids Res. 32, W249–W252.

    Article  CAS  PubMed  Google Scholar 

  7. Sinha, S., Schroeder, M., Unnerstall, U., Gaul, U., and Siggia, E. (2004) Cross-species comparison significantly improves genome-wide prediction of cis-regulatory modules in Drosophila. BMC Bioinformatics 5, 129.

    Article  PubMed  Google Scholar 

  8. Thompson, W., Palumbo, M. J., Wasserman, W. W., Liu, J. S., and Lawrence, C. E. (2004) Decoding human regulatory circuits. Genome Res. 14, 1967–1974.

    Article  CAS  PubMed  Google Scholar 

  9. Wasserman, W. W., Palumbo, M., Thompson, W., Fickett, J. W., and Lawrence, C. E. (2000) Human-mouse genome comparisons to locate regulatory sites. Nat. Genet. 26, 225–228.

    Article  CAS  PubMed  Google Scholar 

  10. Lee, T. K. and Friedman, J. M. (2005) Analysis of NF1 transcriptional regulatory elements. Am. J. Med. Genet. A. 137A, 130–135.

    Article  Google Scholar 

  11. Bailey, T. L. and Elkan, C. (1995) Unsupervised learning of multiple motifs in biopolymers using EM. Machine Learning 21, 51–80.

    Google Scholar 

  12. Thijs, G., Marchal, K., Lescot, M., et al. (2002) A Gibbs sampling method to detect overrepresented motifs in the upstream regions of coexpressed genes. J. Comput. Biol. 9, 447–464.

    Article  CAS  PubMed  Google Scholar 

  13. Blanchette, M., Schwikowski, B., and Tompa, M. (2002) Algorithms for phylogenetic footprinting. J. Comput. Biol. 9, 211–223.

    Article  CAS  PubMed  Google Scholar 

  14. Buhler, J. and Tompa, M. (2002) Finding motifs using random projections. J. Comput. Biol. 9, 225–242.

    Article  CAS  PubMed  Google Scholar 

  15. Marsan, L. and Sagot, M. F. (2000) Algorithms for extracting structured motifs using a suffix tree with an application to promoter and regulatory site consensus identification. J. Comput. Biol. 7, 345–362.

    Article  CAS  PubMed  Google Scholar 

  16. Sinha, S. and Tompa, M. (2002) Discovery of novel transcription factor binding sites by statistical overrepresentation. Nucleic. Acids Res. 30, 5549–5560.

    Article  CAS  PubMed  Google Scholar 

  17. Stormo, G. D. (1990) Consensus patterns in DNA. Methods Enzymol. 183, 211–221.

    Article  CAS  PubMed  Google Scholar 

  18. Sinha, S., Blanchette, M., and Tompa, M. (2004) PhyME: a probabilistic algorithm for finding motifs in sets of orthologous sequences. BMC Bioinformatics 5, 170.

    Article  PubMed  Google Scholar 

  19. Lawrence, C. E., and Reilly, A. A. (1990) An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Proteins 7, 41–51.

    Article  CAS  PubMed  Google Scholar 

  20. Lawrence, C., Altschul, S., Boguski, M., Liu, J., Neuwald, A., and Wootton, J. (1993) Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262, 208–214.

    Article  CAS  PubMed  Google Scholar 

  21. Neuwald, A., Liu, J., and Lawrence, C. (1995) Gibbs motif sampling: detection of bacterial outer membrane protein repeats. Protein Science 4, 1618–1632.

    Article  CAS  PubMed  Google Scholar 

  22. Liu, J., Neuwald, A., and Lawrence, C. (1995) Bayesian models for multiple local sequence alignment and Gibbs sampling strategies. J. Amer. Stat. Assoc. 90, 1156–1170.

    Article  Google Scholar 

  23. Liu, J. S., Neuwald, A. F., and Lawrence, C. E. (1999) Markovian structures in biological sequence alignments. J. Amer. Stat. Assoc. 94, 1–15.

    Article  Google Scholar 

  24. Thompson, W., McCue, L. A., and Lawrence, C. E. (2005) Using the Gibbs Motif Sampler to find conserved domains in DNA and protein sequences. In Current Protocols in Bioinformatics, (Baxevanis, A. D., Davison, D. B., Page, R. D. M., Petsko, G. A., Stein, L. D., and Stormo, G. D., eds.), John Wiley 8 Sons, Inc., New York, NY, pp. 2.8.1–2.8.38.

    Google Scholar 

  25. Remm, M., Storm, C. E. V., and Sonnhammer, E. L. L. (2001) Automatic clustering of orthologs and in-paralogs from pairwise species comparisons J. Mol. Biol. 314, 1041–1052.

    Article  CAS  PubMed  Google Scholar 

  26. Altschul, S. F., Madden, T. L., Schaffer, A. A., et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucl. Acids Res. 25, 3389–3402.

    Article  CAS  PubMed  Google Scholar 

  27. Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990) Basic local alignment search tool. J. Mol. Biol. 215, 403–410.

    CAS  PubMed  Google Scholar 

  28. Marchal, K., Thijs, G., Keersmaecker, S. D., Monsieurs, P., Moor, B. D., and Vanderleyden, J. (2003) Genome-specific higher-order background models to improve motif detection. Trends Microbiol. 11, 61–66.

    Article  CAS  PubMed  Google Scholar 

  29. Liu, J. and Lawrence, C. (1999) Bayesian inference on biopolymer models. Bioinformatics 15, 38–52.

    Article  CAS  PubMed  Google Scholar 

  30. Wanner, B. L. (1996) Phosphorus assimilation and control of the phosphate regulon. In Escherichia coli and Salmonella: Cellular and Molecular Biology, (Neidhardt, F. C., ed.), ASM Press, Washington, DC, pp. 1357–1381.

    Google Scholar 

  31. Munch, R., Hiller, K., Barg, H., et al. (2003) PRODORIC: prokaryotic database of gene regulation. Nucleic Acids Res. 31, 266–269.

    Article  CAS  PubMed  Google Scholar 

  32. Matys, V., Fricke, E., Geffers, R., et al. (2003) TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res. 31, 374–378.

    Article  CAS  PubMed  Google Scholar 

  33. Sandelin, A., Alkema, W., Engstrom, P., Wasserman, W. W., and Lenhard, B. (2004) JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res. 32, D91–D94.

    Article  CAS  PubMed  Google Scholar 

  34. Sandelin, A., and Wasserman, W. W. (2004) Constrained binding site diversity within families of transcription factors enhances pattern discovery bioinformatics. J. Mol. Biol. 338, 207–215.

    Article  CAS  PubMed  Google Scholar 

  35. Fernandez De Henestrosa, A. R., Ogi, T., Aoyagi, S., et al. (2000) Identification of additional genes belonging to the LexA regulon in Escherichia coli. Mol. Microbiol. 35, 1560–1572.

    Article  CAS  PubMed  Google Scholar 

  36. Dumay, V., Inui, M., and Yukawa, H. (1999) Molecular analysis of the recA gene and SOS box of the purple non-sulfur bacterium Rhodopseudomonas palustris no. 7. Microbiology 145, 1275–1285.

    Article  CAS  PubMed  Google Scholar 

  37. Fernandez de Henestrosa, A. R., Cune, J., Mazon, G., Dubbels, B. L., Bazylinski, D. A., and Barbe, J. (2003) Characterization of a new LexA binding motif in the marine magnetotactic bacterium strain MC-1. J. Bacteriol. 185, 4471–4482.

    Article  CAS  PubMed  Google Scholar 

  38. Mazon, G., Erill, I., Campoy, S., Cortes, P., Forano, E., and Barbe, J. (2004) Reconstruction of the evolutionary history of the LexA-binding sequence. Microbiology 150, 3783–3795.

    Article  CAS  PubMed  Google Scholar 

  39. Schneider, T. D., and Stephens, R. M. (1990) Sequence logos: a new way to display consensus sequences. Nucleic Acids Res. 18, 6097–6100.

    Article  CAS  PubMed  Google Scholar 

  40. Smit, A. F. A., Hubley, R., and Green, P. RepeatMasker Open-3.0. 1996–2004 http://www.repeatmasker.org.

  41. Newberg, L. A., and Lawrence, C. E. (2004) Mammalian genomes ease location of human DNA functional segments but not their description. Stat. Appl. Genet. Mol. Biol. 3, 1–12.

    Google Scholar 

  42. Florczyk, M. A., McCue, L. A., Purkayastha, A., Currenti, E., Wolin, M. J., and McDonough, K. A. (2003) A family of acr-coregulated Mycobacterium tuberculosis genes shares a common DNA motif and requires Rv3133c (dosR or devR) for expression. Infect. Immun. 71, 5332–5343.

    Article  CAS  PubMed  Google Scholar 

  43. Buck, M. J., and Lieb, J. D. (2004) ChIP-chip: considerations for the design, analysis, and application of genome-wide chromatin immunoprecipitation experiments. Genomics 83, 349–360.

    Article  CAS  PubMed  Google Scholar 

  44. Wei, C. -L., Wu, Q., Vega, V. B., et al. (2006) A global map of p53 transcription-factor binding sites in the human genome. Cell 124, 207–219.

    Article  CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Humana Press Inc.

About this protocol

Cite this protocol

Thompson, W., Conlan, S., McCue, L.A., Lawrence, C.E. (2007). Using the Gibbs Motif Sampler for Phylogenetic Footprinting. In: Bergman, N.H. (eds) Comparative Genomics. Methods in Molecular Biology™, vol 395. Humana Press. https://doi.org/10.1007/978-1-59745-514-5_25

Download citation

  • DOI: https://doi.org/10.1007/978-1-59745-514-5_25

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-58829-693-1

  • Online ISBN: 978-1-59745-514-5

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics