Detecting Regulatory Sites Using PhyloGibbs

  • Rahul Siddharthan
  • Erik van Nimwegen
Part of the Methods in Molecular Biology™ book series (MIMB, volume 395)


PhyloGibbs is a program that uses Gibbs sampling to predict putative binding sites for transcription factors in DNA. It has two notable advances over previous algorithms for this task: it handles phylogenetically related sequence systematically, and it evaluates the significance of each predicted site via statistical sampling. In this article, we explain how to use PhyloGibbs effectively. We describe the essential command-line options in detail, and discuss other considerations that arise in practical situations.


Gene regulation binding sites motif finding 


  1. 1.
    Siddharthan, R., Siggia, E. D., and van Nimwegen, E. Phylogibbs: (2005) A gibbs sampling motif finder that incorporates phylogeny. PLoS Comput. Biol. 1, e67.CrossRefPubMedGoogle Scholar
  2. 2.
    Lawrence, C. E., Altschul. S. F., Boguski, M. S., Liu, J. S., Neuwald, A. F., and Wootton, J. C. (1993) Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262, 208–214.CrossRefPubMedGoogle Scholar
  3. 3.
    Bailey, T. L. and Elkan, C. (1994) Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc. Int. Conf. Intell. Syst. Mol. Biol. 2, 28–36.Google Scholar
  4. 4.
    Rajewsky, N., Vergassola, M., Gaul, U., and Siggia, E. D. (2002) Computational detection of genomic cis-regulatory modules applied to body patterning in the early Drosophila embryo. BMC Bioinformatics 3, 30.CrossRefPubMedGoogle Scholar
  5. 5.
    Sinha, S., van Nimwegen, E., and Siggia, E. D. (2003) A probabilistic method to detect regulatory modules. Bioinformatics 19, 292–301.CrossRefGoogle Scholar
  6. 6.
    Sinha, S., Schroeder, M. D., Unnerstall, U., Gaul, U., and Siggia, E. D. (2004) Cross-species comparison significantly improves genome-wide prediction of cis-regulatory modules in Drosophila. BMC Bioinformatics 5, 129.CrossRefPubMedGoogle Scholar
  7. 7.
    Berman, B. P., Pfeiffer, B. D., Laverty, T. R., et al. (2004) Computational identification of developmental enhancers: conservation and function of transcription factor binding-site clusters in Drosophila melanogaster and Drosophila pseudoobscura. Genome Biol. 5, R61.CrossRefPubMedGoogle Scholar
  8. 8.
    Berman, B. P., Barret, Y. N., Pfeiffer, D., et al. (2002) Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome. Proc. Natl. Acad. Sci. USA 99, 757–762.CrossRefPubMedGoogle Scholar
  9. 9.
    Johansson, O., Alkema, W., Wasserman, W. W., and Lagergren, J. (2003) Identification of functional clusters of transcription factor binding motifs in genome sequences: the MSCAN algorithm. Bioinformatics 19, 169–176.CrossRefGoogle Scholar
  10. 10.
    Quackenbush, J. (2001) Computational analysis of microarray data. Nat. Rev. Genet. 2, 418–427.CrossRefPubMedGoogle Scholar
  11. 11.
    Blanchette, M. and Tompa, M. (2003) FootPrinter: a program designed for phylogenetic footprinting. Nucleic Acids Res. 31, 3840–3842.CrossRefPubMedGoogle Scholar
  12. 12.
    Blanchette, M. and Tompa, M. (2002) Discovery of regulatory elements by a computational method for phylogenetic footprinting. Genome Res. 12, 739–748.CrossRefPubMedGoogle Scholar
  13. 13.
    Dermitzakis, E. T., Bergman, C. M., and Clark, A. G. (2003) Tracing the evolutionary history of drosophila regulatory regions with models that identify transcription factor binding sites. Mol. Biol. Evol. 20, 703–714.CrossRefPubMedGoogle Scholar
  14. 14.
    Emberly, E., Rajewsky, N., and Siggia, E. D. (2003) Conservation of regulatory elements between two species of drosophila. BMC Bioinformatics 4, 57.CrossRefPubMedGoogle Scholar
  15. 15.
    Morgenstern, B. (1999) DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment. Bioinformatics 15, 211–218.CrossRefPubMedGoogle Scholar
  16. 16.
    Siddharthan, R. (2006) Sigma: multiple alignment of weakly-conserved non-coding dna sequences. BMC Bioinformatics 7, 143.CrossRefPubMedGoogle Scholar
  17. 17.
    Thompson, J. D., Higgins, D. G., and Gibson, T. J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680.CrossRefPubMedGoogle Scholar
  18. 18.
    Brudno, M., Do, C. B., Cooper, G. M., et al. (2003) LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA. Genome Res. 13, 721–731.CrossRefPubMedGoogle Scholar
  19. 19.
    Matys, V., Fricke, E., Geffers, R., et al. (2003) TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res. 31, 374–378.CrossRefPubMedGoogle Scholar

Copyright information

© Humana Press Inc. 2007

Authors and Affiliations

  • Rahul Siddharthan
    • 1
  • Erik van Nimwegen
    • 2
  1. 1.Institute of Mathematical SciencesIndia
  2. 2.Biozentrum, University of Basel, and Swiss Institute of BioinformaticsSwitzerland

Personalised recommendations