Abstract
In this review, we discuss the general problem of understanding transcriptional regulation from DNA sequence and prior information. The main tasks we discuss are predicting local regions of DNA, cis-regulatory modules (CRMs) that contain binding sites for transcription factors (TFs), and predicting individual binding sites. We review various existing methods, and then describe the approach taken by PhyloGibbs, a recent motif-finding algorithm that we developed to predict TF binding sites, and PhyloGibbs-MP, an extension to PhyloGibbs that tackles other tasks in regulatory genomics, particularly prediction of CRMs.
Similar content being viewed by others
Abbreviations
- CRMS:
-
cis-regulatory modules
- MCMC:
-
Markor Chain Monte Corlo
- PWMS:
-
position weight matrices
- TFs:
-
transcription factors
References
Amir A, Lewenstein M and Porat E 2004 Faster algorithms for string matching with k mismatches; J. Algorithms 50 257–275
Bailey T L and Elkan C 1994 Fitting a mixture model by expectation maximization to discover motifs in biopolymers; Proc. Int. Conf. Intell. Syst. Mol. Biol. 2 28–36
Berman B P, Nibu Y, Pfeiffer B D, Tomancak P, Celniker S E, Levine M, Rubin G M and Eisen M B 2002 Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome; Proc. Natl. Acad. Sci. USA 99 757–762
Berman B P, Pfeiffer B D, Laverty T R, Salzberg S L, Rubin G M, Eisen M B and Celniker S E 2004 Computational identification of developmental enhancers: conservation and function of transcription factor binding-site clusters in Drosophila melanogaster and Drosophila pseudoobscura; Genome Biol. 5 R61
Dermitzakis E T, Bergman C M and Clark A G 2003 Tracing the evolutionary history of drosophila regulatory regions with models that identify transcription factor binding sites; Mol. Biol. Evol. 20 703–714
Djordjevic M, Sengupta A M and Shraiman B I 2003 A biophysical approach to transcription factor binding site discovery; Genome Res. 13 2381–2390
Emberly E, Rajewsky N and Siggia E D 2003 Conservation of regulatory elements between two species of drosophila; BMC Bioinformatics 4 57
He L and Hannon G J 2004 MicroRNAs: small RNAs with a big role in gene regulation; Nat. Rev. Genet. 5 522–531
Lawrence C E, Altschul S F, Boguski M S, Liu J S, Neuwald A F and Wootton J C 1993 Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment; Science 262 208–214
Lettice L A, Heaney S J H, Purdie L A, Li L, de Beer P, Oostra B A, Goode D, Elgar G, Hill R E and de Graaff E 2003A long-range Shh enhancer regulates expression in the developing limb and fin and is associated with preaxial polydactyly; Hum. Mol. Genet. 12 1725–1735
Matzke M A and Birchler J A 2005 RNAi-mediated pathways in the nucleus; Nat. Rev. Genet. 6 24–35
Morgenstern B 1999 DIALIGN 2: improvement of the segmenttosegment approach to multiple sequence alignment; Bioinformatics 15 211–218
Pearson H 2006 Genetics: what is a gene?; Nature (London) 441 398–401
Pierstorff N, Bergman C M and Wiehe T 2006 Identifying cis-regulatory modules by combining comparative and compositional analysis of DNA; Bioinformatics 22 2858–2864
Sagot M-F 1998 Spelling approximate repeated or common motifs using a suffix tree; in Latin 98, lecture notes in computer science (Springer-Verlag) vol. 1380, pp 111–127
Segal E, Fondufe-Mittendorf Y, Chen L, Thastrom A, Field Y, Moore I K, Wang J-P Z and Widom J 2006 A genomic code for nucleosome positioning; Nature (London) 442 772–778
Siddharthan R, Siggia E D and van Nimwegen E 2005 Phylogibbs: A gibbs sampling motif finder that incorporates phylogeny; PLoS Comput. Biol. 1 e67
Siddharthan R 2006 Sigma: multiple alignment of weakly-conserved non-coding DNA sequence; BMC Bioinformatics 7 143
Siddharthan R and van Nimwegen E 2007 Detecting regulatory sites using phylogibbs; in Comprehensive genomics, methods in molecular biology. (ed.) N H Bergman (Humana Press) (in press)
Sinha S, Liang Y and Siggia E 2006 Stubb: a program for discovery and analysis of cis-regulatory modules; Nucleic Acids Res. 34 555–559
Sinha S, Schroeder M D, Unnerstall U, Gaul U and Siggia E D 2004 Cross-species comparison significantly improves genomewide prediction of cis-regulatory modules in Drosophila; BMC Bioinformatics 5 129
Sinha S, van Nimwegen E and Siggia E D 2003 A probabilistic method to detect regulatory modules; Bioinformatics (Suppl. 1) 19 292–301
Smith, A F M and Roberts G O 1993 Bayesian computation via the gibbs sampler and related markov chain monte carlo methods; J. R. Stat. Soc. Series B (Methodological) 55 3–23
Stein L D, Mungall C, Shu S Q, Caudy M, Mangone M, Day A, Nickerson E, Stajich J E, Harris T W, Arva A and Lewis S 2002 The generic genome browser: a building block for a model organism system database; Genome Res. 12 1599–1610
Tanay A, Regev A and Shamir R 2005 Conservation and evolvability in regulatory networks: the evolution of ribosomal regulation in yeast; Proc. Natl. Acad. Sci. USA 102 7203–7208
Ukkonen E 1995 Online construction of suffix trees; Algorithmica 14 249–260
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Siddharthan, R. Parsing regulatory DNA: General tasks, techniques, and the PhyloGibbs approach. J Biosci 32 (Suppl 1), 863–870 (2007). https://doi.org/10.1007/s12038-007-0086-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12038-007-0086-0