Abstract
Identifying transcription factor binding sites (TFBSs) is essential to elucidate putative regulatory mechanisms. A common strategy is to combine cross-species conservation with single sequence TFBS annotation to yield “conserved TFBSs”. Most current methods in this field adopt a multi-step approach that segregates the two aspects. Again, it is widely accepted that the evolutionary dynamics of binding sites differ from those of the surrounding sequence. Hence, it is desirable to have an approach that explicitly takes this factor into account. Although a plethora of approaches have been proposed for the prediction of conserved TFBSs, very few explicitly model TFBS evolutionary properties, while additionally being multi-step. Recently, we introduced a novel approach to simultaneously align and annotate conserved TFBSs in a pair of sequences. Building upon the standard Smith-Waterman algorithm for local alignments, SimAnn introduces additional states for profiles to output extended alignments or annotated alignments. That is, alignments with parts annotated as gaplessly aligned TFBSs (pair-profile hits) are generated. Moreover, the pair-profile related parameters are derived in a sound statistical framework.
In this article, we extend this approach to explicitly incorporate evolution of binding sites in the SimAnn framework. We demonstrate the extension in the theoretical derivations through two position-specific evolutionary models, previously used for modelling TFBS evolution. In a simulated setting, we provide a proof of concept that the approach works given the underlying assumptions, as compared to the original work. Finally, using a real dataset of experimentally verified binding sites in human-mouse sequence pairs, we compare the new approach (eSimAnn) to an existing multi-step tool that also considers TFBS evolution.
Although it is widely accepted that binding sites evolve differently from the surrounding sequences, most comparative TFBS identification methods do not explicitly consider this. Additionally, prediction of conserved binding sites is carried out in a multi-step approach that segregates alignment from TFBS annotation. In this paper, we demonstrate how the simultaneous alignment and annotation approach of SimAnn can be further extended to incorporate TFBS evolutionary relationships. We study how alignments and binding site predictions interplay at varying evolutionary distances and for various profile qualities.
Similar content being viewed by others
Abbreviations
- PSSM:
-
Position-specific scoring matrix
- ROC:
-
receiver operator characteristics
- SW:
-
Smith-Waterman
- TFBS:
-
transcription factor binding sites
References
Bais A S, Grossmann S and Vingron M 2007 Simultaneous alignment and annotation of cis-regulatory regions; Bioinformatics 23 e44–e49
Berg J, Willmann S and Lässig M 2004 Adaptive evolution of transcription factor binding sites; BMC Evol. Biol 4 42
Chiaromonte F, Yap V B and Miller W 2002 Scoring pairwise genomic sequence alignments; Pac. Symp. Biocomput. 115–126
Dermitzakis E T and Clark A G 2002 Evolution of transcription factor binding sites in mammalian gene regulatory regions: conservation and turnover; Mol. Biol. Evol. 19 1114–1121
Durbin R, Eddy S, Krogh A and Mitchison G 1998 Biological sequence analysis (Cambridge: Cambridge University Press)
Felsenstein J 1981 Evolutionary trees from DNA sequences: a maximum likelihood approach; J. Mol. Evol. 17 368–376
Gerland U and Hwa T 2002 On the selection and evolution of regulatory DNA motifs; J. Mol. Evol. 55 386–400
Halpern A L and Bruno W J 1998 Evolutionary distances for protein-coding sequences: modeling site-specific residue frequencies; Mol. Biol. Evol. 15 910–917
Hillis D M, Moritz C and Mable B K 1996 Molecular systematics (Sunderland, MA: Sinauer Associates)
Jukes T H and Cantor C R 1969 Evolution of Protein Molecules; in Mamalian protein molecules (ed.) H N Munro (New York: Academic Press) vol. 3, pp 21–132
Kotelnikova E A, Makeev V J and Gelfand M S 2005 Evolution of transcription factor DNA binding sites; Gene 347 255–263
Lenhard B, Sandelin A, Mendoza L, Engström P, Jareborg N and Wasserman W W 2003 Identification of conserved regulatory elements by comparative genome analysis; J. Biol. 2 13
Loots G G and Ovcharenko I 2004 rVISTA 2.0: evolutionary analysis of transcription factor binding sites; Nucleic Acids Res. 32 W217–W221
Ludwig M Z, Palsson A, Alekseeva E, Bergman C M, Nathan J and Kreitman M 2005 Functional evolution of a cis-regulatory module; PLoS Biol. 3 e93
MacIsaac K D and Fraenkel E 2006 Practical strategies for discovering regulatory DNA sequence motifs; PLoS Comput. Biol. 2 e36
Matys V, Fricke E, Geffers R, Gössling E, Haubrock M, Hehl R, Hornischer K, Karas D et al 2003 TRANSFAC: transcriptional regulation, from patterns to profiles; Nucleic Acids Res. 31 374–378
McCue L A, Thompson W, Carmack C S and Lawrence C E 2002 Factors influencing the identification of transcription factor binding sites by cross-species comparison; Genome Res. 12 1523–1532
Moses A, Chiang D, Pollard D, Iyer V and Eisen M 2004a Monkey: identifying conserved transcription-factor binding sites in multiple alignments using a binding site-specific evolutionary model; Genome Biol. 5 R98
Moses A M, Chiang, D Y, Kellis M, Lander E S and Eisen M B 2003 Position specific variation in the rate of evolution in transcription factor binding sites; BMC Evol. Biol. 3 19
Moses A M, Chiang DY and Eisen M B 2004b Phylogenetic motif detection by expectation maximization on evolutionary mixtures; Pac Symp. Biocomput. 324–335
Mustonen V and Lässig 2005 Evolutionary population genetics of promoters: predicting binding sites and functional phylogenies; Proc. Natl. Acad. Sci. USA 102 15936–15941
Pollard D A, Moses A M, Iyer V N and Eisen M B 2006 Detecting the limits of regulatory element conservation and divergence estimation using pairwise and multiple alignments; BMC Bioinformat. 7 376
Rahmann S, Müller T and Vingron M 2003 On the power of profiles for transcription factor binding site detection; Stat. Appl. Genet. Mol. Biol. 2 Article 7
Siddharthan R, Siggia E D and van Nimwegen E 2005 PhyloGibbs: A Gibbs sampling motif finder that incorporates phylogeny; PLoS Comput. Biol. 1 e67
Siggia E D 2005 Computational methods for transcriptional regulation; Curr. Opin. Genet. Dev. 15 214–221
Sinha S, Blanchette M and Tompa M 2004 PhyME: a probabilistic algorithm for finding motifs in sets of orthologous sequences; BMC Bioinformat. 5 170
Smith T F and Waterman M S 1981 Identification of common molecular subsequences; J. Mol. Biol. 147 195–197
Stormo G D 2000 DNA binding sites: representation and discovery; Bioinformatics 16 16–23
Stoye J, Evers D and Meyer F 1998 Rose: generating sequence families; Bioinformatics 14 157–163
Sui S J H, Mortimer J R, Arenillas D J, Brumm J, Walsh C J, Kennedy B P and Wasserman W W 2005 oPOSSUM: identification of over-represented transcription factor binding sites in co-expressed genes; Nucleic Acids Res. 33 3154–3164
Wasserman W W and Sandelin A 2004 Applied bioinformatics for the identification of regulatory elements; Nat. Rev. Genet. 5 276–287
Wittkopp P J 2006 Evolution of cis-regulatory sequence and function in Diptera; Heredity 79 139–147
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bais, A.S., Grossmann, S. & Vingron, M. Incorporating evolution of transcription factor binding sites into annotated alignments. J Biosci 32 (Suppl 1), 841–850 (2007). https://doi.org/10.1007/s12038-007-0084-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12038-007-0084-2