Incorporating evolution of transcription factor binding sites into annotated alignments
- 47 Downloads
- 3 Citations
Abstract
Identifying transcription factor binding sites (TFBSs) is essential to elucidate putative regulatory mechanisms. A common strategy is to combine cross-species conservation with single sequence TFBS annotation to yield “conserved TFBSs”. Most current methods in this field adopt a multi-step approach that segregates the two aspects. Again, it is widely accepted that the evolutionary dynamics of binding sites differ from those of the surrounding sequence. Hence, it is desirable to have an approach that explicitly takes this factor into account. Although a plethora of approaches have been proposed for the prediction of conserved TFBSs, very few explicitly model TFBS evolutionary properties, while additionally being multi-step. Recently, we introduced a novel approach to simultaneously align and annotate conserved TFBSs in a pair of sequences. Building upon the standard Smith-Waterman algorithm for local alignments, SimAnn introduces additional states for profiles to output extended alignments or annotated alignments. That is, alignments with parts annotated as gaplessly aligned TFBSs (pair-profile hits) are generated. Moreover, the pair-profile related parameters are derived in a sound statistical framework.
In this article, we extend this approach to explicitly incorporate evolution of binding sites in the SimAnn framework. We demonstrate the extension in the theoretical derivations through two position-specific evolutionary models, previously used for modelling TFBS evolution. In a simulated setting, we provide a proof of concept that the approach works given the underlying assumptions, as compared to the original work. Finally, using a real dataset of experimentally verified binding sites in human-mouse sequence pairs, we compare the new approach (eSimAnn) to an existing multi-step tool that also considers TFBS evolution.
Although it is widely accepted that binding sites evolve differently from the surrounding sequences, most comparative TFBS identification methods do not explicitly consider this. Additionally, prediction of conserved binding sites is carried out in a multi-step approach that segregates alignment from TFBS annotation. In this paper, we demonstrate how the simultaneous alignment and annotation approach of SimAnn can be further extended to incorporate TFBS evolutionary relationships. We study how alignments and binding site predictions interplay at varying evolutionary distances and for various profile qualities.
Keywords
Alignments evolutionary models transcription factor binding sitesAbbreviations used
- PSSM
Position-specific scoring matrix
- ROC
receiver operator characteristics
- SW
Smith-Waterman
- TFBS
transcription factor binding sites
Preview
Unable to display preview. Download preview PDF.
References
- Bais A S, Grossmann S and Vingron M 2007 Simultaneous alignment and annotation of cis-regulatory regions; Bioinformatics 23 e44–e49PubMedCrossRefGoogle Scholar
- Berg J, Willmann S and Lässig M 2004 Adaptive evolution of transcription factor binding sites; BMC Evol. Biol 4 42PubMedCrossRefGoogle Scholar
- Chiaromonte F, Yap V B and Miller W 2002 Scoring pairwise genomic sequence alignments; Pac. Symp. Biocomput. 115–126Google Scholar
- Dermitzakis E T and Clark A G 2002 Evolution of transcription factor binding sites in mammalian gene regulatory regions: conservation and turnover; Mol. Biol. Evol. 19 1114–1121PubMedGoogle Scholar
- Durbin R, Eddy S, Krogh A and Mitchison G 1998 Biological sequence analysis (Cambridge: Cambridge University Press)Google Scholar
- Felsenstein J 1981 Evolutionary trees from DNA sequences: a maximum likelihood approach; J. Mol. Evol. 17 368–376PubMedCrossRefGoogle Scholar
- Gerland U and Hwa T 2002 On the selection and evolution of regulatory DNA motifs; J. Mol. Evol. 55 386–400PubMedCrossRefGoogle Scholar
- Halpern A L and Bruno W J 1998 Evolutionary distances for protein-coding sequences: modeling site-specific residue frequencies; Mol. Biol. Evol. 15 910–917PubMedGoogle Scholar
- Hillis D M, Moritz C and Mable B K 1996 Molecular systematics (Sunderland, MA: Sinauer Associates)Google Scholar
- Jukes T H and Cantor C R 1969 Evolution of Protein Molecules; in Mamalian protein molecules (ed.) H N Munro (New York: Academic Press) vol. 3, pp 21–132Google Scholar
- Kotelnikova E A, Makeev V J and Gelfand M S 2005 Evolution of transcription factor DNA binding sites; Gene 347 255–263PubMedCrossRefGoogle Scholar
- Lenhard B, Sandelin A, Mendoza L, Engström P, Jareborg N and Wasserman W W 2003 Identification of conserved regulatory elements by comparative genome analysis; J. Biol. 2 13PubMedCrossRefGoogle Scholar
- Loots G G and Ovcharenko I 2004 rVISTA 2.0: evolutionary analysis of transcription factor binding sites; Nucleic Acids Res. 32 W217–W221PubMedCrossRefGoogle Scholar
- Ludwig M Z, Palsson A, Alekseeva E, Bergman C M, Nathan J and Kreitman M 2005 Functional evolution of a cis-regulatory module; PLoS Biol. 3 e93PubMedCrossRefGoogle Scholar
- MacIsaac K D and Fraenkel E 2006 Practical strategies for discovering regulatory DNA sequence motifs; PLoS Comput. Biol. 2 e36PubMedCrossRefGoogle Scholar
- Matys V, Fricke E, Geffers R, Gössling E, Haubrock M, Hehl R, Hornischer K, Karas D et al 2003 TRANSFAC: transcriptional regulation, from patterns to profiles; Nucleic Acids Res. 31 374–378PubMedCrossRefGoogle Scholar
- McCue L A, Thompson W, Carmack C S and Lawrence C E 2002 Factors influencing the identification of transcription factor binding sites by cross-species comparison; Genome Res. 12 1523–1532PubMedCrossRefGoogle Scholar
- Moses A, Chiang D, Pollard D, Iyer V and Eisen M 2004a Monkey: identifying conserved transcription-factor binding sites in multiple alignments using a binding site-specific evolutionary model; Genome Biol. 5 R98PubMedCrossRefGoogle Scholar
- Moses A M, Chiang, D Y, Kellis M, Lander E S and Eisen M B 2003 Position specific variation in the rate of evolution in transcription factor binding sites; BMC Evol. Biol. 3 19PubMedCrossRefGoogle Scholar
- Moses A M, Chiang DY and Eisen M B 2004b Phylogenetic motif detection by expectation maximization on evolutionary mixtures; Pac Symp. Biocomput. 324–335Google Scholar
- Mustonen V and Lässig 2005 Evolutionary population genetics of promoters: predicting binding sites and functional phylogenies; Proc. Natl. Acad. Sci. USA 102 15936–15941PubMedCrossRefGoogle Scholar
- Pollard D A, Moses A M, Iyer V N and Eisen M B 2006 Detecting the limits of regulatory element conservation and divergence estimation using pairwise and multiple alignments; BMC Bioinformat. 7 376CrossRefGoogle Scholar
- Rahmann S, Müller T and Vingron M 2003 On the power of profiles for transcription factor binding site detection; Stat. Appl. Genet. Mol. Biol. 2 Article 7Google Scholar
- Siddharthan R, Siggia E D and van Nimwegen E 2005 PhyloGibbs: A Gibbs sampling motif finder that incorporates phylogeny; PLoS Comput. Biol. 1 e67PubMedCrossRefGoogle Scholar
- Siggia E D 2005 Computational methods for transcriptional regulation; Curr. Opin. Genet. Dev. 15 214–221PubMedCrossRefGoogle Scholar
- Sinha S, Blanchette M and Tompa M 2004 PhyME: a probabilistic algorithm for finding motifs in sets of orthologous sequences; BMC Bioinformat. 5 170CrossRefGoogle Scholar
- Smith T F and Waterman M S 1981 Identification of common molecular subsequences; J. Mol. Biol. 147 195–197PubMedCrossRefGoogle Scholar
- Stormo G D 2000 DNA binding sites: representation and discovery; Bioinformatics 16 16–23PubMedCrossRefGoogle Scholar
- Stoye J, Evers D and Meyer F 1998 Rose: generating sequence families; Bioinformatics 14 157–163PubMedCrossRefGoogle Scholar
- Sui S J H, Mortimer J R, Arenillas D J, Brumm J, Walsh C J, Kennedy B P and Wasserman W W 2005 oPOSSUM: identification of over-represented transcription factor binding sites in co-expressed genes; Nucleic Acids Res. 33 3154–3164CrossRefGoogle Scholar
- Wasserman W W and Sandelin A 2004 Applied bioinformatics for the identification of regulatory elements; Nat. Rev. Genet. 5 276–287PubMedCrossRefGoogle Scholar
- Wittkopp P J 2006 Evolution of cis-regulatory sequence and function in Diptera; Heredity 79 139–147CrossRefGoogle Scholar