Journal of Biosciences

, Volume 32, Supplement 1, pp 841–850 | Cite as

Incorporating evolution of transcription factor binding sites into annotated alignments

Article

Abstract

Identifying transcription factor binding sites (TFBSs) is essential to elucidate putative regulatory mechanisms. A common strategy is to combine cross-species conservation with single sequence TFBS annotation to yield “conserved TFBSs”. Most current methods in this field adopt a multi-step approach that segregates the two aspects. Again, it is widely accepted that the evolutionary dynamics of binding sites differ from those of the surrounding sequence. Hence, it is desirable to have an approach that explicitly takes this factor into account. Although a plethora of approaches have been proposed for the prediction of conserved TFBSs, very few explicitly model TFBS evolutionary properties, while additionally being multi-step. Recently, we introduced a novel approach to simultaneously align and annotate conserved TFBSs in a pair of sequences. Building upon the standard Smith-Waterman algorithm for local alignments, SimAnn introduces additional states for profiles to output extended alignments or annotated alignments. That is, alignments with parts annotated as gaplessly aligned TFBSs (pair-profile hits) are generated. Moreover, the pair-profile related parameters are derived in a sound statistical framework.

In this article, we extend this approach to explicitly incorporate evolution of binding sites in the SimAnn framework. We demonstrate the extension in the theoretical derivations through two position-specific evolutionary models, previously used for modelling TFBS evolution. In a simulated setting, we provide a proof of concept that the approach works given the underlying assumptions, as compared to the original work. Finally, using a real dataset of experimentally verified binding sites in human-mouse sequence pairs, we compare the new approach (eSimAnn) to an existing multi-step tool that also considers TFBS evolution.

Although it is widely accepted that binding sites evolve differently from the surrounding sequences, most comparative TFBS identification methods do not explicitly consider this. Additionally, prediction of conserved binding sites is carried out in a multi-step approach that segregates alignment from TFBS annotation. In this paper, we demonstrate how the simultaneous alignment and annotation approach of SimAnn can be further extended to incorporate TFBS evolutionary relationships. We study how alignments and binding site predictions interplay at varying evolutionary distances and for various profile qualities.

Keywords

Alignments evolutionary models transcription factor binding sites 

Abbreviations used

PSSM

Position-specific scoring matrix

ROC

receiver operator characteristics

SW

Smith-Waterman

TFBS

transcription factor binding sites

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bais A S, Grossmann S and Vingron M 2007 Simultaneous alignment and annotation of cis-regulatory regions; Bioinformatics 23 e44–e49PubMedCrossRefGoogle Scholar
  2. Berg J, Willmann S and Lässig M 2004 Adaptive evolution of transcription factor binding sites; BMC Evol. Biol 4 42PubMedCrossRefGoogle Scholar
  3. Chiaromonte F, Yap V B and Miller W 2002 Scoring pairwise genomic sequence alignments; Pac. Symp. Biocomput. 115–126Google Scholar
  4. Dermitzakis E T and Clark A G 2002 Evolution of transcription factor binding sites in mammalian gene regulatory regions: conservation and turnover; Mol. Biol. Evol. 19 1114–1121PubMedGoogle Scholar
  5. Durbin R, Eddy S, Krogh A and Mitchison G 1998 Biological sequence analysis (Cambridge: Cambridge University Press)Google Scholar
  6. Felsenstein J 1981 Evolutionary trees from DNA sequences: a maximum likelihood approach; J. Mol. Evol. 17 368–376PubMedCrossRefGoogle Scholar
  7. Gerland U and Hwa T 2002 On the selection and evolution of regulatory DNA motifs; J. Mol. Evol. 55 386–400PubMedCrossRefGoogle Scholar
  8. Halpern A L and Bruno W J 1998 Evolutionary distances for protein-coding sequences: modeling site-specific residue frequencies; Mol. Biol. Evol. 15 910–917PubMedGoogle Scholar
  9. Hillis D M, Moritz C and Mable B K 1996 Molecular systematics (Sunderland, MA: Sinauer Associates)Google Scholar
  10. Jukes T H and Cantor C R 1969 Evolution of Protein Molecules; in Mamalian protein molecules (ed.) H N Munro (New York: Academic Press) vol. 3, pp 21–132Google Scholar
  11. Kotelnikova E A, Makeev V J and Gelfand M S 2005 Evolution of transcription factor DNA binding sites; Gene 347 255–263PubMedCrossRefGoogle Scholar
  12. Lenhard B, Sandelin A, Mendoza L, Engström P, Jareborg N and Wasserman W W 2003 Identification of conserved regulatory elements by comparative genome analysis; J. Biol. 2 13PubMedCrossRefGoogle Scholar
  13. Loots G G and Ovcharenko I 2004 rVISTA 2.0: evolutionary analysis of transcription factor binding sites; Nucleic Acids Res. 32 W217–W221PubMedCrossRefGoogle Scholar
  14. Ludwig M Z, Palsson A, Alekseeva E, Bergman C M, Nathan J and Kreitman M 2005 Functional evolution of a cis-regulatory module; PLoS Biol. 3 e93PubMedCrossRefGoogle Scholar
  15. MacIsaac K D and Fraenkel E 2006 Practical strategies for discovering regulatory DNA sequence motifs; PLoS Comput. Biol. 2 e36PubMedCrossRefGoogle Scholar
  16. Matys V, Fricke E, Geffers R, Gössling E, Haubrock M, Hehl R, Hornischer K, Karas D et al 2003 TRANSFAC: transcriptional regulation, from patterns to profiles; Nucleic Acids Res. 31 374–378PubMedCrossRefGoogle Scholar
  17. McCue L A, Thompson W, Carmack C S and Lawrence C E 2002 Factors influencing the identification of transcription factor binding sites by cross-species comparison; Genome Res. 12 1523–1532PubMedCrossRefGoogle Scholar
  18. Moses A, Chiang D, Pollard D, Iyer V and Eisen M 2004a Monkey: identifying conserved transcription-factor binding sites in multiple alignments using a binding site-specific evolutionary model; Genome Biol. 5 R98PubMedCrossRefGoogle Scholar
  19. Moses A M, Chiang, D Y, Kellis M, Lander E S and Eisen M B 2003 Position specific variation in the rate of evolution in transcription factor binding sites; BMC Evol. Biol. 3 19PubMedCrossRefGoogle Scholar
  20. Moses A M, Chiang DY and Eisen M B 2004b Phylogenetic motif detection by expectation maximization on evolutionary mixtures; Pac Symp. Biocomput. 324–335Google Scholar
  21. Mustonen V and Lässig 2005 Evolutionary population genetics of promoters: predicting binding sites and functional phylogenies; Proc. Natl. Acad. Sci. USA 102 15936–15941PubMedCrossRefGoogle Scholar
  22. Pollard D A, Moses A M, Iyer V N and Eisen M B 2006 Detecting the limits of regulatory element conservation and divergence estimation using pairwise and multiple alignments; BMC Bioinformat. 7 376CrossRefGoogle Scholar
  23. Rahmann S, Müller T and Vingron M 2003 On the power of profiles for transcription factor binding site detection; Stat. Appl. Genet. Mol. Biol. 2 Article 7Google Scholar
  24. Siddharthan R, Siggia E D and van Nimwegen E 2005 PhyloGibbs: A Gibbs sampling motif finder that incorporates phylogeny; PLoS Comput. Biol. 1 e67PubMedCrossRefGoogle Scholar
  25. Siggia E D 2005 Computational methods for transcriptional regulation; Curr. Opin. Genet. Dev. 15 214–221PubMedCrossRefGoogle Scholar
  26. Sinha S, Blanchette M and Tompa M 2004 PhyME: a probabilistic algorithm for finding motifs in sets of orthologous sequences; BMC Bioinformat. 5 170CrossRefGoogle Scholar
  27. Smith T F and Waterman M S 1981 Identification of common molecular subsequences; J. Mol. Biol. 147 195–197PubMedCrossRefGoogle Scholar
  28. Stormo G D 2000 DNA binding sites: representation and discovery; Bioinformatics 16 16–23PubMedCrossRefGoogle Scholar
  29. Stoye J, Evers D and Meyer F 1998 Rose: generating sequence families; Bioinformatics 14 157–163PubMedCrossRefGoogle Scholar
  30. Sui S J H, Mortimer J R, Arenillas D J, Brumm J, Walsh C J, Kennedy B P and Wasserman W W 2005 oPOSSUM: identification of over-represented transcription factor binding sites in co-expressed genes; Nucleic Acids Res. 33 3154–3164CrossRefGoogle Scholar
  31. Wasserman W W and Sandelin A 2004 Applied bioinformatics for the identification of regulatory elements; Nat. Rev. Genet. 5 276–287PubMedCrossRefGoogle Scholar
  32. Wittkopp P J 2006 Evolution of cis-regulatory sequence and function in Diptera; Heredity 79 139–147CrossRefGoogle Scholar

Copyright information

© Indian Academy of Sciences 2007

Authors and Affiliations

  • Abha S. Bais
    • 1
  • Steffen Grossmann
    • 1
  • Martin Vingron
    • 1
  1. 1.Max Planck Institute for Molecular GeneticsBerlinGermany

Personalised recommendations