Skip to main content

Advertisement

Log in

Incorporating evolution of transcription factor binding sites into annotated alignments

  • Published:
Journal of Biosciences Aims and scope Submit manuscript

Abstract

Identifying transcription factor binding sites (TFBSs) is essential to elucidate putative regulatory mechanisms. A common strategy is to combine cross-species conservation with single sequence TFBS annotation to yield “conserved TFBSs”. Most current methods in this field adopt a multi-step approach that segregates the two aspects. Again, it is widely accepted that the evolutionary dynamics of binding sites differ from those of the surrounding sequence. Hence, it is desirable to have an approach that explicitly takes this factor into account. Although a plethora of approaches have been proposed for the prediction of conserved TFBSs, very few explicitly model TFBS evolutionary properties, while additionally being multi-step. Recently, we introduced a novel approach to simultaneously align and annotate conserved TFBSs in a pair of sequences. Building upon the standard Smith-Waterman algorithm for local alignments, SimAnn introduces additional states for profiles to output extended alignments or annotated alignments. That is, alignments with parts annotated as gaplessly aligned TFBSs (pair-profile hits) are generated. Moreover, the pair-profile related parameters are derived in a sound statistical framework.

In this article, we extend this approach to explicitly incorporate evolution of binding sites in the SimAnn framework. We demonstrate the extension in the theoretical derivations through two position-specific evolutionary models, previously used for modelling TFBS evolution. In a simulated setting, we provide a proof of concept that the approach works given the underlying assumptions, as compared to the original work. Finally, using a real dataset of experimentally verified binding sites in human-mouse sequence pairs, we compare the new approach (eSimAnn) to an existing multi-step tool that also considers TFBS evolution.

Although it is widely accepted that binding sites evolve differently from the surrounding sequences, most comparative TFBS identification methods do not explicitly consider this. Additionally, prediction of conserved binding sites is carried out in a multi-step approach that segregates alignment from TFBS annotation. In this paper, we demonstrate how the simultaneous alignment and annotation approach of SimAnn can be further extended to incorporate TFBS evolutionary relationships. We study how alignments and binding site predictions interplay at varying evolutionary distances and for various profile qualities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Abbreviations

PSSM:

Position-specific scoring matrix

ROC:

receiver operator characteristics

SW:

Smith-Waterman

TFBS:

transcription factor binding sites

References

  • Bais A S, Grossmann S and Vingron M 2007 Simultaneous alignment and annotation of cis-regulatory regions; Bioinformatics 23 e44–e49

    Article  PubMed  CAS  Google Scholar 

  • Berg J, Willmann S and Lässig M 2004 Adaptive evolution of transcription factor binding sites; BMC Evol. Biol 4 42

    Article  PubMed  Google Scholar 

  • Chiaromonte F, Yap V B and Miller W 2002 Scoring pairwise genomic sequence alignments; Pac. Symp. Biocomput. 115–126

  • Dermitzakis E T and Clark A G 2002 Evolution of transcription factor binding sites in mammalian gene regulatory regions: conservation and turnover; Mol. Biol. Evol. 19 1114–1121

    PubMed  CAS  Google Scholar 

  • Durbin R, Eddy S, Krogh A and Mitchison G 1998 Biological sequence analysis (Cambridge: Cambridge University Press)

    Google Scholar 

  • Felsenstein J 1981 Evolutionary trees from DNA sequences: a maximum likelihood approach; J. Mol. Evol. 17 368–376

    Article  PubMed  CAS  Google Scholar 

  • Gerland U and Hwa T 2002 On the selection and evolution of regulatory DNA motifs; J. Mol. Evol. 55 386–400

    Article  PubMed  CAS  Google Scholar 

  • Halpern A L and Bruno W J 1998 Evolutionary distances for protein-coding sequences: modeling site-specific residue frequencies; Mol. Biol. Evol. 15 910–917

    PubMed  CAS  Google Scholar 

  • Hillis D M, Moritz C and Mable B K 1996 Molecular systematics (Sunderland, MA: Sinauer Associates)

    Google Scholar 

  • Jukes T H and Cantor C R 1969 Evolution of Protein Molecules; in Mamalian protein molecules (ed.) H N Munro (New York: Academic Press) vol. 3, pp 21–132

    Google Scholar 

  • Kotelnikova E A, Makeev V J and Gelfand M S 2005 Evolution of transcription factor DNA binding sites; Gene 347 255–263

    Article  PubMed  CAS  Google Scholar 

  • Lenhard B, Sandelin A, Mendoza L, Engström P, Jareborg N and Wasserman W W 2003 Identification of conserved regulatory elements by comparative genome analysis; J. Biol. 2 13

    Article  PubMed  Google Scholar 

  • Loots G G and Ovcharenko I 2004 rVISTA 2.0: evolutionary analysis of transcription factor binding sites; Nucleic Acids Res. 32 W217–W221

    Article  PubMed  CAS  Google Scholar 

  • Ludwig M Z, Palsson A, Alekseeva E, Bergman C M, Nathan J and Kreitman M 2005 Functional evolution of a cis-regulatory module; PLoS Biol. 3 e93

    Article  PubMed  Google Scholar 

  • MacIsaac K D and Fraenkel E 2006 Practical strategies for discovering regulatory DNA sequence motifs; PLoS Comput. Biol. 2 e36

    Article  PubMed  Google Scholar 

  • Matys V, Fricke E, Geffers R, Gössling E, Haubrock M, Hehl R, Hornischer K, Karas D et al 2003 TRANSFAC: transcriptional regulation, from patterns to profiles; Nucleic Acids Res. 31 374–378

    Article  PubMed  CAS  Google Scholar 

  • McCue L A, Thompson W, Carmack C S and Lawrence C E 2002 Factors influencing the identification of transcription factor binding sites by cross-species comparison; Genome Res. 12 1523–1532

    Article  PubMed  CAS  Google Scholar 

  • Moses A, Chiang D, Pollard D, Iyer V and Eisen M 2004a Monkey: identifying conserved transcription-factor binding sites in multiple alignments using a binding site-specific evolutionary model; Genome Biol. 5 R98

    Article  PubMed  Google Scholar 

  • Moses A M, Chiang, D Y, Kellis M, Lander E S and Eisen M B 2003 Position specific variation in the rate of evolution in transcription factor binding sites; BMC Evol. Biol. 3 19

    Article  PubMed  Google Scholar 

  • Moses A M, Chiang DY and Eisen M B 2004b Phylogenetic motif detection by expectation maximization on evolutionary mixtures; Pac Symp. Biocomput. 324–335

  • Mustonen V and Lässig 2005 Evolutionary population genetics of promoters: predicting binding sites and functional phylogenies; Proc. Natl. Acad. Sci. USA 102 15936–15941

    Article  PubMed  CAS  Google Scholar 

  • Pollard D A, Moses A M, Iyer V N and Eisen M B 2006 Detecting the limits of regulatory element conservation and divergence estimation using pairwise and multiple alignments; BMC Bioinformat. 7 376

    Article  Google Scholar 

  • Rahmann S, Müller T and Vingron M 2003 On the power of profiles for transcription factor binding site detection; Stat. Appl. Genet. Mol. Biol. 2 Article 7

    Google Scholar 

  • Siddharthan R, Siggia E D and van Nimwegen E 2005 PhyloGibbs: A Gibbs sampling motif finder that incorporates phylogeny; PLoS Comput. Biol. 1 e67

    Article  PubMed  Google Scholar 

  • Siggia E D 2005 Computational methods for transcriptional regulation; Curr. Opin. Genet. Dev. 15 214–221

    Article  PubMed  CAS  Google Scholar 

  • Sinha S, Blanchette M and Tompa M 2004 PhyME: a probabilistic algorithm for finding motifs in sets of orthologous sequences; BMC Bioinformat. 5 170

    Article  Google Scholar 

  • Smith T F and Waterman M S 1981 Identification of common molecular subsequences; J. Mol. Biol. 147 195–197

    Article  PubMed  CAS  Google Scholar 

  • Stormo G D 2000 DNA binding sites: representation and discovery; Bioinformatics 16 16–23

    Article  PubMed  CAS  Google Scholar 

  • Stoye J, Evers D and Meyer F 1998 Rose: generating sequence families; Bioinformatics 14 157–163

    Article  PubMed  CAS  Google Scholar 

  • Sui S J H, Mortimer J R, Arenillas D J, Brumm J, Walsh C J, Kennedy B P and Wasserman W W 2005 oPOSSUM: identification of over-represented transcription factor binding sites in co-expressed genes; Nucleic Acids Res. 33 3154–3164

    Article  CAS  Google Scholar 

  • Wasserman W W and Sandelin A 2004 Applied bioinformatics for the identification of regulatory elements; Nat. Rev. Genet. 5 276–287

    Article  PubMed  CAS  Google Scholar 

  • Wittkopp P J 2006 Evolution of cis-regulatory sequence and function in Diptera; Heredity 79 139–147

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abha S. Bais.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bais, A.S., Grossmann, S. & Vingron, M. Incorporating evolution of transcription factor binding sites into annotated alignments. J Biosci 32 (Suppl 1), 841–850 (2007). https://doi.org/10.1007/s12038-007-0084-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12038-007-0084-2

Keywords

Navigation