Motif Yggdrasil: Sampling from a Tree Mixture Model

  • Samuel A. Andersson
  • Jens Lagergren
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3909)


In phylogenetic foot-printing, putative regulatory elements are found in upstream regions of orthologous genes by searching for common motifs. Motifs in different upstream sequences are subject to mutations along the edges of the corresponding phylogenetic tree, consequently taking advantage of the tree in the motif search is an appealing idea. We describe the Motif Yggdrasil sampler; the first Gibbs sampler based on a general tree that uses unaligned sequences. Previous tree-based Gibbs samplers have assumed a star-shaped tree or partially aligned upstream regions. We give a probabilistic model describing upstream sequences with regulatory elements and build a Gibbs sampler with respect to this model. We apply the collapsing technique to eliminate the need to sample nuisance parameters, and give a derivation of the predictive update formula. The use of the tree achieves a substantial increase in nucleotide level correlation coefficient both for synthetic data and 37 bacterial lexA genes.


Gibbs Sampler Upstream Sequence Dirichlet Distribution Phylogenetic Footprinting Motif Candidate 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Lawrence, C.E., Altschul, S.F., Boguski, M.S., Liu, J.S., Neuwald, A.F., Wootton, J.C.: Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262, 208–214 (1993)CrossRefGoogle Scholar
  2. 2.
    Liu, J., Neuwald, A., Lawrence, C.: Bayesian models for multiple local sequence alignment and Gibbs sampling strategies. Journal of the American Statistical Association 90, 1156–1170 (1995)MATHCrossRefGoogle Scholar
  3. 3.
    Eskin, E., Pevzner, P.: Finding composite regulatory patterns in DNA sequences. Bioinformatics 18(Suppl. 1), S354–363 (2002)Google Scholar
  4. 4.
    Eskin, E.: From profiles to patterns and back again: a branch and bound algorithm for finding near optimal motif profiles. In: Proceedings of the eigth International Conference on Computational Molecular Biology (RECOMB 2004), pp. 115–124. ACM Press, New York (2004)Google Scholar
  5. 5.
    Keich, U., Pevzner, P.: Subtle motifs: defining the limits of motif finding algorithms. Bioinformatics 18, 1382–1390 (2002)CrossRefGoogle Scholar
  6. 6.
    Buhler, J., Tompa, M.: Finding motifs using random projections. J. Comput. Biol. 9, 225–242 (2002)CrossRefGoogle Scholar
  7. 7.
    Blanchette, M., Schwikowski, B., Tompa, M.: An exact algorithm to identify motifs in orthologous sequences from multiple species. Proc. Int. Conf. Intell. Syst. Mol. Biol. 8, 37–45 (2000)Google Scholar
  8. 8.
    Blanchette, M.: Algorithms for phylogenetic footprinting. In: Proceedings of the Fifth International Conference on Computational Molecular Biology (RECOMB 2001), pp. 49–58. ACM Press, New York (2001)Google Scholar
  9. 9.
    Moses, A., Chiang, D., Eisen, M.: Phylogenetic motif detection by expectation-maximization on evolutionary mixtures, 324–335 (2004)Google Scholar
  10. 10.
    Siddhartan, R., van Nimwegen, E., Siggia, E.D.: PhyloGibbs: A Gibbs sampler incorporating phylogenetic information. In: Eskin, E., Workman, C. (eds.) RECOMB 2004 Satellite Workshop on Regulatory Genomics, pp. 30–41 (2005)Google Scholar
  11. 11.
    Li, X., Wong, W.: Sampling motifs on phylogenetic trees. Proc. Natl. Acad. Sci. USA 102, 9481–9486 (2005)MATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    Wray, G.A., Hahn, M.W., Abouheif, E., Balhoff, J.P., Pizer, M., Rockman, M.V., Romano, L.A.: The evolution of transcriptional regulation in eukaryotes. Mol. Biol. Evol. 20, 1377–1419 (2003)CrossRefGoogle Scholar
  13. 13.
    Moses, A.M., Chiang, D.Y., Kellis, M., Lander, E.S., Eisen, M.B.: Position specific variation in the rate of evolution in transcription factor binding sites. BMC Evol. Biol. 3, 19 (2003)CrossRefGoogle Scholar
  14. 14.
    Liu, X., Brutlag, D., Liu, J.: BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes, 127–138 (2001)Google Scholar
  15. 15.
    Liu, J.: The collapsed Gibbs sampler with applications to a gene regulation problem. Journal of the American Statistical Association 89 (1994)Google Scholar
  16. 16.
    Jensen, S.T., Liu, J.S.: Biooptimizer: a bayesian scoring function approach to motif discovery. Bioinformatics 20, 1557–1564 (2004)CrossRefGoogle Scholar
  17. 17.
    Vavouri, T., Elgar, G.: Prediction of cis-regulatory elements using binding site matrices–the successes, the failures and the reasons for both. Curr. Opin. Genet. Dev. 15, 395–402 (2005)CrossRefGoogle Scholar
  18. 18.
    Tompa, M., Li, N., Bailey, T., Church, G., De Moor, B., Eskin, E., Favorov, A., Frith, M., Fu, Y., Kent, W., Makeev, V., Mironov, A., Noble, W., Pavesi, G., Pesole, G., Regnier, M., Simonis, N., Sinha, S., Thijs, G., van Helden, J., Vandenbogaert, M., Weng, Z., Workman, C., Ye, C., Zhu, Z.: Assessing computational tools for the discovery of transcription factor binding sites. Nat. Biotechnol. 23, 137–144 (2005)CrossRefGoogle Scholar
  19. 19.
    Gilks, W.R., Richardson, S., Spiegelhalter, D.J.: Markov Chain Monte Carlo in Practice. Chapman and Hall, Boca Raton (1996)MATHGoogle Scholar
  20. 20.
    Liu, J.S.: Monte Carlo strategies in Scientific Computing. Springer, New York (2003)Google Scholar
  21. 21.
    Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence Analysis: Probablistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge (1998)CrossRefGoogle Scholar
  22. 22.
    Rambaut, A., Grassly, N.: Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Comput. Appl. Biosci. 13, 235–238 (1997)Google Scholar
  23. 23.
    Mazon, G., Erill, I., Campoy, S., Cortes, P., Forano, E., Barbe, J.: Reconstruction of the evolutionary history of the LexA-binding sequence. Microbiology 150, 3783–3795 (2004)CrossRefGoogle Scholar
  24. 24.
    Wingender, E., Chen, X., Fricke, E., Geffers, R., Hehl, R., Liebich, I., Krull, M., Matys, V., Michael, H., Ohnhauser, R., Pruss, M., Schacherer, F., Thiele, S., Urbach, S.: The TRANSFAC system on gene expression regulation. Nucleic Acids Res. 29, 281–283 (2001)CrossRefGoogle Scholar
  25. 25.
    Erill, I., Jara, M., Salvador, N., Escribano, M., Campoy, S., Barbe, J.: Differences in LexA regulon structure among Proteobacteria through in vivo assisted comparative genomics. Nucleic Acids Res. 32, 6617–6626 (2004)CrossRefGoogle Scholar
  26. 26.
    Guindon, S., Gascuel, O.: A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52, 696–704 (2003)CrossRefGoogle Scholar
  27. 27.
    Hannenhalli, S., Wang, L.: Enhanced position weight matrices using mixture models. Bioinformatics 21(Suppl. 1), i204–i212 (2005)Google Scholar
  28. 28.
    Huson, D.: Splitstree: analyzing and visualizing evolutionary data. Bioinformatics 14, 68–73 (1998)CrossRefGoogle Scholar
  29. 29.
    Bryant, D., Moulton, V.: Neighbor-net: an agglomerative method for the construction of phylogenetic networks. Mol. Biol. Evol. 21, 255–265 (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Samuel A. Andersson
    • 1
  • Jens Lagergren
    • 1
  1. 1.Stockholm Bioinformatics Center and School of Computer Science and CommunicationRoyal Institute of TechnologyStockholmSweden

Personalised recommendations