Fast Algorithms for Inferring Gene-Species Associations

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9096)


Assessment of microbial biodiversity is typically made by sequencing either PCR-amplified marker genes or all genomic DNA from environmental samples. Both approaches rely on the similarity of the sequenced material to known entries in sequence databases. However, amplicons of non-marker genes are often used, when the research question aims at assessing both functional capabilities of a microbial community and its biodiversity. In such cases, a phylogenetic tree is constructed with known and metagenomic sequences, and expert assessment defines the taxonomic groups the amplicons belong to. Here, instead of relying on sequences, often missing, of non-marker genes, we use tree reconciliation to obtain a distribution of mappings between genes and species. We describe efficient algorithms for the reconstruction of gene-species mappings and a Monte-Carlo method for the inference of distributions for the cases when the number of optimal reconstructions is large. We provide a comparative study of different cost functions showing that the duplication-loss cost induces mappings of the highest quality. Further, we demonstrate the correctness of our approach using several datasets.


Species Tree Gene Tree Horizontal Gene Transfer Input Tree mcrA Gene 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25(17), 3389–3402 (1997)CrossRefGoogle Scholar
  2. 2.
    Arvestad, L., Lagergren, J., Sennblad, B.: The gene evolution model and computing its associated probabilities. Journal of ACM 56(2) (2009)Google Scholar
  3. 3.
    Bafna, V., Hannenhalli, S., Rice, K., Vawter, L.: Ligand-Receptor pairing via tree comparison. Journal of Computational Biology 7, 59–70 (2000)CrossRefGoogle Scholar
  4. 4.
    Berglund-Sonnhammer, A.-C., Steffansson, P., Betts, M.J., Liberles, D.A.: Optimal gene trees from sequences and species trees using a soft interpretation of parsimony. Journal of Molecular Evolution 63(2), 240–250 (2006)CrossRefGoogle Scholar
  5. 5.
    Bonizzoni, P., Vedova, G.D., Dondi, R.: Reconciling a gene tree to a species tree under the duplication cost model. Theoretical Computer Science 347(1-2), 36–53 (2005), doi:10.1016/j.tcs.2005.05.016CrossRefzbMATHMathSciNetGoogle Scholar
  6. 6.
    Dinsdale, E.A., et al.: Functional metagenomic profiling of nine biomes. Nature 452(7187), 629–632 (2008)CrossRefGoogle Scholar
  7. 7.
    Doyon, J.-P., Chauve, C., Hamel, S.: Space of gene/species tree reconciliations and parsimonious models. Journal of Computational Biology 16 (2009)Google Scholar
  8. 8.
    Durand, D., Halldórsson, B.V., Vernot, B.: A hybrid micro-macroevolutionary approach to gene tree reconstruction. Journal of Computational Biology 13(2), 320–335 (2006)CrossRefMathSciNetGoogle Scholar
  9. 9.
    Goodman, M., Czelusniak, J., Moore, G.W., Romero-Herrera, A.E., Matsuda, G.: Fitting the gene lineage into its species lineage, a parsimony strategy illustrated by cladograms constructed from globin sequences. Systematic Zoology 28(2), 132–163 (1979)CrossRefGoogle Scholar
  10. 10.
    Górecki, P., Eulenstein, O., Tiuryn, J.: Unrooted tree reconciliation: A unified approach. IEEE/ACM Transactions on Computational Biology and Bioinformatics 10(2), 522–536 (2013)CrossRefGoogle Scholar
  11. 11.
    Górecki, P., Tiuryn, J.: DLS-trees: A model of evolutionary scenarios. Theoretical Computer Science 359(1-3), 378–399 (2006)CrossRefzbMATHMathSciNetGoogle Scholar
  12. 12.
    Hallett, M.T., Lagergren, J.: Efficient algorithms for lateral gene transfer problems. In: RECOMB, pp. 149–156 (2001)Google Scholar
  13. 13.
    Harding, E.F.: The probabilities of rooted tree-shapes generated by random bifurcation. Advances in Applied Probability 3(1), 44–77 (1971)CrossRefzbMATHMathSciNetGoogle Scholar
  14. 14.
    Huson, D.H., Auch, A.F., Qi, J., Schuster, S.C.: MEGAN analysis of metagenomic data. Genome Research 17(3), 377–386 (2007)CrossRefGoogle Scholar
  15. 15.
    Lafond, M., Swenson, K.M., El-Mabrouk, N.: An optimal reconciliation algorithm for gene trees with polytomies. In: Raphael, B., Tang, J. (eds.) WABI 2012. LNCS, vol. 7534, pp. 106–122. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  16. 16.
    Luton, P.E., Wayne, J.M., Sharp, R.J., Riley, P.W.: The mcrA gene as an alternative to 16S rRNA in the phylogenetic analysis of methanogen populations in landfill. Microbiology 148(11), 3521–3530 (2002)Google Scholar
  17. 17.
    Ma, B., Li, M., Zhang, L.: From gene trees to species trees. SIAM Journal on Computing 30(3), 729–752 (2000)CrossRefzbMATHMathSciNetGoogle Scholar
  18. 18.
    Maddison, W.P.: Gene trees in species trees. Systematic Biology 46, 523–536 (1997)CrossRefGoogle Scholar
  19. 19.
    Matsen, F.A., Kodner, R.B., Armbrust, E.V.: pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics 11(1), 538 (2010)CrossRefGoogle Scholar
  20. 20.
    O’Meara, B.C.: New heuristic methods for joint species delimitation and species tree inference. Systematic Biology 59, 59–73 (2010)CrossRefGoogle Scholar
  21. 21.
    Page, R.D.M.: Maps between trees and cladistic analysis of historical associations among genes, organisms, and areas. Syst. Biol. 43(1), 58–77 (1994)Google Scholar
  22. 22.
    Page, R.D.M., Charleston, M.A.: From gene to organismal phylogeny: reconciled trees and the gene tree/species tree problem. Molecular Phylogenetics and Evolution 7, 231–240 (1997)CrossRefGoogle Scholar
  23. 23.
    Puigbo, P., Wolf, Y.I., Koonin, E.V.: The tree and net components of prokaryote evolution. Genome Biology and Evolution 2, 745–756 (2010)CrossRefGoogle Scholar
  24. 24.
    Quast, C., Pruesse, E., Yilmaz, P., Gerken, J., Schweer, T., Yarza, P., Peplies, J., Glöckner, F.O.: The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Research 41(D1), D590–D596 (2013)Google Scholar
  25. 25.
    Sjöstrand, J., Tofigh, A., Daubin, V., Arvestad, L., Sennblad, B., Lagergren, J.: A Bayesian method for analyzing lateral gene transfer. Systematic Biology (2014)Google Scholar
  26. 26.
    Stark, M., Berger, S.A., Stamatakis, A., von Mering, C.: MLTreeMap - accurate maximum likelihood placement of environmental DNA sequences into taxonomic and functional reference phylogenies. BMC Genomics 11(1), 461 (2010)CrossRefGoogle Scholar
  27. 27.
    Stolzer, M., Lai, H., Xu, M., Sathaye, D., Vernot, B., Durand, D.: Inferring duplications, losses, transfers and incomplete lineage sorting with nonbinary species trees. Bioinformatics 28(18), i409–i415 (2012)Google Scholar
  28. 28.
    Thompson, C.C., Thompson, F.L., Vandemeulebroecke, K., Hoste, B., Dawyndt, P., Swings, J.: Use of recA as an alternative phylogenetic marker in the family vibrionaceae. International Journal of Systematic and Evolutionary Microbiology 54(3), 919–924 (2004)CrossRefGoogle Scholar
  29. 29.
    Vernot, B., Stolzer, M., Goldman, A., Durand, D.: Reconciliation with non-binary species trees. Journal of Computational Biology 15(8), 981–1006 (2008)CrossRefMathSciNetGoogle Scholar
  30. 30.
    Zhang, L.: From gene trees to species trees II: Species tree inference by minimizing deep coalescence events. IEEE/ACM Transactions on Computational Biology and Bioinformatics 8, 1685–1691 (2011)CrossRefGoogle Scholar
  31. 31.
    Zhang, L., Cui, Y.: An efficient method for DNA-based species assignment via gene tree and species tree reconciliation. In: Moulton, V., Singh, M. (eds.) WABI 2010. LNCS, vol. 6293, pp. 300–311. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  32. 32.
    Zheng, Y., Zhang, L.: Reconciliation with non-binary gene trees revisited. In: Sharan, R. (ed.) RECOMB 2014. LNCS, vol. 8394, pp. 418–432. Springer, Heidelberg (2014)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Faculty of Mathematics, Informatics and MechanicsUniversity of WarsawWarsawPoland
  2. 2.Institute of Biochemistry and BiophysicsPolish Academy of SciencesWarsawPoland

Personalised recommendations