Fast Heuristics for Resolving Weakly Supported Branches Using Duplication, Transfers, and Losses

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10562)

Abstract

Weak branch supports in a gene tree suggest that the signal in sequence data is insufficient to resolve a particular branching order. One approach to reduce uncertainty takes the topology of the species tree into account. Under a maximum parsimony model, the best resolution of the weak branches is the binary tree that minimizes the cost of duplications, transfers, and losses. However, this problem is NP-hard, and the exact algorithm is limited to small, weakly supported areas.

We present an exact algorithm and several heuristic methods to resolve weak or non-binary gene trees given an undated species tree. These methods generate a set of optimal, binary resolutions that are temporally feasible, as well as event histories corresponding to each binary resolution. We compared the accuracy and runtime of these methods on simulated and biological datasets. The best of these heuristics provide close approximation to the event cost of the exact method and are much faster in practice. Surprisingly, a heuristic based on duplications and losses provides a good initialization for tree searching methods, even when transfers are present. Comparing event costs with RF distance, we observed that the two measures of distance captured very different information and are poorly correlated.

All methods are implemented in a new release of Notung, a Java-based, cross-platform software for reconciling and resolving gene trees. Notung is available at: http://www.cs.cmu.edu/~durand/Notung.

Keywords

Transfers Resolve Rearrange Non-binary gene tree Weak branches Reconciliation Gene tree corrections 

Notes

Acknowledgments

We thank Annette McLeod for help with figures.

References

  1. 1.
    Anisimova, M., Gil, M., Dufayard, J.F., Dessimoz, C., Gascuel, O.: Survey of branch support methods demonstrates accuracy, power, and robustness of fast likelihood-based approximation schemes. Syst. Biol. 60(5), 685–699 (2011)CrossRefGoogle Scholar
  2. 2.
    Bansal, M.S., Alm, E.J., Kellis, M.: Efficient algorithms for the reconciliation problem with gene duplication, horizontal transfer and loss. Bioinformatics 28, i283–i291 (2012)CrossRefGoogle Scholar
  3. 3.
    Bansal, M.S., Wu, Y.C., Alm, E.J., Kellis, M.: Improved gene tree error correction in the presence of horizontal gene transfer. Bioinformatics 31, 1211–1218 (2015)CrossRefGoogle Scholar
  4. 4.
    Barker, D.: Gene trees for orthologous groups from: the evolution of nitrogen fixation in cyanobacteria (2012). Edinburgh DataShare. doi:10.5061/dryad.pv6df
  5. 5.
    Boussau, B., Szöllősi, G.J., Duret, L., Gouy, M., Tannier, E., Daubin, V.: Genome-scale coestimation of species and gene trees. Genome Res. 23, 323–330 (2013)CrossRefGoogle Scholar
  6. 6.
    Chang, W.-C., Eulenstein, O.: Reconciling gene trees with apparent polytomies. In: Chen, D.Z., Lee, D.T. (eds.) COCOON 2006. LNCS, vol. 4112, pp. 235–244. Springer, Heidelberg (2006). doi:10.1007/11809678_26 CrossRefGoogle Scholar
  7. 7.
    Chaudhary, R., Burleigh, J.G., Eulenstein, O.: Efficient error correction algorithms for gene tree reconciliation based on duplication, duplication and loss, and deep coalescence. BMC Bioinformatics 13(Suppl 10), S11 (2012)CrossRefGoogle Scholar
  8. 8.
    Chauve, C., El-Mabrouk, N., Guéguen, L., Semeria, M., Tannier, E.: Duplication rearrangement and reconciliation: a follow-up 13 years later. In: Chauve, C., El-Mabrouk, N., Tannier, E. (eds.) Models and Algorithms for Genome Evolution, pp. 47–62. Springer, London (2013). doi:10.1007/978-1-4471-5298-9_4 CrossRefGoogle Scholar
  9. 9.
    Chen, K., Durand, D., Farach-Colton, M.: Notung: a program for dating gene duplications and optimizing gene family trees. J. Comput. Biol. 7(3/4), 429–447 (2000)CrossRefGoogle Scholar
  10. 10.
    Darby, C.A., Stolzer, M., Ropp, P.J., Barker, D., Durand, D.: Xenolog classification. Bioinformatics 33(5), 640–649 (2017)Google Scholar
  11. 11.
    David, L.A., Alm, E.J.: Rapid evolutionary innovation during an Archaean genetic expansion. Nature 469, 93–96 (2011)CrossRefGoogle Scholar
  12. 12.
    Donati, B., Baudet, C., Sinaimeri, B., Crescenzi, P., Sagot, M.F.: EUCALYPT: efficient tree reconciliation enumerator. Algorithms Mol. Biol. 10(1), 3 (2015)CrossRefGoogle Scholar
  13. 13.
    Doyon, J.-P., Scornavacca, C., Gorbunov, K.Y., Szöllősi, G.J., Ranwez, V., Berry, V.: An efficient algorithm for gene/species trees parsimonious reconciliation with losses, duplications and transfers. In: Tannier, E. (ed.) RECOMB-CG 2010. LNCS, vol. 6398, pp. 93–108. Springer, Heidelberg (2010). doi:10.1007/978-3-642-16181-0_9 CrossRefGoogle Scholar
  14. 14.
    Durand, D., Halldorsson, B., Vernot, B.: A hybridmicro-macroevolutionary approach to gene tree reconstruction. J. Comput. Biol. 13(2), 320–335 (2006). A preliminary version appeared in RECOMB 2005, 250–264MathSciNetCrossRefMATHGoogle Scholar
  15. 15.
    El-Mabrouk, N., Ouangraoua, A.: A general framework for gene tree correction based on duplication-loss reconciliation. In: Proceedings of the Workshop on Algorithmics in Bioinformatics (WABI). (2017, in press)Google Scholar
  16. 16.
    Górecki, P., Eulenstein, O.: Algorithms: simultaneous error-correction and rooting for gene tree reconciliation and the gene duplication problem. BMC Bioinform. 13(Suppl 10), S14 (2012)CrossRefGoogle Scholar
  17. 17.
    Guindon, S., Dufayard, J.F., Lefort, V., Anisimova, M., Hordijk, W., Gascuel, O.: New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59, 307–321 (2010)CrossRefGoogle Scholar
  18. 18.
    Hallett, M., Lagergren, J., Tofigh, A.: Simultaneous identification of duplications and lateral transfers. In: Proceedings of the 8th International Conference on Research in Computational Biology, RECOMB 2004, pp. 347–356. ACM Press, New York (2004)Google Scholar
  19. 19.
    Hill, T., Nordström, K.J.V., Thollesson, M., Säfström, T.M., Vernersson, A.K.E., Fredriksson, R., Schiöth, H.B.: Sprit: Identifying horizontal gene transfer in rooted phylogenetic trees. BMC Evol. Biol. 10, 42 (2010)CrossRefGoogle Scholar
  20. 20.
    Huson, D., Rupp, R., Scornavacca, C.: Phylogenetic Networks: Concepts, Algorithms and Applications. Cambridge University Press, Cambridge (2011)Google Scholar
  21. 21.
    Huson, D.H., Scornavacca, C.: A survey of combinatorial methods for phylogenetic networks. Genome Biol. Evol. 3, 23–35 (2011)CrossRefGoogle Scholar
  22. 22.
    Jacox, E., Chauve, C., Szöllősi, G.J., Ponty, Y., Scornavacca, C.: ecceTERA: comprehensive gene tree-species tree reconciliation using parsimony. Bioinformatics 32, 2056–2058 (2016)CrossRefGoogle Scholar
  23. 23.
    Jacox, E., Weller, M., Tannier, E., Scornavacca, C.: Resolution and reconciliation of non-binary gene trees with transfers, duplications and losses. Bioinformatics 33, 980–987 (2017)Google Scholar
  24. 24.
    Keane, T.M., Creevey, C.J., Pentony, M.M., Naughton, T.J., Mclnerney, J.O.: Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified. BMC Evol. Biol. 6, 29 (2006)CrossRefGoogle Scholar
  25. 25.
    Kordi, M., Bansal, M.S.: Exact algorithms for duplication-transfer-loss reconciliation with non-binary gene trees. In: ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, pp. 297–306 (2016)Google Scholar
  26. 26.
    Kordi, M., Bansal, S.: On the complexity of duplication-transfer-loss reconciliation with non-binary gene trees. IEEE/ACM Trans. Comput. Biol. Bioinform. 14(3), 587–599 (2017)CrossRefGoogle Scholar
  27. 27.
    Lafond, M., Chauve, C., Dondi, R., El-Mabrouk, N.: Polytomy refinement for the correction of dubious duplications in gene trees. Bioinformatics 30, i519–i526 (2014)CrossRefGoogle Scholar
  28. 28.
    Lafond, M., Noutahi, E., El-Mabrouk, N.: Efficient non-binary gene tree resolution with weighted reconciliation cost. In: Grossi, R., Lewenstein, M. (eds.) 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016), Leibniz International Proceedings in Informatics (LIPIcs), vol. 54, pp. 14:1–14:12. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany (2016)Google Scholar
  29. 29.
    Lafond, M., Semeria, M., Swenson, K.M., Tannier, E., El -Mabrouk, N.: Gene tree correction guided by orthology. BMC Bioinform. 14(Suppl 15), S5 (2013)CrossRefGoogle Scholar
  30. 30.
    Lafond, M., Swenson, K.M., El-Mabrouk, N.: An optimal reconciliation algorithm for gene trees with polytomies. In: Raphael, B., Tang, J. (eds.) WABI 2012. LNCS, vol. 7534, pp. 106–122. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33122-0_9 CrossRefGoogle Scholar
  31. 31.
    Latysheva, N., Junker, V.L., Palmer, W.J., Codd, G.A., Barker, D.: The evolution of nitrogen fixation in cyanobacteria. Bioinformatics 28(5), 603–606 (2012)CrossRefGoogle Scholar
  32. 32.
    Ma, W., Smirnov, D., Forman, J., Schweickart, A., Slocum, C., Srinivasan, S., Libeskind-Hadas, R.: DTL-RnB: algorithms and tools for summarizing the space of DTL reconciliations. IEEE/ACM Trans. Comput. Biol. Bioinform. (2016, in press)Google Scholar
  33. 33.
    Nakhleh, L.: Evolutionary phylogenetic networks: models and issues. In: Heath, L., Ramakrishnan, N. (eds.) The Problem Solving Handbook for Computational, pp. 125–158. Springer, Heidelberg (2010). doi:10.1007/978-0-387-09760-2_7 Google Scholar
  34. 34.
    Nakhleh, L.: Computational approaches to species phylogeny inference and gene tree reconciliation. Trends Ecol. Evol. 28, 719–728 (2013)CrossRefGoogle Scholar
  35. 35.
    Nakhleh, L., Ruths, D.: Gene trees, species trees, and species networks. In: Guerra, R., Goldstein, D. (eds.) Meta-Analysis and Combining Information in Genetics and Genomics, pp. 275–293. CRC Press, Boca Raton (2009)CrossRefGoogle Scholar
  36. 36.
    Nguyen, T.H., Ranwez, V., Pointet, S., Chifolleau, A.M.A., Doyon, J.P., Berry, V.: Reconciliation and local gene tree rearrangement can be of mutual profit. Algorithms Mol. Biol. 8(1), 12 (2013)CrossRefGoogle Scholar
  37. 37.
    Noutahi, E., Semeria, M., Lafond, M., Seguin, J., Boussau, B., Guéguen, L., El -Mabrouk, N., Tannier, E.: Efficient gene tree correction guided by genome evolution. PLoS ONE 11, e0159559 (2016)CrossRefGoogle Scholar
  38. 38.
    Ovadia, Y., Fielder, D., Conow, C., Libeskind-Hadas, R.: The cophylogeny reconstruction problem is NP-complete. J. Comput. Biol. 18, 59–65 (2011)MathSciNetCrossRefGoogle Scholar
  39. 39.
    Penel, S., Arigon, A.M., Dufayard, J.F., Sertier, A.S., Daubin, V., Duret, L., Gouy, M., Perrière, G.: Databases of homologous gene families for comparative genomics. BMC Bioinform. 10(Suppl 6), S3 (2009)CrossRefGoogle Scholar
  40. 40.
    Rasmussen, M.D., Kellis, M.: A Bayesian approach for fast and accurate gene tree reconstruction. Mol. Biol. Evol. 28, 273–290 (2011)CrossRefGoogle Scholar
  41. 41.
    Scornavacca, C., Jacox, E., Szöllősi, G.J.: Joint amalgamation of most parsimonious reconciled gene trees. Bioinformatics 31, 841–848 (2015)CrossRefGoogle Scholar
  42. 42.
    Sjöstrand, J., Sennblad, B., Arvestad, L., Lagergren, J.: DLRS: gene tree evolution in light of a species tree. Bioinformatics 28, 2994–2995 (2012)CrossRefGoogle Scholar
  43. 43.
    Sjöstrand, J., Tofigh, A., Daubin, V., Arvestad, L., Sennblad, B., Lagergren, J.: A Bayesian method for analyzing lateral gene transfer. Syst. Biol. 63(3), 409 (2014)CrossRefGoogle Scholar
  44. 44.
    Stolzer, M., Lai, H., Xu, M., Sathaye, D., Vernot, B., Durand, D.: Inferring duplications, losses, transfers, and incomplete lineage sorting with non-binary species trees. Bioinformatics 28, i409–i415 (2012)CrossRefGoogle Scholar
  45. 45.
    Swenson, K.M., Doroftei, A., El-Mabrouk, N.: Gene tree correction for reconciliation and species tree inference. Algorithms Mol. Biol. 7, 31 (2012)CrossRefGoogle Scholar
  46. 46.
    Szöllősi, G.J., Boussau, B., Abby, S.S., Tannier, E., Daubin, V.: Phylogenetic modeling of lateral gene transfer reconstructs the pattern and relative timing of speciations. Proc. Natl. Acad. Sci. U.S.A. 109, 17513–17518 (2012)CrossRefGoogle Scholar
  47. 47.
    Szöllősi, G.J., Rosikiewicz, W., Boussau, B., Tannier, E., Daubin, V.: Data from: efficient exploration of the space of reconciled gene trees (2013). Dryad Digital Repository. doi:10.5061/dryad.pv6df
  48. 48.
    Szöllősi, G.J., Rosikiewicz, W., Boussau, B., Tannier, E., Daubin, V.: Efficient exploration of the space of reconciled gene trees. Syst. Biol. 62, 901–912 (2013)CrossRefGoogle Scholar
  49. 49.
    Thomas, P.D.: GIGA: a simple, efficient algorithm for gene tree inference in the genomic age. BMC Bioinform. 11, 312 (2010)CrossRefGoogle Scholar
  50. 50.
    Tofigh, A., Hallett, M., Lagergren, J.: Simultaneous identification of duplications and lateral gene transfers. IEEE/ACM Trans. Comput. Biol. Bioinf. 8, 517–535 (2011)CrossRefGoogle Scholar
  51. 51.
    Vilella, A.J., Severin, J., Ureta-Vidal, A., Heng, L., Durbin, R., Birney, E.: Ensemblcompara genetrees: complete, duplication-aware phylogenetic trees in vertebrates. Genome Res. 19, 327–335 (2009)CrossRefGoogle Scholar
  52. 52.
    Wapinski, I., Pfeffer, A., Friedman, N., Regev, A.: Automatic genome-wide reconstruction of phylogenetic gene trees. Bioinformatics 23, i549–i558 (2007)CrossRefGoogle Scholar
  53. 53.
    Zheng, Y., Zhang, L.: Are the duplication cost and robinson-foulds distance equivalent? J. Comput. Biol. 21, 578–590 (2014)MathSciNetCrossRefGoogle Scholar
  54. 54.
    Zheng, Y., Zhang, L.: Reconciliation with non-binary gene trees revisited. In: Sharan, R. (ed.) RECOMB 2014. LNCS, vol. 8394, pp. 418–432. Springer, Cham (2014). doi:10.1007/978-3-319-05269-4_33 CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Department of Biological SciencesCarnegie Mellon UniversityPittsburghUSA
  2. 2.Department of Computer ScienceCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations