Do Branch Lengths Help to Locate a Tree in a Phylogenetic Network?
Phylogenetic networks are increasingly used in evolutionary biology to represent the history of species that have undergone reticulate events such as horizontal gene transfer, hybrid speciation and recombination. One of the most fundamental questions that arise in this context is whether the evolution of a gene with one copy in all species can be explained by a given network. In mathematical terms, this is often translated in the following way: is a given phylogenetic tree contained in a given phylogenetic network? Recently this tree containment problem has been widely investigated from a computational perspective, but most studies have only focused on the topology of the phylogenies, ignoring a piece of information that, in the case of phylogenetic trees, is routinely inferred by evolutionary analyses: branch lengths. These measure the amount of change (e.g., nucleotide substitutions) that has occurred along each branch of the phylogeny. Here, we study a number of versions of the tree containment problem that explicitly account for branch lengths. We show that, although length information has the potential to locate more precisely a tree within a network, the problem is computationally hard in its most general form. On a positive note, for a number of special cases of biological relevance, we provide algorithms that solve this problem efficiently. This includes the case of networks of limited complexity, for which it is possible to recover, among the trees contained by the network with the same topology as the input tree, the closest one in terms of branch lengths.
KeywordsPhylogenetic network Tree containment Branch lengths Displayed trees Computational complexity
This work was partially funded by the CNRS “Projet international de coopération scientifique (PICS)” grant number 230310 (CoCoAlSeq). L. van Iersel was partly funded by the 4TU Applied Mathematics Institute and The Netherlands Organisation for Scientific Research (NWO). F. Pardi is a member of the VIROGENESIS project, which receives funding from the EU’s Horizon 2020 research and innovation programme under grant agreement No 634650.
- Bordewich M, Tokac N (2016) An algorithm for reconstructing ultrametric tree-child networks from inter-taxa distances. Discrete Appl Math. doi: 10.1016/j.dam.2016.05.011
- Doyon JP, Scornavacca C, Gorbunov KY, Szöllösi GJ, Ranwez V, Berry V (2011) An efficient algorithm for gene/species trees parsimonious reconciliation with losses, duplications, and transfers. In: Proceedings of the eighth RECOMB comparative genomics satellite workshop (RECOMB-CG’10), LNCS, vol 6398, pp 93–108. SpringerGoogle Scholar
- Gambette P, Berry V, Paul C (2009) The structure of level-k phylogenetic networks. In: CPM09, LNCS, vol 5577, pp 289–300. SpringerGoogle Scholar
- Garey MR, Johnson DS (1979) Computers and intractability. W. H. Freeman and Co. A guide to the theory of NP-completeness, A Series of Books in the Mathematical SciencesGoogle Scholar
- Morrison DA (2011) Introduction to Phylogenetic Networks. RJR ProductionsGoogle Scholar
- van Iersel L (2009) Algorithms, haplotypes and phylogenetic networks. Ph.D. thesis, Eindhoven University of TechnologyGoogle Scholar
- van Iersel L, Semple C, Steel M (2010) Locating a tree in a phylogenetic network. Inf Process Lett 110(23):1037–1043Google Scholar