Abstract
Phylogenetic diversity indices such as the Fair Proportion (FP) index are frequently discussed as prioritization criteria in biodiversity conservation. They rank species according to their contribution to overall diversity by taking into account the unique and shared evolutionary history of each species as indicated by its placement in an underlying phylogenetic tree. Traditionally, phylogenetic trees were inferred from single genes and the resulting gene trees were assumed to be a valid estimate for the species tree, i.e., the “true” evolutionary history of the species under consideration. However, nowadays it is common to sequence whole genomes of hundreds or thousands of genes, and it is often the case that conflicting genealogical histories exist in different genes throughout the genome, resulting in discordance between individual gene trees and the species tree. Here, we analyze the effects of gene and species tree discordance on prioritization decisions based on the FP index. In particular, we consider the ranking order of taxa induced by (i) The FP index on a species tree, and (ii) The expected FP index across all gene tree histories associated with the species tree. On the one hand, we show that for particular tree shapes, the two rankings always coincide. On the other hand, we show that for all leaf numbers greater than or equal to five, there exist species trees for which the two rankings differ. Finally, we illustrate the variability in the rankings obtained from the FP index across different gene tree and species tree estimates for an empirical multilocus mammal data set.
Similar content being viewed by others
Data Availability
Not applicable.
Notes
Note that Degnan et al. (2012a) defined pseudocaterpillar trees for \(n \ge 5\).
Note that we use this data set as a mere example for the variability of the FP rankings across gene trees and do not attempt to make any statement about the conservation implications for any of those species.
References
Chifman J, Kubatko L (2014) Quartet inference from SNP data under the coalescent model. Bioinformatics 30(23):3317–3324. https://doi.org/10.1093/bioinformatics/btu530
Degnan JH, Rosenberg NA (2009) Gene tree discordance, phylogenetic inference and the multispecies coalescent. Trends Ecol Evol 24(6):332–340. https://doi.org/10.1016/j.tree.2009.01.009
Degnan JH, Rosenberg NA, Stadler T (2012) A characterization of the set of species trees that produce anomalous ranked gene trees. IEEE/ACM Trans Comput Biol Bioinf 9(6):1558–1568. https://doi.org/10.1109/tcbb.2012.110
Degnan JH, Rosenberg NA, Stadler T (2012) The probability distribution of ranked gene trees on a species tree. Math Biosci 235(1):45–55. https://doi.org/10.1016/j.mbs.2011.10.006
Faith DP (1992) Conservation evaluation and phylogenetic diversity. Biol Cons 61(1):1–10. https://doi.org/10.1016/0006-3207(92)91201-3
Fuchs M, Jin EY (2015) Equality of shapley value and fair proportion index in phylogenetic trees. J Math Biol 71(5):1133–1147. https://doi.org/10.1007/s00285-014-0853-0
Haake CJ, Kashiwada A, Su FE (2008) The Shapley value of phylogenetic trees. J Math Biol 56(4):479–497. https://doi.org/10.1007/s00285-007-0126-2
Isaac NJ, Turvey ST, Collen B et al (2007) Mammals on the EDGE: conservation priorities based on threat and phylogeny. PLoS ONE 2(3):e296
Maddison WP (1997) Gene trees in species trees. Syst Biol 46(3):523–536. https://doi.org/10.1093/sysbio/46.3.523
Nichols R (2001) Gene trees and species trees are not the same. Trends Ecol Evol 16(7):358–364. https://doi.org/10.1016/S0169-5347(01)02203-0
Pamilo P, Nei M (1988) Relationships between gene trees and species trees. Mol Biol Evol 5(5):568–583. https://doi.org/10.1093/oxfordjournals.molbev.a040517
Redding DW (2003) Incorporating genetic distinctness and reserve occupancy into a conservation priorisation approach. University Of East Anglia, UK
Redding DW, Mooers AØ (2006) Incorporating evolutionary measures into conservation prioritization. Conserv Biol 20(6):1670–1678. https://doi.org/10.1111/j.1523-1739.2006.00555.x
Redding DW, Hartmann K, Mimoto A et al (2008) Evolutionarily distinctive species often capture more phylogenetic diversity than expected. J Theor Biol 251(4):606–615. https://doi.org/10.1016/j.jtbi.2007.12.006
Redding DW, Mazel F, Mooers AØ (2014) Measuring evolutionary isolation for conservation. PLoS ONE 9(12):e113,490. https://doi.org/10.1371/journal.pone.0113490
Shapley LS (1953) A value for \(n\)–person games. In: Contributions to the theory of games (AM-28), Volume II. Princeton University Press, Princeton, pp 307–317, https://doi.org/10.1515/9781400881970-018
Song S, Liu L, Edwards SV et al (2012) Resolving conflict in eutherian mammal phylogeny using phylogenomics and the multispecies coalescent model. Proc Nat Acad Sci 109(37):14942–14947. https://doi.org/10.1073/pnas.1211733109
Springer MS, Gatesy J (2016) The gene tree delusion. Mol Phylogenet Evol 94:1–33. https://doi.org/10.1016/j.ympev.2015.07.018
Stamatakis A (2014) RAxML Version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30:1312–1313
Swofford D (2021) Paup*: Phylogenetic analysis using parsimony (*and other methods). wwwpaupphylosolutionscom, version 4a168
Vane-Wright R, Humphries C, Williams P (1991) What to protect?-Systematics and the agony of choice. Biol Cons 55(3):235–254. https://doi.org/10.1016/0006-3207(91)90030-d
Vellend MK, Cornwell W, Magnuson-Ford K, et al (2011) Measuring phylogenetic biodiversity. In: Magurran AE, McGill BJ (Eds.) Biological Diversity: Frontiers in Measurement and Assessment. Oxford University Press, Oxford, chap 14, pp 194–207
Wolfram Research I (2020) Mathematica, Version 12.2. https://www.wolfram.com/mathematica, champaign, IL, 2020
Acknowledgements
MF was supported by the joint research project DIG-IT! funded by the European Social Fund (ESF), reference: ESF/14-BM-A55- 0017/19, and the Ministry of Education, Science and Culture of Mecklenburg-Vorpommern, Germany. KW was supported by The Ohio State University’s President’s Postdoctoral Scholars Program. All authors thank three anonymous reviewers for detailed comments on an earlier version of this manuscript. Moreover, all authors thank James Degnan for a helpful discussion on the existence of anomalous ranked gene trees.
Funding
MF was supported by the joint research project DIG-IT! funded by the European Social Fund (ESF), reference: ESF/14-BM-A55- 0017/19, and the Ministry of Education, Science and Culture of Mecklenburg-Vorpommern, Germany. KW was supported by The Ohio State University’s President’s Postdoctoral Scholars Program.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that there is no conflict of interest.
Code availability
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A FP indices for the species trees depicted in Figs. 1 and 10
In the following, we give expression for the FP indices on the species tree and the expected FP indices for the species trees \({{\mathcal {S}}}\), \({{\mathcal {S}}}'\), and \({{\mathcal {S}}}''\) on 5 leaves considered in the main part of this manuscript (Figs. 1 and 10). The expected FP indices were obtained using Mathematica (Wolfram Research I 2020) to enumerate all gene tree histories in \({{\mathcal {H}}}({{\mathcal {S}}})\), respectively \({{\mathcal {H}}}({{\mathcal {S}}}')\) and \({{\mathcal {H}}}({{\mathcal {S}}}'')\).
Species tree \({{\mathcal {S}}}\) (Fig.1).
-
FP indices on the species tree:
$$\begin{aligned} FP_{{{\mathcal {S}}}}(x_1)&= \tau _0 + \frac{\tau _1}{2} + \frac{\tau _2}{2} + \frac{\tau _3}{2} = FP_{{{\mathcal {S}}}}(x_2) \\ FP_{{{\mathcal {S}}}}(x_3)&= \tau _0 + \tau _1 + \tau _2 + \frac{\tau _3}{3} \\ FP_{{{\mathcal {S}}}}(x_4)&= \tau _0 + \tau _1 + \frac{\tau _2}{2} + \frac{\tau _3}{3} = FP_{{{\mathcal {S}}}}(x_5). \end{aligned}$$ -
Expected FP indices:
$$\begin{aligned} {{\mathbb {E}}}\left[ FP(x_1) \vert {{\mathcal {S}}}\right]&= \tau _0 + \frac{\tau _1}{2} + \frac{\tau _2}{2} + \frac{\tau _3}{2} + 1 - \frac{ e^{-\tau _1 - 2 \tau _2 - 2 \tau _3}}{24} - \frac{e^{- \tau _1 - \tau _2 - 2 \tau _3}}{12} \\&\quad + \frac{e^{-\tau _2 - \tau _3}}{72} - \frac{e^{-\tau _1 - \tau _2 - \tau _3}}{12} + \frac{e^{-\tau _3}}{36} = {{\mathbb {E}}}\left[ FP(x_2) \vert {{\mathcal {S}}}\right] \\ {{\mathbb {E}}}\left[ FP(x_3) \vert {{\mathcal {S}}}\right]&= \tau _0 + \tau _1 + \tau _2 + \frac{\tau _3}{3} + 1 - \frac{e^{-\tau _2}}{9} - \frac{e^{-\tau _1 - \tau _2 - 2 \tau _3}}{12} \\&\quad + \frac{e^{-\tau _2 - \tau _3}}{12} + \frac{e^{-\tau _1 - \tau _2 - \tau _3}}{18} - \frac{e^{-\tau _3}}{9} \\ {{\mathbb {E}}}\left[ FP(x_4) \vert {{\mathcal {S}}}\right]&= \tau _0 + \tau _1 + \frac{\tau _2}{2} + \frac{\tau _3}{3} + 1 - \frac{e^{-\tau _2}}{9} - \frac{e^{-\tau _1 - 2 \tau _2 - 2 \tau _3}}{24} \\&\quad - \frac{e^{-\tau _1 - \tau _2 - 2 \tau _3}}{24} - \frac{e^{-\tau _2-\tau _3}}{18} + \frac{e^{-\tau _1-\tau _2-\tau _3}}{18} + \frac{e^{-\tau _3}}{36} = {{\mathbb {E}}}\left[ FP(x_5) \vert {{\mathcal {S}}}\right] . \end{aligned}$$
Species tree \({{\mathcal {S}}}'\)(Fig. 10, top)
-
FP indices on the species tree:
$$\begin{aligned} FP_{{{\mathcal {S}}}'}(x_1)&= \tau _0 + \tau _1 + \frac{\tau _2}{2} + \frac{\tau _3}{2} = FP_{{{\mathcal {S}}}'}(x_2) \\ FP_{{{\mathcal {S}}}'}(x_3)&= \tau _0 + \tau _1 + \tau _2 + \frac{\tau _3}{3} \\ FP_{{{\mathcal {S}}}'}(x_4)&= \tau _0 + \frac{\tau _1}{2} + \frac{\tau _2}{2} + \frac{\tau _3}{3} = FP_{{{\mathcal {S}}}'}(x_5). \end{aligned}$$ -
Expected FP indices:
$$\begin{aligned} {{\mathbb {E}}}\left[ FP(x_1) \vert {{\mathcal {S}}}'\right]&= \tau _0 + \tau _1 + \frac{\tau _2}{2} + \frac{\tau _3}{2} + 1 - \frac{e^{-\tau _1-2\tau _2-2 \tau _3}}{24} - \frac{e^{-\tau _2 - 2 \tau _3}}{12} \\&\quad - \frac{e^{-\tau _2-\tau _3}}{12} + \frac{e^{-\tau _1 - \tau _2 - \tau _3}}{72} + \frac{e^{-\tau _3}}{36} = {{\mathbb {E}}}\left[ FP(x_2) \vert {{\mathcal {S}}}'\right] \\ {{\mathbb {E}}}\left[ FP(x_3) \vert {{\mathcal {S}}}'\right]&= \tau _0 + \tau _1 + \tau _2 + \frac{\tau _3}{3} + 1 - \frac{e^{-\tau _1- \tau _2}}{9} - \frac{e^{-\tau _2-2 \tau _3}}{12} \\&\quad + \frac{e^{-\tau _2-\tau _3}}{18} + \frac{e^{-\tau _1-\tau _2-\tau _3}}{12} - \frac{e^{-\tau _3}}{9} \\ {{\mathbb {E}}}\left[ FP(x_4) \vert {{\mathcal {S}}}'\right]&= \tau _0 + \frac{\tau _1}{2} + \frac{\tau _2}{2} + \frac{\tau _3}{3} + 1 - \frac{e^{-\tau _1-\tau _2}}{9} - \frac{e^{-\tau _1-2\tau _2-2\tau _3}}{24} \\&\quad - \frac{e^{-\tau _2-2 \tau _3}}{24} + \frac{e^{-\tau _2 - \tau _3}}{18} - \frac{e^{-\tau _1 - \tau _2 - \tau _3}}{18} + \frac{e^{-\tau _3}}{36} = {{\mathbb {E}}}\left[ FP(x_5) \vert {{\mathcal {S}}}'\right] . \end{aligned}$$
Species tree \({{\mathcal {S}}}''\) Fig. 10, bottom)
-
FP indices on the species tree:
$$\begin{aligned} FP_{{{\mathcal {S}}}''}(x_1)&= \tau _0 + \tau _1 + \tau _2 + \frac{\tau _3}{2} = FP_{{{\mathcal {S}}}''}(x_2) \\ FP_{{{\mathcal {S}}}''}(x_3)&= \tau _0 + \tau _1 + \frac{\tau _2}{3} + \frac{\tau _3}{3} \\ FP_{{{\mathcal {S}}}''}(x_4)&= \tau _0 + \frac{\tau _1}{2} + \frac{\tau _2}{3} + \frac{\tau _3}{3} = FP_{{{\mathcal {S}}}''}(x_5). \end{aligned}$$ -
Expected FP indices:
$$\begin{aligned} {{\mathbb {E}}}\left[ FP(x_1) \vert {{\mathcal {S}}}''\right]&= \tau _0 + \tau _1 + \tau _2 + \frac{\tau _3}{2} + 1 - \frac{e^{-\tau _2-2\tau _3}}{12} - \frac{e^{-\tau _1-\tau _2-2\tau _3}}{24} \\&\quad + \frac{e^{-\tau _2-\tau _3}}{36} + \frac{e^{-\tau _1-\tau _2-\tau _3}}{72} - \frac{e^{-\tau _3}}{12} = {{\mathbb {E}}}\left[ FP(x_2) \vert {{\mathcal {S}}}''\right] \\ {{\mathbb {E}}}\left[ FP(x_3) \vert {{\mathcal {S}}}''\right]&= \tau _0 + \tau _1 + \frac{\tau _2}{3} + \frac{\tau _3}{3} + 1 - \frac{e^{-\tau _1}}{9} - \frac{e^{-\tau _2-2 \tau _3}}{12} \\&\quad - \frac{e^{-\tau _2-\tau _3}}{9} + \frac{e^{-\tau _1-\tau _2-\tau _3}}{12} + \frac{e^{- \tau _3}}{18} \\ {{\mathbb {E}}}\left[ FP(x_4) \vert {{\mathcal {S}}}''\right]&= \tau _0 + \frac{\tau _1}{2} + \frac{\tau _2}{3} + \frac{\tau _3}{3} + 1 - \frac{e^{-\tau _1}}{9} - \frac{e^{-\tau _2-2 \tau _3}}{24} \\&\quad - \frac{e^{-\tau _1-\tau _2-2 \tau _3}}{24} + \frac{e^{-\tau _2-\tau _3}}{36} - \frac{e^{-\tau _1-\tau _2-\tau _3}}{18} + \frac{e^{-\tau _3}}{18} \\&= {{\mathbb {E}}}\left[ FP(x_5) \vert {{\mathcal {S}}}''\right] . \end{aligned}$$
Appendix B Species name abbreviations used in Fig. 11
Appendix C Details on species tree estimation for Sect. 4
In order to obtain species tree estimates for the multilocus mammal data set of Song et al. (2012), we downloaded the full sequences of the 447 genes from https://datadryad.org/stash/dataset/doi:10.5061%2Fdryad.3629v (file gene447_final.nexus).
After converting to PHYLIP format, a maximum liklihood tree based on the concatenated data was obtained using RAxML version 8.2.10 (Stamatakis 2014) via the following command:
raxMLHPC-AVX -s gene447_final.phy -p 12345 -m GTRGAMMA -o Gal -n raxml_GTR.tre
Afterwards, maximum likelihood branch lengths under the GTR+\(\Gamma \) model and enforcing a molecular clock were computed in PAUP* (Swofford 2021) via the following commands:
exe gene447_final.nexus
gettrees file=raxml_GTR.tre
set criterion=likelihood
lset nst=6 basefreq=empirical rates=gamma rmatrix =estimate shape=estimate clock=yes
lscores
savetrees RAxML_GTR_MLbranchlengths.tre brlens
Moreover, a species tree using the species tree estimation method SVDQuartets (Chifman and Kubatko 2014) and maximum likelihood branch lengths for this species tree were obtained in PAUP* via the following commands:
exe gene447_final.nex
svdq
outgroup 35 (where species 35 corresponds to Gal)
roottrees rootmethod=outgroup
set criterion=likelihood
lset nst=6 basefreq=empirical rates=gamma rmatrix =estimate shape=estimate clock=yes
lscores
savetrees svdq_GTR_MLbranchlengths.tre brlens
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wicke, K., Fischer, M. & Kubatko, L. Effects of discordance between species and gene trees on phylogenetic diversity conservation. J. Math. Biol. 86, 13 (2023). https://doi.org/10.1007/s00285-022-01845-w
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00285-022-01845-w
Keywords
- Biodiversity conservation
- Fair Proportion index
- Gene tree
- Multispecies coalescent
- Phylogenetic diversity
- Species tree