Skip to main content
Log in

Effects of discordance between species and gene trees on phylogenetic diversity conservation

  • Published:
Journal of Mathematical Biology Aims and scope Submit manuscript

Abstract

Phylogenetic diversity indices such as the Fair Proportion (FP) index are frequently discussed as prioritization criteria in biodiversity conservation. They rank species according to their contribution to overall diversity by taking into account the unique and shared evolutionary history of each species as indicated by its placement in an underlying phylogenetic tree. Traditionally, phylogenetic trees were inferred from single genes and the resulting gene trees were assumed to be a valid estimate for the species tree, i.e., the “true” evolutionary history of the species under consideration. However, nowadays it is common to sequence whole genomes of hundreds or thousands of genes, and it is often the case that conflicting genealogical histories exist in different genes throughout the genome, resulting in discordance between individual gene trees and the species tree. Here, we analyze the effects of gene and species tree discordance on prioritization decisions based on the FP index. In particular, we consider the ranking order of taxa induced by (i) The FP index on a species tree, and (ii) The expected FP index across all gene tree histories associated with the species tree. On the one hand, we show that for particular tree shapes, the two rankings always coincide. On the other hand, we show that for all leaf numbers greater than or equal to five, there exist species trees for which the two rankings differ. Finally, we illustrate the variability in the rankings obtained from the FP index across different gene tree and species tree estimates for an empirical multilocus mammal data set.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

\hspace*{-6pt 12
\hspace*{-6pt 13
\hspace*{-6pt 14
\hspace*{-6pt 15
\hspace*{-6pt 16
\hspace*{-6pt 17
\hspace*{-6pt 18
\hspace*{-6pt 19
\hspace*{-6pt 20
\hspace*{-6pt 21
\hspace*{-6pt 22

Similar content being viewed by others

Data Availability

Not applicable.

Notes

  1. Note that Degnan et al. (2012a) defined pseudocaterpillar trees for \(n \ge 5\).

  2. In particular, we did not re-compute the probability distribution corresponding to Table 1 and it might be the case that not every point in the plotted region has the property that \(P(n_1=1) \cdot P(n_3=2) > P(n_1=2) \cdot P(n_3=1)\), which was used in the proof of Theorem 2 for \(n > 5\).

  3. Note that we use this data set as a mere example for the variability of the FP rankings across gene trees and do not attempt to make any statement about the conservation implications for any of those species.

References

Download references

Acknowledgements

MF was supported by the joint research project DIG-IT! funded by the European Social Fund (ESF), reference: ESF/14-BM-A55- 0017/19, and the Ministry of Education, Science and Culture of Mecklenburg-Vorpommern, Germany. KW was supported by The Ohio State University’s President’s Postdoctoral Scholars Program. All authors thank three anonymous reviewers for detailed comments on an earlier version of this manuscript. Moreover, all authors thank James Degnan for a helpful discussion on the existence of anomalous ranked gene trees.

Funding

MF was supported by the joint research project DIG-IT! funded by the European Social Fund (ESF), reference: ESF/14-BM-A55- 0017/19, and the Ministry of Education, Science and Culture of Mecklenburg-Vorpommern, Germany. KW was supported by The Ohio State University’s President’s Postdoctoral Scholars Program.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kristina Wicke.

Ethics declarations

Conflicts of interest

The authors declare that there is no conflict of interest.

Code availability

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A FP indices for the species trees depicted in Figs. 1 and 10

In the following, we give expression for the FP indices on the species tree and the expected FP indices for the species trees \({{\mathcal {S}}}\), \({{\mathcal {S}}}'\), and \({{\mathcal {S}}}''\) on 5 leaves considered in the main part of this manuscript (Figs. 1 and 10). The expected FP indices were obtained using Mathematica (Wolfram Research I 2020) to enumerate all gene tree histories in \({{\mathcal {H}}}({{\mathcal {S}}})\), respectively \({{\mathcal {H}}}({{\mathcal {S}}}')\) and \({{\mathcal {H}}}({{\mathcal {S}}}'')\).

Species tree \({{\mathcal {S}}}\) (Fig.1).

  • FP indices on the species tree:

    $$\begin{aligned} FP_{{{\mathcal {S}}}}(x_1)&= \tau _0 + \frac{\tau _1}{2} + \frac{\tau _2}{2} + \frac{\tau _3}{2} = FP_{{{\mathcal {S}}}}(x_2) \\ FP_{{{\mathcal {S}}}}(x_3)&= \tau _0 + \tau _1 + \tau _2 + \frac{\tau _3}{3} \\ FP_{{{\mathcal {S}}}}(x_4)&= \tau _0 + \tau _1 + \frac{\tau _2}{2} + \frac{\tau _3}{3} = FP_{{{\mathcal {S}}}}(x_5). \end{aligned}$$
  • Expected FP indices:

    $$\begin{aligned} {{\mathbb {E}}}\left[ FP(x_1) \vert {{\mathcal {S}}}\right]&= \tau _0 + \frac{\tau _1}{2} + \frac{\tau _2}{2} + \frac{\tau _3}{2} + 1 - \frac{ e^{-\tau _1 - 2 \tau _2 - 2 \tau _3}}{24} - \frac{e^{- \tau _1 - \tau _2 - 2 \tau _3}}{12} \\&\quad + \frac{e^{-\tau _2 - \tau _3}}{72} - \frac{e^{-\tau _1 - \tau _2 - \tau _3}}{12} + \frac{e^{-\tau _3}}{36} = {{\mathbb {E}}}\left[ FP(x_2) \vert {{\mathcal {S}}}\right] \\ {{\mathbb {E}}}\left[ FP(x_3) \vert {{\mathcal {S}}}\right]&= \tau _0 + \tau _1 + \tau _2 + \frac{\tau _3}{3} + 1 - \frac{e^{-\tau _2}}{9} - \frac{e^{-\tau _1 - \tau _2 - 2 \tau _3}}{12} \\&\quad + \frac{e^{-\tau _2 - \tau _3}}{12} + \frac{e^{-\tau _1 - \tau _2 - \tau _3}}{18} - \frac{e^{-\tau _3}}{9} \\ {{\mathbb {E}}}\left[ FP(x_4) \vert {{\mathcal {S}}}\right]&= \tau _0 + \tau _1 + \frac{\tau _2}{2} + \frac{\tau _3}{3} + 1 - \frac{e^{-\tau _2}}{9} - \frac{e^{-\tau _1 - 2 \tau _2 - 2 \tau _3}}{24} \\&\quad - \frac{e^{-\tau _1 - \tau _2 - 2 \tau _3}}{24} - \frac{e^{-\tau _2-\tau _3}}{18} + \frac{e^{-\tau _1-\tau _2-\tau _3}}{18} + \frac{e^{-\tau _3}}{36} = {{\mathbb {E}}}\left[ FP(x_5) \vert {{\mathcal {S}}}\right] . \end{aligned}$$

Species tree \({{\mathcal {S}}}'\)(Fig. 10, top)

  • FP indices on the species tree:

    $$\begin{aligned} FP_{{{\mathcal {S}}}'}(x_1)&= \tau _0 + \tau _1 + \frac{\tau _2}{2} + \frac{\tau _3}{2} = FP_{{{\mathcal {S}}}'}(x_2) \\ FP_{{{\mathcal {S}}}'}(x_3)&= \tau _0 + \tau _1 + \tau _2 + \frac{\tau _3}{3} \\ FP_{{{\mathcal {S}}}'}(x_4)&= \tau _0 + \frac{\tau _1}{2} + \frac{\tau _2}{2} + \frac{\tau _3}{3} = FP_{{{\mathcal {S}}}'}(x_5). \end{aligned}$$
  • Expected FP indices:

    $$\begin{aligned} {{\mathbb {E}}}\left[ FP(x_1) \vert {{\mathcal {S}}}'\right]&= \tau _0 + \tau _1 + \frac{\tau _2}{2} + \frac{\tau _3}{2} + 1 - \frac{e^{-\tau _1-2\tau _2-2 \tau _3}}{24} - \frac{e^{-\tau _2 - 2 \tau _3}}{12} \\&\quad - \frac{e^{-\tau _2-\tau _3}}{12} + \frac{e^{-\tau _1 - \tau _2 - \tau _3}}{72} + \frac{e^{-\tau _3}}{36} = {{\mathbb {E}}}\left[ FP(x_2) \vert {{\mathcal {S}}}'\right] \\ {{\mathbb {E}}}\left[ FP(x_3) \vert {{\mathcal {S}}}'\right]&= \tau _0 + \tau _1 + \tau _2 + \frac{\tau _3}{3} + 1 - \frac{e^{-\tau _1- \tau _2}}{9} - \frac{e^{-\tau _2-2 \tau _3}}{12} \\&\quad + \frac{e^{-\tau _2-\tau _3}}{18} + \frac{e^{-\tau _1-\tau _2-\tau _3}}{12} - \frac{e^{-\tau _3}}{9} \\ {{\mathbb {E}}}\left[ FP(x_4) \vert {{\mathcal {S}}}'\right]&= \tau _0 + \frac{\tau _1}{2} + \frac{\tau _2}{2} + \frac{\tau _3}{3} + 1 - \frac{e^{-\tau _1-\tau _2}}{9} - \frac{e^{-\tau _1-2\tau _2-2\tau _3}}{24} \\&\quad - \frac{e^{-\tau _2-2 \tau _3}}{24} + \frac{e^{-\tau _2 - \tau _3}}{18} - \frac{e^{-\tau _1 - \tau _2 - \tau _3}}{18} + \frac{e^{-\tau _3}}{36} = {{\mathbb {E}}}\left[ FP(x_5) \vert {{\mathcal {S}}}'\right] . \end{aligned}$$

Species tree \({{\mathcal {S}}}''\) Fig. 10, bottom)

  • FP indices on the species tree:

    $$\begin{aligned} FP_{{{\mathcal {S}}}''}(x_1)&= \tau _0 + \tau _1 + \tau _2 + \frac{\tau _3}{2} = FP_{{{\mathcal {S}}}''}(x_2) \\ FP_{{{\mathcal {S}}}''}(x_3)&= \tau _0 + \tau _1 + \frac{\tau _2}{3} + \frac{\tau _3}{3} \\ FP_{{{\mathcal {S}}}''}(x_4)&= \tau _0 + \frac{\tau _1}{2} + \frac{\tau _2}{3} + \frac{\tau _3}{3} = FP_{{{\mathcal {S}}}''}(x_5). \end{aligned}$$
  • Expected FP indices:

    $$\begin{aligned} {{\mathbb {E}}}\left[ FP(x_1) \vert {{\mathcal {S}}}''\right]&= \tau _0 + \tau _1 + \tau _2 + \frac{\tau _3}{2} + 1 - \frac{e^{-\tau _2-2\tau _3}}{12} - \frac{e^{-\tau _1-\tau _2-2\tau _3}}{24} \\&\quad + \frac{e^{-\tau _2-\tau _3}}{36} + \frac{e^{-\tau _1-\tau _2-\tau _3}}{72} - \frac{e^{-\tau _3}}{12} = {{\mathbb {E}}}\left[ FP(x_2) \vert {{\mathcal {S}}}''\right] \\ {{\mathbb {E}}}\left[ FP(x_3) \vert {{\mathcal {S}}}''\right]&= \tau _0 + \tau _1 + \frac{\tau _2}{3} + \frac{\tau _3}{3} + 1 - \frac{e^{-\tau _1}}{9} - \frac{e^{-\tau _2-2 \tau _3}}{12} \\&\quad - \frac{e^{-\tau _2-\tau _3}}{9} + \frac{e^{-\tau _1-\tau _2-\tau _3}}{12} + \frac{e^{- \tau _3}}{18} \\ {{\mathbb {E}}}\left[ FP(x_4) \vert {{\mathcal {S}}}''\right]&= \tau _0 + \frac{\tau _1}{2} + \frac{\tau _2}{3} + \frac{\tau _3}{3} + 1 - \frac{e^{-\tau _1}}{9} - \frac{e^{-\tau _2-2 \tau _3}}{24} \\&\quad - \frac{e^{-\tau _1-\tau _2-2 \tau _3}}{24} + \frac{e^{-\tau _2-\tau _3}}{36} - \frac{e^{-\tau _1-\tau _2-\tau _3}}{18} + \frac{e^{-\tau _3}}{18} \\&= {{\mathbb {E}}}\left[ FP(x_5) \vert {{\mathcal {S}}}''\right] . \end{aligned}$$

Appendix B Species name abbreviations used in Fig. 11

Table 2 Species name abbreviations used in Fig. 11
Table 3 (continued)

Appendix C Details on species tree estimation for Sect. 4

In order to obtain species tree estimates for the multilocus mammal data set of Song et al. (2012), we downloaded the full sequences of the 447 genes from https://datadryad.org/stash/dataset/doi:10.5061%2Fdryad.3629v (file gene447_final.nexus).

After converting to PHYLIP format, a maximum liklihood tree based on the concatenated data was obtained using RAxML version 8.2.10 (Stamatakis 2014) via the following command:

raxMLHPC-AVX -s gene447_final.phy -p 12345 -m GTRGAMMA -o Gal -n raxml_GTR.tre

Afterwards, maximum likelihood branch lengths under the GTR+\(\Gamma \) model and enforcing a molecular clock were computed in PAUP* (Swofford 2021) via the following commands:

exe gene447_final.nexus

gettrees file=raxml_GTR.tre

set criterion=likelihood

lset nst=6 basefreq=empirical rates=gamma rmatrix =estimate shape=estimate clock=yes

lscores

savetrees RAxML_GTR_MLbranchlengths.tre brlens

Moreover, a species tree using the species tree estimation method SVDQuartets (Chifman and Kubatko 2014) and maximum likelihood branch lengths for this species tree were obtained in PAUP* via the following commands:

exe gene447_final.nex

svdq

outgroup 35 (where species 35 corresponds to Gal)

roottrees rootmethod=outgroup

set criterion=likelihood

lset nst=6 basefreq=empirical rates=gamma rmatrix =estimate shape=estimate clock=yes

lscores

savetrees svdq_GTR_MLbranchlengths.tre brlens

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wicke, K., Fischer, M. & Kubatko, L. Effects of discordance between species and gene trees on phylogenetic diversity conservation. J. Math. Biol. 86, 13 (2023). https://doi.org/10.1007/s00285-022-01845-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00285-022-01845-w

Keywords

MSC Classification

Navigation