WABI 2014: Algorithms in Bioinformatics pp 187-203

# New Algorithms for Computing Phylogenetic Biodiversity

• Constantinos Tsirogiannis
• Brody Sandel
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8701)

## Abstract

A common problem that appears in many case studies in ecology is the following: given a rooted phylogenetic tree $$\mathcal{T}$$ and a subset R of its leaf nodes, we want to compute the distance between the elements in R. A very popular distance measure that can be used for this reason is the Phylogenetic Diversity (PD), which is defined as the cost of the minimum weight Steiner tree in $$\mathcal{T}$$ that spans the nodes in R. To analyse the value of the PD for a given set R it is important also to calculate the variance of this measure. However, the best algorithm known so far for computing the variance of the PD is inefficient; for any input tree $$\mathcal{T}$$ that consists of n nodes, this algorithm has Θ(n 2) running time. Moreover, computing efficiently the variance and higher order statistical moments is a major open problem for several other phylogenetic measures. We provide the following results:

• We describe a new algorithm that computes efficiently in practice the variance of the pd. This algorithm has O(si($$\mathcal{T}$$) + DSSI $$^2(\mathcal{T}))$$ running time; here si($$\mathcal{T}$$) denotes the Sackin’s Index of $$\mathcal{T}$$, and DSSI $$(\mathcal{T})$$ is a new index whose value depends on how balanced $$\mathcal{T}$$ is.

• We provide for the first time exact formulas for computing the mean and the variance of another popular biodiversity measure, the Mean Nearest Taxon Distance (mntd). These formulas apply specifically to ultrametric trees. For an ultrametric tree $$\mathcal{T}$$ of n nodes, we show how we can compute the mean of the mntd in O(n) time, and its variance in O(si($$\mathcal{T}$$) + DSSI $$^2(\mathcal{T}))$$ time.

• We introduce a new measure which we call the Core Ancestor Cost  (cac). A major advantage of this measure is that for any integer k > 0 we can compute all first k statistical moments of the cac in O(si($$\mathcal{T}) +nk+k^2)$$ time in total, using O(n + k) space.

We have implemented the new algorithms for computing the variance of the pd and of the mntd, and the statistical moments of the cac. We conducted experiments on large phylogenetic datasets and we show that our algorithms perform efficiently in practice.

## Preview

Unable to display preview. Download preview PDF.

## References

1. 1.
Bininda-Emonds, O.R.P., Cardillo, M., Jones, K.E., MacPhee, R.D.E., Beck, R.M.D., Grenyer, R., Price, S.A., Vos, R.A., Gittleman, J.L., Purvis, A.: The Delayed Rise of Present-Day Mammals. Nature 446, 507–512 (2007)
2. 2.
Blum, M.G.B., François, O.: On Statistical Tests of Phylogenetic Tree Imbalance: The Sackin and Other Indices Revisited. Mathematical Biosciences 195, 14–153 (2005)
3. 3.
Cadotte, M., Albert, C.H., Walker, S.C.: The Ecology of Differences: Assessing Community Assembly with Trait and Evolutionary Distances. Ecology Letters 16, 1234–1244 (2013)
4. 4.
Cooper, N., Rodriguez, J., Purvis, A.: A Common Tendency for Phylogenetic Overdispersion in Mammalian Assemblages. Proceedings of the Royal Society B 275, 2031–2037 (2008)
5. 5.
O’Dwyer, J.P., Kembel, S.W., Green, J.L.: Phylogenetic Diversity Theory Sheds Light on the Structure of Microbial Communities. PLoS Computational Biology 8(12), e1002832(2012)Google Scholar
6. 6.
Faller, B., Pardi, F., Steel, M.: Distribution of Phylogenetic Diversity Under Random Extinction. Journal of Theoretical Biology 251, 286–296 (2008)
7. 7.
Goloboff, P.A., Catalano, S.A., Mirandeb, J.M., Szumika, C.A., Ariasa, J.S., Kallersjoc, M., Farris, J.S.: Phylogenetic Analysis of 73 060 Taxa Corroborates Major Eukaryotic Groups. Cladistics 25, 211–230 (2009)
8. 8.
Graham, C.H., Parra, J.L., Rahbek, C., McGuire, J.A.: Phylogenetic Structure in Tropical Hummingbird Communities. Proceedings of the National Academy of Sciences USA 106, 19673–19678 (2009)
9. 9.
Kembel, S.W., Hubbell, S.P.: The Phylogenetic Structure of a Neotropical Forest Tree Community. Ecology 87, S86–S99 (2006)Google Scholar
10. 10.
Kissling, W.D., Eiserhardt, W.L., Baker, W.J., Borchsenius, F., Couvreur, T.L.P., Balslev, H., Svenning, J.-C.: Cenozoic Imprints on the Phylogenetic Structure of Palm Species Assemblages Worldwide. Proceedings of the National Academy of Sciences USA 109, 7379–7384 (2012)
11. 11.
Kraft, N.J.B., Cornwell, W.K., Webb, C.O., Ackerly, D.D.: Trait Evolution, Community Assembly, and the Phylogenetic Structure of Ecological Communities. The American Naturalist 170, 271–283 (2007)
12. 12.
Nipperess, D.A., Matsen IV., F.A.: The Mean and Variance of Phylogenetic Diversity Under Rarefaction. Methods in Ecology and Evolution 4, 566–572 (2013)
13. 13.
Steel, M.: Tools to Construct and Study Big Trees: A Mathematical Perspective. In: Hodkinson, T., Parnell, J., Waldren, S. (eds.) Reconstructing the Tree of Life: Taxonomy and Systematics of Species Rich Taxa, pp. 97–112. CRC Press (2007)Google Scholar
14. 14.
Tsirogiannis, C., Sandel, B.: Computing the skewness of the phylogenetic mean pairwise distance in linear time. In: Darling, A., Stoye, J. (eds.) WABI 2013. LNCS, vol. 8126, pp. 170–184. Springer, Heidelberg (2013)
15. 15.
Tsirogiannis, C., Sandel, B., Cheliotis, D.: Efficient computation of popular phylogenetic tree measures. In: Raphael, B., Tang, J. (eds.) WABI 2012. LNCS, vol. 7534, pp. 30–43. Springer, Heidelberg (2012)
16. 16.
Vellend, M., Cornwell, W.K., Magnuson-Ford, K., Mooers, A.Ø.: Measuring Phylogenetic Biodiversity. In: Magurran, A., McGill, B. (eds.) Biological Diversity: Frontiers in Measurement and Assessment, Oxford University Press (2010)Google Scholar
17. 17.
Webb, C.O., Ackerly, D.D., McPeek, M.A., Donoghue, M.J.: Phylogenies and Community Ecology. Annual review of ecology and systematics 33, 475–505 (2002)

© Springer-Verlag Berlin Heidelberg 2014

## Authors and Affiliations

• Constantinos Tsirogiannis
• 1
• Brody Sandel
• 1