Skip to main content

Advertisement

Log in

Phylogenetic trees and Euclidean embeddings

  • Published:
Journal of Mathematical Biology Aims and scope Submit manuscript

Abstract

It was recently observed by de Vienne et al. (Syst Biol 60(6):826–832, 2011) that a simple square root transformation of distances between taxa on a phylogenetic tree allowed for an embedding of the taxa into Euclidean space. While the justification for this was based on a diffusion model of continuous character evolution along the tree, here we give a direct and elementary explanation for it that provides substantial additional insight. We use this embedding to reinterpret the differences between the NJ and BIONJ tree building algorithms, providing one illustration of how this embedding reflects tree structures in data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. The gap in the argument is as follows: If D denotes the \(n\times n\) matrix of pairwise distances between taxa on some metric tree, and \(F=I_{n}-\frac{1}{n} \mathbf {11}^{T}\) where \(\mathbf {1}\) is a column of ones, then by multidimensional scaling theory the desired Euclidean embedding exists if and only if the “doubly centered” matrix \(H=(-1/2)FDF\) is positive semidefinite. The covariance matrix \(\varSigma \) of the diffusion process on a rooted version of the tree is positive definite, and de Vienne et al. (2011) suggest that \(H=\varSigma \). However, this relationship is invalid: \(\varSigma \) is positive definite, while H is not; \(\varSigma \) depends on the root location, while H does not. The correct relationship, that \(H=F\varSigma F\), was not established. While the gap can be filled by proving this equality directly, our approach is simpler and more easily yields additional results.

References

  • Allman ES, Degnan JH, Rhodes JA (2013) Species tree inference by the STAR method and its generalizations. J Comput Biol 20(1):50–61. doi:10.1089/cmb.2012.0101 ISSN 1066-5277

    Article  MathSciNet  Google Scholar 

  • Bruno WJ, Socci ND, Halpern AL (2000) Weighted neighbor joining: a likelihood-based approach to distance-based phylogeny reconstruction. Mol Biol Evol 17(1):189–197

    Article  Google Scholar 

  • Critchley F, Fichet B (1994) The partial order by inclusion of the principal classes of dissimilarity on a finite set, and some of their basic properties. In: Classification and dissimilarity analysis. In: Lecture notes in statistics, vol 93. Springer, New York, pp 5–65

  • de Vienne DM, Aguileta G, Ollier S (2011) Euclidean nature of phylogenetic distance matrices. Syst Biol 60(6):826–832

    Article  Google Scholar 

  • de Vienne DM, Ollier S, Aguileta G (2012) Phylo-MCOA: a fast and efficient method to detect outlier genes and species in phylogenomics using multiple co-inertia analysis. Mol Biol Evol 29(6):1587–1598

    Article  Google Scholar 

  • Felsenstein J (1985) Phylogenies and the comparative method. Am Nat 125:1–15

    Article  Google Scholar 

  • Gascuel O (1994) A note on Sattath and Tversky’s, Saitou and Nei’s, and Studier and Keppler’s algorithms for inferring phylogeneies from evolutionary distances. Mol Biol Evol 11(6):961–963

    Google Scholar 

  • Gascuel O (1997) BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. Mol Biol Evol 14(7):685–695

    Article  Google Scholar 

  • Jewett EM, Rosenberg NA (2012) iGLASS: an improvement to the GLASS method for estimating species trees from gene trees. J Comput Biol 19:293–315

    Article  MathSciNet  Google Scholar 

  • Layer M (2014) Phylogenetic trees and Euclidean embeddings. Master’s thesis, University of Alaska Fairbanks

  • Liu L, Yu L (2011) Estimating species trees from unrooted gene trees. Syst Biol 60:661–667

    Article  Google Scholar 

  • Liu L, Yu L, Pearl DK, Edwards SV (2009) Estimating species phylogenies using coalescence times among sequences. Syst Biol 58:468–477

    Article  Google Scholar 

  • Liu L, Yu L, Pearl DK (2010) Maximum tree: a consistent estimator of the species tree. J Math Biol 60:95106

    Article  MathSciNet  MATH  Google Scholar 

  • Mossel E, Roch S (2010) Incomplete lineage sorting: consistent phylogeny estimation from multiple loci. IEEE/ACM Trans Comput Biol Bioinf 7:166–171

    Article  Google Scholar 

  • Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4:406–425

    Google Scholar 

  • Studier J, Keppler K (1988) A note on the neighbor-joining algorithm of Saitou and Nei. Mol Biol Evol 5:729–731

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to John A. Rhodes.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Layer, M., Rhodes, J.A. Phylogenetic trees and Euclidean embeddings. J. Math. Biol. 74, 99–111 (2017). https://doi.org/10.1007/s00285-016-1018-0

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00285-016-1018-0

Keywords

Mathematics Subject Classification

Navigation