Bulletin of Mathematical Biology

, Volume 81, Issue 2, pp 568–597

# Tropical Principal Component Analysis and Its Application to Phylogenetics

• Ruriko Yoshida
• Leon Zhang
• Xu Zhang
Special Issue: Algebraic Methods in Phylogenetics

## Abstract

Principal component analysis is a widely used method for the dimensionality reduction of a given data set in a high-dimensional Euclidean space. Here we define and analyze two analogues of principal component analysis in the setting of tropical geometry. In one approach, we study the Stiefel tropical linear space of fixed dimension closest to the data points in the tropical projective torus; in the other approach, we consider the tropical polytope with a fixed number of vertices closest to the data points. We then give approximative algorithms for both approaches and apply them to phylogenetics, testing the methods on simulated phylogenetic data and on an empirical dataset of Apicomplexa genomes.

## Keywords

Dimensionality reduction Phylogenomics Tropical geometry

## Notes

### Acknowledgements

R. Y. was supported by Research Initiation Proposals from the Naval Postgraduate School and NSF Division of Mathematical Sciences 1622369. L. Z. was supported by an NSF Graduate Research Fellowship. X. Z. was supported by travel funding from the Department of Statistics at the University of Kentucky. The authors thank Bernd Sturmfels (UC Berkeley and MPI Leipzig) for many helpful conversations. The authors also thank Daniel Howe (University of Kentucky) for his input on Apicomplexa tree topologies.

## References

1. Akian M, Gaubert S, Viorel N, Singer I (2011) Best approximation in max-plus semimodules. Linear Algebra Appl 435:3261–3296
2. Billera L, Holmes S, Vogtman K (2001) Geometry of the space of phylogenetic trees. Adv Appl Math 27:733–767
3. Butkovic P (2010) Max-linear systems: theory and algorithms. Springer, London Springer monographs in mathematics
4. Burkard R, Dell’Amico M, Martello S (2009) Assignment problems. Society for Industrial and Applied Mathematics, Philadelphia
5. Cohen G, Gaubert S, Quadrat J (2004) Duality and separation theorems in idempotent semimodules. Linear Algebra Appl 379:395–422
6. Depersin J, Gaubert S, Joswig M (2017) A tropical isoperimetric inequality. Sémin Lothar Combin 78B:12
7. Develin M, Sturmfels B (2004) Tropical convexity. Doc Math 9:1–27
8. Feragen A, Owen M, Petersen J, Wille MMW, Thomsen LH, Dirksen A, de Bruijne M (2012) Tree-space statistics and approximations for large-scale analysis of anatomical trees. In: IPMI 2013: information processing in medical imagingGoogle Scholar
9. Fink A, Rincón F (2015) Stiefel tropical linear spaces. J Combin Theory A 135:291–331
10. Igor G, Stephan N, Ariela S (2009) Linear and nonlinear optimization, 2nd edn. Society for Industrial Mathematics, Philadelphia
11. Joswig M (2017) Essentials of tropical combinatorics (in preparation). http://page.math.tu-berlin.de/~joswig/etc/index.html
12. Joswig M, Sturmfels B, Yu J (2007) Affine buildings and tropical convexity. Alban J Math 1:187–211
13. Kuo C, Wares JP, Kissinger JC (2008) The apicomplexan whole-genome phylogeny: an analysis of incongruence among gene trees. Mol Biol Evol 25:2689–2698
14. Lenstra HW (1983) Integer programming with a fixed number of variables. Math Oper Res 8:538–548
15. Lin B, Sturmfels B, Tang X, Yoshida R (2017) Convexity in tree spaces. SIAM Discrete Math 3:2015–2038
16. Lin B, Yoshida R (2018) Tropical Fermat–Weber points. SIAM Discrete Math. arXiv:1604.04674
17. Maclagan D, Sturmfels B (2015) Introduction to tropical geometry, graduate studies in mathematics, vol 161. American Mathematical Society, Providence
18. Maddison WP, Maddison D (2017) Mesquite: a modular system for evolutionary analysis. Version 3.31 http://mesquiteproject.org
19. Nye T, Tang X, Weyenberg G, Yoshida R (2017) Principal component analysis and the locus of the Fréchet mean in the space of phylogenetic trees. Biometrika 104(4):901–922
20. Richter-Gebert J, Sturmfels B, Theobald T (2005) First steps in tropical geometry. In: Litvinov GL, Maslov VP (eds) Idempotent mathematics and mathematical physics, vol 377. American Mathematical Society, Providence, pp 289–308
21. Weyenberg G, Yoshida R, Howe D (2016) Normalizing kernels in the Billera–Holmes–Vogtmann treespace. IEEE/ACM Trans Comput Biol Bioinform.
22. Zhao J, Yoshida R, Cheung SS, Haws D (2013) Approximate techniques in solving optimal camera placement problems. Int J Distrib Sens Netw 241913:15. Google Scholar

© This is a U.S. government work and its text is not subject to copyright protection in the United States; however, its text may be subject to foreign copyright protection 2018

## Authors and Affiliations

• Ruriko Yoshida
• 1
• Leon Zhang
• 2
• Xu Zhang
• 3