Dimensionality Reduction Techniques for Visualizing Morphometric Data: Comparing Principal Component Analysis to Nonlinear Methods
- 68 Downloads
Principal component analysis (PCA) is the most widely used dimensionality reduction technique in the biological sciences, and is commonly employed to create 2D visualizations of geometric morphometric data. However, interesting biological information may be lost or misrepresented in these plots due to PCA’s inability to summarize nonlinear dependencies between variables. Nonlinear alternative methods exist, but their effectiveness has never been tested on morphometric data. Here, the performance of PCA on the task of visualizing morphometric variation is compared to four nonlinear techniques: Sammon Mapping, Isomap, Locally Linear Embedding, and Laplacian Eigenmaps. The performance of methods is assessed on the basis of global and local preservation of pairwise distances for a variety of simulated and empirical datasets. The relative performance of PCA varies in function of the distribution of variation, complexity, and size of datasets. Overall, nonlinear methods show superior preservation of small differences between morphologies compared to PCA.
KeywordsData visualization Morphological variation Multivariate data Theoretical biology
Thanks to D. Fowler and H. Larsson for advice, as well as A. Beauvais-Lacasse and A. Huot for help with coding. I am grateful to the Natural Sciences and Engineering Research Council of Canada (CGS-D) and le Fonds de recherche du Québec - Nature et technologies (BX3) for funding.
Compliance with Ethical Standards
Conflict of interest
The author has no conflicts of interest to declare.
- Adams, D. C., Collyer, M. L., Kaliontzopoulou, A., & Sherratt, E. (2017). Geomorph: Geometric morphometric analyses of 2D/3D landmark data. R Package version 3.0.5. https://cran.r-project.org/package=geomorph.
- Altenberg, L. (2005). Modularity in evolution: Some low-level questions. In W. Callebaut & D. Rasskin-Gutman (Eds.), Modularity: Understanding the development and evolution of natural complex systems (pp. 99–128). Cambridge: MIT Press.Google Scholar
- Kouropteva, O., Okun, O., & Pietikäinen, M. (2002). Selection of the optimal parameter value for the locally linear embedding algorithm. In Proceedings of the 1st international conference on fuzzy systems and knowledge discovery (pp. 359–363). Singapore.Google Scholar
- MATLAB and Statistics Toolbox. (Version 2018a). Natick: The MathWorks, Inc.Google Scholar
- Meier, A., & Kramer, O. (2017). An experimental study of dimensionality reduction methods. In G. Kern-Isberner, J. Fürnkranz & M. Thimm (Eds.), Advances in artificial intelligence, lecture notes in computer science (pp. 178–192). Cham: Springer.Google Scholar
- Niskanen, M., & Silvén, O. (2003). Comparison of dimensionality reduction methods for wood surface inspection. In Sixth international conference on quality control by artificial vision (pp. 178–189). Gatlinburg, TE, USA.Google Scholar
- Polly, P. D., Lawing, A. M., Fabre, A.-C., & Goswami, A. (2013). Phylogenetic principal components analysis and geometric morphometrics. Hystrix, the Italian Journal of Mammalogy, 24(1), 33–41.Google Scholar
- Raup, D. M. (1966). Geometric analysis of shell coiling: General problems. Journal of Paleontology, 40(5), 1178–1190.Google Scholar
- R Core Team. (2018). R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing.Google Scholar
- van der Maaten, L., Postma, E., & van den Herik, J. (2009). Dimensionality reduction: A comparative review (# TiCC-TR 2009-005). Tilburg: Tilburg University.Google Scholar