Size correction in biology: how reliable are approaches based on (common) principal component analysis?
- 600 Downloads
Morphological traits typically scale with the overall body size of an organism. A meaningful comparison of trait values among individuals or populations that differ in size therefore requires size correction. A frequently applied size correction method involves subjecting the set of n morphological traits of interest to (common) principal component analysis [(C)PCA], and treating the first principal component [(C)PC1] as a latent size variable. The remaining variation (PC2–PCn) is considered size-independent and interpreted biologically. I here analyze simulated data and natural datasets to demonstrate that this (C)PCA-based size correction generates systematic statistical artifacts. Artifacts arise even when all traits are tightly correlated with overall size, and they are particularly strong when the magnitude of variance is heterogeneous among the traits, and when the traits under study are few. (C)PCA-based approaches are therefore inappropriate for size correction and should be abandoned in favor of methods using univariate general linear models with an adequate independent body size metric as covariate. As I demonstrate, (C)PC1 extracted from a subset of traits, not themselves subjected to size correction, can provide such a size metric.
KeywordsBias Body size Morphology Multivariate statistics Shape
This paper benefited greatly from comments and suggestions by F. James Rohlf, Ben Bolker, Thom DeWitt, Dan Bolnick, Dolph Schluter, Pedro Peres-Neto, and an anonymous reviewer. Dan Bolnick kindly shared his stickleback data, and Philadelphia Airport provided electricity and a rocking chair. Funding was provided by an Ambizione fellowship from the Swiss National Science Foundation (grant PZ00P3_126391/1), and by the Research Fund of the University of Basel.
- Flury BD (1988) Common principal components and related multivariate models. Wiley, New YorkGoogle Scholar
- Fox CW, Wolf JB (2006) Evolutionary genetics: concepts and case studies. Oxford University Press, New YorkGoogle Scholar
- Klingenberg CP (1996) Multivariate allometry. In: Marcus LF, Corti M, Loy A, Naylor GJP, Slice DE (eds) Advances in morphometrics. Plenum, New York, pp 23–49Google Scholar
- Leon-Garcia A (2008) Probability, statistics, and random processes for electrical engineering, 3rd edn. Pearson Education, Upper Saddle RiverGoogle Scholar
- Lynch M, Walsh B (1998) Genetics and analysis of quantitative traits. Sinauer, SunderlandGoogle Scholar
- Pimentel RA (1979) Morphometrics. Kendall/Hunt, DubuqueGoogle Scholar
- R Development Core Team (2009) R: a language and environment for statistical computing. R Foundation for Statistical Computing, AustriaGoogle Scholar
- Robinson BW, Schluter D (2000) Natural selection and the evolution of adaptive genetic variation in northern freshwater fishes. In: Mousseau A, Sinervo B, Endler JA (eds) Adaptive genetic variation in the wild. Oxford University Press, New York, pp 65–94Google Scholar
- Sneath RR, Sokal RR (1973) Numerical taxonomy. Freeman, San FranciscoGoogle Scholar
- Zelditch ML, Swiderski DL, Sheets HD, Fink WL (2004) Geometric morphometrics for biologists. Elsevier, LondonGoogle Scholar