Several papers have recently raised the occurrence of some problems with between-group Principal Component Analysis (bgPCA). This method inflates the differences between the groups, and can even display completely artificial differences when none exist, for example when applied to random numbers tables with many variables (columns) and few individuals (rows). Lately, cross-validation has been proposed as a way to circumvent this problem. Here we present some tools and several functions of the ade4 package for the R statistical software to compute a bgPCA, test the presence of statistically significant groups, perform a cross-validation of this analysis and compute associated statistics. We also describe how to use these functions to avoid running into the spurious groups problem. Several examples, including a real data set and random numbers tables, are used to validate this approach in various experimental and numerical conditions. The integrated framework of the duality diagram, as implemented in ade4, allows to extend this approach to other multivariate analysis methods beyond principal component analysis, which could prove useful in the case of other types of variables. The R code and the real data table used to make the computations and graphs of this paper are available as supplementary material.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Price includes VAT (USA)
Tax calculation will be finalised during checkout.
Ackermann, R., & Cheverud, J. (2000). Phenotypic covariance structure in tamarins (Genus Saguinus): a comparison of variation patterns using matrix correlation and common principal component analysis. American Journal of Physical Anthropology, 111, 489–501.
Adams, C., & Otarola-Castillo, E. (2013). geomorph: An R package for the collection and analysis of geometric morphometric shape data. Methods in Ecology and Evolution, 4, 393–399.
Almécija, S., Tallman, M., Alba, D. M., Pina, M., Moyà-Solà, S., & Jungers, W. L. (2013). The femur of Orrorin tugenensis exhibits morphometric affinities with both miocene apes and later hominins. Nature Communications, 4, 2888.
Barker, M., & Rayens, W. (2003). Morphometric least squares for discrimination. Journal of Chemometrics, 17, 166–173.
Bookstein, F. L. (1991). Morphometric tools for landmark data: Geometry and biology. Cambridge (UK): Cambridge University Press.
Bookstein, F. L. (1997). Landmark methods for forms without landmarks: Morphometrics of group differences in outline shape. Medical Image Analysis, 1, 225–243.
Bookstein, F. L. (2019). Pathologies of between-groups principal components analysis in geometric morphometrics. Evolutionary Biology, 46, 271–302.
Braga, J., ter Braak, C. J. F., Thuiller, W., & Dray, S. (2018). Integrating spatial and phylogenetic information in the fourth-corner analysis to test trait-environment relationships. Ecology, 99(12), 2667–2674. https://doi.org/10.1002/ecy.2530
Cardini, A., & Polly, P. D. (2020). Cross-validated between group PCA scatterplots: A solution to spurious group separation? Evolutionary Biology, 47, 85–95.
Cardini, A., O’Higgins, P., & Rohlf, F. J. (2019). Seeing distinct groups where there are none: Spurious patterns from between-group PCA. Evolutionary Biology, 46, 303–316.
Chiari, Y., & Claude, J. (2012). Morphometric identification of individuals when there are more shape variables than reference specimens: A case study in Galápagos tortoises. Comptes Rendus Biologies, 335, 62–68. https://doi.org/10.1016/j.crvi.2011.10.007
Collyer, M. L., & Adams, D. C. (2018). RRPP: An R package for fitting linear models to high-dimensional data using residual randomization. Methods in Ecology and Evolution, 9(7), 1772–1779. https://doi.org/10.1111/2041-210X.13029
Crabot, J., Clappe, S., Dray, S., & Datry, T. (2019). Testing the Mantel statistic with a spatially-constrained permutation procedure. Methods in Ecology and Evolution, 10, 532–540. https://doi.org/10.1111/2041-210X.13141
Cucchi, T., Kovács, Z., Berthon, R., Orth, A., Bonhomme, F., Evin, A., Siahsarvie, R., Darvish, J., Bakhshaliyev, V., & Marro, C. (2013). Landmark methods for forms without landmarks: Morphometrics of group differences in outline shape. Biological Journal of the Linnean Society, 108, 917–928.
Cucchi, T., Mohaseb, A., Peigné, S., Debue, K., Orlando, L., & Mashkour, M. (2017). Detecting taxonomic and phylogenetic signals in equid cheek teeth: Towards new palaeontological and archaeological proxies. Royal Society Open Science, 4, 1609977.
Culhane, A. C., Perrière, G., Considine, E. C., Cotter, T. G., & Higgins, D. G. (2002). Between-group analysis of microarray data. Bioinformatics, 18, 1600–1608.
Debat, V., Bégin, M., Legout, H., & David, J. R. (2003). Allometric and nonallometric components of Drosophila wing shape respond differently to developmental temperature. Evolution, 57(12), 2773–2784.
Dianat, M., Darvish, J., Cornette, R., Aliabadian, M., & Nicolas, V. (2017). Evolutionary history of the persian jird, Meriones persicus, based on genetics, species distribution modelling and morphometric data. Journal of Zoological Systematics and Evolutionary Research, 55(1), 29–45. https://doi.org/10.1111/jzs.12145
Dolédec, S., & Chessel, D. (1987). Rythmes saisonniers et composantes stationnelles en milieu aquatique. I- Description d’un plan d’observations complet par projection de variables. Acta Oecologica Oecologia Generalis, 8, 403–426.
Dolédec, S., & Chessel, D. (1989). Rythmes saisonniers et composantes stationnelles en milieu aquatique. II- Prise en compte et élimination d’effets dans un tableau faunistique. Acta Oecologica Oecologia Generalis, 10, 207–232.
Dray, S., Pavoine, S., & Aguirre de Carcer, D. (2015). Considering external information to improve the phylogenetic comparison of microbial communities: A new approach based on constrained Double Principal Coordinates Analysis (cDPCoA). Molecular Ecology Resources, 15, 242–249. https://doi.org/10.1111/1755-0998.12300
Evin, A., Cucchi, T., Cardini, A., Vidarsdottir, U. S., Larson, G., & Dobney, K. (2013). The long and winding road: Identifying pig domestication through molar size and shape. Journal of Archaeological Science, 40, 735–743.
Franquet, E., Dolédec, S., & Chessel, D. (1995). Using multivariate analyses for separating spatial and temporal effects within species-environment relationships. Hydrobiologia, 300(301), 425–431.
Gunz, P., Ramsier, M., Kuhrig, M., Hublin, J. J., & Spoor, F. (2012). The mammalian bony labyrinth reconsidered, introducing a comprehensive geometric morphometric approach. Journal of Anatomy, 220, 529–543.
Harbers, H., Neaux, D., Ortiz, K., Blanc, B., Schafberg, R., Haruda, A., et al. (2020). The mark of captivity: Plastic responses in the ankle bone of a wild ungulate (Sus scrofa). Royal Society Open Science. https://doi.org/10.1098/rsos.192039.
Jamniczky, H., & Hallgrímsson, B. (2009). A comparison of covariance structure in wild and laboratory muroid crania. Evolution, 63, 1540–1556.
Klingenberg, C. P. (2011). MorphoJ: An integrated software package for geometric morphometrics. Molecular Ecology Resources, 11, 353–357.
Kovarovic, K., Aiello, L. C., Cardini, A., & Lockwood, C. A. (2011). Discriminant function analyses in archaeology: Are classification rates too good to be true? Journal of Archaeological Science, 38, 3006–3018.
Ledevin, R., & Koyabu, D. (2019). Patterns and constraints of craniofacial variation in colobine monkeys: Disentangling the effects of phylogeny, allometry and diet. Evolutionary Biology, 46, 14–34. https://doi.org/10.1007/s11692-019-09469-7
Leinonen, T., Cano, J., Mäkinen, H., & Merilä, J. (2006). Contrasting patterns of body shape and neutral genetic divergence in marine and lake populations of threespine sticklebacks. Journal of Evolutionary Biology, 19, 1803–1812.
Mitteroecker, P., & Bookstein, F. (2011). Linear discrimination, ordination, and the visualization of selection gradients in modern morphometrics. Evolutionary Biology, 38(1), 100–114.
Oksanen J, Blanchet FG, Friendly M, Kindt R, Legendre P, McGlinn D, Minchin PR, O’Hara RB, Simpson GL, Solymos P, Stevens MHH, Szoecs E, Wagner H (2019) vegan: Community Ecology Package. https://CRAN.R-project.org/package=vegan, r package version 2.5-6
R Core Team (2020) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, https://www.R-project.org/
Renaud, S., & Auffray, J. (2013). The direction of main phenotypic variance as a channel to morphological evolution: case studies in murine rodents. Hystrix, The Italian Journal of Mammalogy, 24, 85–93.
Renaud, S., Pantalacci, S., & Auffray, J. (2011). Differential evolvability along lines of least resistance of upper and lower molars in island house mice. PLoS ONE. https://doi.org/10.1371/journal.pone.0018951
Renaud, S., Dufour, A., Hardouin, E., Ledevin, R., & Auffray, J. (2015). Once upon multivariate analyses: When they tell several stories about biological evolution. PLoS ONE. https://doi.org/10.1371/journal.pone.0132801
Renaud, S., Alibert, P., & Auffray, J. (2017a). Impact of hybridization on shape, variation and covariation of the mouse molar. Evolutionary Biology, 44, 69–81.
Renaud, S., Hardouin, E., Quéré, J., & Chevret, P. (2017b). Morphometric variations at an ecological scale: Seasonal and local variations in feral and commensal house mice. Mammalian Biology, 87, 1–12.
Renaud, S., Ledevin, R., Souquet, L., Gomes Rodrigues, H., Ginot, S., Agret, S., Claude, J., Herrel, A., & Hautier, L. (2018). Evolving teeth within a stable masticatory apparatus in Orkney mice. Evolutionary Biology, 45, 405–424.
Rohlf, F., & Slice, D. (1990). Extensions of the Procrustes method for the optimal superimposition of landmarks. Systematic Zoology, 39, 40–59.
Rohlf, F. J. (2021). Why clusters and other patterns can seem to be found in analyses of high-dimensional data. Evolutionary Biology, 48, 1–16.
Sanchez G (2013) DiscriMiner: Tools of the Trade for Discriminant Analysis. https://CRAN.R-project.org/package=DiscriMiner, r package version 0.1-29
Schlager, S. (2017). Morpho and rvcg–Shape analysis. In R. S. Li, G. Szekely, & G. Zheng (Eds.), Statistical shape and deformation analysis (pp. 217–256). New York: Academic Press.
Schluter, D. (1996). Adaptive radiation along genetic lines of least resistance. Evolution, 50, 1766–1774.
Siberchicot, A., Julien-Laferrière, A., Dufour, A. B., Thioulouse, J., & Dray, S. (2017). adegraphics: An s4 lattice-based package for the representation of multivariate data. The R Journal, 9, 198–212.
Souquet, L., Chevret, P., Ganem, G., Auffray, J. C., Ledevin, R., Agret, S., Hautier, L., & Renaud, S. (2019). Back to the wild: Does feralization affect the mandible of non-commensal house mice (Mus musculus domesticus)? Biological Journal of the Linnean Society, 126, 471–486.
Thioulouse, J., Chessel, D., Dolédec, S., & Olivier, J. (1997). ADE-4: A multivariate analysis and graphical display software. Statistics and Computing, 7(1), 75–83.
Thioulouse, J., Dray, S., Dufour, A., Siberchicot, A., Jombart, T., & Pavoine, S. (2018). Multivariate analysis of ecological data with ade4. New York: Springer.
Valenzuela-Lamas, S., Baylac, M., Cucchi, T., & Vigne, J. D. (2011). House mouse dispersal in iron age spain: A geometric morphometrics appraisal. Biological Journal of the Linnean Society, 102, 483–497.
Viscosi, V., & Cardini, A. (2011). Leaf morphology, taxonomy and geometric morphometrics: A simplified protocol for beginners. PLoS ONE. https://doi.org/10.1371/journal.pone.0025630
Wagner, H. H., & Dray, S. (2015). Generating spatially constrained null models for irregularly spaced data using Moran spectral randomization methods. Methods in Ecology and Evolution, 6(10), 1169–1178. https://doi.org/10.1111/2041-210X.12407
Weinberg, S. L., & Darlington, R. B. (1976). Canonical analysis when number of variables is large relative to sample size. Journal of Educational Statistics, 1(4), 313–332.
We are very grateful to Julien Claude and another anonymous reviewer of our manuscript, who helped us to substantially improve it.
Conflict of interest
The authors declare that they have no conflict of interest.
Electronic supplementary material
Below is the link to the electronic supplementary material.
About this article
Cite this article
Thioulouse, J., Renaud, S., Dufour, AB. et al. Overcoming the Spurious Groups Problem in Between-Group PCA. Evol Biol 48, 458–471 (2021). https://doi.org/10.1007/s11692-021-09550-0
- Geometric morphometrics
- Multivariate analysis
- Between-group analysis
- Spurious groups
- Random permutation test
- Leave-one-out cross-validation