Skip to main content

Overcoming the Spurious Groups Problem in Between-Group PCA

Abstract

Several papers have recently raised the occurrence of some problems with between-group Principal Component Analysis (bgPCA). This method inflates the differences between the groups, and can even display completely artificial differences when none exist, for example when applied to random numbers tables with many variables (columns) and few individuals (rows). Lately, cross-validation has been proposed as a way to circumvent this problem. Here we present some tools and several functions of the ade4 package for the R statistical software to compute a bgPCA, test the presence of statistically significant groups, perform a cross-validation of this analysis and compute associated statistics. We also describe how to use these functions to avoid running into the spurious groups problem. Several examples, including a real data set and random numbers tables, are used to validate this approach in various experimental and numerical conditions. The integrated framework of the duality diagram, as implemented in ade4, allows to extend this approach to other multivariate analysis methods beyond principal component analysis, which could prove useful in the case of other types of variables. The R code and the real data table used to make the computations and graphs of this paper are available as supplementary material.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

References

  1. Ackermann, R., & Cheverud, J. (2000). Phenotypic covariance structure in tamarins (Genus Saguinus): a comparison of variation patterns using matrix correlation and common principal component analysis. American Journal of Physical Anthropology, 111, 489–501.

    CAS  Article  Google Scholar 

  2. Adams, C., & Otarola-Castillo, E. (2013). geomorph: An R package for the collection and analysis of geometric morphometric shape data. Methods in Ecology and Evolution, 4, 393–399.

    Article  Google Scholar 

  3. Almécija, S., Tallman, M., Alba, D. M., Pina, M., Moyà-Solà, S., & Jungers, W. L. (2013). The femur of Orrorin tugenensis exhibits morphometric affinities with both miocene apes and later hominins. Nature Communications, 4, 2888.

    Article  Google Scholar 

  4. Barker, M., & Rayens, W. (2003). Morphometric least squares for discrimination. Journal of Chemometrics, 17, 166–173.

    CAS  Article  Google Scholar 

  5. Bookstein, F. L. (1991). Morphometric tools for landmark data: Geometry and biology. Cambridge (UK): Cambridge University Press.

    Google Scholar 

  6. Bookstein, F. L. (1997). Landmark methods for forms without landmarks: Morphometrics of group differences in outline shape. Medical Image Analysis, 1, 225–243.

    CAS  Article  Google Scholar 

  7. Bookstein, F. L. (2019). Pathologies of between-groups principal components analysis in geometric morphometrics. Evolutionary Biology, 46, 271–302.

    Article  Google Scholar 

  8. Braga, J., ter Braak, C. J. F., Thuiller, W., & Dray, S. (2018). Integrating spatial and phylogenetic information in the fourth-corner analysis to test trait-environment relationships. Ecology, 99(12), 2667–2674. https://doi.org/10.1002/ecy.2530

    Article  PubMed  Google Scholar 

  9. Cardini, A., & Polly, P. D. (2020). Cross-validated between group PCA scatterplots: A solution to spurious group separation? Evolutionary Biology, 47, 85–95.

    Article  Google Scholar 

  10. Cardini, A., O’Higgins, P., & Rohlf, F. J. (2019). Seeing distinct groups where there are none: Spurious patterns from between-group PCA. Evolutionary Biology, 46, 303–316.

    Article  Google Scholar 

  11. Chiari, Y., & Claude, J. (2012). Morphometric identification of individuals when there are more shape variables than reference specimens: A case study in Galápagos tortoises. Comptes Rendus Biologies, 335, 62–68. https://doi.org/10.1016/j.crvi.2011.10.007

    Article  PubMed  Google Scholar 

  12. Collyer, M. L., & Adams, D. C. (2018). RRPP: An R package for fitting linear models to high-dimensional data using residual randomization. Methods in Ecology and Evolution, 9(7), 1772–1779. https://doi.org/10.1111/2041-210X.13029

    Article  Google Scholar 

  13. Crabot, J., Clappe, S., Dray, S., & Datry, T. (2019). Testing the Mantel statistic with a spatially-constrained permutation procedure. Methods in Ecology and Evolution, 10, 532–540. https://doi.org/10.1111/2041-210X.13141

    Article  Google Scholar 

  14. Cucchi, T., Kovács, Z., Berthon, R., Orth, A., Bonhomme, F., Evin, A., Siahsarvie, R., Darvish, J., Bakhshaliyev, V., & Marro, C. (2013). Landmark methods for forms without landmarks: Morphometrics of group differences in outline shape. Biological Journal of the Linnean Society, 108, 917–928.

    Article  Google Scholar 

  15. Cucchi, T., Mohaseb, A., Peigné, S., Debue, K., Orlando, L., & Mashkour, M. (2017). Detecting taxonomic and phylogenetic signals in equid cheek teeth: Towards new palaeontological and archaeological proxies. Royal Society Open Science, 4, 1609977.

    Article  Google Scholar 

  16. Culhane, A. C., Perrière, G., Considine, E. C., Cotter, T. G., & Higgins, D. G. (2002). Between-group analysis of microarray data. Bioinformatics, 18, 1600–1608.

    CAS  Article  Google Scholar 

  17. Debat, V., Bégin, M., Legout, H., & David, J. R. (2003). Allometric and nonallometric components of Drosophila wing shape respond differently to developmental temperature. Evolution, 57(12), 2773–2784.

    Article  Google Scholar 

  18. Dianat, M., Darvish, J., Cornette, R., Aliabadian, M., & Nicolas, V. (2017). Evolutionary history of the persian jird, Meriones persicus, based on genetics, species distribution modelling and morphometric data. Journal of Zoological Systematics and Evolutionary Research, 55(1), 29–45. https://doi.org/10.1111/jzs.12145

    Article  Google Scholar 

  19. Dolédec, S., & Chessel, D. (1987). Rythmes saisonniers et composantes stationnelles en milieu aquatique. I- Description d’un plan d’observations complet par projection de variables. Acta Oecologica Oecologia Generalis, 8, 403–426.

    Google Scholar 

  20. Dolédec, S., & Chessel, D. (1989). Rythmes saisonniers et composantes stationnelles en milieu aquatique. II- Prise en compte et élimination d’effets dans un tableau faunistique. Acta Oecologica Oecologia Generalis, 10, 207–232.

    Google Scholar 

  21. Dray, S., Pavoine, S., & Aguirre de Carcer, D. (2015). Considering external information to improve the phylogenetic comparison of microbial communities: A new approach based on constrained Double Principal Coordinates Analysis (cDPCoA). Molecular Ecology Resources, 15, 242–249. https://doi.org/10.1111/1755-0998.12300

    CAS  Article  PubMed  Google Scholar 

  22. Evin, A., Cucchi, T., Cardini, A., Vidarsdottir, U. S., Larson, G., & Dobney, K. (2013). The long and winding road: Identifying pig domestication through molar size and shape. Journal of Archaeological Science, 40, 735–743.

    Article  Google Scholar 

  23. Franquet, E., Dolédec, S., & Chessel, D. (1995). Using multivariate analyses for separating spatial and temporal effects within species-environment relationships. Hydrobiologia, 300(301), 425–431.

    Article  Google Scholar 

  24. Gunz, P., Ramsier, M., Kuhrig, M., Hublin, J. J., & Spoor, F. (2012). The mammalian bony labyrinth reconsidered, introducing a comprehensive geometric morphometric approach. Journal of Anatomy, 220, 529–543.

    Article  Google Scholar 

  25. Harbers, H., Neaux, D., Ortiz, K., Blanc, B., Schafberg, R., Haruda, A., et al. (2020). The mark of captivity: Plastic responses in the ankle bone of a wild ungulate (Sus scrofa). Royal Society Open Science. https://doi.org/10.1098/rsos.192039.

    Article  PubMed  PubMed Central  Google Scholar 

  26. Jamniczky, H., & Hallgrímsson, B. (2009). A comparison of covariance structure in wild and laboratory muroid crania. Evolution, 63, 1540–1556.

    Article  Google Scholar 

  27. Klingenberg, C. P. (2011). MorphoJ: An integrated software package for geometric morphometrics. Molecular Ecology Resources, 11, 353–357.

    Article  Google Scholar 

  28. Kovarovic, K., Aiello, L. C., Cardini, A., & Lockwood, C. A. (2011). Discriminant function analyses in archaeology: Are classification rates too good to be true? Journal of Archaeological Science, 38, 3006–3018.

    Article  Google Scholar 

  29. Ledevin, R., & Koyabu, D. (2019). Patterns and constraints of craniofacial variation in colobine monkeys: Disentangling the effects of phylogeny, allometry and diet. Evolutionary Biology, 46, 14–34. https://doi.org/10.1007/s11692-019-09469-7

    Article  Google Scholar 

  30. Leinonen, T., Cano, J., Mäkinen, H., & Merilä, J. (2006). Contrasting patterns of body shape and neutral genetic divergence in marine and lake populations of threespine sticklebacks. Journal of Evolutionary Biology, 19, 1803–1812.

    CAS  Article  Google Scholar 

  31. Mitteroecker, P., & Bookstein, F. (2011). Linear discrimination, ordination, and the visualization of selection gradients in modern morphometrics. Evolutionary Biology, 38(1), 100–114.

    Article  Google Scholar 

  32. Oksanen J, Blanchet FG, Friendly M, Kindt R, Legendre P, McGlinn D, Minchin PR, O’Hara RB, Simpson GL, Solymos P, Stevens MHH, Szoecs E, Wagner H (2019) vegan: Community Ecology Package. https://CRAN.R-project.org/package=vegan, r package version 2.5-6

  33. R Core Team (2020) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, https://www.R-project.org/

  34. Renaud, S., & Auffray, J. (2013). The direction of main phenotypic variance as a channel to morphological evolution: case studies in murine rodents. Hystrix, The Italian Journal of Mammalogy, 24, 85–93.

    Google Scholar 

  35. Renaud, S., Pantalacci, S., & Auffray, J. (2011). Differential evolvability along lines of least resistance of upper and lower molars in island house mice. PLoS ONE. https://doi.org/10.1371/journal.pone.0018951

    Article  PubMed  PubMed Central  Google Scholar 

  36. Renaud, S., Dufour, A., Hardouin, E., Ledevin, R., & Auffray, J. (2015). Once upon multivariate analyses: When they tell several stories about biological evolution. PLoS ONE. https://doi.org/10.1371/journal.pone.0132801

    Article  PubMed  PubMed Central  Google Scholar 

  37. Renaud, S., Alibert, P., & Auffray, J. (2017a). Impact of hybridization on shape, variation and covariation of the mouse molar. Evolutionary Biology, 44, 69–81.

    Article  Google Scholar 

  38. Renaud, S., Hardouin, E., Quéré, J., & Chevret, P. (2017b). Morphometric variations at an ecological scale: Seasonal and local variations in feral and commensal house mice. Mammalian Biology, 87, 1–12.

    Article  Google Scholar 

  39. Renaud, S., Ledevin, R., Souquet, L., Gomes Rodrigues, H., Ginot, S., Agret, S., Claude, J., Herrel, A., & Hautier, L. (2018). Evolving teeth within a stable masticatory apparatus in Orkney mice. Evolutionary Biology, 45, 405–424.

    Article  Google Scholar 

  40. Rohlf, F., & Slice, D. (1990). Extensions of the Procrustes method for the optimal superimposition of landmarks. Systematic Zoology, 39, 40–59.

    Article  Google Scholar 

  41. Rohlf, F. J. (2021). Why clusters and other patterns can seem to be found in analyses of high-dimensional data. Evolutionary Biology, 48, 1–16.

    Article  Google Scholar 

  42. Sanchez G (2013) DiscriMiner: Tools of the Trade for Discriminant Analysis. https://CRAN.R-project.org/package=DiscriMiner, r package version 0.1-29

  43. Schlager, S. (2017). Morpho and rvcg–Shape analysis. In R. S. Li, G. Szekely, & G. Zheng (Eds.), Statistical shape and deformation analysis (pp. 217–256). New York: Academic Press.

    Chapter  Google Scholar 

  44. Schluter, D. (1996). Adaptive radiation along genetic lines of least resistance. Evolution, 50, 1766–1774.

    Article  Google Scholar 

  45. Siberchicot, A., Julien-Laferrière, A., Dufour, A. B., Thioulouse, J., & Dray, S. (2017). adegraphics: An s4 lattice-based package for the representation of multivariate data. The R Journal, 9, 198–212.

    Article  Google Scholar 

  46. Souquet, L., Chevret, P., Ganem, G., Auffray, J. C., Ledevin, R., Agret, S., Hautier, L., & Renaud, S. (2019). Back to the wild: Does feralization affect the mandible of non-commensal house mice (Mus musculus domesticus)? Biological Journal of the Linnean Society, 126, 471–486.

    Article  Google Scholar 

  47. Thioulouse, J., Chessel, D., Dolédec, S., & Olivier, J. (1997). ADE-4: A multivariate analysis and graphical display software. Statistics and Computing, 7(1), 75–83.

    Article  Google Scholar 

  48. Thioulouse, J., Dray, S., Dufour, A., Siberchicot, A., Jombart, T., & Pavoine, S. (2018). Multivariate analysis of ecological data with ade4. New York: Springer.

    Book  Google Scholar 

  49. Valenzuela-Lamas, S., Baylac, M., Cucchi, T., & Vigne, J. D. (2011). House mouse dispersal in iron age spain: A geometric morphometrics appraisal. Biological Journal of the Linnean Society, 102, 483–497.

    Article  Google Scholar 

  50. Viscosi, V., & Cardini, A. (2011). Leaf morphology, taxonomy and geometric morphometrics: A simplified protocol for beginners. PLoS ONE. https://doi.org/10.1371/journal.pone.0025630

    Article  PubMed  PubMed Central  Google Scholar 

  51. Wagner, H. H., & Dray, S. (2015). Generating spatially constrained null models for irregularly spaced data using Moran spectral randomization methods. Methods in Ecology and Evolution, 6(10), 1169–1178. https://doi.org/10.1111/2041-210X.12407

    Article  Google Scholar 

  52. Weinberg, S. L., & Darlington, R. B. (1976). Canonical analysis when number of variables is large relative to sample size. Journal of Educational Statistics, 1(4), 313–332.

    Article  Google Scholar 

Download references

Acknowledgements

We are very grateful to Julien Claude and another anonymous reviewer of our manuscript, who helped us to substantially improve it.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Jean Thioulouse.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Electronic supplementary material

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Thioulouse, J., Renaud, S., Dufour, AB. et al. Overcoming the Spurious Groups Problem in Between-Group PCA. Evol Biol 48, 458–471 (2021). https://doi.org/10.1007/s11692-021-09550-0

Download citation

Keywords

  • Geometric morphometrics
  • Multivariate analysis
  • Ade4
  • Between-group analysis
  • Spurious groups
  • Random permutation test
  • Leave-one-out cross-validation