Plant Ecology

, Volume 216, Issue 5, pp 657–667

Principal component analysis with missing values: a comparative survey of methods

Article

DOI: 10.1007/s11258-014-0406-z

Cite this article as:
Dray, S. & Josse, J. Plant Ecol (2015) 216: 657. doi:10.1007/s11258-014-0406-z

Abstract

Principal component analysis (PCA) is a standard technique to summarize the main structures of a data table containing the measurements of several quantitative variables for a number of individuals. Here, we study the case where some of the data values are missing and propose a review of methods which accommodate PCA to missing data. In plant ecology, this statistical challenge relates to the current effort to compile global plant functional trait databases producing matrices with a large amount of missing values. We present several techniques to consider or estimate (impute) missing values in PCA and compare them using theoretical considerations. We carried out a simulation study to evaluate the relative merits of the different approaches in various situations (correlation structure, number of variables and individuals, and percentage of missing values) and also applied them on a real data set. Lastly, we discuss the advantages and drawbacks of these approaches, the potential pitfalls and future challenges that need to be addressed in the future.

Keywords

Imputation Ordination PCA Traits 

Supplementary material

11258_2014_406_MOESM1_ESM.pdf (74 kb)
Electronic supplementary material 1 (PDF 74 kb)
11258_2014_406_MOESM2_ESM.tex (59 kb)
Electronic supplementary material 1 (TEX 59 kb)

Copyright information

© Springer Science+Business Media Dordrecht 2014

Authors and Affiliations

  1. 1.Université de LyonLyonFrance
  2. 2.Université Lyon 1, CNRS, UMR5558, Laboratoire de Biométrie et Biologie EvolutiveVilleurbanneFrance
  3. 3.Applied Mathematics DepartmentAgrocampus OuestRennesFrance

Personalised recommendations