Skip to main content
Log in

Multiple imputation in principal component analysis

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

The available methods to handle missing values in principal component analysis only provide point estimates of the parameters (axes and components) and estimates of the missing values. To take into account the variability due to missing values a multiple imputation method is proposed. First a method to generate multiple imputed data sets from a principal component analysis model is defined. Then, two ways to visualize the uncertainty due to missing values onto the principal component analysis results are described. The first one consists in projecting the imputed data sets onto a reference configuration as supplementary elements to assess the stability of the individuals (respectively of the variables). The second one consists in performing a principal component analysis on each imputed data set and fitting each obtained configuration onto the reference one with Procrustes rotation. The latter strategy allows to assess the variability of the principal component analysis parameters induced by the missing values. The methodology is then evaluated from a real data set.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Adams E, Walczak B, Vervaet C, Risha PG, Massart D (2002) Principal component analysis of dissolution data with missing elements. Int J Pharm 234: 169–178

    Article  Google Scholar 

  • Bro R (1998) Multi-way analysis in the food industry—models, algorithms, and applications. Tech. rep., MRI, EPG and EMA, Proc ICSLP 2000

  • Bro R, Kjeldahl K, Smilde AK, Kiers HAL (2008) Cross-validation of component models: A critical look at current methods. Anal Bioanal Chem 5: 1241–1251

    Article  Google Scholar 

  • Caussinus H (1986) Models and uses of principal component analysis. In: de Leeuw J, Heiser W, Meulman J, Critchley F (eds) Multidimensional data analysis. DSWO Press, Ram, pp 149–178

    Google Scholar 

  • Chateau F, Lebart L (1996) Assessing sample variability in the visualization techniques related to principal componenet analysis: Bootstrap and alternative simulation methods. In: COMPSTAT, pp 205–210

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J Royal Stat Soc B 39: 1–38

    MathSciNet  MATH  Google Scholar 

  • Denis JB (1991) Ajustements de modèles linéaires et bilinéaires sous contraintes linéaires avec données manquantes. Revue de Statistique Appliquée 39: 5–24

    Google Scholar 

  • Dray S (2008) On the number of principal components: A test of dimensionality based on measurements of similarity between matrices. Comput Stat Data Anal 52: 2228–2237

    Article  MathSciNet  MATH  Google Scholar 

  • Escofier B, Pagès J (2008) Analyses factorielles simples et multiples, 4th edn. Economica, Paris

    Google Scholar 

  • Gabriel KR, Zamir S (1979) Lower rank approximation of matrices by least squares with any choice of weights. Technometrics 21: 236–246

    Article  Google Scholar 

  • Golub GH, Van Loan CF (1996) Matrix computations, 3rd edn. Johns Hopkins University Press, Baltimore

    MATH  Google Scholar 

  • Gower JC, Dijksterhuis GB (2004) Procrustes problems. Oxford University Press, New York

    Book  MATH  Google Scholar 

  • Greenacre M (1984) Theory and applications of correspondence analysis. Acadamic Press, London

    MATH  Google Scholar 

  • Grung B, Manne R (1998) Missing values in principal component analysis. Chemiometr Intell Lab Syst 42: 125–139

    Article  Google Scholar 

  • Husson F, Josse J (2010) missMDA: Handling missing values with/in multivariate data analysis (principal component methods). http://www.agrocampus-ouest.fr/math/husson, http://www.agrocampus-ouest.fr/math/josse, R package version 1.2

  • Ilin A, Raiko T (2010) Practical approaches to principal component analysis in the presence of missing values. J Mach Learn Res 11: 1957–2000

    MathSciNet  Google Scholar 

  • Josse J, Pagès J, Husson F (2009) Gestion des données manquantes en analyse en composantes principales. J de la Société Française de Statistique 150: 28–51

    Google Scholar 

  • Josse J, Pagès J, Husson F (2011) Selecting the number of components in principal component analysis using cross-validation approximations (submitted)

  • Kiers HAL (1997) Weighted least squares fitting using ordinary least squares algorithms. Psychometrica 62: 251–266

    Article  MathSciNet  MATH  Google Scholar 

  • Kroonenberg PM (2008) Applied Multiway data analysis (chap.7). Wiley series in probability and statistics, New York

    Book  Google Scholar 

  • Little RJA, Rubin DB (1987) 2002) Statistical analysis with missing data. Wiley series in probability and statistics, New York

    Google Scholar 

  • Milan M (1995) Application of the parametric bootstrap to models that incorporate a singular value decomposition. J Royal Stat Soc Ser C 44: 31–49

    MATH  Google Scholar 

  • Netflix (2009) Netflix challenge. http://www.netflixprize.com

  • Nora-Chouteau C (1974) Une méthode de reconstitution et d’analyse de données incomplètes. PhD thesis, Université Pierre et Marie Curie

  • Peres-Neto PR, Jackson DA, Somers KM (2005) How many principal components? stopping rules for determining the number of non-trivial axes revisited. Comput Stat Data Anal 49: 974–997

    Article  MathSciNet  MATH  Google Scholar 

  • R Development Core Team (2009) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, http://www.R-project.org, ISBN 3-900051-07-0

  • Raiko T, Ilin A, Karhunen J (2007) Principal component analysis for sparse high-dimensional data. In: Neural Information Processing, pp 566–575

  • Rubin DB (1987) Multiple imputation for non-response in survey. Wiley, New York

    Book  Google Scholar 

  • Schafer JL (1997) Analysis of incomplete multivariate data. Chapman & Hall/CRC, London

    Book  MATH  Google Scholar 

  • Schafer JL, Olsen MK (1998) Multiple imputation for missing-data problems: A data analyst’s perspective. Multivar Behav Res 33: 545–571

    Article  Google Scholar 

  • Song J (1999) Analysis of incomplete high-dimensional multivariate normal data using a common factor model. PhD thesis, Dept. of Biostatistics, UCLA, Los Angeles

  • Tanner MA, Wong WH (1987) The calculation of posterior distributions by data augmentation. J Am Stat Assoc 82: 805–811

    MathSciNet  Google Scholar 

  • Timmerman ME, Kiers HAL, Smilde AK (2007) Estimating confidence intervals for principal component loadings: a comparaison between the bootstrap and asymptotic results. Br J Math Stat Psychol 60: 295–314

    Article  Google Scholar 

  • Tipping M, Bishop CM (1999) Probabilistic principal component analysis. J Royal Stat Soc B 61: 611–622

    Article  MathSciNet  MATH  Google Scholar 

  • van Buuren S (2007) Multiple imputation of discrete and continuous data by fully conditional specification. Stat Methods Med Res 16: 219–242

    Article  MathSciNet  MATH  Google Scholar 

  • Wold H (1966) Nonlinear estimation by iterative least squares procedures. In: David FN (eds) Research Papers in Statistics: Festschrift for Jerzy Neyman. Wiley, New York, pp 411–444

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Julie Josse.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Josse, J., Pagès, J. & Husson, F. Multiple imputation in principal component analysis. Adv Data Anal Classif 5, 231–246 (2011). https://doi.org/10.1007/s11634-011-0086-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-011-0086-7

Keywords

Mathematics Subject Classification (2000)

Navigation