Skip to main content

Single Imputation Via Chunk-Wise PCA

  • Conference paper
  • First Online:
Data Analysis and Rationality in a Complex World (IFCS 2019)

Abstract

The straightforward application of Principal Component Analysis (PCA) to incomplete data sets is not possible and practitioners often remove or ignore observations that contain at least one missing value. Three different strategies can be mainly distinguished to apply PCA on a data set with missing entries: (i) imputation of the missings prior to the application of PCA; (ii) obtain the PCA solution and ignore the missings; and (iii) obtain the PCA solution and deal explicitly with missings. Methods implementing the latter strategy have been reviewed and, among them, the iterative PCA (iPCA) approach has been shown to be preferable. This paper proposes a chunk-wise implementation of iPCA, suitable for tall data sets, that is, with many observations. In the proposed approach, each data chunk is imputed according to the insofar analyzed data. The proposed procedure is compared to the batch iPCA and to a naive implementation, which imputes each data chunk independently. In a series of experiments, we consider different data sets and missing data mechanisms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Dray, S., Josse, J.: Principal component analysis with missing values: a comparative survey of methods. Plant Ecol. 216(5), 657–667 (2015)

    Article  Google Scholar 

  • Folch-Fortuny, A., Arteaga, F., Ferrer, A.: PCA model building with missing data: new proposals and a comparative study. Chemometr. Intell. Lab. Syst. 146, 77–88 (2015)

    Article  Google Scholar 

  • Geraci, M., Farcomeni, A.: Principal component analysis in the presence of missing data. In: Naik, G.R. (ed.) Advances in Principal Component Analysis, pp. 47–70. Springer (2018)

    Google Scholar 

  • Gower, J.C.: Statistical methods of comparing different multivariate analyses of the same data. In: Hodson F.R., Kendall, D. G., Tautu, P. (eds.) Mathematics in the Archaeological and Historical Sciences, pp. 138–149. Edinburgh University Press, Edinburgh (1971)

    Google Scholar 

  • Greenacre, M.J.: Biplots in practice, Fundacion BBVA (2010)

    Google Scholar 

  • Hall, P., Marshall, D., Martin, R.: Adding and subtracting eigenspaces with eigenvalue decomposition and singular value decomposition. Image Vision Comput. 20(13–14), 1009–1016 (2002)

    Article  Google Scholar 

  • Iodice D’Enza, A., Markos, A., Buttarazzi, D.: The idm package: incremental decomposition methods in R. J. Stat. Softw. 86(1), 1–24 (2018)

    Google Scholar 

  • Jolliffe, I.T.: Principal Component Analysis. Springer, New York, NY (2002)

    MATH  Google Scholar 

  • Josse, J., Hussin, F.: Handling missing values in exploratory multivariate data analysis methods. J. Société Française Statistique 153(2), 79–99 (2012)

    MathSciNet  MATH  Google Scholar 

  • Kiers, H.: Weighted least squares fitting using ordinary least squares algorithms. Psychometrika 62(2), 251–266 (1997)

    Article  MathSciNet  Google Scholar 

  • Little, R., Rubin. D.: Statistical Analysis with Missing Data. Wiley (2019)

    Google Scholar 

  • Loisel, S., Takane, Y.: Comparisons among several methods for handling missing data in principal component analysis (PCA). Adv. Data Anal. Classi. 13(2), 495–518 (2019)

    Article  MathSciNet  Google Scholar 

  • Matloff, N.: Software alchemy: turning complex statistical computations into embarrassingly-parallel ones. arXiv preprint arXiv:1409.5827 (2014)

  • Rieth, C.A., Amsel, B.D., Tran, R., Cook, M.B.: Additional Tennessee Eastman Process Simulation Data for Anomaly Detection Evaluation. Harvard Dataverse (2017)

    Google Scholar 

  • Schafer, J.L.: Analysis of Incomplete Multivariate Data. CRC Press (1997)

    Google Scholar 

  • Severson, K.A., Molaro, M.C., Braatz, R.D.: Principal component analysis of process datasets with missing values. Processes 5(3), 38 (2017)

    Article  Google Scholar 

  • Van Ginkel, J.R., Kroonenberg, P.M., Kiers, H.: Missing data in principal component analysis of questionnaire data: a comparison of methods. J. Stat. Comput. Sim. 84(11), 2298–2315 (2014)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alfonso Iodice D’Enza .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Iodice D’Enza, A., Palumbo, F., Markos, A. (2021). Single Imputation Via Chunk-Wise PCA. In: Chadjipadelis, T., Lausen, B., Markos, A., Lee, T.R., Montanari, A., Nugent, R. (eds) Data Analysis and Rationality in a Complex World. IFCS 2019. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-030-60104-1_9

Download citation

Publish with us

Policies and ethics