Single Imputation Via Chunk-Wise PCA

Iodice D’Enza, Alfonso; Palumbo, Francesco; Markos, Angelos

doi:10.1007/978-3-030-60104-1_9

Alfonso Iodice D’Enza²³,
Francesco Palumbo²³ &
Angelos Markos²⁴

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

Included in the following conference series:

Conference of the International Federation of Classification Societies

840 Accesses
1 Citations

Abstract

The straightforward application of Principal Component Analysis (PCA) to incomplete data sets is not possible and practitioners often remove or ignore observations that contain at least one missing value. Three different strategies can be mainly distinguished to apply PCA on a data set with missing entries: (i) imputation of the missings prior to the application of PCA; (ii) obtain the PCA solution and ignore the missings; and (iii) obtain the PCA solution and deal explicitly with missings. Methods implementing the latter strategy have been reviewed and, among them, the iterative PCA (iPCA) approach has been shown to be preferable. This paper proposes a chunk-wise implementation of iPCA, suitable for tall data sets, that is, with many observations. In the proposed approach, each data chunk is imputed according to the insofar analyzed data. The proposed procedure is compared to the batch iPCA and to a naive implementation, which imputes each data chunk independently. In a series of experiments, we consider different data sets and missing data mechanisms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Dray, S., Josse, J.: Principal component analysis with missing values: a comparative survey of methods. Plant Ecol. 216(5), 657–667 (2015)
Article Google Scholar
Folch-Fortuny, A., Arteaga, F., Ferrer, A.: PCA model building with missing data: new proposals and a comparative study. Chemometr. Intell. Lab. Syst. 146, 77–88 (2015)
Article Google Scholar
Geraci, M., Farcomeni, A.: Principal component analysis in the presence of missing data. In: Naik, G.R. (ed.) Advances in Principal Component Analysis, pp. 47–70. Springer (2018)
Google Scholar
Gower, J.C.: Statistical methods of comparing different multivariate analyses of the same data. In: Hodson F.R., Kendall, D. G., Tautu, P. (eds.) Mathematics in the Archaeological and Historical Sciences, pp. 138–149. Edinburgh University Press, Edinburgh (1971)
Google Scholar
Greenacre, M.J.: Biplots in practice, Fundacion BBVA (2010)
Google Scholar
Hall, P., Marshall, D., Martin, R.: Adding and subtracting eigenspaces with eigenvalue decomposition and singular value decomposition. Image Vision Comput. 20(13–14), 1009–1016 (2002)
Article Google Scholar
Iodice D’Enza, A., Markos, A., Buttarazzi, D.: The idm package: incremental decomposition methods in R. J. Stat. Softw. 86(1), 1–24 (2018)
Google Scholar
Jolliffe, I.T.: Principal Component Analysis. Springer, New York, NY (2002)
MATH Google Scholar
Josse, J., Hussin, F.: Handling missing values in exploratory multivariate data analysis methods. J. Société Française Statistique 153(2), 79–99 (2012)
MathSciNet MATH Google Scholar
Kiers, H.: Weighted least squares fitting using ordinary least squares algorithms. Psychometrika 62(2), 251–266 (1997)
Article MathSciNet Google Scholar
Little, R., Rubin. D.: Statistical Analysis with Missing Data. Wiley (2019)
Google Scholar
Loisel, S., Takane, Y.: Comparisons among several methods for handling missing data in principal component analysis (PCA). Adv. Data Anal. Classi. 13(2), 495–518 (2019)
Article MathSciNet Google Scholar
Matloff, N.: Software alchemy: turning complex statistical computations into embarrassingly-parallel ones. arXiv preprint arXiv:1409.5827 (2014)
Rieth, C.A., Amsel, B.D., Tran, R., Cook, M.B.: Additional Tennessee Eastman Process Simulation Data for Anomaly Detection Evaluation. Harvard Dataverse (2017)
Google Scholar
Schafer, J.L.: Analysis of Incomplete Multivariate Data. CRC Press (1997)
Google Scholar
Severson, K.A., Molaro, M.C., Braatz, R.D.: Principal component analysis of process datasets with missing values. Processes 5(3), 38 (2017)
Article Google Scholar
Van Ginkel, J.R., Kroonenberg, P.M., Kiers, H.: Missing data in principal component analysis of questionnaire data: a comparison of methods. J. Stat. Comput. Sim. 84(11), 2298–2315 (2014)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Universitá degli Studi di Napoli Federico II, Napoli, Italy
Alfonso Iodice D’Enza & Francesco Palumbo
Democritus University of Thrace, Xanthi, Greece
Angelos Markos

Authors

Alfonso Iodice D’Enza
View author publications
You can also search for this author in PubMed Google Scholar
Francesco Palumbo
View author publications
You can also search for this author in PubMed Google Scholar
Angelos Markos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alfonso Iodice D’Enza .

Editor information

Editors and Affiliations

Department of Political Sciences, Aristotle University of Thessaloniki, Thessaloniki, Greece
Theodore Chadjipadelis
Department of Mathematical Sciences, University of Essex, Colchester, UK
Berthold Lausen
School of Education, Democritus University of Thrace, Alexandroupolis, Greece
Angelos Markos
Department of Data Science and Statistics, Korea National Open University, Seoul, Korea (Republic of)
Tae Rim Lee
Department of Statistical Sciences “Paolo Fortunati”, University of Bologna, Bologna, Italy
Angela Montanari
Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA, USA
Rebecca Nugent

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Iodice D’Enza, A., Palumbo, F., Markos, A. (2021). Single Imputation Via Chunk-Wise PCA. In: Chadjipadelis, T., Lausen, B., Markos, A., Lee, T.R., Montanari, A., Nugent, R. (eds) Data Analysis and Rationality in a Complex World. IFCS 2019. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-030-60104-1_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-60104-1_9
Published: 15 February 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60103-4
Online ISBN: 978-3-030-60104-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics