Robust Principal Component Analysis of Data with Missing Values

Kärkkäinen, Tommi; Saarela, Mirka

doi:10.1007/978-3-319-21024-7_10

Tommi Kärkkäinen⁵ &
Mirka Saarela⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9166))

Included in the following conference series:

International Workshop on Machine Learning and Data Mining in Pattern Recognition

3240 Accesses
7 Citations
1 Altmetric

Abstract

Principal component analysis is one of the most popular machine learning and data mining techniques. Having its origins in statistics, principal component analysis is used in numerous applications. However, there seems to be not much systematic testing and assessment of principal component analysis for cases with erroneous and incomplete data. The purpose of this article is to propose multiple robust approaches for carrying out principal component analysis and, especially, to estimate the relative importances of the principal components to explain the data variability. Computational experiments are first focused on carefully designed simulated tests where the ground truth is known and can be used to assess the accuracy of the results of the different methods. In addition, a practical application and evaluation of the methods for an educational data set is given.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Available at http://www.oecd.org/pisa/pisaproducts/.

References

Alpaydin, E.: Introduction to Machine Learning, 2nd edn. The MIT Press, Cambridge, MA, USA (2010)
MATH Google Scholar
Äyrämö, S.: Knowledge Mining Using Robust Clustering: volume 63 of Jyväskylä Studies in Computing. University of Jyväskylä, Jyväskylä (2006)
Google Scholar
Bednar, J., Watt, T.: Alpha-trimmed means and their relationship to median filters. IEEE Trans. Acoust. Speech Sig. Process. 32(1), 145–153 (1984)
Article Google Scholar
Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, Oxford (1995)
Google Scholar
Croux, C., Ollila, E., Oja, H.: Sign and rank covariance matrices: statistical properties and application to principal components analysis. In: Dodge, Y. (ed.) Statistical data analysis based on the L1-norm and related methods, pp. 257–269. Springer, Basel (2002)
Chapter Google Scholar
d’Aspremont, A., Bach, F., Ghaoui, L.E.: Optimal solutions for sparse principal component analysis. J. Mach. Learn. Res. 9, 1269–1294 (2008)
MATH MathSciNet Google Scholar
Gervini, D.: Robust functional estimation using the median and spherical principal components. Biometrika 95(3), 587–600 (2008)
Article MATH MathSciNet Google Scholar
Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. Johns Hopkins University Press, Baltimore, MD, USA (1996)
MATH Google Scholar
Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J., Stahel, W.A.: Robust statistics: the approach based on influence functions, vol. 114. Wiley, New York (2011)
Google Scholar
Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques, 3rd edn. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2011)
Google Scholar
Hettmansperger, T.P., McKean, J.W.: Robust Nonparametric Statistical Methods. Edward Arnold, London (1998)
MATH Google Scholar
Hotelling, H.: Analysis of a complex of statistical variables into principal components. J. Educ. Psychol. 24(6), 417 (1933)
Article Google Scholar
Huber, P.J.: Robust Statistics. Wiley, New York (1981)
Book MATH Google Scholar
Ilin, A., Raiko, T.: Practical approaches to principal component analysis in the presence of missing values. J. Mach. Learn. Res. 11, 1957–2000 (2010)
MATH MathSciNet Google Scholar
Jolliffe, I.: Principal Component Analysis. Wiley Online Library, New York (2005)
Book Google Scholar
Kärkkäinen, T., Heikkola, E.: Robust formulations for training multilayer perceptrons. Neural Comput. 16, 837–862 (2004)
Article MATH Google Scholar
Kärkkäinen, T., Toivanen, J.: Building blocks for odd-even multigrid with applications to reduced systems. J. Comput. Appl. Math. 131, 15–33 (2001)
Article MATH MathSciNet Google Scholar
Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data, vol. 4. Wiley, New York (1987)
MATH Google Scholar
Locantore, N., Marron, J.S., Simpson, D.G., Tripoli, N., Zhang, J.T., Cohen, K.L., Boente, G., Fraiman, R., Brumback, B., Croux, C., et al.: Robust principal component analysis for functional data. Test 8(1), 1–73 (1999)
Article MATH MathSciNet Google Scholar
Milasevic, P., Ducharme, G.R.: Uniqueness of the spatial median. Ann. Stat. 15(3), 1332–1333 (1987)
Article MATH MathSciNet Google Scholar
OECD: PISA Data Analysis Manual: SPSS and SAS, 2nd edn. OECD Publishing, Paris (2009)
Google Scholar
OECD: PISA: Results: Ready to Learn - Students’ Engagement, Drive and Self-Beliefs. OECD Publishing, Paris (2013)
Google Scholar
Ringberg, H., Soule, A., Rexford, J., Diot, C.: Sensitivity of PCA for traffic anomaly detection. In: ACM SIGMETRICS Performance Evaluation Review, vol. 35, pp. 109–120. ACM (2007)
Google Scholar
Saarela, M., Kärkkäinen,T.: Discovering gender-specific knowledge from Finnish basic education using PISA scale indices. In: Proceedings of the 7th International Conference on Educational Data Mining, pp. 60–68 (2014)
Google Scholar
Saarela, M., Kärkkäinen, T.: Analysing student performance using sparse data of core bachelor courses. JEDM-J. Educ. Data Min. 7(1), 3–32 (2015)
Google Scholar
Stigler, S.M.: Do robust estimators work with real data? Ann. Stat. 5, 1055–1098 (1977)
Article MATH MathSciNet Google Scholar
Van Ginkel, J.R.. Kroonenberg, P.M., Kiers, H.A.: Missing data in principal component analysis of questionnaire data: a comparison of methods. J. Stat. Comput. Simul. 1–18 (2013) (ahead-of-print)
Google Scholar
Visuri, S., Koivunen, V., Oja, H.: Sign and rank covariance matrices. J. Stat. Plann. Infer. 91(2), 557–575 (2000)
Article MATH MathSciNet Google Scholar

Download references

Acknowledgments

The authors would like to thank Professor Tuomo Rossi for many helpful discussions on the contents of the paper.

Author information

Authors and Affiliations

Department of Mathematical Information Technology, University of Jyväskylä, 40014, Jyväskylä, Finland
Tommi Kärkkäinen & Mirka Saarela

Authors

Tommi Kärkkäinen
View author publications
You can also search for this author in PubMed Google Scholar
Mirka Saarela
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mirka Saarela .

Editor information

Editors and Affiliations

IBaI, Leipzig, Germany
Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kärkkäinen, T., Saarela, M. (2015). Robust Principal Component Analysis of Data with Missing Values. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2015. Lecture Notes in Computer Science(), vol 9166. Springer, Cham. https://doi.org/10.1007/978-3-319-21024-7_10

Download citation

DOI: https://doi.org/10.1007/978-3-319-21024-7_10
Published: 01 July 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-21023-0
Online ISBN: 978-3-319-21024-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics