Skip to main content

Robust Principal Component Analysis of Data with Missing Values

  • Conference paper
  • First Online:
Machine Learning and Data Mining in Pattern Recognition (MLDM 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9166))

Abstract

Principal component analysis is one of the most popular machine learning and data mining techniques. Having its origins in statistics, principal component analysis is used in numerous applications. However, there seems to be not much systematic testing and assessment of principal component analysis for cases with erroneous and incomplete data. The purpose of this article is to propose multiple robust approaches for carrying out principal component analysis and, especially, to estimate the relative importances of the principal components to explain the data variability. Computational experiments are first focused on carefully designed simulated tests where the ground truth is known and can be used to assess the accuracy of the results of the different methods. In addition, a practical application and evaluation of the methods for an educational data set is given.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Available at http://www.oecd.org/pisa/pisaproducts/.

References

  1. Alpaydin, E.: Introduction to Machine Learning, 2nd edn. The MIT Press, Cambridge, MA, USA (2010)

    MATH  Google Scholar 

  2. Äyrämö, S.: Knowledge Mining Using Robust Clustering: volume 63 of Jyväskylä Studies in Computing. University of Jyväskylä, Jyväskylä (2006)

    Google Scholar 

  3. Bednar, J., Watt, T.: Alpha-trimmed means and their relationship to median filters. IEEE Trans. Acoust. Speech Sig. Process. 32(1), 145–153 (1984)

    Article  Google Scholar 

  4. Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, Oxford (1995)

    Google Scholar 

  5. Croux, C., Ollila, E., Oja, H.: Sign and rank covariance matrices: statistical properties and application to principal components analysis. In: Dodge, Y. (ed.) Statistical data analysis based on the L1-norm and related methods, pp. 257–269. Springer, Basel (2002)

    Chapter  Google Scholar 

  6. d’Aspremont, A., Bach, F., Ghaoui, L.E.: Optimal solutions for sparse principal component analysis. J. Mach. Learn. Res. 9, 1269–1294 (2008)

    MATH  MathSciNet  Google Scholar 

  7. Gervini, D.: Robust functional estimation using the median and spherical principal components. Biometrika 95(3), 587–600 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  8. Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. Johns Hopkins University Press, Baltimore, MD, USA (1996)

    MATH  Google Scholar 

  9. Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J., Stahel, W.A.: Robust statistics: the approach based on influence functions, vol. 114. Wiley, New York (2011)

    Google Scholar 

  10. Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques, 3rd edn. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2011)

    Google Scholar 

  11. Hettmansperger, T.P., McKean, J.W.: Robust Nonparametric Statistical Methods. Edward Arnold, London (1998)

    MATH  Google Scholar 

  12. Hotelling, H.: Analysis of a complex of statistical variables into principal components. J. Educ. Psychol. 24(6), 417 (1933)

    Article  Google Scholar 

  13. Huber, P.J.: Robust Statistics. Wiley, New York (1981)

    Book  MATH  Google Scholar 

  14. Ilin, A., Raiko, T.: Practical approaches to principal component analysis in the presence of missing values. J. Mach. Learn. Res. 11, 1957–2000 (2010)

    MATH  MathSciNet  Google Scholar 

  15. Jolliffe, I.: Principal Component Analysis. Wiley Online Library, New York (2005)

    Book  Google Scholar 

  16. Kärkkäinen, T., Heikkola, E.: Robust formulations for training multilayer perceptrons. Neural Comput. 16, 837–862 (2004)

    Article  MATH  Google Scholar 

  17. Kärkkäinen, T., Toivanen, J.: Building blocks for odd-even multigrid with applications to reduced systems. J. Comput. Appl. Math. 131, 15–33 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  18. Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data, vol. 4. Wiley, New York (1987)

    MATH  Google Scholar 

  19. Locantore, N., Marron, J.S., Simpson, D.G., Tripoli, N., Zhang, J.T., Cohen, K.L., Boente, G., Fraiman, R., Brumback, B., Croux, C., et al.: Robust principal component analysis for functional data. Test 8(1), 1–73 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  20. Milasevic, P., Ducharme, G.R.: Uniqueness of the spatial median. Ann. Stat. 15(3), 1332–1333 (1987)

    Article  MATH  MathSciNet  Google Scholar 

  21. OECD: PISA Data Analysis Manual: SPSS and SAS, 2nd edn. OECD Publishing, Paris (2009)

    Google Scholar 

  22. OECD: PISA: Results: Ready to Learn - Students’ Engagement, Drive and Self-Beliefs. OECD Publishing, Paris (2013)

    Google Scholar 

  23. Ringberg, H., Soule, A., Rexford, J., Diot, C.: Sensitivity of PCA for traffic anomaly detection. In: ACM SIGMETRICS Performance Evaluation Review, vol. 35, pp. 109–120. ACM (2007)

    Google Scholar 

  24. Saarela, M., Kärkkäinen,T.: Discovering gender-specific knowledge from Finnish basic education using PISA scale indices. In: Proceedings of the 7th International Conference on Educational Data Mining, pp. 60–68 (2014)

    Google Scholar 

  25. Saarela, M., Kärkkäinen, T.: Analysing student performance using sparse data of core bachelor courses. JEDM-J. Educ. Data Min. 7(1), 3–32 (2015)

    Google Scholar 

  26. Stigler, S.M.: Do robust estimators work with real data? Ann. Stat. 5, 1055–1098 (1977)

    Article  MATH  MathSciNet  Google Scholar 

  27. Van Ginkel, J.R.. Kroonenberg, P.M., Kiers, H.A.: Missing data in principal component analysis of questionnaire data: a comparison of methods. J. Stat. Comput. Simul. 1–18 (2013) (ahead-of-print)

    Google Scholar 

  28. Visuri, S., Koivunen, V., Oja, H.: Sign and rank covariance matrices. J. Stat. Plann. Infer. 91(2), 557–575 (2000)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Acknowledgments

The authors would like to thank Professor Tuomo Rossi for many helpful discussions on the contents of the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mirka Saarela .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Kärkkäinen, T., Saarela, M. (2015). Robust Principal Component Analysis of Data with Missing Values. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2015. Lecture Notes in Computer Science(), vol 9166. Springer, Cham. https://doi.org/10.1007/978-3-319-21024-7_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-21024-7_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-21023-0

  • Online ISBN: 978-3-319-21024-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics