Principal Component Analysis for Large Scale Problems with Lots of Missing Values

Raiko, Tapani; Ilin, Alexander; Karhunen, Juha

doi:10.1007/978-3-540-74958-5_69

Tapani Raiko¹,
Alexander Ilin¹ &
Juha Karhunen¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4701))

Included in the following conference series:

European Conference on Machine Learning

6161 Accesses
18 Citations

Abstract

Principal component analysis (PCA) is a well-known classical data analysis technique. There are a number of algorithms for solving the problem, some scaling better than others to problems with high dimensionality. They also differ in their ability to handle missing values in the data. We study a case where the data are high-dimensional and a majority of the values are missing. In case of very sparse data, overfitting becomes a severe problem even in simple linear models such as PCA. We propose an algorithm based on speeding up a simple principal subspace rule, and extend it to use regularization and variational Bayesian (VB) learning. The experiments with Netflix data confirm that the proposed algorithm is much faster than any of the compared methods, and that VB-PCA method provides more accurate predictions for new data than traditional PCA or regularized PCA.

Download to read the full chapter text

Chapter PDF

Sparse Principal Component Analysis with Missing Observations

Principal component selection via adaptive regularization method and generalized information criterion

Article 07 July 2015

Principal Components Analysis Based Frameworks for Efficient Missing Data Imputation Algorithms

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Pearson, K.: On lines and planes of closest fit to systems of points in space. Philosophical Magazine 2(6), 559–572 (1901)
Google Scholar
Jolliffe, I.: Principal Component Analysis. Springer, Heidelberg (1986)
Google Scholar
Bishop, C.: Pattern Recognition and Machine Learning. Springer, Heidelberg (2006)
MATH Google Scholar
Diamantaras, K., Kung, S.: Principal Component Neural Networks - Theory and Application. Wiley, Chichester (1996)
Google Scholar
Haykin, S.: Modern Filters. Macmillan (1989)
Google Scholar
Cichocki, A., Amari, S.: Adaptive Blind Signal and Image Processing - Learning Algorithms and Applications. Wiley, Chichester (2002)
Google Scholar
Oja, E.: Neural networks, principal components, and subspaces. International Journal of Neural Systems 1(1), 61–68 (1989)
Article MathSciNet Google Scholar
Tipping, M., Bishop, C.: Probabilistic principal component analysis. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 61(3), 611–622 (1999)
Article MATH MathSciNet Google Scholar
Grung, B., Manne, R.: Missing values in principal components analysis. Chemometrics and Intelligent Laboratory Systems 42(1), 125–139 (1998)
Article Google Scholar
Bishop, C.: Variational principal components. In: Proc. 9th Int. Conf. on Artificial Neural Networks (ICANN 1999), pp. 509–514 (1999)
Google Scholar
Oba, S., Sato, M., Takemasa, I., Monden, M., Matsubara, K., Ishii, S.: A Bayesian missing value estimation method for gene expression profile data. Bioinformatics 19(16), 2088–2096 (2003)
Article Google Scholar
Raiko, T., Valpola, H., Harva, M., Karhunen, J.: Building blocks for variational Bayesian learning of latent variable models. Journal of Machine Learning Research 8, 155–201 (2007)
MathSciNet Google Scholar
Netflix: Netflix prize webpage (2007), http://www.netflixprize.com/
Funk, S.: Netflix update: Try this at home (December 2006), Available at http://sifter.org/~simon/journal/20061211.html
Salakhutdinov, R., Mnih, A., Hinton, G.: Restricted Boltzmann machines for collaborative filtering. In: Proc. Int. Conf. on Machine Learning (to appear, 2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Adaptive Informatics Research Center, Helsinki Univ. of Technology, P.O. Box 5400, FI-02015 TKK, Finland
Tapani Raiko, Alexander Ilin & Juha Karhunen

Authors

Tapani Raiko
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Ilin
View author publications
You can also search for this author in PubMed Google Scholar
Juha Karhunen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Joost N. Kok Jacek Koronacki Raomon Lopez de Mantaras Stan Matwin Dunja Mladenič Andrzej Skowron

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Raiko, T., Ilin, A., Karhunen, J. (2007). Principal Component Analysis for Large Scale Problems with Lots of Missing Values. In: Kok, J.N., Koronacki, J., Mantaras, R.L.d., Matwin, S., Mladenič, D., Skowron, A. (eds) Machine Learning: ECML 2007. ECML 2007. Lecture Notes in Computer Science(), vol 4701. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74958-5_69

Download citation

DOI: https://doi.org/10.1007/978-3-540-74958-5_69
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74957-8
Online ISBN: 978-3-540-74958-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Principal Component Analysis for Large Scale Problems with Lots of Missing Values

Abstract

Chapter PDF

Similar content being viewed by others

Sparse Principal Component Analysis with Missing Observations

Principal component selection via adaptive regularization method and generalized information criterion

Principal Components Analysis Based Frameworks for Efficient Missing Data Imputation Algorithms

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Principal Component Analysis for Large Scale Problems with Lots of Missing Values

Abstract

Chapter PDF

Similar content being viewed by others

Sparse Principal Component Analysis with Missing Observations

Principal component selection via adaptive regularization method and generalized information criterion

Principal Components Analysis Based Frameworks for Efficient Missing Data Imputation Algorithms

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation