Skip to main content

AgFlow: fast model selection of penalized PCA via implicit regularization effects of gradient flow

Abstract

Principal component analysis (PCA) has been widely used as an effective technique for feature extraction and dimension reduction. In the High Dimension Low Sample Size setting, one may prefer modified principal components, with penalized loadings, and automated penalty selection by implementing model selection among these different models with varying penalties. The earlier work (Zou et al. in J Comput Graph Stat 15(2):265–286, 2006; Gaynanova et al. in J Comput Graph Stat 26(2):379–387, 2017) has proposed penalized PCA, indicating the feasibility of model selection in \(\ell _2\)-penalized PCA through the solution path of Ridge regression, however, it is extremely time-consuming because of the intensive calculation of matrix inverse. In this paper, we propose a fast model selection method for penalized PCA, named approximated gradient flow (AgFlow), which lowers the computation complexity through incorporating the implicit regularization effect introduced by (stochastic) gradient flow (Ali et al. in: The 22nd international conference on artificial intelligence and statistics, pp 1370–1378, 2019; Ali et al. in: International conference on machine learning, 2020) and obtains the complete solution path of \(\ell _2\)-penalized PCA under varying \(\ell _2\)-regularization. We perform extensive experiments on real-world datasets. AgFlow outperforms existing methods (Oja and Karhunen in J Math Anal Appl 106(1):69–84, 1985; Hardt and Price in: Advances in neural information processing systems, pp 2861–2869, 2014; Shamir in: International conference on machine learning, pp 144–152, PMLR, 2015; and the vanilla Ridge estimators) in terms of computation costs.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

References

  • Ali, A., Dobriban, E., & Tibshirani, R. J. (2020). The implicit regularization of stochastic gradient flow for least squares. In International conference on machine learning (pp. 233–244). PMLR.

  • Ali, A., Kolter, J. Z., & Tibshirani, R. J. (2019). A continuous-time view of early stopping for least squares regression. In The 22nd international conference on artificial intelligence and statistics (pp 1370–1378).

  • Arora, R., Cotter, A., Livescu, K., & Srebro, N. (2012). Stochastic optimization for PCA and PLS. In 2012 50th annual allerton conference on communication, control, and computing (allerton) (pp. 861–868). IEEE.

  • Balsubramani, A., Dasgupta, S., & Freund, Y. (2013). The fast convergence of incremental PCA. In F. Bach, & D. Blei (Ed.), Advances in neural information processing systems (pp. 3174–3182).

  • Candes, E., Tao, T., et al. (2007). The Dantzig selector: Statistical estimation when p is much larger than n. The Annals of Statistics, 35(6), 2313–2351.

    MathSciNet  MATH  Google Scholar 

  • De Sa, C., Re, C., & Olukotun, K. (2015). Global convergence of stochastic gradient descent for some non-convex matrix problems. In International conference on machine learning (pp. 2332–2341).

  • Dutta, A., Hanzely, F., & Richtárik, P. (2019). A nonconvex projection method for robust PCA. In Proceedings of the AAAI conference on artificial intelligence (Vol. 33, pp. 1468–1476).

  • Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96(456), 1348–1360.

    MathSciNet  Article  Google Scholar 

  • Friedman, J., & Popescu, B. E. (2003). Gradient directed regularization for linear regression and classification. Technical report, Technical Report, Statistics Department, Stanford University.

  • Friedman, J., & Popescu, B. E. (2004). Gradient directed regularization. Unpublished manuscript. http://www-stat.stanford.edu/hf/ftp/pathlite.pdf. Accessed 24 June 2021.

  • Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1), 1.

    Article  Google Scholar 

  • Gaynanova, I., Booth, J. G., & Wells, M. T. (2017). Penalized versus constrained generalized eigenvalue problems. Journal of Computational and Graphical Statistics, 26(2), 379–387.

    MathSciNet  Article  Google Scholar 

  • Golub, G., & Loan, C. (2013). Matrix computations (4th ed.). Baltimore: Johns Hopkins University Press.

    MATH  Google Scholar 

  • Haber, E., Horesh, L., & Tenorio, L. (2008). Numerical methods for experimental design of large-scale linear ill-posed inverse problems. Inverse Problems, 24(5), 055012.

    MathSciNet  Article  Google Scholar 

  • Hardt, M., & Price, E. (2014). The noisy power method: a meta algorithm with applications. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, & K. Q. Weinberger (Eds.), Advances in neural information processing systems (pp. 2861–2869). MIT Press.

  • Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction. Berlin: Springer.

    Book  Google Scholar 

  • Hoerl, A. E., & Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1), 55–67.

    Article  Google Scholar 

  • Hoerl, A. E., Kannard, R. W., & Baldwin, K. F. (1975). Ridge regression: Some simulations. Communications in Statistics-Theory and Methods, 4(2), 105–123.

    MATH  Google Scholar 

  • Huang, G. B., Mattar, M., Berg, T., & Learned-Miller, E. (2008). Labeled faces in the wild: A database forstudying face recognition in unconstrained environments. In Workshop on faces in’Real-Life’Images: detection, alignment, and recognition.

  • Jolliffe, I. T. (1986). Principal components in regression analysis. In I. T. Jolliffe (Ed.), Principal component analysis (pp. 129–155). Springer.

  • LeCun, Y., Jackel, L. D., Bottou, L., Brunot, A., Cortes, C., Denker, J. S., Drucker, H., Guyon, I., Muller, U. A., Sackinger, E., & Simard, P. (1995). Comparison of learning algorithms for handwritten digit recognition. In International conference on artificial neural networks (Vol. 60, pp. 53–60). Australia: Perth.

  • Lee, Y. K., Lee, E. R., & Park, B. U. (2012). Principal component analysis in very high-dimensional spaces. Statistica Sinica, 22(3), 933–956.

    MathSciNet  MATH  Google Scholar 

  • Mitliagkas, I., Caramanis, C., & Jain, P. (2013). Memory limited, streaming PCA. In C. J. Burges, L. Bottou, M. Welling, Z. Ghahramani & K.Q. Weinberger (Eds.), Advances in neural information processing systems (pp. 2886–2894).

  • Mohammed, A. A., Minhas, R., Jonathan Wu, Q. M., & Sid-Ahmed, M. A. (2011). Human face recognition based on multidimensional PCA and extreme learning machine. Pattern Recognition, 44(10–11), 2588–2597.

    Article  Google Scholar 

  • Oja, E., & Karhunen, J. (1985). On stochastic approximation of the eigenvectors and eigenvalues of the expectation of a random matrix. Journal of Mathematical Analysis and Applications, 106(1), 69–84.

    MathSciNet  Article  Google Scholar 

  • Shamir, O. (2015). A stochastic PCA and SVD algorithm with an exponential convergence rate. In International conference on machine learning (pp. 144–152). PMLR.

  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267–288.

    MathSciNet  MATH  Google Scholar 

  • Witten, D. M., & Tibshirani, R. (2009). Covariance-regularized regression and classification for high dimensional problems. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71(3), 615–636.

    MathSciNet  Article  Google Scholar 

  • Witten, D. M., Tibshirani, R., & Hastie, T. (2009). A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics, 10(3), 515–534.

    Article  Google Scholar 

  • Yeung, K. Y., & Ruzzo, W. L. (2001). Principal component analysis for clustering gene expression data. Bioinformatics, 17(9), 763–774.

    Article  Google Scholar 

  • Yuan, M., & Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(1), 49–67.

    MathSciNet  Article  Google Scholar 

  • Zhu, Z., Ong, Y.-S., & Dash, M. (2007). Markov blanket-embedded genetic algorithm for gene selection. Pattern Recognition, 40(11), 3236–3248.

    Article  Google Scholar 

  • Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301–320.

    MathSciNet  Article  Google Scholar 

  • Zou, H., Hastie, T., & Tibshirani, R. (2006). Sparse principal component analysis. Journal of Computational and Graphical Statistics, 15(2), 265–286.

    MathSciNet  Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haoyi Xiong.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Editors: Annalisa Appice, Sergio Escalera, Jose A. Gamez, Heike Trautmann.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Jiang, H., Xiong, H., Wu, D. et al. AgFlow: fast model selection of penalized PCA via implicit regularization effects of gradient flow. Mach Learn 110, 2131–2150 (2021). https://doi.org/10.1007/s10994-021-06025-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10994-021-06025-3

Keywords

  • Model selection
  • Gradient flow
  • Implicit regularization
  • Penalized PCA
  • Ridge