Skip to main content

Principal Component Analysis for Exponential Family Data

  • Chapter
  • First Online:
Advances in Principal Component Analysis
  • 2741 Accesses

Abstract

This chapter reviews exponential family principal component analysis (ePCA), a family of statistical methods for dimension reduction of large-scale data that are not real-valued, such as user ratings for items in e-commerce, categorical/count genetic data in bioinformatics, and digital images in computer vision. The ePCA framework extends the applications of traditional PCA to modern data containing various data types. A sparse version of ePCA further helps overcome the model inconsistency and improve interpretability when applied to high-dimensional data. Model formulations and solution strategies of ePCA and sparse ePCA are discussed with real-world applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  1. Bair, E., Hastie, T., Paul, D., Tibshirani, R.: Prediction by supervised principal components. J. Am. Stat. Assoc. 101(473), 119–137 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  2. Bair, E., Tibshirani, R.: Semi-supervised methods to predict patient survival from gene expression data. PLoS Biol. 2(4), e108 (2004)

    Article  Google Scholar 

  3. Borwein, J., Lewis, A.: Convex Analysis and Nonlinear Optimization. Springer (2000)

    Google Scholar 

  4. Chen, X., Wang, L., Hu, B., Guo, M., Barnard, J., Zhu, X.: Pathway-based analysis for genome-wide association studies using supervised principal components. Genet. Epidemiol. 34, 716–724 (2010)

    Article  Google Scholar 

  5. Collins, M., Dasgupta, S., Schapire, R.E.: A generalization of principal component analysis to the exponential family. Adv. Neural Inf. Process. Syst. 14, 617–642 (2002)

    Google Scholar 

  6. David, W., Srikantan, N.: Iterative reweighted l1 and l2 methods for finding sparse solutions. IEEE J. Sel. Top. Sig. Process. 4(2), 317–329 (2010)

    Article  Google Scholar 

  7. Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96(456), 1348–1360 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  8. Fan, K.: On a theorem of weyl concerning eigenvalues of linear transformations: II. Proc. Natl. Acad. Sci. U. S. A. 35(11), 652–655 (1949)

    Article  Google Scholar 

  9. Georghiades, A.S., Belhumeur, P.N.: From few to many: illumination cone models for face recognition under variable lighting and pose. IEEE Trans. Pattern Anal. Mach. Intell. 23(6), 643–660 (2001)

    Article  Google Scholar 

  10. Guo, Y., Schuurmans, D.: Efficient global optimization for exponential family PCA and low-rank matrix factorization. In: Proceedings of the 46th Annual Allerton Conference on Communication, Control, and Computing, pp. 1100–1107 (2008)

    Google Scholar 

  11. Hunter, D.R., Lange, K.: A tutorial on MM algorithms. Am. Stat. 58(1), 30–37 (2004)

    Article  MathSciNet  Google Scholar 

  12. Jaakkola, T., Jordan, M.I.: Bayesian parameter estimation via variational methods. Stat. Comput. 10, 25–37 (2000)

    Article  Google Scholar 

  13. Johnstone, I.M., Lu, A.Y.: On consistency and sparsity for principal components analysis in high dimensions. J. Am. Stat. Assoc. 104(486), 700 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  14. Jolliffe, I.T.: Principal Component Analysis. Springer, New York (2002)

    MATH  Google Scholar 

  15. Landgraf, A.J., Lee, Y.: Dimensionality reduction for binary data through the projection of natural parameters. Technical Report No. 890, Department of Statistics, The Ohio State University (2015)

    Google Scholar 

  16. Landgraf, A.J., Lee, Y.: Generalized principal component analysis: projection of saturated model parameters. Technical Report No. 892, Department of Statistics, The Ohio State University (2015)

    Google Scholar 

  17. Lange, K., Hunter, D.R., Yang, I.: Optimization transfer using surrogate objective functions (with discussion). J. Comput. Graphical Stat. 9, 1–20 (2000)

    Google Scholar 

  18. Lee, S., Huang, J.Z.: A coordinate descent MM algorithm for fast computation of sparse logistic PCA. J. Comput. Stat. Data Anal. 62, 26–38 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  19. Lee, S., Huang, J.Z., Hu, J.: Sparse logistic principal components analysis for binary data. Ann. Appl. Stat. 4(3), 1579–1601 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  20. Leeuw, J.D.: Principal component analysis of binary data by iterated singular value decomposition. J. Comput. Stat. Data Anal. 50(1), 21–39 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  21. Lu, M., Huang, J.Z., Qian, X.: Supervised logistic principal component analysis for pathway based genome-wide association studies. In: ACM Conference on Bioinformatics, Computational Biology and Biomedicine (ACM BCB), pp. 52–59 (2012)

    Google Scholar 

  22. Lu, M., Huang, J.Z., Qian, X.: Sparse exponential family principal component analysis. Pattern Recogn. 60, 681–691 (2016)

    Article  Google Scholar 

  23. Lu, M., Lee, H.S., Hadley, D., Huang, J.Z., Qian, X.: Logistic principal component analysis for rare variants in gene-environment interaction analysis. IEEE/ACM Trans. Comput. Biol. Bioinf. 11(6), 1020–1028 (2014)

    Article  Google Scholar 

  24. Lu, M., Lee, H.S., Hadley, D., Huang, J.Z., Qian, X.: Supervised categorical principal component analysis for genome-wide association analyses. BMC Genomics 15, (S10) (2014)

    Google Scholar 

  25. Mardia, K., Kent, J., Bibby, J.: Multivariate Analysis. Academic Press, New York (1979)

    MATH  Google Scholar 

  26. McCullagh, P., Nelder, J.A.: Generalized Linear Models, 2nd ed. CRC (1990)

    Google Scholar 

  27. Nadler, B.: Finite sample approximation results for principal component analysis: a matrix perturbation approach. Ann. Stat. 36(6), 2791–2817 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  28. Paul, D.: Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Stat. Sinica 17(4), 1617 (2007)

    MathSciNet  MATH  Google Scholar 

  29. Pearson, K.: On lines and planes of closest fit to systems of points in space. Lond. Edinb. Dublin Phylos. Mag. J. Sci. Sixth Ser. 2, 559–572 (1901)

    MATH  Google Scholar 

  30. Rockafellar, R.: Convex Analysis. Princeton University Press (1970)

    Google Scholar 

  31. She, Y.: Thresholding-based iterative selection procedures for model selection and shrinkage. Electron. J. Stat. 3, 384–415 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  32. She, Y., Li, S., Wu, D.: Robust orthogonal complement principal component analysis. J. Am. Stat. Assoc. 111(514), 763–771 (2016)

    Article  MathSciNet  Google Scholar 

  33. She, Y., Owen, A.B.: Outlier detection using nonconvex penalized regression. J. Am. Stat. Assoc. 106(494), 626–639 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  34. Shen, H., Huang, J.Z.: Sparse principal component analysis via regularized low rank matrix approximation. J. Multivar. Anal. 101, 1015–1034 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  35. Tipping, M.E., Bishop, C.M.: Probabilistic principal component analysis. J. R. Stat. Soc. B 6(3), 611–622 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  36. Wainwright, M.J., Jordan, M.I.: Graphical models, exponential families, and variational inference. Found. Trends Mach. Learn. 1, 1–305 (2008)

    Article  MATH  Google Scholar 

  37. Wen, Z., Yin, W.: A feasible method for optimization with orthogonality constraints. Math. Program. 142(1–2), 397–434 (2013)

    Google Scholar 

  38. Zhang, Q., She, Y.: Sparse generalized principal component analysis for large-scale applications beyond gaussianity. arXiv:1512.03883 (2016)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaoning Qian .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Lu, M., He, K., Huang, J.Z., Qian, X. (2018). Principal Component Analysis for Exponential Family Data. In: Naik, G. (eds) Advances in Principal Component Analysis. Springer, Singapore. https://doi.org/10.1007/978-981-10-6704-4_8

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-6704-4_8

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-6703-7

  • Online ISBN: 978-981-10-6704-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics