Skip to main content
Log in

Sparse correspondence analysis for large contingency tables

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

We propose sparse variants of correspondence analysis (CA) for large contingency tables like documents-terms matrices used in text mining. By seeking to obtain many zero coefficients, sparse CA remedies to the difficulty of interpreting CA results when the size of the table is large. Since CA is a double weighted PCA (for rows and columns) or a weighted generalized SVD, we adapt known sparse versions of these methods with specific developments to obtain orthogonal solutions and to tune the sparseness parameters. We distinguish two cases depending on whether sparseness is asked for both rows and columns, or only for one set.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. There are 45 presidents, but the speech data of presidents William Henry Harrison and James Garfield are missing.

References

  • Abdi H, Béra M (2014) Correspondence Analysis. Encyclopedia of Social Network Analysis and Mining. Springer, New York, New York, NY, pp 275–284

    Chapter  Google Scholar 

  • Adachi K, Trendafilov NT (2016) Sparse principal component analysis subject to prespecified cardinality of loadings. Computational Statistics 31(4):1403–1427

    Article  MathSciNet  MATH  Google Scholar 

  • Bécue-Bertaut M (2019) Textual data science with R. CRC Press

    Book  Google Scholar 

  • Beh EJ, Lombardo R (2014) Correspondence analysis: Theory, practice and new strategies. John Wiley & Sons

    Book  MATH  Google Scholar 

  • Bernard A, Guinot C, Saporta G (2012) Sparse principal component analysis for multiblock data and its extension to sparse multiple correspondence analysis. In: Colubi A et al (eds) Proceedings of the 20th international conference on computational statistics (COMPSTAT 2012). International Association for Statistical Computing, pp 99–106

  • D’Ambra L, Lauro NC (1992) Non symmetrical exploratory data analysis. Statistica Applicata 4(4):511–529

    Google Scholar 

  • Govaert G, Nadif M (2013) Co-clustering: models, algorithms and applications. John Wiley & Sons

    Book  MATH  Google Scholar 

  • Greenacre MJ (2010) Correspondence analysis. Wiley Interdisciplinary Reviews: Computational Statistics 2(5):613–619

    Article  Google Scholar 

  • Guerra-Urzola R, Van Deun K, Vera JC, Sijtsma K (2021) A Guide for Sparse PCA: Model Comparison and Applications. Psychometrika 86(4):893–919

    Article  MathSciNet  MATH  Google Scholar 

  • Guillemot V, Beaton D, Gloaguen A, Löfstedt T, Levine B, Raymond N, Tenenhaus A, Abdi H (2019) A constrained singular value decomposition method that integrates sparsity and orthogonality. PloS one 14(3):e0211463

    Article  Google Scholar 

  • Jolliffe IT, Trendafilov NT, Uddin M (2003) A modified principal component technique based on the LASSO. Journal of Computational and Graphical Statistics 12(3):531–547

    Article  MathSciNet  Google Scholar 

  • Lebart L, Pincemin B, Poudat C (2019) Analyse des données textuelles. Presses de l’Université du Québec

  • Lebart L, Salem A, Berry L (1997) Exploring textual data. Springer Science & Business Media

  • Lebart L, Saporta G (2014) Historical elements of correspondence analysis and multiple correspondence analysis. In: Blasius J, Greenacre MJ (eds) Visualization and Verbalization of Data. Chapman and Hall, London, pp 31–44

    Google Scholar 

  • Mackey L (2009) Deflation Methods for Sparse PCA. In: Koller D, Schuurmans D, Bengio Y, Bottou L (eds) Advances in Neural Information Processing Systems, vol 21. Curran Associates Inc, pp 1017–1024

    Google Scholar 

  • Mori Y, Kuroda M, Makino N (2016) Sparse Multiple Correspondence Analysis. In: Mori Y, Kuroda M, Makino N (eds) Nonlinear Principal Component Analysis and Its Applications. Springer-Verlag, pp 47–56

    Chapter  MATH  Google Scholar 

  • Ning-min S, Jing L (2015) A literature survey on high-dimensional sparse principal component analysis. International Journal of Database Theory and Application 8(6):57–74

    Article  Google Scholar 

  • Savoy J (2015) Text clustering: An application with the State of the Union addresses. Journal of the Association for Information Science and Technology 66(8):1645–1654

    Article  Google Scholar 

  • Shen D, Shen H, Marron JS (2013) Consistency of sparse PCA in high dimension, low sample size contexts. Journal of Multivariate Analysis 115:317–333

    Article  MathSciNet  MATH  Google Scholar 

  • Shen H, Huang JZ (2008) Sparse principal component analysis via regularized low rank matrix approximation. Journal of Multivariate Analysis 99(6):1015–1034

    Article  MathSciNet  MATH  Google Scholar 

  • Trendafilov NT (2014) From simple structure to sparse components: a review. Computational Statistics 29(3):431–454

    Article  MathSciNet  MATH  Google Scholar 

  • Trendafilov NT, Fontanella S, Adachi K (2017) Sparse exploratory factor analysis. Psychometrika 82(3):778–794

    Article  MathSciNet  MATH  Google Scholar 

  • Wilms I, Croux C (2015) Sparse canonical correlation analysis from a predictive point of view. Biometrical Journal 57(5):834–851

    Article  MathSciNet  MATH  Google Scholar 

  • Witten DM, Tibshirani R, Hastie T (2009) A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10(3):515–534

    Article  MATH  Google Scholar 

  • Zou H, Hastie T, Tibshirani R (2006) Sparse principal component analysis. Journal of Computational and Graphical Statistics 15(2):265–286

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gilbert Saporta.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: The 45 presidents of the United States

Appendix: The 45 presidents of the United States

1. George Washington

16. Abraham Lincoln

31. Herbert Hoover

2. John Adams

17. Andrew Johnson

32. Franklin D. Roosevelt

3. Thomas Jefferson

18. Ulysses S. Grant

33. Harry S. Truman

4. James Madison

19. Rutherford B. Hayes

34. Dwight D. Eisenhower

5. James Monroe

20. James A. Garfield

35. John F. Kennedy

6. John Quincy Adams

21. Chester A. Arthur

36. Lyndon B. Johnson

7. Andrew Jackson

22. Grover Cleveland

37. Richard Nixon

8. Martin Van Buren

23. Benjamin Harrison

38. Gerald R. Ford

9. William H. Harrison

24. Grover Cleveland

39. Jimmy Carter

10. John Tyler

25. William McKinley

40. Ronald Reagan

11. James Knox Polk

26. Theodore Roosevelt

41. George H.W. Bush

12. Zachary Taylor

27. William H. Taft

42. William J. Clinton

13. Millard Fillmore

28. Woodrow Wilson

43. George W. Bush

14. Franklin Pierce

29. Warren Harding

44. Barack Obama

15. James Buchanan

30. Calvin Coolidge

45. Donald Trump

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, R., Niang, N., Saporta, G. et al. Sparse correspondence analysis for large contingency tables. Adv Data Anal Classif 17, 1037–1056 (2023). https://doi.org/10.1007/s11634-022-00531-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-022-00531-5

Keywords

Mathematics Subject Classification

Navigation