Sparse correspondence analysis for large contingency tables

Liu, Ruiping; Niang, Ndeye; Saporta, Gilbert; Wang, Huiwen

doi:10.1007/s11634-022-00531-5

Sparse correspondence analysis for large contingency tables

Regular Article
Published: 02 January 2023

Volume 17, pages 1037–1056, (2023)
Cite this article

Advances in Data Analysis and Classification Aims and scope Submit manuscript

374 Accesses
2 Citations
Explore all metrics

Abstract

We propose sparse variants of correspondence analysis (CA) for large contingency tables like documents-terms matrices used in text mining. By seeking to obtain many zero coefficients, sparse CA remedies to the difficulty of interpreting CA results when the size of the table is large. Since CA is a double weighted PCA (for rows and columns) or a weighted generalized SVD, we adapt known sparse versions of these methods with specific developments to obtain orthogonal solutions and to tune the sparseness parameters. We distinguish two cases depending on whether sparseness is asked for both rows and columns, or only for one set.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Comprehensive Survey of Clustering Algorithms

Article 01 June 2015

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

Article 27 November 2022

Notes

There are 45 presidents, but the speech data of presidents William Henry Harrison and James Garfield are missing.

References

Abdi H, Béra M (2014) Correspondence Analysis. Encyclopedia of Social Network Analysis and Mining. Springer, New York, New York, NY, pp 275–284
Chapter Google Scholar
Adachi K, Trendafilov NT (2016) Sparse principal component analysis subject to prespecified cardinality of loadings. Computational Statistics 31(4):1403–1427
Article MathSciNet MATH Google Scholar
Bécue-Bertaut M (2019) Textual data science with R. CRC Press
Book Google Scholar
Beh EJ, Lombardo R (2014) Correspondence analysis: Theory, practice and new strategies. John Wiley & Sons
Book MATH Google Scholar
Bernard A, Guinot C, Saporta G (2012) Sparse principal component analysis for multiblock data and its extension to sparse multiple correspondence analysis. In: Colubi A et al (eds) Proceedings of the 20th international conference on computational statistics (COMPSTAT 2012). International Association for Statistical Computing, pp 99–106
D’Ambra L, Lauro NC (1992) Non symmetrical exploratory data analysis. Statistica Applicata 4(4):511–529
Google Scholar
Govaert G, Nadif M (2013) Co-clustering: models, algorithms and applications. John Wiley & Sons
Book MATH Google Scholar
Greenacre MJ (2010) Correspondence analysis. Wiley Interdisciplinary Reviews: Computational Statistics 2(5):613–619
Article Google Scholar
Guerra-Urzola R, Van Deun K, Vera JC, Sijtsma K (2021) A Guide for Sparse PCA: Model Comparison and Applications. Psychometrika 86(4):893–919
Article MathSciNet MATH Google Scholar
Guillemot V, Beaton D, Gloaguen A, Löfstedt T, Levine B, Raymond N, Tenenhaus A, Abdi H (2019) A constrained singular value decomposition method that integrates sparsity and orthogonality. PloS one 14(3):e0211463
Article Google Scholar
Jolliffe IT, Trendafilov NT, Uddin M (2003) A modified principal component technique based on the LASSO. Journal of Computational and Graphical Statistics 12(3):531–547
Article MathSciNet Google Scholar
Lebart L, Pincemin B, Poudat C (2019) Analyse des données textuelles. Presses de l’Université du Québec
Lebart L, Salem A, Berry L (1997) Exploring textual data. Springer Science & Business Media
Lebart L, Saporta G (2014) Historical elements of correspondence analysis and multiple correspondence analysis. In: Blasius J, Greenacre MJ (eds) Visualization and Verbalization of Data. Chapman and Hall, London, pp 31–44
Google Scholar
Mackey L (2009) Deflation Methods for Sparse PCA. In: Koller D, Schuurmans D, Bengio Y, Bottou L (eds) Advances in Neural Information Processing Systems, vol 21. Curran Associates Inc, pp 1017–1024
Google Scholar
Mori Y, Kuroda M, Makino N (2016) Sparse Multiple Correspondence Analysis. In: Mori Y, Kuroda M, Makino N (eds) Nonlinear Principal Component Analysis and Its Applications. Springer-Verlag, pp 47–56
Chapter MATH Google Scholar
Ning-min S, Jing L (2015) A literature survey on high-dimensional sparse principal component analysis. International Journal of Database Theory and Application 8(6):57–74
Article Google Scholar
Savoy J (2015) Text clustering: An application with the State of the Union addresses. Journal of the Association for Information Science and Technology 66(8):1645–1654
Article Google Scholar
Shen D, Shen H, Marron JS (2013) Consistency of sparse PCA in high dimension, low sample size contexts. Journal of Multivariate Analysis 115:317–333
Article MathSciNet MATH Google Scholar
Shen H, Huang JZ (2008) Sparse principal component analysis via regularized low rank matrix approximation. Journal of Multivariate Analysis 99(6):1015–1034
Article MathSciNet MATH Google Scholar
Trendafilov NT (2014) From simple structure to sparse components: a review. Computational Statistics 29(3):431–454
Article MathSciNet MATH Google Scholar
Trendafilov NT, Fontanella S, Adachi K (2017) Sparse exploratory factor analysis. Psychometrika 82(3):778–794
Article MathSciNet MATH Google Scholar
Wilms I, Croux C (2015) Sparse canonical correlation analysis from a predictive point of view. Biometrical Journal 57(5):834–851
Article MathSciNet MATH Google Scholar
Witten DM, Tibshirani R, Hastie T (2009) A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10(3):515–534
Article MATH Google Scholar
Zou H, Hastie T, Tibshirani R (2006) Sparse principal component analysis. Journal of Computational and Graphical Statistics 15(2):265–286
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

School of Applied Science, Beijing Information Science and Technology University, Beijing, China
Ruiping Liu
CEDRIC Lab, Conservatoire national des arts et métiers, 292 rue Saint Martin, 75003, Paris, France
Ndeye Niang & Gilbert Saporta
School of Economics and Management and Beijing Advanced Innovation Center for Big Data and Brain Computing, Beihang University, Beijing, China
Huiwen Wang

Authors

Ruiping Liu
View author publications
You can also search for this author in PubMed Google Scholar
Ndeye Niang
View author publications
You can also search for this author in PubMed Google Scholar
Gilbert Saporta
View author publications
You can also search for this author in PubMed Google Scholar
Huiwen Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gilbert Saporta.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: The 45 presidents of the United States

1. George Washington	16. Abraham Lincoln	31. Herbert Hoover
2. John Adams	17. Andrew Johnson	32. Franklin D. Roosevelt
3. Thomas Jefferson	18. Ulysses S. Grant	33. Harry S. Truman
4. James Madison	19. Rutherford B. Hayes	34. Dwight D. Eisenhower
5. James Monroe	20. James A. Garfield	35. John F. Kennedy
6. John Quincy Adams	21. Chester A. Arthur	36. Lyndon B. Johnson
7. Andrew Jackson	22. Grover Cleveland	37. Richard Nixon
8. Martin Van Buren	23. Benjamin Harrison	38. Gerald R. Ford
9. William H. Harrison	24. Grover Cleveland	39. Jimmy Carter
10. John Tyler	25. William McKinley	40. Ronald Reagan
11. James Knox Polk	26. Theodore Roosevelt	41. George H.W. Bush
12. Zachary Taylor	27. William H. Taft	42. William J. Clinton
13. Millard Fillmore	28. Woodrow Wilson	43. George W. Bush
14. Franklin Pierce	29. Warren Harding	44. Barack Obama
15. James Buchanan	30. Calvin Coolidge	45. Donald Trump

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Liu, R., Niang, N., Saporta, G. et al. Sparse correspondence analysis for large contingency tables. Adv Data Anal Classif 17, 1037–1056 (2023). https://doi.org/10.1007/s11634-022-00531-5

Download citation

Received: 22 November 2021
Revised: 08 December 2022
Accepted: 13 December 2022
Published: 02 January 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s11634-022-00531-5

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sparse correspondence analysis for large contingency tables

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: The 45 presidents of the United States

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Sparse correspondence analysis for large contingency tables

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: The 45 presidents of the United States

Appendix: The 45 presidents of the United States

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation