Handling Missing Values with Regularized Iterative Multiple Correspondence Analysis
A common approach to deal with missing values in multivariate exploratory data analysis consists in minimizing the loss function over all non-missing elements, which can be achieved by EM-type algorithms where an iterative imputation of the missing values is performed during the estimation of the axes and components. This paper proposes such an algorithm, named iterative multiple correspondence analysis, to handle missing values in multiple correspondence analysis (MCA). The algorithm, based on an iterative PCA algorithm, is described and its properties are studied. We point out the overfitting problem and propose a regularized version of the algorithm to overcome this major issue. Finally, performances of the regularized iterative MCA algorithm (implemented in the R-package named missMDA) are assessed from both simulations and a real dataset. Results are promising with respect to other methods such as the missing-data passive modified margin method, an adaptation of the missing passive method used in Gifi’s Homogeneity analysis framework.
KeywordsMultiple correspondence analysis Categorical data Missing values Imputation Regularization
Unable to display preview. Download preview PDF.
- GIFI, A. (1981), Non-linear Multivariate Analysis, Leiden: D.S.W.O.-Press.Google Scholar
- HASTIE, T., TIBSHIRANI, R., and FRIEDMAN, J. (2001), The Elements of Statistical Learning: Data Mining, Inference and Prediction, Springer Series in Statistics.Google Scholar
- JOSSE, J., PAGÈS, J., and HUSSON, F. (2009), “Gestion des DonnÉes Manquantes en Analyse en Composantes Principales”, Journal de la Société Française de Statistique, 150, 28–51.Google Scholar
- LÊ, S., JOSSE, J. and HUSSON, F. (2008), “Factominer: An R Package for Multivariate Analysis”, Journal of Statistical Software, 25(1), 1–18.Google Scholar
- LITTLE, R.J.A., and RUBIN, D.B. (1987, 2002), Statistical Analysis with Missing Data, New York: Wiley Series in Probability And Statistics.Google Scholar
- MEULMAN, J. (1982), Homgeneity Analysis of Incomplete Data, Leiden: D.S.W.O.-Press.Google Scholar
- NORA-CHOUTEAU, C. (1974), Une Méthode de Reconstitution et d’Analyse de Données IncomplÈtes, unpublished PhD thesis, Université Pierre et Marie Curie.Google Scholar
- R DEVELOPMENT CORE TEAM, (2010), R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, ISBN 3-900051-07-0, http://www.R-project.org/.
- SCHAFER, J.L. (1997), Analysis of Incomplete Multivariate Data, Chapman & Hall/CRC.Google Scholar
- TAKANE, Y,. and HWANG, H. (2006), “Regularized Multiple Correspondence Analysis”, in Multiple Correspondence Analysis and Related Methods, eds. J. Blasius and M. J. Greenacre, Chapman & Hall, pp. 259–279.Google Scholar
- VAN DER HEIJDEN, P.G.M., and ESCOFIER, B. (2003), “Multiple Correspondence Analysis with Missing Data”, in Recherches sur l’Analyse des Correspondances, pp. 152–170.Google Scholar
- VERMUNT, J.K., VAN GINKEL, J.R., VAN DER ARK, L.A., and SIJTSMA, K. (2008), “Multiple Imputation of Incomplete Categorical Data Using Latent Class Analysis”, Sociological Methodology, 33, 369–397.Google Scholar