Skip to main content
Log in

Handling Missing Values with Regularized Iterative Multiple Correspondence Analysis

  • Published:
Journal of Classification Aims and scope Submit manuscript

Abstract

A common approach to deal with missing values in multivariate exploratory data analysis consists in minimizing the loss function over all non-missing elements, which can be achieved by EM-type algorithms where an iterative imputation of the missing values is performed during the estimation of the axes and components. This paper proposes such an algorithm, named iterative multiple correspondence analysis, to handle missing values in multiple correspondence analysis (MCA). The algorithm, based on an iterative PCA algorithm, is described and its properties are studied. We point out the overfitting problem and propose a regularized version of the algorithm to overcome this major issue. Finally, performances of the regularized iterative MCA algorithm (implemented in the R-package named missMDA) are assessed from both simulations and a real dataset. Results are promising with respect to other methods such as the missing-data passive modified margin method, an adaptation of the missing passive method used in Gifi’s Homogeneity analysis framework.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • BENZÉCRI, J-P. (1973), L’Analyse des Données, Tome II: L’Analyse des Correspondances, Paris: Dunod.

    MATH  Google Scholar 

  • BRO, R., KJELDAHL, K., SMILDE, A. K., and KIERS, H. A. L. (2008), “Cross-validation of Component Models: A Critical Look at Current Methods”, Analalytical and Bioanalytical Chemistry, 390, 1241–1251.

    Article  Google Scholar 

  • DE LEEUW, J., and VAN DER HEIJDEN, P. G. M. (1988), “Correspondence Analysis of Incomplete Contingency Tables”, Psychometrika, 53, 223–233.

    Article  MathSciNet  MATH  Google Scholar 

  • DEMPSTER, A. P., LAIRD, N. M., and RUBIN, D. B. (1977), “Maximum Likelihood from Incomplete Data via the Em Algorithm”, Journal of the Royal Statistical Society B, 39, 1–38.

    MathSciNet  MATH  Google Scholar 

  • ESCOFIER, B. (1987), “Traitement des Questionnaires avec Non Réponse, Analyse des Correspondances avec Marges Modifiée et Analyse Multicanonique avec Contrainte”, Publications de l’Institut de Statistique de l’Université de Paris, 32, 33–70.

    MATH  Google Scholar 

  • ESCOUFIER, Y. (1973), “Le Traitement des Variables Vectorielles”, Biometrics, 29, 751–760.

    Article  MathSciNet  Google Scholar 

  • GABRIEL, K.R.,and ZAMIR, S. (1979), “Lower Rank Approximation of Matrices by Least Squares with Any Choice of Weights”, Technometrics, 21, 236–246.

    Article  Google Scholar 

  • GIFI, A. (1981), Non-linear Multivariate Analysis, Leiden: D.S.W.O.-Press.

    Google Scholar 

  • GREENACRE, M. (1984), Theory and Applications of Correspondence Analysis, London: Acadamic Press.

    MATH  Google Scholar 

  • GREENACRE, M. (1988), “Correspondence Analysis of Multivariate Categorical Data by Weighted Least-squares”, Biometrika, 75, 457–477.

    Article  MathSciNet  MATH  Google Scholar 

  • GREENACRE, and BLASIUS, J. (2006), Multiple Correspondence Analysis and Related Methods, London: Chapman & Hall/CRC.

    Book  MATH  Google Scholar 

  • GREENACRE, M. and PARDO, R. (2006), “Subset Correspondence Analysis: Visualizing Relationships Among a Selected Set of Response Categories from a Questionnaire Survey”, Sociological Methods and Research, 35 (2): 193–218.

    Article  MathSciNet  Google Scholar 

  • HASTIE, T., TIBSHIRANI, R., and FRIEDMAN, J. (2001), The Elements of Statistical Learning: Data Mining, Inference and Prediction, Springer Series in Statistics.

  • HOERL, A.F., and KENNARD, R.W. (1970), “Ridge Regression: Biased Estimation for Nonorthogonal Problems”, Technometrics, 12, 55–67.

    Article  MATH  Google Scholar 

  • HUSSON, F. and JOSSE, J. (2010), missMDA: Handling Missing Values With/In Multivariate Data Analysis (Principal Component Methods), R package version 1.2, http://www.agrocampus-ouest.fr/math/husson, http://www.agrocampus-ouest.fr/math/josse.

  • HUSSON, F., JOSSE, J., LÊ, S., and MAZET, J. (2011), FactoMineR: Multivariate Exploratory Data Analysis and Data Mining with R, R package version 1.16, http://factominer.free.fr, http://www.agrocampus-ouest.fr/math/.

  • ILIN, A., and RAIKO, T. (2010), “Practical Approaches to Principal Component Analysis in the Presence of Missing Values”, Journal of Machine Learning Research, 11, pp. 1957-2000.

    MathSciNet  Google Scholar 

  • JOSSE, J., PAGÈS, J., and HUSSON, F. (2008), “Testing the Significance of the Rv Coefficient”, Computational Statistics and Data Analysis, 53, 82–91.

    Article  MathSciNet  MATH  Google Scholar 

  • JOSSE, J., PAGÈS, J., and HUSSON, F. (2009), “Gestion des DonnÉes Manquantes en Analyse en Composantes Principales”, Journal de la Société Française de Statistique, 150, 28–51.

    Google Scholar 

  • KIERS, H.A.L. (1997), “Weighted Least Squares Fitting Using Ordinary Least Squares Algorithms”, Psychometrika, 62, 251–266.

    Article  MathSciNet  MATH  Google Scholar 

  • LÊ, S., JOSSE, J. and HUSSON, F. (2008), “Factominer: An R Package for Multivariate Analysis”, Journal of Statistical Software, 25(1), 1–18.

    Google Scholar 

  • LEBART, L., MORINEAU, A., and WARWICK, K.M. (1984), Multivariate Descriptive Statistical Analysis, New York: Wiley.

    MATH  Google Scholar 

  • LITTLE, R.J.A., and RUBIN, D.B. (1987, 2002), Statistical Analysis with Missing Data, New York: Wiley Series in Probability And Statistics.

    Google Scholar 

  • MEULMAN, J. (1982), Homgeneity Analysis of Incomplete Data, Leiden: D.S.W.O.-Press.

    Google Scholar 

  • NISHISATO, S. (1980), Analysis of Categorical Data: Dual Scaling and its Applications, Toronto: University of Toronto Press, Toronto.

    MATH  Google Scholar 

  • NORA-CHOUTEAU, C. (1974), Une Méthode de Reconstitution et d’Analyse de Données IncomplÈtes, unpublished PhD thesis, Université Pierre et Marie Curie.

  • R DEVELOPMENT CORE TEAM, (2010), R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, ISBN 3-900051-07-0, http://www.R-project.org/.

  • RUBIN, D.B. (1976), “Inference and Missing Data”, Biometrika, 63, 581–592.

    Article  MathSciNet  MATH  Google Scholar 

  • SCHAFER, J.L. (1997), Analysis of Incomplete Multivariate Data, Chapman & Hall/CRC.

  • SCHAFER, J.L., and GRAHAM, J.W. (2002), “Missing Data: Our View of the State of the Art”, Psychological Methods, 7, 147–177.

    Article  Google Scholar 

  • SMILDE, A.K., KIERS, H.A.L., BIJLSMA, S., RUBINGH, C.M. and VAN ERK, M.J. (2009), “Matrix Correlations for High-dimensional Data: The Modified RV-coefficient”, Bioinformatics, 25, 401–405.

    Article  Google Scholar 

  • TAKANE, Y., and HWANG, H. (2002), “Generalized Constrained Canonical Correlation Analysis”, Multivariate Behavioral Research, 37, 163–195.

    Article  Google Scholar 

  • TAKANE, Y,. and HWANG, H. (2006), “Regularized Multiple Correspondence Analysis”, in Multiple Correspondence Analysis and Related Methods, eds. J. Blasius and M. J. Greenacre, Chapman & Hall, pp. 259–279.

  • TAKANE, Y., and OSHIMA-TAKANE, Y. (2003), “Relationships Between Two Methods for Dealing with Missing Data in Principal Component Analysis”, Behaviormetrika, 30, 145–154.

    Article  MathSciNet  MATH  Google Scholar 

  • TENENHAUS, M., and YOUNG, F.W. (1985), “An Analysis and Synthesis of Multiple Correspondence Analysis, Optimal Scaling, Dual Scaling, Homogeneity Analysis and Other Methods for Quantifying Categorical Multivariate Data”, Psychometrika, 50, 91–119.

    Article  MathSciNet  MATH  Google Scholar 

  • TIPPING, M., and BISHOP, C.M. (1999), “Probabilistic Principal Component Analysis”, Journal of the Royal Statistical Society B, 61, 611–622.

    Article  MathSciNet  MATH  Google Scholar 

  • VAN DER HEIJDEN, P.G.M., and ESCOFIER, B. (2003), “Multiple Correspondence Analysis with Missing Data”, in Recherches sur l’Analyse des Correspondances, pp. 152–170.

  • VERMUNT, J.K., VAN GINKEL, J.R., VAN DER ARK, L.A., and SIJTSMA, K. (2008), “Multiple Imputation of Incomplete Categorical Data Using Latent Class Analysis”, Sociological Methodology, 33, 369–397.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Julie Josse.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Josse, J., Chavent, M., Liquet, B. et al. Handling Missing Values with Regularized Iterative Multiple Correspondence Analysis. J Classif 29, 91–116 (2012). https://doi.org/10.1007/s00357-012-9097-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00357-012-9097-0

Keywords

Navigation