Skip to main content
Log in

Using Generalized Procrustes Analysis for Multiple Imputation in Principal Component Analysis

  • Published:
Journal of Classification Aims and scope Submit manuscript

Abstract

Multiple imputation is one of the most highly recommended procedures for dealing with missing data. However, to date little attention has been paid to methods for combining the results from principal component analyses applied to a multiply imputed data set. In this paper we propose Generalized Procrustes analysis for this purpose, of which its centroid solution can be used as a final estimate for the component loadings. Convex hulls based on the loadings of the imputed data sets can be used to represent the uncertainty due to the missing data. In two simulation studies, the performance of Generalized Procrustes approach is evaluated and compared with other methods. More specifically it is studied how these methods behave when order changes of components and sign reversals of component loadings occur, such as in case of near-equal eigenvalues, or data having almost as many counterindicative items as indicative items. The simulations show that other proposed methods either may run into serious problems or are not able to adequately assess the accuracy due to the presence of missing data. However, when the above situations do not occur, all methods will provide adequate estimates for the PCA loadings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • ACOCK, A., and MARTIN, J.D. (1974), “The Undermeasurement Controversy: Should Ordinal Data be Treated as Interval?”, Sociology and Social Research, 58, 427-433.

    Google Scholar 

  • ALISIC, E., VAN DER SCHOOT, T.A.W, VAN GINKEL, J.R., and KLEBER, R.J. (2008), “Looking Beyond PTSD in Children: Posttraumatic Stress Reactions, Posttraumatic Growth, and Quality of Life”, Journal of Clinical Psychiatry, 69, 1455-1461.

    Article  Google Scholar 

  • ANDERSON, T.W. (1963), “Asymptotic Theory for Principal Component Analysis”, Annals of Mathematical Statistics, 34, 122-148.

    Article  MATH  MathSciNet  Google Scholar 

  • ARCHER, C.O., and JENNRICH, R. I. (1973), “Standard Errors for Rotated Factor Loadings”, Psychometrika, 38, 581-592.

    Article  MATH  MathSciNet  Google Scholar 

  • BAKER, B.O., HARDYCK, C.D., and PETRINOVICH, L.F. (1966), “Weak Measurement vs. Strong Statistics: An Empirical Critique of S. S. Stevens’s Proscriptions on Statistics”, Educational and Psychological Measurement, 26, 291-309.

    Article  Google Scholar 

  • BENZÉCRI, J.P. (1973), L’Analyse des Données. 1. La Taxinomie, 2. L’Analyse de Correspondances, Paris: Dunod.

    Google Scholar 

  • BERNAARDS, C.A., BELIN, T.R., and SCHAFER, J.L. (2007), “Robustness of a Multivariate Normal Approximation for Imputation of Incomplete Binary Data”, Statistics in Medicine, 26, 1368-1382.

    Article  MathSciNet  Google Scholar 

  • BERNAARDS, C.A., and SIJTSMA, K. (1999), “Factor Analysis of Multidimensional Polytomous Items Response Data Suffering from Ignorable Item Nonresponse”, Multivariate Behavioral Research, 34, 277-313.

    Article  Google Scholar 

  • BERNAARDS, C.A., and SIJTSMA, K. (2000), “Influence of Imputation and EM Methods on Factor Analysis when Item Nonresponse in Questionnaire Data is Nonignorable”, Multivariate Behavioral Research, 35, 321-364.

    Article  Google Scholar 

  • BOLLEN, K.A., and BARB, K.H. (1981), “Pearson’s R and Coarsely Categorized Measures”, American Sociological Review, 46, 232-239.

    Article  Google Scholar 

  • CHATTERJEE, S. (1984), “Variance Estimation in Factor Analysis: An Application of the Bootstrap”, British Journal of Mathematical and Statistical Psychology, 37, 252-262.

    Article  Google Scholar 

  • COHEN, J. (1988), Statistical Power Analysis for the Behavioral Sciences (2nd ed.), Hillsdale, NJ: Lawrence Erlbaum Associates.

    MATH  Google Scholar 

  • COMMANDEUR, J.J.F. (1991), Matching Configurations, Leiden, The Netherlands: DSWO Press.

    Google Scholar 

  • COMREY, A.L., and LEE, H.B. (1992), A First Course in Factor Analysis (2nd ed.), Hillsdale, NJ: Lawrence Erlbaum Associates.

    Google Scholar 

  • COSTA, P.T., and MCCRAE, R.R. (1985), The NEO Personality Inventory Manual, Odessa, Florida: Psychological Assessment Resources Inc.

    Google Scholar 

  • D’AUBIGNY, G. (2004), “Une Méthode d’Imputation Multiple, en ACP”, paper presented the XXXVIème Journée de Statistique. Montpellier, France, May 2004.

  • DOERING, T.R., and RAYMOND, H. (1979), “Measurement and Statistics: The Ordinal-Interval Controversy and Geography”, Area, 11, 237-243.

    Google Scholar 

  • GIRSHICK, M.A. (1939), “On the Sampling Theory of Roots of Determinantal Equations”, Annals of Mathematical Statistics, 10, 203-224.

    Article  MathSciNet  Google Scholar 

  • GOWER, J.C. (1971), “Statistical Methods of Comparing Different Multivariate Analyses of the Same Data”, in Mathematics in the Archaeological and Historical Sciences, eds. F.R. Hodson, D.G. Kendall, and P. Tautu, Edinburgh: Edinburgh Univ. Press, pp. 138-149.

    Google Scholar 

  • GOWER, J.C. (1975), “Generalized Procrustes Analysis”, Psychometrika, 40, 33-51.

    Article  MATH  MathSciNet  Google Scholar 

  • GRAHAM, J.W., and SCHAFER, J.L. (1999), “On the Performance of Multiple Imputation for Multivariate Data with Small Sample Size”, in Statistical Strategies for Small Sample Research, ed. R. Hoyle, Thousand Oaks CA: Sage, pp. 1-29.

    Google Scholar 

  • GREEN, B.F. (1952), “An Orthogonal Approximation of an Oblique Structure in Factor Analysis”, Psychometrika, 17, 429-440.

    Article  MATH  MathSciNet  Google Scholar 

  • GREEN, P.J. (1981), “Peeling Bivariate Data”, in Interpreting Multivariate Data, ed. V, Barnett, New York: Wiley, pp. 3-19.

    Google Scholar 

  • GRUNG, B., and MANNE, R. (1998), “Missing Values in Principal Component Analysis”, Chemometrics and Intelligent Laboratory Systems, 42, 125-139.

    Article  Google Scholar 

  • HO, P., SILVA M.C.M., and HOGG T.A. (2001), “Changes in Colour and Phenolic Composition During the Early Stages of Maturation of Port in Wood, Stainless Steel and Glass”, Journal of the Science of Food and Agriculture, 81, 1269-1280.

    Article  Google Scholar 

  • HOCK, E. (1984), “The Transition to Day Care: Effects of Maternal Separation Anxiety on Infant Adjustment”, in The Child and the Day Care Setting, ed. R. Ainslie, New York: Praeger.

    Google Scholar 

  • JOLLIFFE, I.T. (2002), Principal Component Analysis (2nd ed.), New York: Springer.

    MATH  Google Scholar 

  • JOSSE, J., PAGÈS, J., and HUSSON, F. (2011), “Multiple Imputation in PCA”, Advances in Data Analysis and Classification, 5, 231-246.

    Article  MATH  Google Scholar 

  • JOSSE, J., HUSSON, F., and PAGÈS, J. (2009), “Gestion des Données Manquantes en Analyse en Composantes Principales”, Journal de la Société Française de Statistique, 150, 28-51.

    Google Scholar 

  • KIERS, H.A.L. (1997), “Weighted Least Squares Fitting Using Ordinary Least Squares Algorithms”, Psychometrika, 62, 251-266.

    Article  MATH  MathSciNet  Google Scholar 

  • KNAPP. T.R. (1990), “Treating Ordinal Scales as Interval Scales: An attempt to Resolve the Controversy”, Nursing Research, 39, 121-123.

    Article  Google Scholar 

  • KROONENBERG, P.M. (1983), Three-Mode Principal Component Analysis, Leiden, The Netherlands: DSWO Press, accessed January, 2013, from http://three-mode.leidenuniv.nl/

  • KROONENBERG, P.M. (2008), Applied Multiway Data Analysis, Hoboken, NJ: Wiley.

    Book  MATH  Google Scholar 

  • LABOVITZ, S. (1967), “Some Observations on Measurement and Statistics”, Social Forces, 46, 151-160.

    Article  Google Scholar 

  • LINGOES, J.C., and BORG, I. (1978), “A Direct Approach to Individual Differences Scaling Using Increasingly Complex Transformations”, Psychometrika, 43, 491-519.

    Article  MATH  MathSciNet  Google Scholar 

  • LINTING, M., MEULMAN, J.J., GROENEN, P.J.F., and VAN DER KOOIJ, A.J. (2007), “Stability of Nonlinear Principal Components Analysis: An Empirical Study Using the Balanced Bootstrap”, Psychological Methods, 12, 359-379.

    Article  Google Scholar 

  • LITTLE, R.J.A. (1988), “Missing-Data Adjustments in Large Surveys”, Journal of Business and Economic Statistics, 6, 287-296.

    Google Scholar 

  • LITTLE, R.J.A., and RUBIN, D.B. (2002), Statistical Analysis with Missing Data (2nd ed.), New York: Wiley.

    MATH  Google Scholar 

  • MARKUS, M.T. (1994), Bootstrap Confidence Regions in Nonlinear Multivariate Analysis, Leiden: DSWO Press.

    MATH  Google Scholar 

  • MASI, A.T., ALDAG, J.C., and CHATTERTON, R.T. (2006), “Sex Hormones and Risks of Rheumatoid Arthritis and Developmental or Environmental Influences”, Annals of the New York Academy of Sciences, 1069, 223-235.

    Article  Google Scholar 

  • MEULMAN, J. (1982), Homogeneity Analysis of Incomplete Data, Leiden: DSWO Press.

    Google Scholar 

  • MILAN, L., and WHITTAKER, J. (1995), “Application of the Parametric Bootstrap to Models that Incorporate a Singular Value Decomposition”, Applied Statistics, 44, 31-49.

    Article  MATH  Google Scholar 

  • NANDAKUMAR, R., YU, F., LI, H.H., and STOUT, W.F. (1998), “Assessing Unidimensionality of Polytomous Data”, Applied Psychological Measurement, 22, 99-115.

    Article  Google Scholar 

  • NICHD EARLY CHILDCARE RESEARCH NETWORK (1996), “Characteristics of Infant Childcare: Factors Contributing to Positive Caregiving”, Early Childhood Research Quarterly, 11, 269-306.

    Article  Google Scholar 

  • OGASAWARA, H. (2000), “Standard Errors of the Principal Component Loadings for Unstandardized and Standardized Variables”, British Journal of Mathematical and Statistical Psychology, 53, 155-174.

    Article  Google Scholar 

  • OGASAWARA, H. (2002), “Concise Formulas for the Standard Errors of Component Loading Estimates”, Psychometrika, 67, 289-297.

    Article  MathSciNet  Google Scholar 

  • PIANTA, R.C. (1992), Child-Parent Relationship Scale, Charlotsville: University of Virginia.

    Google Scholar 

  • RADLOFF, L.S. (1977), “The CES-D Scale: A Self-Report Depression Scale for Research in the General Population”, Applied Psychological Measurement, 1, 385-401.

    Article  Google Scholar 

  • RAVENS-SIEBERER, U., AUQUIER, P., ERHART, M., GOSCH, A., RAJMIL, L., BRUIL, J., POWER, M., DUER, W., CLOETTA, B., CZEMY, L., MAZUR, J., CZIMBALMOS, A., TOUNTAS, Y., HAGQUIST, C., KILROE, J, and the EUROPEAN KIDSCREEN GROUP (2007), “The KIDSCREEN-27 for Children and Adolescents: Psychometric Results from a Cross-Cultural Survey in 13 European Countries”, Quality of Life Research, 16, 1347-1356.

    Article  Google Scholar 

  • ROUSSEEUW, P.J., RUTS, I., and TUKEY, J.W. (1999), “The Bagplot: a Bivariate Boxplot”, The American Statistician, 53, 382-387.

    Google Scholar 

  • RUBIN, D.B. (1976), “Inference and Missing Data”, Biometrika, 63, 581-592.

    Article  MATH  MathSciNet  Google Scholar 

  • RUBIN, D.B. (1986), “Statistical Matching Using File Concatenation with Adjusted Weights and Multiple Imputations”, Journal of Business and Economic Statistics 4, 87-94.

    Google Scholar 

  • RUBIN, D.B. (1987), Multiple Imputation for Nonresponse in Surveys, New York: Wiley.

    Book  Google Scholar 

  • SCHAFER, J.L. (1997), Analysis of Incomplete Multivariate Data, London: Chapman and Hall.

    Book  MATH  Google Scholar 

  • SCHAFER, J.L. (1998), NORM: Version 2.02 for Windows 95/98/NT, accessed January, 2013, from http://www.stat.psu.edu/~jls/misoftwa.html

  • S-PLUS 7 for WINDOWS [Computer software], (2007), Seattle, WA: Insightful Corporation.

  • SPSS INC. (2011), SPSS 19.0 for Windows [Computer software], Chicago: SPSS.

    Google Scholar 

  • SU, Y.S., GELMAN, A., HILL, J., and YAJIMA, M. (2011), “Multiple Imputation with Diagnostics (mi) in R: Opening Windows into the Black Box”, Journal of Statistical Software, 45, 1-31.

    Google Scholar 

  • TAKANE, Y., and OSHIMA-TAKANE, Y. (2003), “Relationship Between Two Methods for Dealing with Missing Data in Principal Component Analysis”, Behaviormetrika, 30, 145-154.

    Article  MATH  MathSciNet  Google Scholar 

  • TEN BERGE, J.M.F. (1977), “Orthogonal Procrustes Rotation for Two or More Matrices”, Psychometrika, 42, 267-275.

    Article  MATH  MathSciNet  Google Scholar 

  • TIMMERMAN, M.E., KIERS, H.A.L., and SMILDE, A.K. (2007), “Estimating Confidence Intervals for Principal Component Loadings: A Comparison Between the Bootstrap and Asymptotic Results”, British Journal of Mathematical and Statistical Psychology, 60, 295-314.

    Article  Google Scholar 

  • TUCKER, L.R. (1951), “A Method for Synthesis of Factor Analysis Studies”, Personnel Research Section Report No. 984, Washington, DC: Department of the Army.

  • VAN BUUREN, S. (2010), “Item Imputation Without Specifying Scale Structure”, Methodology: European Journal of Research Methods for the Behavioral and Social Sciences, 6, 31-36.

    Google Scholar 

  • VAN BUUREN, S., BRAND, J.P.L., GROOTHUIS-OUDHOORN, C.G.M., and RUBIN, D.B. (2006), “Fully Conditional Specification in Multivariate Imputation”, Journal of Statistical Computation and Simulation, 76, 1049-1064.

    Article  MATH  MathSciNet  Google Scholar 

  • VAN GINKEL, J.R. (2010), “Investigation of Multiple Imputation in Low-Quality Questionnaire Data”, Multivariate Behavioral Research, 45, 574-598.

    Article  Google Scholar 

  • VAN GINKEL, J.R., and KIERS, H.A.L. (2011), “Constructing Bootstrap Confidence Intervals for Principal Component Loadings in the Presence of Missing Data: A Multiple-Imputation Approach”, British Journal of Mathematical and Statistical Psychology, 64, 498-515.

    Article  MathSciNet  Google Scholar 

  • VAN GINKEL J.R., and KROONENBERG, P.M. (2009), “Using Generalized Procrustes Analysis to Combine the Results from Principal Components Analysis in Multiple Imputation”, presentation given at the 16th International Meeting of the Psychometric Society, Cambridge, July 2009.

  • VAN GINKEL, J.R., VAN DER ARK, L.A., SIJTSMA, K., and VERMUNT, J.K. (2007), “Two-Way Imputation: A Bayesian Method for Estimating Missing Scores in Tests and Questionnaires, and an Accurate Approximation”, Computational Statistics and Data Analysis, 51, 4013-4027.

    Article  MATH  MathSciNet  Google Scholar 

  • WEISSTEIN, E.W., “Heron's Formula, MathWorld-A Wolfram Web Resource, accessed, January, 2013 from http://mathworld.wolfram.com/HeronsFormula.html

  • WENTZELL, P.D., ANDREWS, D.T., HAMILTON, D.C., FABER, K., and KOWALSKI, B.R. (1997), “Maximum Likelihood Principal Component Analysis”, Journal of Chemometrics, 11, 339-366.

    Article  Google Scholar 

  • YUAN, Y.C. (2011), “Multiple Imputation using SAS Software”, Journal of Statistical Software, 45, 1-25.

    Google Scholar 

  • ZUCCOLOTTO, P. (2008), “A Symbolic Data Approach for Missing Values Treatment in Principal Components Analysis,” Statistica Applicazioni, 6, 153-180.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joost R. van Ginkel.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

van Ginkel, J.R., Kroonenberg, P.M. Using Generalized Procrustes Analysis for Multiple Imputation in Principal Component Analysis. J Classif 31, 242–269 (2014). https://doi.org/10.1007/s00357-014-9154-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00357-014-9154-y

Keywords

Navigation