Abstract
In this paper, the statistical significance of the contribution of variables to the principal components in principal components analysis (PCA) is assessed nonparametrically by the use of permutation tests. We compare a new strategy to a strategy used in previous research consisting of permuting the columns (variables) of a data matrix independently and concurrently, thus destroying the entire correlational structure of the data. This strategy is considered appropriate for assessing the significance of the PCA solution as a whole, but is not suitable for assessing the significance of the contribution of single variables. Alternatively, we propose a strategy involving permutation of one variable at a time, while keeping the other variables fixed. We compare the two approaches in a simulation study, considering proportions of Type I and Type II error. We use two corrections for multiple testing: the Bonferroni correction and controlling the False Discovery Rate (FDR). To assess the significance of the variance accounted for by the variables, permuting one variable at a time, combined with FDR correction, yields the most favorable results. This optimal strategy is applied to an empirical data set, and results are compared with bootstrap confidence intervals.
Similar content being viewed by others
References
Agresti, A., & Coull, B.A. (1998). Approximate is better than ‘exact’ for interval estimation of binomial proportions. The American Statistician, 52, 119–126.
Anderson, M.J., & Ter Braak, C.J.F. (2003). Permutation tests for multi-factorial analysis of variance. Journal of Statistical Computation and Simulation, 73, 85–113.
Anderson, T.W. (1963). Asymptotic theory for principal component analysis. Annals of Mathematical Statistics, 34, 122–148.
Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B. Methodological, 57, 289–300.
Buja, A., & Eyuboglu, N. (1992). Remarks on parallel analysis. Multivariate Behavioral Research, 27, 509–540.
Cohen, J. (1994). The earth is round (p<0.05). The American Psychologist, 49, 997–1003.
De Leeuw, J., & Van der Burg, E. (1986). The permutational limit distribution of generalized canonical correlations. In Diday, E. (Ed.). Data analysis and informatics, IV, pp. 509–521. Amsterdam: Elsevier.
Dietz, E.J. (1983). Permutation tests for association between two distance matrices. Systematic Zoology, 32, 21–26.
Douglas, M.E., & Endler, J.A. (1982). Quantitative matrix comparisons in ecological and evolutionary investigations. Journal of Theoretical Biology, 99, 777–795.
Fabrigar, L.R., Wegener, D.T., MacCallum, R.C., & Strahan, E.J. (1999). Evaluating the use of exploratory factor analysis in psychological research. Psychological Methods, 4, 272–299.
Fisher, R.A. (1935). The design of experiments. Edinburgh: Oliver and Boyd.
Girshick, M.A. (1939). On the sampling theory of roots of determinantal equations. Annals of Mathematical Statistics, 10, 203–224.
Glick, B.J. (1979). Tests for space-time clustering used in cancer research. Geographical Analysis, 11, 202–208.
Gliner, J., Leech, N., & Morgan, G. (2002). Problems with null hypothesis significance testing (NHST): What do the textbooks say? Journal of Experimental Education, 71, 83–92.
Good, P.I. (2000). Permutation tests: A practical guide to resampling methods for testing hypotheses. New York: Springer.
Heiser, W.J., & Meulman, J.J. (1994). Homogeneity analysis: Exploring the distribution of variables and their nonlinear relationships. In Greenacre, M., & Blasius, J. (Eds.). Correspondence analysis in the social sciences: recent developments and applications (pp. 179–209). New York: Academic Press.
Horney, K. (1945). Our inner conflicts: a constructive theory of neurosis. New York: Norton.
Hubert, L.J. (1984). Statistical applications of linear assignment. Psychometrika, 49, 449–473.
Hubert, L.J. (1985). Combinatorial data analysis: association and partial association. Psychometrika, 50, 449–467.
Hubert, L.J. (1987). Assignment methods in combinatorial data analysis. New York: Marcel Dekker.
Hubert, L.J., & Schultz, J. (1976). Quadratic assignment as a general data analysis strategy. British Journal of Mathematical & Statistical Psychology, 29, 190–241.
Jolliffe, I.T. (2002). Principal component analysis. New York: Springer.
Keselman, H., Cribbie, R., & Holland, B. (1999). The pairwise multiple comparison multiplicity problem: an alternative approach to familywise and comparisonwise Type I error control. Psychological Methods, 4, 58–69.
Killeen, P.R. (2005). An alternative to null-hypothesis significance tests. Psychological Science, 16, 345–353.
Killeen, P.R. (2006). Beyond statistical inference: a decision theory for science. Psychonomic Bulletin & Review, 13, 549–562.
Landgrebe, J., Wurst, W., & Welzl, G. (2002). Permutation-validated principal components analysis of microarray data. Genome Biology, 3, 0019.
Lin, S.P., & Bendel, R.B., (1985). Algorithm AS 213: generation of population correlation matrices with specified eigenvalues. Applied Statistics, 34, 193–198.
Linting, M., Meulman, J.J., Groenen, P.J.F., & van der Kooij, A.J. (2007a). Nonlinear principal components analysis: introduction and application. Psychological Methods, 12, 336–358.
Linting, M., Meulman, J.J., Groenen, P.J.F., & van der Kooij, A.J. (2007b). Stability of nonlinear principal components analysis: an empirical study using the balanced bootstrap. Psychological Methods, 12, 359–379.
Mantel, N. (1967). The detection of disease clustering and a generalized regression approach. Cancer Research, 27, 209–220.
Meulman, J.J. (1992). The integration of multidimensional scaling and multivariate analysis with optimal transformations of the variables. Psychometrika, 57, 539–565.
Meulman, J.J. (1993). Nonlinear principal coordinates analysis: minimizing the sum of squares of the smallest eigenvalues. British Journal of Mathematical & Statistical Psychology, 46, 287–300.
Meulman, J.J. (1996). Fitting a distance model to homogeneous subsets of variables: points of view analysis of categorical data. Journal of Classification, 13, 249–266.
Meulman, J.J., Van der Kooij, A.J., & Heiser, W.J. (2004). Principal components analysis with nonlinear optimal scaling transformations for ordinal and nominal data. In Kaplan, D. (Ed.), Handbook of quantitative methodology for the social sciences (pp. 49–70). London: Sage Publications.
NICHD Early Child Care Research Network (1996). Characteristics of infant child care: factors contributing to positive caregiving. Early Childhood Research Quarterly, 11, 269–306.
Noreen, E.W. (1989). Computer intensive methods for testing hypotheses. New York: Wiley.
Ogasawara, H. (2004). Asymptotic biases of the unrotated/rotated solutions in principal component analysis. British Journal of Mathematical & Statistical Psychology, 57, 353–376.
Peres-Neto, P.R., Jackson, D.A., & Somers, K.M. (2003). Giving meaningful interpretation to ordination axes: assessing loading significance in principal component analysis. Ecology, 84, 2347–2363.
Shaffer, J.P. (1995). Multiple hypothesis testing. Annual Review of Psychology, 46, 561–584.
Smouse, P.E., Long, J., & Sokal, R.R. (1985). Multiple regression and correlation extensions of the Mantel test of matrix correspondence. Systematic Zoology, 35, 627–632.
Sokal, R.R. (1979). Testing statistical significance of geographical variation. Systematic Zoology, 28, 227–232.
Ter Braak, C.J.F. (1992). Permutation versus bootstrap significance tests in multiple regression and ANOVA. In Jöckel, K.H., Rothe, G., & Sendler, W. (Eds.), Bootstrapping and related techniques (pp. 79–86). Berlin: Springer.
Timmerman, M.E., Kiers, H.A.L., & Smilde, A.K. (2007). Estimating confidence intervals for principal component loadings: a comparison between the bootstrap and asymptotic results. British Journal of Mathematical & Statistical Psychology, 60, 295–314.
Verhoeven, K., Simonsen, K., & McIntyre, L. (2005). Implementing false discovery rate control: increasing your power. Oikos, 108, 643–647.
Wilson, E.B. (1927). Probable inference, the law of succession, and statistical inference. Journal of the American Statistical Association, 22, 209–212.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Linting, M., van Os, B.J. & Meulman, J.J. Statistical Significance of the Contribution of Variables to the PCA solution: An Alternative Permutation Strategy. Psychometrika 76, 440–460 (2011). https://doi.org/10.1007/s11336-011-9216-6
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11336-011-9216-6