Psychometrika

, Volume 82, Issue 1, pp 186–209 | Cite as

Considering Horn’s Parallel Analysis from a Random Matrix Theory Point of View

Article

Abstract

Horn’s parallel analysis is a widely used method for assessing the number of principal components and common factors. We discuss the theoretical foundations of parallel analysis for principal components based on a covariance matrix by making use of arguments from random matrix theory. In particular, we show that (i) for the first component, parallel analysis is an inferential method equivalent to the Tracy–Widom test, (ii) its use to test high-order eigenvalues is equivalent to the use of the joint distribution of the eigenvalues, and thus should be discouraged, and (iii) a formal test for higher-order components can be obtained based on a Tracy–Widom approximation. We illustrate the performance of the two testing procedures using simulated data generated under both a principal component model and a common factors model. For the principal component model, the Tracy–Widom test performs consistently in all conditions, while parallel analysis shows unpredictable behavior for higher-order components. For the common factor model, including major and minor factors, both procedures are heuristic approaches, with variable performance. We conclude that the Tracy–Widom procedure is preferred over parallel analysis for statistically testing the number of principal components based on a covariance matrix.

Keywords

covariance matrix principal component analysis common factor analysis number of principal components number of common factors 

List of Symbols

\({\mathbf {X}}\)

Matrices (bold font, uppercase)

\(x_{ij}\)

Element of \({\mathbf {X}}\) in the i-th row, j-th column

\({\varvec{\Sigma }}\)

Population covariance matrix

\({\mathbf {C}}\)

Sample covariance matrix

\(\lambda _k\)

kth eigenvalue of the population covariance matrix

\(l_k\)

kth eigenvalue of the sample covariance matrix

\(L_k\)

Tracy–Widom statistic for the \(l_k\)

s

Argument of the Tracy–Widom cdf and pdf

References

  1. Airy, G. (1838). On the intensity of light in the neighbourhood of a caustic. Transactions of the Cambridge Philosophical Society, 6, 379–402.Google Scholar
  2. Baik, J., Ben Arous, G., & Péché, S. (2005). Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices. Annals of Probability, 33, 1643–1697.CrossRefGoogle Scholar
  3. Baik, J., & Silverstein, J. W. (2006). Eigenvalues of large sample covariance matrices of spiked population models. Journal of Multivariate Analysis, 97(6), 1382–1408.CrossRefGoogle Scholar
  4. Bao, Z., Pan, G., & Zhou, W. (2012). Tracy-Widom law for the extreme eigenvalues of sample correlation matrices. Electronic Journal of Probability, 17(88), 1–32.Google Scholar
  5. Barelds, D. P., & Dijkstra, P. (2010). Narcissistic personality inventory: Structure of the adapted Dutch version. Scandinavian Journal of Psychology, 51(2), 132–138.CrossRefPubMedGoogle Scholar
  6. Bartlett, M. S. (1950). Tests of significance in factor analysis. British Journal of Statistical Psychology, 3(2), 77–85.CrossRefGoogle Scholar
  7. Bornemann, F. (2009). On the numerical evaluation of distributions in random matrix theory: A review. arXiv:0904.1581.
  8. Bornemann, F. (2010). On the numerical evaluation of Fredholm determinants. Mathematics of Computation, 79(270), 871–915.CrossRefGoogle Scholar
  9. Buja, A., & Eyübğolu, N. (1992). Remarks on parallel analysis. Multivariate Behavioral Research, 27(4), 509–540.CrossRefPubMedGoogle Scholar
  10. Cattell, R. B. (1966). The scree test for the number of factors. Multivariate Behavioral Research, 1(2), 245–276.CrossRefPubMedGoogle Scholar
  11. Ceulemans, E., & Kiers, H. A. (2006). Selecting among three-mode principal component models of different types and complexities: A numerical convex hull based method. British Journal of Mathematical and Statistical Psychology, 59(1), 133–150.CrossRefPubMedGoogle Scholar
  12. Chiani, M. (2012). Distribution of the largest eigenvalue for real Wishart and Gaussian random matrices and a simple approximation for the Tracy–Widom distribution. arXiv:1209.3394.
  13. Crawford, A. V., Green, S. B., Levy, R., Lo, W.-J., Scott, L., Svetina, D., et al. (2010). Evaluation of parallel analysis methods for determining the number of factors. Educational and Psychological Measurement, 70(6), 885–901.CrossRefGoogle Scholar
  14. Deming, W. E. (1966). Some theory of sampling. New York: Courier Dover Publications.Google Scholar
  15. DiCiccio, T. J., & Efron, B. (1996). Bootstrap confidence intervals. Statistical Science, 11, 189–212.CrossRefGoogle Scholar
  16. Dinno, A. (2009). Exploring the sensitivity of horn’s parallel analysis to the distributional form of random data. Multivariate Behavioral Research, 44(3), 362–388.CrossRefPubMedPubMedCentralGoogle Scholar
  17. Efron, B.,&Tibshirani, R. J. (1993). The bootstrap estimate of standard error. In An introduction to the bootstrap (pp.45–59). New York: Springer.Google Scholar
  18. Efron, B. (1994). Missing data, imputation, and the bootstrap. Journal of the American Statistical Association, 89(426), 463–475.CrossRefGoogle Scholar
  19. Fabrigar, L. R., Wegener, D. T., MacCallum, R. C., & Strahan, E. J. (1999). Evaluating the use of exploratory factor analysis in psychological research. Psychological Methods, 4(3), 272.CrossRefGoogle Scholar
  20. Ford, J. K., MacCallum, R. C., & Tait, M. (1986). The application of exploratory factor analysis in applied psychology: A critical review and analysis. Personnel Psychology, 39(2), 291–314.CrossRefGoogle Scholar
  21. Garrido, L. E., Abad, F. J., & Ponsoda, V. (2013). A new look at Horn’s parallel analysis with ordinal variables. Psychological Methods, 18(4), 454.CrossRefPubMedGoogle Scholar
  22. Glorfeld, L. W. (1995). An improvement on horn’s parallel analysis methodology for selecting the correct number of factors to retain. Educational and Psychological Measurement, 55(3), 377–393.CrossRefGoogle Scholar
  23. Green, S. B., Levy, R., Thompson, M. S., Lu, M., Lo, W.-J. (2012). A proposed solution to the problem with using completely random data to assess the number of factors with parallel analysis. Educational and Psychological Measurement, 72(3) 357–374. http://epm.sagepub.com/content/72/3/357.abstract doi:10.1177/0013164411422252
  24. Guttman, L. (1954). Some necessary conditions for common-factor analysis. Psychometrika, 19(2), 149–161.CrossRefGoogle Scholar
  25. Harding, M. C. (2008). Explaining the single factor bias of arbitrage pricing models in finite samples. Economics Letters, 99(1), 85–88.CrossRefGoogle Scholar
  26. Hastings, S., & McLeod, J. (1980). A boundary value problem associated with the second Painleve transcendent and the Korteweg-de Vries equation. Archive for Rational Mechanics and Analysis, 73(1), 31–51.CrossRefGoogle Scholar
  27. Hattie, J. (1985). Methodology review: assessing unidimensionality of tests and ltenls. Applied Psychological Measurement, 9(2), 139–164.CrossRefGoogle Scholar
  28. Hayton, J. C., Allen, D. G., & Scarpello, V. (2004). Factor retention decisions in exploratory factor analysis: A tutorial on parallel analysis. Organizational Research Methods, 7(2), 191–205.CrossRefGoogle Scholar
  29. Horn, J. L. (1965). A rationale and test for the number of factors in factor analysis. Psychometrika, 30(2), 179–185.CrossRefPubMedGoogle Scholar
  30. Humphreys, L. G., & Montanelli, R. G, Jr. (1975). An investigation of the parallel analysis criterion for determining the number of common factors. Multivariate Behavioral Research, 10(2), 193–205.CrossRefGoogle Scholar
  31. Jackson, D. A. (1993). Stopping rules in principal components analysis: A comparison of heuristical and statistical approaches. Ecology, 74(8), 2204–2214.CrossRefGoogle Scholar
  32. Johnstone, I. M. (2006). High dimensional statistical inference and random matrices. arXiv:math/0611589.
  33. Johnstone, I. M., Ma, Z., Perry, P. O. Shahram, M. (2009). Rmtstat: Distributions, statistics and tests derived from random matrix theory [Computersoftwaremanual]. (R package version 0.2)Google Scholar
  34. Johnstone, I. M. (2001). On the distribution of the largest eigenvalue in principal components analysis. Annals of Statistics, 29(2), 295–327.CrossRefGoogle Scholar
  35. Jolliffe, I. (2005). Principal component analysis. New York: Wiley.CrossRefGoogle Scholar
  36. Karoui, N. E. (2003). On the largest eigenvalue of Wishart matrices with identity covariance when n, p and p/n tend to infinity. arXiv:math/0309355.
  37. Karoui, N. E. (2007). Tracy–Widom limit for the largest eigenvalue of a large class of complex sample covariance matrices. The Annals of Probability, 35, 663–714.CrossRefGoogle Scholar
  38. Kendall, M. G., & Yule, G. U. (1950). An introduction to the theory of statistics. London: Charles Griffin & Company.Google Scholar
  39. Koster, M., Timmerman, M. E., Nakken, H., Pijl, S. J., & van Houten, E. J. (2009). Evaluating social participation of pupils with special needs in regular primary schools. European Journal of Psychological Assessment, 25(4), 213–222.CrossRefGoogle Scholar
  40. Kritchman, S., & Nadler, B. (2008). Determining the number of components in a factor model from limited noisy data. Chemometrics and Intelligent Laboratory Systems, 94(1), 19–32.CrossRefGoogle Scholar
  41. Kuppens, P., Ceulemans, E., Timmerman, M. E., Diener, E., & Kim-Prieto, C. (2006). Universal intracultural and intercultural dimensions of the recalled frequency of emotional experience. Journal of Cross-Cultural Psychology, 37(5), 491–515.CrossRefGoogle Scholar
  42. Ledesma, R. D., & Valero-Mora, P. (2007). Determining the number of factors to retain in EFA: an easy-to-use computer program for carrying out parallel analysis. Practical Assessment, Research & Evaluation, 12(2), 1–11.Google Scholar
  43. Lorenzo-Seva, U., Timmerman, M. E., & Kiers, H. A. (2011). The hull method for selecting the number of common factors. Multivariate Behavioral Research, 46(2), 340–364.CrossRefPubMedGoogle Scholar
  44. Pan, G. (2012). Comparison between two types of large sample covariance matrices. In Institut Henri Poincaré: Ann.Google Scholar
  45. Patterson, N., Price, A. L., & Reich, D. (2006). Population structure and eigenanalysis. PLoS Genetics, 2(12), e190.CrossRefPubMedPubMedCentralGoogle Scholar
  46. Paul, D. (2007). Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Statistica Sinica, 17(4), 1617.Google Scholar
  47. Paul, D., & Aue, A. (2014). Random matrix theory in statistics: A review. Journal of Statistical Planning and Inference, 150, 1–29.CrossRefGoogle Scholar
  48. Peres-Neto, P. R., Jackson, D. A., & Somers, K. M. (2005). How many principal components? stopping rules for determining the number of non-trivial axes revisited. Computational Statistics & Data Analysis, 49(4), 974–997.CrossRefGoogle Scholar
  49. Pillai, N. S., & Yin, J. (2012). Edge universality of correlation matrices. The Annals of Statistics, 40(3), 1737–1763.CrossRefGoogle Scholar
  50. Raskin, R., & Hall, C. (1979). A narcissistic personality inventory. Psychological Reports, 45(2), 590–590.CrossRefPubMedGoogle Scholar
  51. Raskin, R., & Terry, H. (1988). A principal-components analysis of the narcissistic personality inventory and further evidence of its construct validity. Journal of Personality and Social Psychology, 54(5), 890.CrossRefPubMedGoogle Scholar
  52. Rice, S., & Church, M. (1996). Sampling surficial fluvial gravels: the precision of size distribution percentile estimates. Journal of Sedimentary Research, 66(3), 654–665.CrossRefGoogle Scholar
  53. Saccenti, E., Smilde, A. K., Westerhuis, J. A., & Hendriks, M. M. (2011). Tracy-Widom statistic for the largest eigenvalue of autoscaled real matrices. Journal of Chemometrics, 25(12), 644–652.CrossRefGoogle Scholar
  54. Saccenti, E., & Camacho, J. (2015). Determining the number of components in principal components analysis: A comparison of statistical, crossvalidation and approximated methods. Chemometrics and Intelligent Laboratory Systems, 149(Part A), 99–116.Google Scholar
  55. Saccenti, E., & Timmerman, M. E. (2016). Approaches to sample size determination for multivariate data: Applications to PCA and PLS-DA of omics data. Journal of Proteome Research, 15, 2379–2393.Google Scholar
  56. Smits, I. A., Timmerman, M. E., & Meijer, R. R. (2012). Exploratory Mokken scale analysis as a dimensionality assessment tool why scalability does not imply unidimensionality. Applied Psychological Measurement, 36(6), 516–539.CrossRefGoogle Scholar
  57. Soshnikov, A. (2002). A note on universality of the distribution of the largest eigenvalues in certain sample covariance matrices. Journal of Statistical Physics, 108(5), 1033–1056.CrossRefGoogle Scholar
  58. Thompson, B. (2004). Exploratory and confirmatory factor analysis: Understanding concepts and applications. Washington, DC: American Psychological.CrossRefGoogle Scholar
  59. Timmerman, M. E., & Lorenzo-Seva, U. (2011). Dimensionality assessment of ordered polytomous items with parallel analysis. Psychological Methods, 16(2), 209.CrossRefPubMedGoogle Scholar
  60. Tracy, C. A., Widom, H. (2009). The distributions of random matrix theory and their applications. In New trends in mathematical physics (pp. 753–765). Springer: New York.Google Scholar
  61. Tracy, C. A., & Widom, H. (1993). Level-spacing distributions and the airy kernel. Physics Letters B, 305(1), 115–118.CrossRefGoogle Scholar
  62. Tracy, C. A., & Widom, H. (1994). Level-spacing distributions and the airy kernel. Communications in Mathematical Physics, 159(1), 151–174.CrossRefGoogle Scholar
  63. Tracy, C. A., & Widom, H. (1996). On orthogonal and symplectic matrix ensembles. Communications in Mathematical Physics, 177(3), 727–754.CrossRefGoogle Scholar
  64. Tucker, L. R., Koopman, R. F., & Linn, R. L. (1969). Evaluation of factor analytic research procedures by means of simulated correlation matrices. Psychometrika, 34(4), 421–459.CrossRefGoogle Scholar
  65. Wilderjans, T. F., Ceulemans, E., & Meers, K. (2013). CHull: A generic convex-hull-based model selection method. Behavior Research Methods, 45(1), 1–15.CrossRefPubMedGoogle Scholar
  66. Wishart, J. (1928). The generalised product moment distribution in samples from a normal multivariate population. Biometrika pp. 32–52.Google Scholar
  67. Zwick, W. R., & Velicer, W. F. (1982). Factors influencing four rules for determining the number of components to retain. Multivariate Behavioral Research, 17(2), 253–269.CrossRefPubMedGoogle Scholar

Copyright information

© The Psychometric Society 2016

Authors and Affiliations

  1. 1.Laboratory of Systems and Synthetic BiologyWageningen UniversityWageningenThe Netherlands
  2. 2.Department Psychometrics & StatisticsUniversity of GroningenGroningenNetherlands

Personalised recommendations