Retrieving relevant factors with exploratory SEM and principal-covariate regression: A comparison

  • Marlies Vervloet
  • Wim Van den Noortgate
  • Eva Ceulemans
Article

Abstract

Behavioral researchers often linearly regress a criterion on multiple predictors, aiming to gain insight into the relations between the criterion and predictors. Obtaining this insight from the ordinary least squares (OLS) regression solution may be troublesome, because OLS regression weights show only the effect of a predictor on top of the effects of other predictors. Moreover, when the number of predictors grows larger, it becomes likely that the predictors will be highly collinear, which makes the regression weights’ estimates unstable (i.e., the “bouncing beta” problem). Among other procedures, dimension-reduction-based methods have been proposed for dealing with these problems. These methods yield insight into the data by reducing the predictors to a smaller number of summarizing variables and regressing the criterion on these summarizing variables. Two promising methods are principal-covariate regression (PCovR) and exploratory structural equation modeling (ESEM). Both simultaneously optimize reduction and prediction, but they are based on different frameworks. The resulting solutions have not yet been compared; it is thus unclear what the strengths and weaknesses are of both methods. In this article, we focus on the extents to which PCovR and ESEM are able to extract the factors that truly underlie the predictor scores and can predict a single criterion. The results of two simulation studies showed that for a typical behavioral dataset, ESEM (using the BIC for model selection) in this regard is successful more often than PCovR. Yet, in 93% of the datasets PCovR performed equally well, and in the case of 48 predictors, 100 observations, and large differences in the strengths of the factors, PCovR even outperformed ESEM.

Keywords

Principal-covariate regression Exploratory structural equation modeling Multicollinearity Dimension reduction 

Notes

Author note

The research leading to the results reported in this article was supported in part by the Research Fund of KU Leuven (GOA/15/003); by the Interuniversity Attraction Poles program, financed by the Belgian government (IAP/P7/06); and by a postdoctoral fellowship awarded to M.V. by the Research Council of KU Leuven (PDM/17/071). For the simulations, we used the infrastructure of the VSC–Flemish Supercomputer Center, funded by the Hercules foundation and the Flemish government, Department EWI.

References

  1. Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, AC-19, 716–723. doi: https://doi.org/10.1109/TAC.1974.1100705 CrossRefGoogle Scholar
  2. Asparouhov, T., & Muthén, B. (2009). Exploratory structural equation modeling. Structural Equation Modeling, 16, 397–438.CrossRefGoogle Scholar
  3. Bollen, K., & Lennox, R. (1991). Conventional wisdom on measurement: A structural equation perspective. Psychological Bulletin, 110, 305–314. doi: https://doi.org/10.1037/0033-2909.110.2.305 CrossRefGoogle Scholar
  4. Borsboom, G., Mellenbergh, G. J., & Van Heerden, J. (2003). The theoretical status of latent variables. Psychological Review, 110, 203–219.CrossRefPubMedGoogle Scholar
  5. Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.CrossRefGoogle Scholar
  6. Browne, M. W. (1972a). Oblique rotation to a partially specified target. British Journal of Mathematical and Statistical Psychology, 25, 207–212.CrossRefGoogle Scholar
  7. Browne, M. W. (1972b). Orthogonal rotation to a partially specified target. British Journal of Mathematical and Statistical Psychology, 25, 115–120.CrossRefGoogle Scholar
  8. Browne, M. W. (2001). An overview of analytic rotation in exploratory factor analysis. Multivariate Behavioral Research, 36, 111–150.CrossRefGoogle Scholar
  9. Buckley, P. F., Miller, B. J., Lehrer, D. S., & Castle, D. J. (2009). Psychiatric comorbidities and schizophrenia. Schizophrenia Bulletin, 35, 383–402. doi: https://doi.org/10.1093/schbul/sbn135 CrossRefPubMedGoogle Scholar
  10. Bulteel, K., Tuerlinckx, F., Brose, A., & Ceulemans, E. (2016). Using raw VAR regression coefficients to build networks can be misleading. Multivariate Behavioral Research, 51, 330–344. doi: https://doi.org/10.1080/00273171.2016.1150151 CrossRefPubMedGoogle Scholar
  11. Bulteel, K., Wilderjans, T. F., Tuerlinckx, F., & Ceulemans, E. (2013). CHull as an alternative to AIC and BIC in the context of mixtures of factor analyzers. Behavior Research Methods, 45, 782–791. doi: https://doi.org/10.3758/s13428-012-0293-y CrossRefPubMedGoogle Scholar
  12. Carroll, J. B. (1953). An analytical solution for approximating simple structure in factor analysis. Psychometrika, 18, 23–38.CrossRefGoogle Scholar
  13. Cattell, R. B. (1966). The scree test for the number of factors. Multivariate Behavioral Research, 1, 245–276. doi: https://doi.org/10.1207/s15327906mbr0102_10 CrossRefPubMedGoogle Scholar
  14. Ceulemans, E., & Kiers, H. A. L. (2006). Selecting among three-mode principal component models of different types and complexities: A numerical convex hull based method. British Journal of Mathematical and Statistical Psychology, 59, 133–150. doi: https://doi.org/10.1348/000711005X64817 CrossRefPubMedGoogle Scholar
  15. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum.Google Scholar
  16. Coltman, T., Devinney, T. M., Midgley, D. F., & Veniak, S. (2008). Formative versus reflective measurement. Journal of Business Research, 61, 1250–1262.CrossRefGoogle Scholar
  17. Cureton, E. E., & Mulaik, S. A. (1975). The weighted varimax rotation and the promax rotation. Psychometrika, 40, 183–195.CrossRefGoogle Scholar
  18. De Jong, S., & Kiers, H. A. (1992). Principal covariates regression: Part I. Theory. Chemometrics and Intelligent Laboratory Systems, 14, 155–164.CrossRefGoogle Scholar
  19. Doove, L. L., Wilderjans, T. F., Calcagnì, A., & Van Mechelen, I. (2017). Deriving optimal data-analytic regimes from benchmarking studies. Computational Statistics and Data Analysis, 107, 81–91.CrossRefGoogle Scholar
  20. Filzmoser, P., Liebmann, B., & Varmuza, K. (2009). Repeated double cross validation. Journal of Chemometrics, 23, 160–171.CrossRefGoogle Scholar
  21. Guadagnoli, E., & Velicer, W. F. (1988). Relation of sample size to the stability of component patterns. Psychological Bulletin, 103, 265–275. doi: https://doi.org/10.1037/0033-2909.103.2.265 CrossRefPubMedGoogle Scholar
  22. Gurden, S. (n.d.). Multiway covariates regression: Discussion facilitator. Unpublished manuscript.Google Scholar
  23. Hastie, T., Tibshirani, R., & Friedman, J. (2001). The elements of statistical learning: Data mining, inference and prediction. New York, NY: Springer.CrossRefGoogle Scholar
  24. Hirschfeld, R. M. (2001). The comorbidity of major depression and anxiety disorders: Recognition and management in primary care. Primary Care Companion to the Journal of Clinical Psychiatry, 3, 244–254.CrossRefPubMedPubMedCentralGoogle Scholar
  25. Hocking, R. R. (1976). The analysis and selection of variables in linear regression. Biometrics, 32, 1–50.CrossRefGoogle Scholar
  26. Hoerl, A. E., & Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12, 55–67.CrossRefGoogle Scholar
  27. Jennrich, R. I. (2001). A simple general procedure for orthogonal rotation. Psychometrika, 66, 289–306. doi: https://doi.org/10.1007/BF02294840 CrossRefGoogle Scholar
  28. Jennrich, R. I. (2002). A simple general method for oblique rotation. Psychometrika, 67, 7–19. doi: https://doi.org/10.1007/BF02294706 CrossRefGoogle Scholar
  29. Johnson, J. W. (2000). A heuristic method for estimating the relative weight of predictor variables in multiple regression. Multivariate Behavioral Research, 35, 1–19.CrossRefPubMedGoogle Scholar
  30. Jolliffe, I. T. (1982). A note on the use of principal components in regression. Applied Statistics, 31, 300–303.CrossRefGoogle Scholar
  31. Kaiser, H. F. (1958). The varimax criterion for analytic rotation in factor analysis. Psychometrika, 23, 187–200. doi: https://doi.org/10.1007/BF02289233 CrossRefGoogle Scholar
  32. Kiers, H. A., & Smilde, A. K. (2007). A comparison of various methods for multivariate regression with highly collinear variables. Statistical Methods and Applications, 16, 193–228.CrossRefGoogle Scholar
  33. Kline, R. B. (2015). Principles and practice of structural equation modeling (4th ed.). New York, NY: Guilford.Google Scholar
  34. Lorenzo-Seva, U. (1999). Promin: A method for oblique factor rotation. Multivariate Behavioral Research, 34, 347–365. doi: https://doi.org/10.1207/S15327906MBR3403_3 CrossRefGoogle Scholar
  35. Lorenzo-Seva, U., & ten Berge, J. M. F. (2006). Tucker’s congruence coefficient as a meaningful index of factor similarity. Methodology, 2, 57–64. doi: https://doi.org/10.1027/1614-2241.2.2.57 CrossRefGoogle Scholar
  36. MacCallum, R. C., Widaman, K. F., Preacher, K. J., & Hong, S. (2001). Sample size in factor analysis: The role of model error. Multivariate Behavioral Research, 36, 611–637.CrossRefPubMedGoogle Scholar
  37. Marsh, H. W., Liem, G. A., Martin, A. J., Nagengast, B., & Morin, A. J. (2011). Methodological-measurement fruitfulness of Exploratory Structural Equation Modeling (ESEM): New approaches to key substantive issues in motivation and engagement. Journal of Psychoeducational Assessment, 29, 322–346.CrossRefGoogle Scholar
  38. Marsh, H. W., Morin, J. S., Parker, P. D., & Kaur, G. (2014). Exploratory structural equation modeling: An integration of the best features of exploratory and confirmatory factor analysis. Annual Review of Clinical Psychology, 10, 85–110.CrossRefPubMedGoogle Scholar
  39. Marsh, H. W., Muthén, B., Asparouhov, T., Lüdtke, O., Robitzsch, A., Morin, A. J., & Trautwein, U. (2009). Exploratory structural equation modeling, integrating CFA and EFA: Application to students’ evaluation of university teaching. Structural Equation Modeling, 16, 439–476.CrossRefGoogle Scholar
  40. Ogasawara, H. (2000). Some relationships between factors and components. Psychometrika, 65, 167–185. doi: https://doi.org/10.1007/BF02294372 CrossRefGoogle Scholar
  41. Pearson, K. (1901). On lines and planes of closest fit to systems of points in space. Philosophical Magazine, 2, 559–572.Google Scholar
  42. Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461–464. doi: https://doi.org/10.1214/aos/1176344136 CrossRefGoogle Scholar
  43. Sclove, S. L. (1987). Application of model-selection criteria to some problems in multivariate analysis. Psychometrika, 52, 333–343.CrossRefGoogle Scholar
  44. Tibshirani, R. (1996). Regression, shrinking and selection via the Lasso. Journal of the Royal Statistical Society: Series B, 58, 267–288.Google Scholar
  45. Tucker, L. R., Koopman, R. F., & Linn, R. L. (1969). Evaluation of factor analytic research procedures by means of simulated correlation matrices. Psychometrika, 34, 421–459.CrossRefGoogle Scholar
  46. Van Mechelen, I., & De Boeck, P. (1990). Projection of a binary criterion into a model of hierarchical classes. Psychometrika, 55, 677–694.CrossRefGoogle Scholar
  47. Velicer, W. F., & Fava, J. L. (1998). Effects of variable and subject sampling on factor pattern recovery. Psychological Methods, 3, 231–251.  https://doi.org/10.1037/1082-989X.3.2.231, see https://www.researchgate.net/publication/232509045_Effects_of_Variable_and_Subject_Sampling_on_Factor_Pattern_Recovery
  48. Velicer, W. F., & Jackson, D. N. (1990). Component analysis versus common factor analysis: Some further observations. Multivariate Behavioral Research, 25, 97–114.CrossRefPubMedGoogle Scholar
  49. Vervloet, M., Kiers, H. A., Van den Noortgate, W., & Ceulemans, E. (2015). PCovR: An R package for principal covariates regression. Journal of Statistical Software, 65, 1–14.CrossRefGoogle Scholar
  50. Vervloet, M., Van Deun, K., Van den Noortgate, W., & Ceulemans, E. (2013). On the selection of the weighting parameter value in Principal Covariates Regression. Chemometrics and Intelligent Laboratory Systems, 123, 36–43.CrossRefGoogle Scholar
  51. Vervloet, M., Van Deun, K., Van den Noortgate, W., & Ceulemans, E. (2016). Model selection in principal covariates regression. Chemometrics and Intelligent Laboratory Systems, 151, 26–33.CrossRefGoogle Scholar
  52. Vrieze, S. I. (2012). Model selection and psychological theory: A discussion of the differences between the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC). Psychological Methods, 17, 228–243.CrossRefPubMedPubMedCentralGoogle Scholar
  53. Widaman, K. F. (1993). Common factor analysis versus principal component analysis: Differential bias in representing model parameters. Multivariate Behavioral Research, 28, 263–311. doi: https://doi.org/10.1207/s15327906mbr2803_1 CrossRefPubMedGoogle Scholar
  54. Wilderjans, T. F., Vande Gaer, E., Kiers, H. A., Van Mechelen, I., & Ceulemans, E. (2017). Principal covariates clusterwise regression (PCCR): Accounting for multicollinearity and population heterogeneity in hierarchically organized data. Psychometrika, 82, 86–111.CrossRefPubMedGoogle Scholar
  55. Wold, S., Ruhe, A., Wold, H., & Dunn III, W. J. (1984). The collinearity problem in linear regression. The partial least squares (PLS) approach to generalized inverses. SIAM Journal of Statistics and Computations, 5, 735–743.CrossRefGoogle Scholar
  56. Yates, A. (1987). Multivariate exploratory data analysis: A perspective on exploratory factor analysis. New York, NY: SUNY Press.Google Scholar

Copyright information

© Psychonomic Society, Inc. 2018

Authors and Affiliations

  • Marlies Vervloet
    • 1
  • Wim Van den Noortgate
    • 1
  • Eva Ceulemans
    • 1
  1. 1.KU LeuvenLeuvenBelgium

Personalised recommendations