Abstract
Dimension reduction, by means of Principal Component Analysis (PCA), is often employed to obtain a reduced set of components preserving the largest possible part of the total variance of the observed variables. Several methodologies have been proposed either to improve the interpretation of PCA results (e.g., by means of orthogonal, oblique rotations, shrinkage methods), or to model oblique components or factors with a hierarchical structure, such as in Bi-factor and High-Order Factor analyses. In this paper, we propose a new methodology, called Hierarchical Disjoint Principal Component Analysis (HierDPCA), that aims at building a hierarchy of disjoint principal components of maximum variance associated with disjoint groups of observed variables, from Q up to a unique, general one. HierDPCA also allows choosing the type of the relationship among disjoint principal components of two sequential levels, from the lowest upwards, by testing the component correlation per level and changing from a reflective to a formative approach when this correlation turns out to be not statistically significant. The methodology is formulated in a semi-parametric least-squares framework and a coordinate descent algorithm is proposed to estimate the model parameters. A simulation study and two real applications are illustrated to highlight the empirical properties of the proposed methodology.
Similar content being viewed by others
Notes
\(\mathbf {g}\) stands for \(\mathbf {Y}_{1}\), i.e., the general concept identified at the last level of the hierarchy.
Available at http://bstat.jp/en_material/.
References
Adachi, K., Trendafilov N.T.: Sparsest factor analysis for clustering variables: a matrix decomposition approach. Adv. Data Anal. Classif. 12(3), 559–585 (2018)
Becker, W., Dominguez-Torreiro, M., Neves, A.R., Tacao Moura, C.J., Saisana, M.: Exploring ASEM Sustainable Connectivity – What brings Asia and Europe together? https://publications.jrc.ec.europa.eu/repository/bitstream/JRC112998/asem-report_online.pdf (2018)
Blalock, H.M.: Causal Inferences in Nonexperimental Research. The University of North Carolina Press, North Carolina (1964)
Bollen, K.A.: Structural Equations with Latent Variables. Wiley, Hoboken (1989)
Bollen, K.A.: Indicator: Methodology. In: Smelser, N.J., Baltes, P.B. (eds.) International Encyclopedia of the Social and Behavioral Sciences, pp. 7282–7287. Elsevier Science, Oxford (2001)
Bollen, K.A.: Evaluating effect, composite, and causal indicators in structural equation models. MIS Q. 35(2), 359–372 (2011)
Bollen, K.A., Bauldry, S.: Three cs in measurement models: causal indicators, composite indicators, and covariates. Psychol. Methods 16(3), 265–284 (2011)
Bollen, K.A., Lennox, R.: Conventional wisdom on measurement: a structural equation perspective. Psychol. Bull. 110(2), 305–314 (1991)
Cadima, J., Jolliffe, I.T.: Loadings and correlations in the interpretation of principal components. J. Appl. Stat. 22(2), 203–214 (1995)
Cattell, R.B.: The scientific use of factor analysis in behavioral and life sciences. Springer, New York (1978)
Cavicchia, C., Vichi, M., Zaccaria, G.: The ultrametric correlation matrix for modelling hierarchical latent concepts. Adv. Data Anal. Classif. 14(4), 837–853 (2020)
Chen, S.X., Zhang, L., Zhong, P.: Tests for high-dimensional covariance matrices. J. Am. Stat. Assoc. 105(490), 810–819 (2010)
Costa, P.T., McCrae, R.R.: NEO PI-R professional manual: revised NEO personality inventory (NEO PI-R) and NEO five-factor inventory (NEO-FFI). Psychological Assessment Resources, Odessa (1992)
Cramer, H.: Mathematical methods of statistics. Princeton University Press, Princeton (1946)
Cronbach, L.J.: Coefficient alpha and the internal structure of tests. Psychometrika 16(3), 297–334 (1951)
d’Aspremont, A., El Ghaoui, L., Jordan, M.I.: A direct formulation for sparse pca using semidefinite programming. SIAM Rev. 49(3), 434–446 (2007)
DeYoung, C.G., Peterson, J.B., Higgins, D.M.: Higher-order factors of the big five predict conformity: Are there neuroses of health? Personality Individ. Differ. 33(4), 533–552 (2002)
Digman, J.M.: Personality structure: Emergence of the five-factor model. Annual Rev. Psychol. 41(1), 417–440 (1990)
Digman, J.M.: Higher-order factors of the Big Five. J. Personality Soc. Psychol. 73(6), 1246–1256 (1997)
Edwards, J.R.: The fallacy of formative measurement. Organizational Res. Methods 14(2), 370–388 (2011)
Edwards, J.R., Bagozzi, R.P.: On the nature and direction of the relationship between constructs and measures. Psychol. Methods 5(2), 155–174 (2000)
Ferrara, C., Martella, F., Vichi, M.: Dimensions of well-being and their statistical measurements. In: Alleva, G., Giommi, A. (eds.) Topics in Theoretical and Applied Statistics, pp. 85-99. Springer, New York (2016)
Ferrara, C., Martella, F., Vichi, M.: Probabilistic disjoint principal component analysis. Multivar. Behav. Res. 54(1), 47–61 (2019)
George, D., Mallery, P.: SPSS for Windows step by step: a simple guide and reference. 11.0 update (4th ed.). Boston, MA: Allyn Bacon, Boston, MA (2003)
Goldberg, L.R.: An alternative “description of personality” : The big-five factor structure. J. Personality Soc. Psychol. 59(6), 1216–1229 (1990)
Goldberg, L.R.: The development of markers for the big-five factor structure. Psychol. Assess. 4(1), 26–42 (1992)
Gordon, A.D.: Classification, 2nd edn. Chapman & Hall/CRC, Boca Raton (1999)
Gorsuch, R.L.: Factor analysis, 2nd edn. Lawrence Erlbaum Associates, Hillsdale, New Jersey (1983)
Götz, O., Liehr-Gobbers, K., Krafft, M.: Evaluation of structural equation models using the partial least squares (PLS) approach. In: Esposito Vinzi, V., Chin, W.W., Henseler, J., Wang, H. (eds.) Handbook of partial least squares: concepts, methods and applications, pp. 691–711. Springer, New York (2010)
Hauser, R.M., Goldberger, A.S.: The treatment of unobservable variables in path analysis. Sociol. Methodol. 3, 81–117 (1971)
Holzinger, K.J., Swineford, F.A.: The bi-factor method. Psychometrika 2(1), 41–54 (1937)
Hotelling, H.: Analysis of a complex of statistical variables into principal components. J. Educ. Psychol. 24(6), 417–441 (1933a)
Hotelling, H.: Analysis of a complex of statistical variables into principal components. J. Educ. Psychol. 24(7), 498–520 (1933b)
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)
Jarvis, C.B., MacKenzie, S.B., Podsakoff, P.M.: A critical review of construct indicators and measurement model misspecification in marketing and consumer research. J. Consumer Res. 30(2), 199–218 (2003)
Jolliffe, I.T.: Principal Component Analysis, 2nd edn. Springer, New York (2002)
Jolliffe, I.T., Trendafilov, N.T., Uddin, M.: A modified principal component technique based on the LASSO. J. Comput. Graph. Stat. 12(3), 531–547 (2003)
Jöreskog, K.G.: A general method for analysis of covariance structure. Biometrika 57(2), 239–251 (1970)
Jöreskog, K.G., Goldberger, A.S.: Estimation of a model with multiple indicators and multiple causes of a single latent variable. J. Am. Stat. Assoc. 70(351), 631–639 (1975)
Kaiser, H.F.: The application of electronic computers to factor analysis. Educ. Psychol. Meas. 20(1), 141–151 (1960)
Křivánek, M., Morávek, J.: Np-hard problems in hierarchical-tree clustering. Acta Informatica 23(3), 311–323 (1986)
Le Dien, S., Pagès, J.: Analyse factorielle multiple hiérarchique [Hierarchical multiple factor analysis]. Revue de statistique appliquée 51(2), 47–73 (2003)
Magnus, R.J., Neudecker, H.: Matrix Differential Calculus with Application in Statistics and Econometrics, 3rd edn. Wiley, Hoboken (2007)
Mazziotta, M., Pareto, A.: Use and misuse of PCA for measuring well-being. Soc. Indic. Res. 142(2), 451–476 (2019)
Musek, J.: A general factor of personality: Evidence for the Big One in the five-factor model. J. Res. Personality 41(6), 1213–1233 (2007)
Nunnally, J.C.: Psychometric Theory, 2nd edn. McGraw-Hill, New York (1978)
Pearson, K.: On lines and planes of closest fit to systems of points in space. Philosophical Mag. J. Sci. 2(11), 559–572 (1901)
Rindskopf, D., Rose, T.: Some theory and applications of confirmatory second-order factor analysis. Multivar. Behav. Res. 23(1), 51–67 (1988)
Schmid, J., Leiman J.M.: The development of hierarchical factorial solutions. Psychometrika 22(1), 53–61 (1957)
Spearman, C.E.: “General intelligence’, objectively determined and measured. Am. J. Psychol. 15(2), 201–293 (1904)
Tenenhaus, M., Esposito Vinzi, V., Chatelin, Y.M., Lauro, C.: PLS path modeling. Comput. Statistics Data Anal. 48(1), 159–205 (2005)
Thurstone, L.L.: Multiple Factor Analysis. The University of Chicago Press, Chicago (1947)
Undheim, J.O., Gustafsson, J.E.: The hierarchical organization of cognitive abilities: Restoring general intelligence through the use of linear structural relations (LISREL). Multivar. Behav. Res. 22(2), 149–171 (1988)
Vichi, M.: Disjoint factor analysis with cross-loadings. Adv. Data Anal. Classif. 11(3), 563–591 (2017)
Vichi, M., Saporta, G.: Clustering and disjoint principal component analysis. Comput. Statistics Data Anal. 53(8), 3194–3208 (2009)
Vigneau, E., Qannari, E.M.: Clustering of variables around latent components. Commun. Statistics Simul. Comput. 32(4), 1131–1150 (2003)
Vines, S.K.: Simple principal components. J. R. Statistical Soc. Series C (Appl. Statistics) 49(4), 441–451 (2000)
Wherry, R.J.: Hierarchical factorial solutions without rotation. Psychometrika 24(1), 45–51 (1959)
Wold, H.: Estimation of principal components and related models by iterative least squares. In: Krishnajah, P.R. (ed.) Multivariate Analysis, pp. 391–420. Academic Press, New York (1966)
Wold, H.: Soft modeling: the basic design and some extensions. In: Jöreskog, K.G., Wold, H. (eds.) Systems Under Indirect Observation: Part II, pp. 1–54. North-Holland, Amsterdam (1982)
Wold, H.: Partial least squares. In: Kotz, S., Johnson, N.L. (eds.) Encyclopedia of Statistical Sciences, vol. 6, pp. 581–591. Wiley, New York (1985)
Yung, Y.F., Thissen, D., McLeod, L.D.: On the relationship between the higher-order factor model and the hierarchical factor model. Psychometrika 64(2), 113–128 (1999)
Zou, H., Hastie, T., Tibshirani, R.: Sparse principal component analysis. J. Comput. Graph. Stat. 15(2), 265–286 (2006)
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
A Proofs
A Proofs
In this Appendix, the proofs of (11) and (13) defined in Sect. 3.1 are provided.
Eq. (11) can be proved by recalling \(\mathbf {Y}_{q} = \mathbf {X}\mathbf {B}_{q}\mathbf {V}_{q}\) for \(q = M, \dots , Q\), the trace additive and invariance under scale permutation properties and constraint (6) of HierDPCA.
Proof
\(\square\)
The proof of (13) is provided as follows by remembering that the difference between two nested partitions \(\mathbf {V}_{q}\) and \(\mathbf {V}_{q-1}\) is written in (12).
Proof
\(\square\)
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Cavicchia, C., Vichi, M. & Zaccaria, G. Hierarchical disjoint principal component analysis. AStA Adv Stat Anal 107, 537–574 (2023). https://doi.org/10.1007/s10182-022-00458-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10182-022-00458-4