Skip to main content
Log in

Hierarchical disjoint principal component analysis

  • Original Paper
  • Published:
AStA Advances in Statistical Analysis Aims and scope Submit manuscript

Abstract

Dimension reduction, by means of Principal Component Analysis (PCA), is often employed to obtain a reduced set of components preserving the largest possible part of the total variance of the observed variables. Several methodologies have been proposed either to improve the interpretation of PCA results (e.g., by means of orthogonal, oblique rotations, shrinkage methods), or to model oblique components or factors with a hierarchical structure, such as in Bi-factor and High-Order Factor analyses. In this paper, we propose a new methodology, called Hierarchical Disjoint Principal Component Analysis (HierDPCA), that aims at building a hierarchy of disjoint principal components of maximum variance associated with disjoint groups of observed variables, from Q up to a unique, general one. HierDPCA also allows choosing the type of the relationship among disjoint principal components of two sequential levels, from the lowest upwards, by testing the component correlation per level and changing from a reflective to a formative approach when this correlation turns out to be not statistically significant. The methodology is formulated in a semi-parametric least-squares framework and a coordinate descent algorithm is proposed to estimate the model parameters. A simulation study and two real applications are illustrated to highlight the empirical properties of the proposed methodology.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. \(\mathbf {g}\) stands for \(\mathbf {Y}_{1}\), i.e., the general concept identified at the last level of the hierarchy.

  2. Available at http://bstat.jp/en_material/.

  3. Available at https://composite-indicators.jrc.ec.europa.eu/asem-sustainable-connectivity/repository.

References

  • Adachi, K., Trendafilov N.T.: Sparsest factor analysis for clustering variables: a matrix decomposition approach. Adv. Data Anal. Classif. 12(3), 559–585 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  • Becker, W., Dominguez-Torreiro, M., Neves, A.R., Tacao Moura, C.J., Saisana, M.: Exploring ASEM Sustainable Connectivity – What brings Asia and Europe together? https://publications.jrc.ec.europa.eu/repository/bitstream/JRC112998/asem-report_online.pdf (2018)

  • Blalock, H.M.: Causal Inferences in Nonexperimental Research. The University of North Carolina Press, North Carolina (1964)

    Google Scholar 

  • Bollen, K.A.: Structural Equations with Latent Variables. Wiley, Hoboken (1989)

    Book  MATH  Google Scholar 

  • Bollen, K.A.: Indicator: Methodology. In: Smelser, N.J., Baltes, P.B. (eds.) International Encyclopedia of the Social and Behavioral Sciences, pp. 7282–7287. Elsevier Science, Oxford (2001)

    Chapter  Google Scholar 

  • Bollen, K.A.: Evaluating effect, composite, and causal indicators in structural equation models. MIS Q. 35(2), 359–372 (2011)

    Article  Google Scholar 

  • Bollen, K.A., Bauldry, S.: Three cs in measurement models: causal indicators, composite indicators, and covariates. Psychol. Methods 16(3), 265–284 (2011)

    Article  Google Scholar 

  • Bollen, K.A., Lennox, R.: Conventional wisdom on measurement: a structural equation perspective. Psychol. Bull. 110(2), 305–314 (1991)

    Article  Google Scholar 

  • Cadima, J., Jolliffe, I.T.: Loadings and correlations in the interpretation of principal components. J. Appl. Stat. 22(2), 203–214 (1995)

    Article  MathSciNet  Google Scholar 

  • Cattell, R.B.: The scientific use of factor analysis in behavioral and life sciences. Springer, New York (1978)

  • Cavicchia, C., Vichi, M., Zaccaria, G.: The ultrametric correlation matrix for modelling hierarchical latent concepts. Adv. Data Anal. Classif. 14(4), 837–853 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  • Chen, S.X., Zhang, L., Zhong, P.: Tests for high-dimensional covariance matrices. J. Am. Stat. Assoc. 105(490), 810–819 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  • Costa, P.T., McCrae, R.R.: NEO PI-R professional manual: revised NEO personality inventory (NEO PI-R) and NEO five-factor inventory (NEO-FFI). Psychological Assessment Resources, Odessa (1992)

  • Cramer, H.: Mathematical methods of statistics. Princeton University Press, Princeton (1946)

    MATH  Google Scholar 

  • Cronbach, L.J.: Coefficient alpha and the internal structure of tests. Psychometrika 16(3), 297–334 (1951)

    Article  MATH  Google Scholar 

  • d’Aspremont, A., El Ghaoui, L., Jordan, M.I.: A direct formulation for sparse pca using semidefinite programming. SIAM Rev. 49(3), 434–446 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  • DeYoung, C.G., Peterson, J.B., Higgins, D.M.: Higher-order factors of the big five predict conformity: Are there neuroses of health? Personality Individ. Differ. 33(4), 533–552 (2002)

    Article  Google Scholar 

  • Digman, J.M.: Personality structure: Emergence of the five-factor model. Annual Rev. Psychol. 41(1), 417–440 (1990)

    Article  Google Scholar 

  • Digman, J.M.: Higher-order factors of the Big Five. J. Personality Soc. Psychol. 73(6), 1246–1256 (1997)

    Article  Google Scholar 

  • Edwards, J.R.: The fallacy of formative measurement. Organizational Res. Methods 14(2), 370–388 (2011)

    Article  Google Scholar 

  • Edwards, J.R., Bagozzi, R.P.: On the nature and direction of the relationship between constructs and measures. Psychol. Methods 5(2), 155–174 (2000)

    Article  Google Scholar 

  • Ferrara, C., Martella, F., Vichi, M.: Dimensions of well-being and their statistical measurements. In: Alleva, G., Giommi, A. (eds.) Topics in Theoretical and Applied Statistics, pp. 85-99. Springer, New York (2016)

    MATH  Google Scholar 

  • Ferrara, C., Martella, F., Vichi, M.: Probabilistic disjoint principal component analysis. Multivar. Behav. Res. 54(1), 47–61 (2019)

    Article  Google Scholar 

  • George, D., Mallery, P.: SPSS for Windows step by step: a simple guide and reference. 11.0 update (4th ed.). Boston, MA: Allyn Bacon, Boston, MA (2003)

  • Goldberg, L.R.: An alternative “description of personality” : The big-five factor structure. J. Personality Soc. Psychol. 59(6), 1216–1229 (1990)

    Article  Google Scholar 

  • Goldberg, L.R.: The development of markers for the big-five factor structure. Psychol. Assess. 4(1), 26–42 (1992)

    Article  Google Scholar 

  • Gordon, A.D.: Classification, 2nd edn. Chapman & Hall/CRC, Boca Raton (1999)

  • Gorsuch, R.L.: Factor analysis, 2nd edn. Lawrence Erlbaum Associates, Hillsdale, New Jersey (1983)

  • Götz, O., Liehr-Gobbers, K., Krafft, M.: Evaluation of structural equation models using the partial least squares (PLS) approach. In: Esposito Vinzi, V., Chin, W.W., Henseler, J., Wang, H. (eds.) Handbook of partial least squares: concepts, methods and applications, pp. 691–711. Springer, New York (2010)

    Google Scholar 

  • Hauser, R.M., Goldberger, A.S.: The treatment of unobservable variables in path analysis. Sociol. Methodol. 3, 81–117 (1971)

    Article  Google Scholar 

  • Holzinger, K.J., Swineford, F.A.: The bi-factor method. Psychometrika 2(1), 41–54 (1937)

    Article  Google Scholar 

  • Hotelling, H.: Analysis of a complex of statistical variables into principal components. J. Educ. Psychol. 24(6), 417–441 (1933a)

    Article  MATH  Google Scholar 

  • Hotelling, H.: Analysis of a complex of statistical variables into principal components. J. Educ. Psychol. 24(7), 498–520 (1933b)

    Article  MATH  Google Scholar 

  • Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)

    Article  MATH  Google Scholar 

  • Jarvis, C.B., MacKenzie, S.B., Podsakoff, P.M.: A critical review of construct indicators and measurement model misspecification in marketing and consumer research. J. Consumer Res. 30(2), 199–218 (2003)

    Article  Google Scholar 

  • Jolliffe, I.T.: Principal Component Analysis, 2nd edn. Springer, New York (2002)

    MATH  Google Scholar 

  • Jolliffe, I.T., Trendafilov, N.T., Uddin, M.: A modified principal component technique based on the LASSO. J. Comput. Graph. Stat. 12(3), 531–547 (2003)

    Article  MathSciNet  Google Scholar 

  • Jöreskog, K.G.: A general method for analysis of covariance structure. Biometrika 57(2), 239–251 (1970)

    Article  MathSciNet  MATH  Google Scholar 

  • Jöreskog, K.G., Goldberger, A.S.: Estimation of a model with multiple indicators and multiple causes of a single latent variable. J. Am. Stat. Assoc. 70(351), 631–639 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  • Kaiser, H.F.: The application of electronic computers to factor analysis. Educ. Psychol. Meas. 20(1), 141–151 (1960)

    Article  Google Scholar 

  • Křivánek, M., Morávek, J.: Np-hard problems in hierarchical-tree clustering. Acta Informatica 23(3), 311–323 (1986)

    Article  MathSciNet  MATH  Google Scholar 

  • Le Dien, S., Pagès, J.: Analyse factorielle multiple hiérarchique [Hierarchical multiple factor analysis]. Revue de statistique appliquée 51(2), 47–73 (2003)

    Google Scholar 

  • Magnus, R.J., Neudecker, H.: Matrix Differential Calculus with Application in Statistics and Econometrics, 3rd edn. Wiley, Hoboken (2007)

    MATH  Google Scholar 

  • Mazziotta, M., Pareto, A.: Use and misuse of PCA for measuring well-being. Soc. Indic. Res. 142(2), 451–476 (2019)

    Article  Google Scholar 

  • Musek, J.: A general factor of personality: Evidence for the Big One in the five-factor model. J. Res. Personality 41(6), 1213–1233 (2007)

    Article  Google Scholar 

  • Nunnally, J.C.: Psychometric Theory, 2nd edn. McGraw-Hill, New York (1978)

    Google Scholar 

  • Pearson, K.: On lines and planes of closest fit to systems of points in space. Philosophical Mag. J. Sci. 2(11), 559–572 (1901)

    Article  MATH  Google Scholar 

  • Rindskopf, D., Rose, T.: Some theory and applications of confirmatory second-order factor analysis. Multivar. Behav. Res. 23(1), 51–67 (1988)

    Article  Google Scholar 

  • Schmid, J., Leiman J.M.: The development of hierarchical factorial solutions. Psychometrika 22(1), 53–61 (1957)

    Article  MATH  Google Scholar 

  • Spearman, C.E.: “General intelligence’, objectively determined and measured. Am. J. Psychol. 15(2), 201–293 (1904)

    Article  Google Scholar 

  • Tenenhaus, M., Esposito Vinzi, V., Chatelin, Y.M., Lauro, C.: PLS path modeling. Comput. Statistics Data Anal. 48(1), 159–205 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  • Thurstone, L.L.: Multiple Factor Analysis. The University of Chicago Press, Chicago (1947)

    MATH  Google Scholar 

  • Undheim, J.O., Gustafsson, J.E.: The hierarchical organization of cognitive abilities: Restoring general intelligence through the use of linear structural relations (LISREL). Multivar. Behav. Res. 22(2), 149–171 (1988)

    Article  Google Scholar 

  • Vichi, M.: Disjoint factor analysis with cross-loadings. Adv. Data Anal. Classif. 11(3), 563–591 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  • Vichi, M., Saporta, G.: Clustering and disjoint principal component analysis. Comput. Statistics Data Anal. 53(8), 3194–3208 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  • Vigneau, E., Qannari, E.M.: Clustering of variables around latent components. Commun. Statistics Simul. Comput. 32(4), 1131–1150 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  • Vines, S.K.: Simple principal components. J. R. Statistical Soc. Series C (Appl. Statistics) 49(4), 441–451 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  • Wherry, R.J.: Hierarchical factorial solutions without rotation. Psychometrika 24(1), 45–51 (1959)

    Article  MATH  Google Scholar 

  • Wold, H.: Estimation of principal components and related models by iterative least squares. In: Krishnajah, P.R. (ed.) Multivariate Analysis, pp. 391–420. Academic Press, New York (1966)

    Google Scholar 

  • Wold, H.: Soft modeling: the basic design and some extensions. In: Jöreskog, K.G., Wold, H. (eds.) Systems Under Indirect Observation: Part II, pp. 1–54. North-Holland, Amsterdam (1982)

    Google Scholar 

  • Wold, H.: Partial least squares. In: Kotz, S., Johnson, N.L. (eds.) Encyclopedia of Statistical Sciences, vol. 6, pp. 581–591. Wiley, New York (1985)

    Google Scholar 

  • Yung, Y.F., Thissen, D., McLeod, L.D.: On the relationship between the higher-order factor model and the hierarchical factor model. Psychometrika 64(2), 113–128 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  • Zou, H., Hastie, T., Tibshirani, R.: Sparse principal component analysis. J. Comput. Graph. Stat. 15(2), 265–286 (2006)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Giorgia Zaccaria.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A Proofs

A Proofs

In this Appendix, the proofs of (11) and (13) defined in Sect. 3.1 are provided.

Eq. (11) can be proved by recalling \(\mathbf {Y}_{q} = \mathbf {X}\mathbf {B}_{q}\mathbf {V}_{q}\) for \(q = M, \dots , Q\), the trace additive and invariance under scale permutation properties and constraint (6) of HierDPCA.

Proof

$$\begin{aligned}&\sum _{q = M}^{Q} ||\mathbf {X} - \mathbf {Y}_{q}^{}\mathbf {V}_{q}^{\prime }\mathbf {B}_{q}^{}||^{2} + \sum _{q = M}^{Q} ||\mathbf {Y}_{q}^{}\mathbf {V}_{q}^{\prime }\mathbf {B}_{q}^{}||^{2} = \sum _{q = M}^{Q} (||\mathbf {X} - \mathbf {Y}_{q}^{}\mathbf {V}_{q}^{\prime }\mathbf {B}_{q}^{}||^{2} + ||\mathbf {Y}_{q}^{}\mathbf {V}_{q}^{\prime }\mathbf {B}_{q}^{}||^{2}) \\&= \sum _{q = M}^{Q} \left\{ \text {tr}[(\mathbf {X} - \mathbf {Y}_{q}^{}\mathbf {V}_{q}^{\prime }\mathbf {B}_{q}^{})^{\prime }(\mathbf {X} - \mathbf {Y}_{q}^{}\mathbf {V}_{q}^{\prime }\mathbf {B}_{q}^{})]+\text {tr}[(\mathbf {Y}_{q}^{}\mathbf {V}_{q}^{\prime }\mathbf {B}_{q}^{})^{\prime }(\mathbf {Y}_{q}^{}\mathbf {V}_{q}^{\prime }\mathbf {B}_{q}^{})] \right\} \\&= \sum _{q = M}^{Q} \big [ \text {tr}(\mathbf {X}^{\prime }\mathbf {X}^{})-\text {tr}(\mathbf {X}^{\prime }\mathbf {Y}_{q}^{}\mathbf {V}_{q}^{\prime }\mathbf {B}_{q}^{}) - \text {tr}(\mathbf {B}_{q}^{}\mathbf {V}_{q}^{}\mathbf {Y}_{q}^{\prime }\mathbf {X}^{})+2 \, \text {tr}(\mathbf {B}_{q}^{}\mathbf {V}_{q}^{}\mathbf {Y}_{q}^{\prime }\mathbf {Y}_{q}^{}\mathbf {V}_{q}^{\prime }\mathbf {B}_{q}^{}) \big ] \\&= \sum _{q = M}^{Q} [\text {tr}(\mathbf {X}^{\prime }\mathbf {X}^{}) - \text {tr}(\mathbf {X}^{\prime }\mathbf {X}^{}\mathbf {B}_{q}^{}\mathbf {V}_{q}^{}\mathbf {V}_{q}^{\prime }\mathbf {B}_{q}^{}) - \text {tr}(\mathbf {B}_{q}^{}\mathbf {V}_{q}^{}\mathbf {V}_{q}^{\prime }\mathbf {B}_{q}^{}\mathbf {X}^{\prime }\mathbf {X}^{}) \\& \quad +2 \, \text {tr}(\mathbf {V}_{q}^{\prime }\mathbf {B}_{q}^{}\mathbf {X}^{\prime }\mathbf {X}^{}\mathbf {B}_{q}^{}\mathbf {V}_{q}^{}\mathbf {V}_{q}^{\prime }\mathbf {B}_{q}^{}\mathbf {B}_{q}^{}\mathbf {V}_{q}^{})] \\&= \sum _{q = M}^{Q} \text {tr}(\mathbf {X}^{\prime }\mathbf {X}^{}) = \sum _{q = M}^{Q} ||\mathbf {X}||^{2} = (Q - M + 1)||\mathbf {X}||^{2}. \end{aligned}$$

\(\square\)

The proof of (13) is provided as follows by remembering that the difference between two nested partitions \(\mathbf {V}_{q}\) and \(\mathbf {V}_{q-1}\) is written in (12).

Proof

$$\begin{aligned}&\sum _{q = 1}^{Q}||\mathbf {X}-\mathbf {Y}_{q}^{}\mathbf {V}_{q}^{\prime }\mathbf {B}_{q}^{}||^{2} = \sum _{q = 1}^{Q}||\mathbf {X}-\mathbf {Y}_{q}^{}\mathbf {V}_{q}^{\prime }\mathbf {B}_{q}^{}||^{2}+Q \, ||\mathbf {X}||^{2}- Q \, ||\mathbf {X}||^{2} \\&= - \left[ \sum _{q = 1}^{Q} (||\mathbf {X}||^{2}-||\mathbf {X}-\mathbf {Y}_{q}^{}\mathbf {V}_{q}^{\prime }\mathbf {B}_{q}^{}||^{2})\right] + Q \, ||\mathbf {X}||^{2} {\mathop {=}\limits ^{\text {Eq. (11)}}} - \sum _{q = 1}^{Q} ||\mathbf {Y}_{q}^{}\mathbf {V}_{q}^{\prime }\mathbf {B}_{q}^{}||^{2} + Q||\mathbf {X}||^{2} \\&= - \sum _{q = 1}^{Q-1} ||\mathbf {Y}_{q}^{}\mathbf {V}_{q}^{\prime }\mathbf {B}_{q}^{}||^{2} - ||\mathbf {Y}_{Q}^{}\mathbf {V}_{Q}^{\prime }\mathbf {B}_{Q}^{}||^{2} + Q \, ||\mathbf {X}||^{2} \\&= - \sum _{q = 1}^{Q-1} ||\mathbf {Y}_{q}^{}\mathbf {V}_{q}^{\prime }\mathbf {B}_{q}^{}||^{2} - ||\mathbf {Y}_{Q}^{}\mathbf {V}_{Q}^{\prime }\mathbf {B}_{Q}^{}||^{2} + Q \, (||\mathbf {X}-\mathbf {Y}_{Q}^{}\mathbf {V}_{Q}^{\prime }\mathbf {B}_{Q}^{}||^{2}+||\mathbf {Y}_{Q}^{}\mathbf {V}_{Q}^{\prime }\mathbf {B}_{Q}^{}||^{2}) \\&= Q \, ||\mathbf {X}-\mathbf {Y}_{Q}^{}\mathbf {V}_{Q}^{\prime }\mathbf {B}_{Q}^{}||^{2} - \sum _{q = 1}^{Q-1} ||\mathbf {Y}_{q}^{}\mathbf {V}_{q}^{\prime }\mathbf {B}_{q}^{}||^{2} +(Q-1) \, ||\mathbf {Y}_{Q}^{}\mathbf {V}_{Q}^{\prime }\mathbf {B}_{Q}^{}||^{2} \\&= Q \, ||\mathbf {X}-\mathbf {Y}_{Q}^{}\mathbf {V}_{Q}^{\prime }\mathbf {B}_{Q}^{}||^{2} - ||\mathbf {Y}_{1}^{}\mathbf {V}_{1}^{\prime }\mathbf {B}_{1}^{}||^{2}- ||\mathbf {Y}_{2}^{}\mathbf {V}_{2}^{\prime }\mathbf {B}_{2}^{}||^{2}- \dots -||\mathbf {Y}_{Q-1}^{}\mathbf {V}_{Q-1}^{\prime }\mathbf {B}_{Q-1}^{}||^{2} \\& \quad +(Q-1) \, ||\mathbf {Y}_{Q}^{}\mathbf {V}_{Q}^{\prime }\mathbf {B}_{Q}^{}||^{2} \\&= Q \, ||\mathbf {X}-\mathbf {Y}_{Q}^{}\mathbf {V}_{Q}^{\prime }\mathbf {B}_{Q}^{}||^{2} + ||\mathbf {Y}_{2}^{}\mathbf {V}_{2}^{\prime }\mathbf {B}_{2}^{}||^{2} - ||\mathbf {Y}_{1}^{}\mathbf {V}_{1}^{\prime }\mathbf {B}_{1}^{}||^{2} + 2 \, (||\mathbf {Y}_{3}^{}\mathbf {V}_{3}^{\prime }\mathbf {B}_{3}^{}||^{2}-||\mathbf {Y}_{2}^{}\mathbf {V}_{2}^{\prime }\mathbf {B}_{2}^{}||^{2}) \\& \quad + \dots + (Q-1) \, (||\mathbf {Y}_{Q}^{}\mathbf {V}_{Q}^{\prime }\mathbf {B}_{Q}^{}||^{2}- ||\mathbf {Y}_{Q-1}^{}\mathbf {V}_{Q-1}^{\prime }\mathbf {B}_{Q-1}^{}||^{2}) \\&= Q \, ||\mathbf {X}-\mathbf {Y}_{Q}^{}\mathbf {V}_{Q}^{\prime }\mathbf {B}_{Q}^{}||^{2} + \sum _{q = 2}^{Q}(q-1) \, I_{d}(\mathbf {V}_{q},\mathbf {V}_{q-1}). \end{aligned}$$

\(\square\)

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cavicchia, C., Vichi, M. & Zaccaria, G. Hierarchical disjoint principal component analysis. AStA Adv Stat Anal 107, 537–574 (2023). https://doi.org/10.1007/s10182-022-00458-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10182-022-00458-4

Keywords

Navigation