Skip to main content
Log in

Principal Covariates Clusterwise Regression (PCCR): Accounting for Multicollinearity and Population Heterogeneity in Hierarchically Organized Data

  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

In the behavioral sciences, many research questions pertain to a regression problem in that one wants to predict a criterion on the basis of a number of predictors. Although in many cases, ordinary least squares regression will suffice, sometimes the prediction problem is more challenging, for three reasons: first, multiple highly collinear predictors can be available, making it difficult to grasp their mutual relations as well as their relations to the criterion. In that case, it may be very useful to reduce the predictors to a few summary variables, on which one regresses the criterion and which at the same time yields insight into the predictor structure. Second, the population under study may consist of a few unknown subgroups that are characterized by different regression models. Third, the obtained data are often hierarchically structured, with for instance, observations being nested into persons or participants within groups or countries. Although some methods have been developed that partially meet these challenges (i.e., principal covariates regression (PCovR), clusterwise regression (CR), and structural equation models), none of these methods adequately deals with all of them simultaneously. To fill this gap, we propose the principal covariates clusterwise regression (PCCR) method, which combines the key idea’s behind PCovR (de Jong & Kiers in Chemom Intell Lab Syst 14(1–3):155–164, 1992) and CR (Späth in Computing 22(4):367–373, 1979). The PCCR method is validated by means of a simulation study and by applying it to cross-cultural data regarding satisfaction with life.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. We generated 360 datasets by manipulating the number of clusters (at three levels: two, three, and four clusters), the cluster sizes (at two levels: one small cluster with 20 % of the level-2 units and one large cluster with 80 % of the level-2 units; the other level-2 units were equally spread across the other clusters) and the \(\alpha \)-level (at six levels: .001, .01, .05, .15, .25 and .35); all other factors were held constant (i.e., 50 level 2-units with 25 level-1 units each, 20 predictors, two relevant and six irrelevant components, the same cluster-specific regression weights \({\varvec{p}}_{{\varvec{Y}}}^{k^\mathrm{{true}}}\) as in Table 2 and 20 % noise in \({\varvec{X}}\) and in \({\varvec{y}})\); we used ten replications for each cell of the design. The recovery of the underlying clustering, quantified by the adjusted rand index (ARI; see Sect. 4.3.2), can be considered reasonable as the mean ARI value equals .93, whereas the mean ARI for the comparable conditions in the simulation study with equal sized underlying clusters amounts to .96. Recovery, however, drops when the number of clusters increases (i.e., mean ARI of .94, .95 and .89 for two, three, and four clusters, respectively) and/or when there is one large cluster next to multiple small ones (i.e., mean ARI of .99 and .87 for the first and second level of the cluster size factor, respectively).

  2. Note that taking .0000001 as convergence criterion might be too strict, resulting in a considerable lengthening of the computation time. A possible solution is to use \(.0000001\, \left\| {\varvec{X}} \right\| ^{2}\) instead (i.e., to look at the proportional decrease instead of the absolute decrease in fit).

  3. Note that step three of the PCCR algorithm consists of a double (nested) iterative procedure of which the outer iterations pertain to updating the level-2 unit memberships and the inner iterations to updating \({\varvec{W}}, {\varvec{P}}_{{\varvec{X}}}\), and \({\varvec{p}}_{{\varvec{Y}}}^{k}\), conditional on \({\varvec{C}}\).

  4. Another option is to directly solve the constrained problem (instead of the unconstrained one) by means of an iterative majorization approach (Kiers & ten Berge, 1992). The technical details of this approach are available upon request from the authors. The results of a pilot simulation study showed that both algorithms in most cases lead to the same final solution, but that the majorization approach consumes more time. Therefore, the majorization approach is not further discussed in this manuscript.

  5. When the worst fitting level 2-unit is the only level-2 unit in his/her cluster, we move on to the level-2 unit with the second worst fit, etcetera.

  6. The results of a pilot study in which the number of components and clusters (i.e., two and four), the amount of noise in \({\varvec{X}}\) and \({\varvec{y}}\) (i.e., 10 and 40 %), and the cluster sizes (see Brusco & Cradit, 2001; Steinley, 2003) have been manipulated indeed reveal that PCCR and a sequential approach perform equally well, when all components are strong and relevant.

  7. \({\varvec{W}}^\mathrm{{true}}\) can be computed from \({\varvec{T}}^\mathrm{{true}}\) as follows: \({\varvec{W}}^\mathrm{{true}}=\left( {\varvec{X}}^{\mathrm{{true}}^{\prime }}{\varvec{X}}^\mathrm{{true}} \right) ^{-1}{\varvec{X}}^{\mathrm{{true}}^{\prime }}{\varvec{T}}^\mathrm{{true}}\), with \({\varvec{X}}^\mathrm{{true}}={\varvec{T}}^\mathrm{{true}}{\varvec{P}}_{{\varvec{X}}}^{\mathrm{{true}}^{\prime }}\).

  8. This choice of cluster-specific regression weights implies that the clusters are clearly separated in terms of their underlying cluster-specific regression models. To quantify the degree of cluster separation, we computed the ratio \( z = \frac{\mathop {\sum }\nolimits _{r = 1}^{R} \left| {p_{{\varvec{Y}}}^{k}}^\mathrm{{true}}_{r} - {{p_{{\varvec{Y}}}^{k}}_{r}^{\prime }}^{{{\mathrm{{true}}}}} \right| }{\mathop {\sum }\nolimits _{r = 1}^{R} \left| {p_{{\varvec{Y}}}^{k}}_{r}^\mathrm{{true}} \right| }\) for each pair of clusters, with \(z>.50\) implying that clusters are well separated (Kiers & Smilde, 2007) and \(p_{{\varvec{Y}}_{r}}^{k^\mathrm{{true}}} \,\left( {p_{{\varvec{Y}}_{r}}^{k^{\prime }}} ^{'^\mathrm{{true}}} \right) \) indicating the \(r^{\mathrm {th}}\) true regression weight for cluster \(k\left( k^{\prime } \right) \). For the generated datasets with two clusters the z-ratio equals 1.23, whereas the mean z-ratio (computed over all possible cluster pairs) for the three- and four-cluster datasets equals 1.57 (with separate values of 1.23, 1.82, and 1.67) and 4.47 (with separate values of 1.23, 2.64, 8.76, 1.15, 7.33, and 5.71), respectively. Note that for generated datasets in the three- and four-cluster conditions, the clusters are more clearly separated than in the two-cluster conditions. Notice further that the larger mean z-ratio for the solutions with four clusters is mainly due to the fourth cluster being clearly separated from the other three clusters (with the z-ratios for the other three clusters being in the range of the z-ratios for the generated datasets with two and three clusters).

  9. We believe that models with more than four clusters and/or four components are too complex for our data (i.e., 48 countries and 34 variables). In particular, the fit of these complex models is not substantially larger than the fit of less complex models. Furthermore, we did not consider solutions with one cluster and/or one component as such models are too simplistic.

  10. These computations are based on the use of a single core at the same time. Modern computers, however, are able to use up to four cores simultaneously, reducing the computation time to 2 h.

References

  • Arminger, G., & Stein, P. (1997). Finite mixtures of covariance structure models with regressors: Loglikelihood function, minimum distance estimation, fit indices, and a complex example. Sociological Methods & Research, 26(2), 148–182. doi:10.1177/0049124197026002002.

    Article  Google Scholar 

  • Brusco, M. J., & Cradit, J. D. (2001). A variable selection heuristic for K-means clustering. Psychometrika, 66, 249–270. doi:10.1007/BF02294838.

    Article  Google Scholar 

  • Brusco, M. J., Cradit, J. D., Steinley, D., & Fox, G. L. (2008). Cautionary remarks on the use of clusterwise regression. Multivariate Behavioral Research, 43(1), 29–49. doi:10.1080/00273170701836653.

    Article  PubMed  Google Scholar 

  • Brusco, M. J., Cradit, J. D., & Tashchian, A. (2003). Multicriterion clusterwise regression for joint segmentation settings: An application to customer value. Journal of Marketing Research, 40(2), 225–234.

    Article  Google Scholar 

  • Ceulemans, E., & Kiers, H. A. L. (2009). Discriminating between strong and weak structures in three-mode principal component analysis. British Journal of Mathematical & Statistical Psychology, 62, 601–620. doi:10.1348/000711008X369474.

    Article  Google Scholar 

  • Ceulemans, E., Kuppens, P., & Van Mechelen, I. (2012). Capturing the structure of distinct types of individual differences in the situation-specific experience of emotions: The case of anger. European Journal of Personality, 26, 484–495. doi:10.1002/per.847.

    Article  Google Scholar 

  • Ceulemans, E., & Van Mechelen, I. (2008). CLASSI: A classification model for the study of sequential processes and individual differences therein. Psychometrika, 73, 107–124. doi:10.1007/s11336-007-9024-1.

    Article  Google Scholar 

  • Ceulemans, E., Van Mechelen, I., & Leenen, I. (2007). The local minima problem in hierarchical classes analysis: an evaluation of a simulated annealing algorithm and various multistart procedures. Psychometrika, 72, 377–391.

    Article  Google Scholar 

  • Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155–159. doi:10.1037/0033-2909-112-1-155.

    Article  PubMed  Google Scholar 

  • Coxe, K. L. (1986). Principal components regression analysis. In S. Kotz, N. L. Johnson, & C. B. Read (Eds.), Encyclopedia of statistical sciences (pp. 181–184). New York: Wiley.

    Google Scholar 

  • de Jong, S., & Kiers, H. A. L. (1992). Principal covariates regression: Part I. Theory. Chemometrics and Intelligent Laboratory Systems, 14(1–3), 155–164. doi:10.1016/0169-7439(92)80100-I.

    Article  Google Scholar 

  • De Roover, K., Ceulemans, E., Timmerman, M. E., Vansteelandt, K., Stouten, J., & Onghena, P. (2012). Clusterwise simultaneous component analysis for analyzing structural differences in multivariate multiblock data. Psychological Methods, 17, 100–119. doi:10.1037/a0025385.

    Article  PubMed  Google Scholar 

  • DeSarbo, W. S., & Cron, W. L. (1988). A maximum likelihood methodology for clusterwise linear regression. Journal of Classification, 5, 249–282.

    Article  Google Scholar 

  • DeSarbo, W. S., & Edwards, E. A. (1996). Typologies of compulsive buying behavior: A constrained clusterwise regression approach. Journal of Consumer Psychology, 5(3), 231–262.

    Article  Google Scholar 

  • DeSarbo, W. S., Oliver, R. L., & Rangaswamy, A. (1989). A simulated annealing methodology for clusterwise linear regression. Psychometrika, 54(4), 707–736.

    Article  Google Scholar 

  • Hahn, C., Johnson, M. D., Herrmann, A., & Huber, F. (2002). Capturing customer heterogeneity using a finite mixture PLS approach. Schmalenbach Business Review, 54(3), 243–269.

    Google Scholar 

  • Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218.

    Article  Google Scholar 

  • Kaiser, H. F. (1958). The varimax criterion for analytic rotation in factor analysis. Psychometrika, 23, 187–200.

    Article  Google Scholar 

  • Kiers, H. A. L. (1989). Three-way methods for the analysis of qualitative and quantitative two-way data. Leiden: DSWO Press.

    Google Scholar 

  • Kiers, H. A. L., & Smilde, A. (2007). A comparison of various methods for multivariate regression with highly collinear variables. Statistical Methods & Applications, 16(2), 193–228. doi:10.1007/s10260-006-0025-5.

    Article  Google Scholar 

  • Kiers, H. A. L., & ten Berge, J. M. F. (1992). Minimization of a class of matrix trace functions by means of refined majorization. Psychometrika, 57, 371–382.

    Article  Google Scholar 

  • Korth, B., & Tucker, L. R. (1975). The distribution of chance congruence coefficients from simulated data. Psychometrika, 40(3), 361–372.

    Article  Google Scholar 

  • Kroonenberg, P. M. (2008). Applied multiway data analysis. Hoboken, NJ: Wiley.

    Book  Google Scholar 

  • Kroonenberg, P. M. (1983). Three-mode principal component analysis: Theory and applications. Leiden: DSWO Press.

    Google Scholar 

  • Kuppens, P., Ceulemans, E., Timmerman, M. E., Diener, E., & Kim-Prieto, C. (2006). Universal intracultural and intercultural dimensions of the recalled frequency of emotional experience. Journal of Cross-Cultural Psychology, 37, 491–515. doi:10.1177/0022022106290474.

    Article  Google Scholar 

  • Leisch, F. (2004). FlexMix: A general framework for finite mixture models and latent class regression in R. Journal of Statistical Software, 11(8), 1–18.

    Article  Google Scholar 

  • Roa, C. R. (1964). The use and interpretation of principal component analysis in applied research. Sankhyā: The Indian Journal of Statistics, Series A, 26(4), 329–358.

    Google Scholar 

  • Sarstedt, M., & Ringle, C. M. (2010). Treating unobserved heterogeneity in PLS path modeling: a comparison of FIMIX-PLS with different data analysis strategies. Journal of Applied Statistics, 37(8), 1299–1318. doi:10.1080/02664760903030213.

    Article  Google Scholar 

  • Schott, J. R. (2005). Matrix analysis for statistics (2nd ed.). Hoboken, NJ: Wiley.

    Google Scholar 

  • Späth, H. (1979). Algorithm 39: Clusterwise linear regression. Computing, 22(4), 367–373. doi:10.1007/BF02265317.

    Article  Google Scholar 

  • Späth, H. (1981). Correction to algorithm 39: Clusterwise linear regression. Computing, 26(3), 275–275. doi:10.1007/BF02243486.

    Article  Google Scholar 

  • Steinley, D. (2003). Local optima in K-means clustering: What you don’t know may hurt you. Psychological Methods, 8, 294–304. doi:10.1037/1082-989X.8.3.294.

    Article  PubMed  Google Scholar 

  • Steinley, D. (2004). Properties of the Hubert-Arabie adjusted rand index. Psychological Methods, 9(3), 386–396. doi:10.1037/1082-989X.9.3.386.

    Article  PubMed  Google Scholar 

  • Stormshak, E. A., Bierman, K. L., Bruschi, C., Dodge, K. A., & Coie, J. D., The Conduct Problems Prevention Research Group. (1999). The relation between behavior problems and peer preference in different classroom contexts. Child Development, 70(1), 169–182.

  • ten Berge, J. M. F. (1977). Orthogonal procrustes rotation for two or more matrices. Psychometrika, 42(2), 267–276.

    Article  Google Scholar 

  • Tucker, L. R. (1951). A method for synthesis of factor analysis studies. Personnel Research Section Rapport #984. Washington, DC: Department of the Army.

  • van den Berg, R. A., Hoefsloot, H. C. J., Westerhuis, J. A., Smilde, A. K., & van der Werf, M. J. (2006). Centering, scaling and transformations: improving the biological information content of metabolomics data. BMC Genomics, 7(1), 142–157. doi:10.1186/1471-2164-7-142.

    Article  PubMed  PubMed Central  Google Scholar 

  • van den Berg, R. A., Van Mechelen, I., Wilderjans, T. F., Van Deun, K., Kiers, H. A. L., & Smilde, A. K. (2009). Integrating functional genomics data using maximum likelihood based simultaneous component analysis. BMC Bioinformatics, 10, 340. doi:10.1186/1471-2105-10-340.

    Article  PubMed  PubMed Central  Google Scholar 

  • Van Deun, K., Smilde, A. K., van der Werf, M. J., Kiers, H. A. L., & Van Mechelen, I. (2009). A structured overview of simultaneous component based data integration. BMC Bioinformatics, 10, 246. doi:10.1186/1471-2105-10-246.

    Article  PubMed  PubMed Central  Google Scholar 

  • Vervloet, M., Van Deun, K., Van den Noortgate, W., & Ceulemans, E. (2013). On the selection of the weighting parameter value in principal covariates regression. Chemometrics and Intelligent Laboratory Systems, 123, 36–43. doi:10.1016/j.chemolab.2013.02.005.

    Article  Google Scholar 

  • Wedel, M., & DeSarbo, W. S. (1995). A mixture likelihood approach for generalized linear models. Journal of Classification, 12, 21–55.

    Article  Google Scholar 

  • Wilderjans, T. F., & Ceulemans, E. (2013). Clusterwise Parafac to identify heterogeneity in three-way data. Chemometrics and Intelligent Laboratory Systems, 129, 87–97. doi:10.1016/j.chemolab.2013.09.010.

    Article  Google Scholar 

  • Wilderjans, T. F., Ceulemans, E., & Kuppens, P. (2012). Clusterwise HICLAS: A generic modeling strategy to trace similarities and differences in multi-block binary data. Behavior Research Methods, 44, 532–545. doi:10.3758/s13428-011-0166-9.

    Article  PubMed  Google Scholar 

  • Wilderjans, T. F., Ceulemans, E., & Van Mechelen, I. (2009). Simultaneous analysis of coupled data blocks differing in size: A comparison of two weighting schemes. Computational Statistics and Data Analysis, 53, 1086–1098. doi:10.1016/j.csda.2008.09.031.

    Article  Google Scholar 

  • Wilderjans, T. F., Ceulemans, E., Van Mechelen, I., & van den Berg, R. A. (2011). Simultaneous analysis of coupled data matrices subject to different amounts of noise. British Journal of Mathematical and Statistical Psychology, 64, 277–290. doi:10.1348/000711010X513263.

    Article  PubMed  Google Scholar 

  • Wold, H. (1966). Estimation of principal component and related methods by iterative least squares. In P. R. Krishnaiah (Ed.), Multivariate analysis (pp. 391–420). New York: Academic Press.

    Google Scholar 

Download references

Acknowledgements

The first author is a postdoctoral Fellow of the Fund of Scientific Research (FWO) Flanders (Belgium). The research leading to the results reported in this paper was sponsored in part by the Research Fund of KU Leuven (GOA/15/003) and by the Interuniversity Attraction Poles programme financed by the Belgian Government (IAP/P7/06).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tom Frans Wilderjans.

Appendices

Appendix 1: INDORT Rotation Toward Orthogonality per Level-2 Unit

The aim of the INDORT rotation is to find the orthonormal rotation matrix \({\varvec{U}} \, \left( {{\varvec{U}}^{\prime }{\varvec{U}} = {\varvec{UU}}^{\prime } = {\varvec{I}}} \right) \) that minimizes the following loss function:

$$\begin{aligned} f = \mathop \sum \limits _{{i = 1}}^{I} \left\| {\varvec{U}}^{\prime }{\varvec{W}}^{\prime }{\varvec{X}}_{i}^{\prime } {\varvec{X}}_{i} {\varvec{WU}} - {\varvec{D}}_{i}\right\| ^{2} = \mathop \sum \limits _{{i = 1}}^{I} \left\| {\varvec{U}}^{\prime }{\varvec{T}}_{i}^{\prime } {\varvec{T}}_{i} {\varvec{U}} - {\varvec{D}}_{i}\right\| ^{2} , \end{aligned}$$

where \({\varvec{D}}_{i} \) is a diagonal matrix containing the variances of the component scores for level-2 unit i. This loss function equals zero when the component scores are at the same time orthogonal across level-2 units (i.e., \({\varvec{T}}'{\varvec{T}}={\varvec{I}}N)\), which will always be the case since we imposed \({\varvec{T}}\) to be orthogonal (see Sect. 2.2), and orthogonal within each level-2 unit i (i.e., \({\varvec{T}}_{i}^{\prime }{\varvec{T}}_{i} ={\varvec{I}}N_{i} )\). This function can be rewritten as:

$$\begin{aligned} \mathop \sum \limits _{{i = 1}}^{I} \left\| {\varvec{U}}^{\prime }{\varvec{T}}_{i}^{\prime } {\varvec{T}}_{i} {\varvec{U}} - {\varvec{D}}_{i}\right\| ^{2}= & {} \mathop \sum \limits _{{i = 1}}^{I} {\varvec{tr}}\left[ {\left( {{\varvec{U}}^{\prime }{\varvec{T}}_{i}^{\prime } {\varvec{T}}_{i} {\varvec{U}} - {\varvec{D}}_{i} } \right) \left( {{\varvec{U}}^{\prime }{\varvec{T}}_{i}^{\prime } {\varvec{T}}_{i} {\varvec{U}} - {\varvec{D}}_{i} } \right) ^{\prime } } \right] \nonumber \\= & {} \mathop \sum \limits _{{i = 1}}^{I} {\varvec{tr}}\left[ {\left( {{\varvec{U}}^{\prime }{\varvec{T}}_{i}^{\prime } {\varvec{T}}_{i} {\varvec{U}} - {\varvec{D}}_{i} } \right) {\varvec{U}}^{\prime }{\varvec{U}}\left( {{\varvec{U}}^{\prime }{\varvec{T}}_{i}^{\prime } {\varvec{T}}_{i} {\varvec{U}} - {\varvec{D}}_{i} } \right) ^{\prime } {\varvec{U}}^{\prime }{\varvec{U}}} \right] \nonumber \\= & {} \mathop \sum \limits _{{i = 1}}^{I} {\varvec{tr}}\left[ {{\varvec{U}}\left( {{\varvec{U}}^{\prime }{\varvec{T}}_{i}^{\prime } {\varvec{T}}_{i} {\varvec{U}} - {\varvec{D}}_{i} } \right) {\varvec{U}}^{\prime }{\varvec{U}}\left( {{\varvec{U}}^{\prime }{\varvec{T}}_{i}^{\prime } {\varvec{T}}_{i} {\varvec{U}} - {\varvec{D}}_{i} } \right) ^{\prime } {\varvec{U}}^{\prime }} \right] \nonumber \\= & {} \mathop \sum \limits _{{i = 1}}^{I} {\varvec{tr}} \left[ \left( {\varvec{UU}}^{\prime } {\varvec{T}}_{i}^{\prime } {\varvec{T}}_{i} {\varvec{UU}}^{\prime } - {\varvec{UD}}_{i} {\varvec{U}}^{\prime } \right) \left( {\varvec{UU}}^{\prime }{\varvec{T}}_{i}^{\prime } {\varvec{T}}_{i} {\varvec{UU}}^{\prime } - {\varvec{UD}}_{i}^{\prime } {\varvec{U}}^{\prime } \right) \right] \nonumber \\= & {} \mathop \sum \limits _{{i = 1}}^{I} {\varvec{tr}}\left[ {\left( {{\varvec{T}}_{i}^{\prime } {\varvec{T}}_{i} - {\varvec{UD}}_{i} {\varvec{U}}^{\prime }} \right) \left( {{\varvec{T}}_{i}^{\prime } {\varvec{T}}_{i} - {\varvec{UD}}_{i}^{\prime } {\varvec{U}}^{\prime }} \right) } \right] \end{aligned}$$

yielding the INDORT loss function, for which an alternating least squares estimation approach exists (Kroonenberg, 1983; Kiers, 1989).

Appendix 2: Unconstrained Minimization of \({\varvec{W}}\)

To update \({\varvec{W}}\) (without any constraint), the loss function in \(\left( 4 \right) \) can be rewritten as follows:

$$\begin{aligned} L_{2}\left( {\varvec{W}} \right)= & {} \beta \sum \limits _{i=1}^I \left\| {\varvec{X}}_{i}-{\varvec{X}}_{i}{\varvec{W}}{\varvec{P}}_{{\varvec{X}}}^{\prime } \right\| ^{2} +\left( 1-\beta \right) \sum \limits _{i=1}^I \left\| {\varvec{y}}_{i}-\sum \limits _{k=1}^K {{\varvec{c}}_{ik}{\varvec{X}}_{i}{\varvec{W}}{\varvec{p}}_{{\varvec{Y}}}^{k^{\prime }}} \right\| ^{2} \nonumber \\= & {} \sum \limits _{k=1}^K \left\| \sqrt{\beta }{\varvec{X}}_{k}-\sqrt{\beta }{\varvec{X}}_{k}{\varvec{W}}{\varvec{P}}_{{\varvec{X}}}^{\prime } \right\| ^{2} +\sum \limits _{k=1}^K \left\| \sqrt{1-\beta } {\varvec{y}}_{k}-\sqrt{1-\beta } {\varvec{X}}_{k}{\varvec{W}}{\varvec{p}}_{{\varvec{Y}}}^{k^{\prime }} \right\| ^{2}\nonumber \\= & {} \sum \limits _{k=1}^K \left\| \mathrm{vec}\left( \sqrt{\beta }{\varvec{X}}_{k} \right) -\sqrt{\beta }\left( {\varvec{P}}_{{\varvec{X}}}\otimes {\varvec{X}}_{k} \right) \mathrm{vec}\left( {\varvec{W}} \right) \right\| ^{2} \nonumber \\&+\sum \limits _{k=1}^K \left\| \mathrm{vec}\left( \sqrt{1-\beta } {\varvec{y}}_{k} \right) -\sqrt{1-\beta } \left( {\varvec{p}}_{{\varvec{Y}}}^{k}\otimes {\varvec{X}}_{k} \right) \mathrm{vec}\left( {\varvec{W}} \right) \right\| ^{2}\nonumber \\= & {} \left\| \left[ \begin{array}{c} \mathrm{vec}\left( \sqrt{\beta }{\varvec{X}}_{1} \right) \\ \ldots \\ \mathrm{vec}\left( {\sqrt{\beta }{\varvec{X}}}_{K} \right) \\ \end{array} \right] -\left[ \begin{array}{c} \sqrt{\beta }\left( {\varvec{P}}_{{\varvec{X}}}\otimes {\varvec{X}}_{1} \right) \\ \ldots \\ \sqrt{\beta }\left( {\varvec{P}}_{{\varvec{X}}}\otimes {\varvec{X}}_{K} \right) \\ \end{array} \right] \mathrm{vec}\left( {\varvec{W}} \right) \right\| ^{2}+\left\| \left[ \begin{array}{c} \mathrm{vec}\left( \sqrt{1-\beta } {\varvec{y}}_{1} \right) \\ \ldots \\ \mathrm{vec}\left( {\sqrt{1-\beta } {\varvec{y}}}_{K} \right) \\ \end{array} \right] \right. \nonumber \\&-\left. \left[ \begin{array}{c} \sqrt{1-\beta } \left( {\varvec{p}}_{{\varvec{Y}}}^{1}\otimes {\varvec{X}}_{1} \right) \\ \ldots \\ \sqrt{1-\beta } \left( {\varvec{p}}_{{\varvec{Y}}}^{K}\otimes {\varvec{X}}_{K} \right) \\ \end{array} \right] \mathrm{vec}\left( {\varvec{W}} \right) \right\| ^{2}\nonumber \\= & {} \left\| \left[ \begin{array}{c} \mathrm{vec}\left( \sqrt{\beta }{\varvec{X}}_{1} \right) \\ \ldots \\ \mathrm{vec}\left( {\sqrt{\beta }{\varvec{X}}}_{K} \right) \\ \mathrm{vec}\left( \sqrt{1-\beta } {\varvec{y}}_{1} \right) \\ \ldots \\ \mathrm{vec}\left( {\sqrt{1-\beta } {\varvec{y}}}_{K} \right) \\ \end{array} \right] -\left[ \begin{array}{c} \sqrt{\beta }\left( {\varvec{P}}_{{\varvec{X}}}\otimes {\varvec{X}}_{1} \right) \\ \ldots \\ \sqrt{\beta }\left( {\varvec{P}}_{{\varvec{X}}}\otimes {\varvec{X}}_{K} \right) \\ \sqrt{1-\beta } \left( {\varvec{p}}_{{\varvec{Y}}}^{1}\otimes {\varvec{X}}_{1} \right) \\ \ldots \\ \sqrt{1-\beta } \left( {\varvec{p}}_{{\varvec{Y}}}^{K}\otimes {\varvec{X}}_{K} \right) \\ \end{array} \right] \mathrm{vec}\left( {\varvec{W}} \right) \right\| ^{2}\nonumber \\= & {} \left\| {\varvec{y}}^{*}-{\varvec{Z}}^{*}\mathrm{vec}\left( {\varvec{W}} \right) \right\| ^{2}. \end{aligned}$$

It appears that updating this loss function over \({\varvec{W}}\) conditional on \({\varvec{P}}_{{\varvec{X}}} , {\varvec{p}}_{{\varvec{Y}}}^{k} \), and \({\varvec{C}}\) boils down to solving a multivariate multiple linear regression problem, for which the following closed-form solution exists:

$$\begin{aligned} \mathrm{vec}\left( {\varvec{W}} \right) = \left( {{\varvec{Z}}^{{*^{\prime }}} {\varvec{Z}}^{*} } \right) ^{{ - 1}} {\varvec{Z}}^{{*^{\prime }}} {\varvec{y}}^{*}. \end{aligned}$$

The matrix \({\varvec{W}}\) can be obtained by devectorizing \(\mathrm{vec}\left( {{\varvec{W}}} \right) \) into a matrix. Note that \( {\varvec{Z}}^{{*^{\prime }}} {\varvec{Z}}^{*} \) and \( {\varvec{Z}}^{{*^{\prime }}} {\varvec{y}}^{*} \) can be simplified as follows:

$$\begin{aligned} {\varvec{Z}}^{{*^{\prime }}} {\varvec{Z}}^{*}= & {} \left[ {\begin{array}{c} {\sqrt{\beta }\left( {{\varvec{P}}_{{\varvec{X}}} \otimes {\varvec{X}}_{1} } \right) } \\ \ldots \\ {\sqrt{\beta }\left( {{\varvec{P}}_{{\varvec{X}}} \otimes {\varvec{X}}_{K} } \right) } \\ {\sqrt{1 - \beta } \left( {{\varvec{p}}_{{\varvec{Y}}}^{1} \otimes {\varvec{X}}_{1} } \right) } \\ \ldots \\ {\sqrt{1 - \beta } \left( {{\varvec{p}}_{{\varvec{Y}}}^{K} \otimes {\varvec{X}}_{K} } \right) } \\ \end{array} } \right] ^{\prime } \left[ {\begin{array}{c} {\sqrt{\beta }\left( {{\varvec{P}}_{{\varvec{X}}} \otimes {\varvec{X}}_{1} } \right) } \\ \ldots \\ {\sqrt{\beta }\left( {{\varvec{P}}_{{\varvec{X}}} \otimes {\varvec{X}}_{K} } \right) } \\ {\sqrt{1 - \beta } \left( {{\varvec{p}}_{{\varvec{Y}}}^{1} \otimes {\varvec{X}}_{1} } \right) } \\ \ldots \\ {\sqrt{1 - \beta } \left( {{\varvec{p}}_{{\varvec{Y}}}^{K} \otimes {\varvec{X}}_{K} } \right) } \\ \end{array} } \right] \nonumber \\= & {} \left( {\sqrt{\beta }{\varvec{P}}_{{\varvec{X}}}^{\prime } \sqrt{\beta }{\varvec{P}}_{{\varvec{X}}} \otimes {\varvec{X}}_{1}^{\prime } {\varvec{X}}_{1} } \right) + \cdots + \left( {\sqrt{\beta }{\varvec{P}}_{{\varvec{X}}}^{\prime } \sqrt{\beta }{\varvec{P}}_{{\varvec{X}}} \otimes {\varvec{X}}_{K}^{\prime } {\varvec{X}}_{K} } \right) \nonumber \\&+ \left( {\sqrt{1 - \beta } {\varvec{p}}_{{\varvec{Y}}}^{{1^{\prime }}} \sqrt{1 - \beta } {\varvec{p}}_{{\varvec{Y}}}^{1} \otimes {\varvec{X}}_{1}^{\prime } {\varvec{X}}_{1} } \right) + \cdots + \left( {\sqrt{1 - \beta } {\varvec{p}}_{{\varvec{Y}}}^{{K^{\prime }}} \sqrt{1 - \beta } {\varvec{p}}_{{\varvec{Y}}}^{K} \otimes {\varvec{X}}_{K}^{\prime } {\varvec{X}}_{K} } \right) \nonumber \\= & {} \mathop \sum \limits _{{k = 1}}^{K} \left[ {\left( {\sqrt{\beta }{\varvec{P}}_{{\varvec{X}}}^{\prime } \sqrt{\beta }{\varvec{P}}_{{\varvec{X}}} \otimes {\varvec{X}}_{k}^{\prime } {\varvec{X}}_{k} } \right) + \left( {\sqrt{1 - \beta } {\varvec{p}}_{{\varvec{Y}}}^{{k^{\prime }}} \sqrt{1 - \beta } {\varvec{p}}_{{\varvec{Y}}}^{k} \otimes {\varvec{X}}_{k}^{\prime } {\varvec{X}}_{k} } \right) } \right] \nonumber \\= & {} \mathop \sum \limits _{{k = 1}}^{K} \left[ {\left( {\sqrt{\beta }{\varvec{P}}_{{\varvec{X}}}^{\prime } \sqrt{\beta }{\varvec{P}}_{{\varvec{X}}} + \sqrt{1 - \beta } {\varvec{p}}_{{\varvec{Y}}}^{{k^{\prime }}} \sqrt{1 - \beta } {\varvec{p}}_{{\varvec{Y}}}^{k} } \right) \otimes {\varvec{X}}_{k}^{\prime } {\varvec{X}}_{k} } \right] \nonumber \\= & {} \mathop \sum \limits _{{k = 1}}^{K} \left[ {\left( {\beta {\varvec{P}}_{{\varvec{X}}}^{\prime } {\varvec{P}}_{{\varvec{X}}} + \left( {1 - \beta } \right) {\varvec{p}}_{{\varvec{Y}}}^{{k^{\prime }}} {\varvec{p}}_{{\varvec{Y}}}^{k} } \right) \otimes {\varvec{X}}_{k}^{\prime } {\varvec{X}}_{k} } \right] \nonumber \\ {\varvec{Z}}^{{*^{\prime }}} {\varvec{y}}^{*}= & {} \left[ {\begin{array}{c} {\sqrt{\beta }\left( {{\varvec{P}}_{{\varvec{X}}} \otimes {\varvec{X}}_{1} } \right) } \\ \ldots \\ {\sqrt{\beta }\left( {{\varvec{P}}_{{\varvec{X}}} \otimes {\varvec{X}}_{K} } \right) } \\ {\sqrt{1 - \beta } \left( {{\varvec{p}}_{{\varvec{Y}}}^{1} \otimes {\varvec{X}}_{1} } \right) } \\ \ldots \\ {\sqrt{1 - \beta } \left( {{\varvec{p}}_{{\varvec{Y}}}^{K} \otimes {\varvec{X}}_{K} } \right) } \\ \end{array} } \right] ^{\prime } \left[ {\begin{array}{c} {\begin{array}{c} {\text {vec}\left( {\sqrt{\beta }{\varvec{X}}_{1} } \right) } \\ \ldots \\ {\text {vec}\left( {\sqrt{\beta }{\varvec{X}}_{K} } \right) } \\ \end{array} } \\ {\text {vec}\left( {\sqrt{1 - \beta } {\varvec{y}}_{1} } \right) } \\ \ldots \\ {\text {vec}\left( {\sqrt{1 - \beta } {\varvec{y}}_{K} } \right) } \\ \end{array} } \right] \nonumber \end{aligned}$$
$$\begin{aligned}= & {} \left( {\sqrt{\beta }{\varvec{P}}_{{\varvec{X}}}^{\prime } \otimes {\varvec{X}}_{1}^{\prime } } \right) \text {vec}\left( {\sqrt{\beta }{\varvec{X}}_{1} } \right) \nonumber \\&+ \ldots + \left( {\sqrt{\beta }{\varvec{P}}_{{\varvec{X}}}^{\prime } \otimes {\varvec{X}}_{K}^{\prime } } \right) \text {vec}\left( {\sqrt{\beta }{\varvec{X}}_{K} } \right) + \left( {{\sqrt{1- \beta }} {\varvec{p}}_{{\varvec{Y}}}^{{1^{\prime }}} \otimes {\varvec{X}}_{1}^{\prime } } \right) \text {vec}\left( {{\sqrt{1- \beta }} {\varvec{y}}_{1} } \right) \\&+ \ldots + \left( {{\sqrt{1- \beta }} {\varvec{p}}_{{\varvec{Y}}}^{{K^{\prime }}} \otimes {\varvec{X}}_{K}^{\prime } } \right) \text {vec}\left( {{\sqrt{1- \beta }} {\varvec{y}}_{K} } \right) \\&= \text {vec}\left[ \beta {\varvec{X}}_{1}^{\prime } {\varvec{X}}_{1} {\varvec{P}}_{{\varvec{X}}} + \ldots + \beta {\varvec{X}}_{K}^{\prime } {\varvec{X}}_{K} {\varvec{P}}_{{\varvec{X}}} \right. \nonumber \\&\left. + \left( {1 - \beta } \right) {\varvec{X}}_{1}^{\prime } {\varvec{y}}_{1} {\varvec{p}}_{{\varvec{Y}}}^{1} + \ldots + \left( {1 - \beta } \right) {\varvec{X}}_{K}^{\prime } {\varvec{y}}_{K} {\varvec{p}}_{{\varvec{Y}}}^{K} \right] \nonumber \\= & {} \text {vec}\left[ {\beta \left( {\mathop \sum \nolimits _{{k = 1}}^{K} {\varvec{X}}_{k}^{\prime } {\varvec{X}}_{k} } \right) {\varvec{P}}_{{\varvec{X}}} + \left( {1 - \beta } \right) \mathop \sum \nolimits _{{k = 1}}^{K} \left( {{\varvec{X}}_{k}^{\prime } {\varvec{y}}_{k} {\varvec{p}}_{{\varvec{Y}}}^{k} } \right) } \right] . \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wilderjans, T.F., Vande Gaer, E., Kiers, H.A.L. et al. Principal Covariates Clusterwise Regression (PCCR): Accounting for Multicollinearity and Population Heterogeneity in Hierarchically Organized Data. Psychometrika 82, 86–111 (2017). https://doi.org/10.1007/s11336-016-9522-0

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11336-016-9522-0

Keywords

Navigation