Principal Covariates Clusterwise Regression (PCCR): Accounting for Multicollinearity and Population Heterogeneity in Hierarchically Organized Data

Wilderjans, Tom Frans; Vande Gaer, Eva; Kiers, Henk A. L.; Van Mechelen, Iven; Ceulemans, Eva

doi:10.1007/s11336-016-9522-0

Principal Covariates Clusterwise Regression (PCCR): Accounting for Multicollinearity and Population Heterogeneity in Hierarchically Organized Data

Published: 30 November 2016

Volume 82, pages 86–111, (2017)
Cite this article

Psychometrika Aims and scope Submit manuscript

Tom Frans Wilderjans^1,2,
Eva Vande Gaer²,
Henk A. L. Kiers³,
Iven Van Mechelen² &
…
Eva Ceulemans²

604 Accesses
6 Citations
Explore all metrics

Abstract

In the behavioral sciences, many research questions pertain to a regression problem in that one wants to predict a criterion on the basis of a number of predictors. Although in many cases, ordinary least squares regression will suffice, sometimes the prediction problem is more challenging, for three reasons: first, multiple highly collinear predictors can be available, making it difficult to grasp their mutual relations as well as their relations to the criterion. In that case, it may be very useful to reduce the predictors to a few summary variables, on which one regresses the criterion and which at the same time yields insight into the predictor structure. Second, the population under study may consist of a few unknown subgroups that are characterized by different regression models. Third, the obtained data are often hierarchically structured, with for instance, observations being nested into persons or participants within groups or countries. Although some methods have been developed that partially meet these challenges (i.e., principal covariates regression (PCovR), clusterwise regression (CR), and structural equation models), none of these methods adequately deals with all of them simultaneously. To fill this gap, we propose the principal covariates clusterwise regression (PCCR) method, which combines the key idea’s behind PCovR (de Jong & Kiers in Chemom Intell Lab Syst 14(1–3):155–164, 1992) and CR (Späth in Computing 22(4):367–373, 1979). The PCCR method is validated by means of a simulation study and by applying it to cross-cultural data regarding satisfaction with life.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reporting reliability, convergent and discriminant validity with structural equation modeling: A review and best-practice recommendations

Article Open access 30 January 2023

Sampling Techniques for Quantitative Research

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

Article 04 June 2018

Notes

We generated 360 datasets by manipulating the number of clusters (at three levels: two, three, and four clusters), the cluster sizes (at two levels: one small cluster with 20 % of the level-2 units and one large cluster with 80 % of the level-2 units; the other level-2 units were equally spread across the other clusters) and the $\alpha $-level (at six levels: .001, .01, .05, .15, .25 and .35); all other factors were held constant (i.e., 50 level 2-units with 25 level-1 units each, 20 predictors, two relevant and six irrelevant components, the same cluster-specific regression weights ${\varvec{p}}_{{\varvec{Y}}}^{k^\mathrm{{true}}}$ as in Table 2 and 20 % noise in ${\varvec{X}}$ and in ${\varvec{y}})$; we used ten replications for each cell of the design. The recovery of the underlying clustering, quantified by the adjusted rand index (ARI; see Sect. 4.3.2), can be considered reasonable as the mean ARI value equals .93, whereas the mean ARI for the comparable conditions in the simulation study with equal sized underlying clusters amounts to .96. Recovery, however, drops when the number of clusters increases (i.e., mean ARI of .94, .95 and .89 for two, three, and four clusters, respectively) and/or when there is one large cluster next to multiple small ones (i.e., mean ARI of .99 and .87 for the first and second level of the cluster size factor, respectively).
Note that taking .0000001 as convergence criterion might be too strict, resulting in a considerable lengthening of the computation time. A possible solution is to use $.0000001\, \left\| {\varvec{X}} \right\| ^{2}$ instead (i.e., to look at the proportional decrease instead of the absolute decrease in fit).
Note that step three of the PCCR algorithm consists of a double (nested) iterative procedure of which the outer iterations pertain to updating the level-2 unit memberships and the inner iterations to updating ${\varvec{W}}, {\varvec{P}}_{{\varvec{X}}}$, and ${\varvec{p}}_{{\varvec{Y}}}^{k}$, conditional on ${\varvec{C}}$.
Another option is to directly solve the constrained problem (instead of the unconstrained one) by means of an iterative majorization approach (Kiers & ten Berge, 1992). The technical details of this approach are available upon request from the authors. The results of a pilot simulation study showed that both algorithms in most cases lead to the same final solution, but that the majorization approach consumes more time. Therefore, the majorization approach is not further discussed in this manuscript.
When the worst fitting level 2-unit is the only level-2 unit in his/her cluster, we move on to the level-2 unit with the second worst fit, etcetera.
The results of a pilot study in which the number of components and clusters (i.e., two and four), the amount of noise in ${\varvec{X}}$ and ${\varvec{y}}$ (i.e., 10 and 40 %), and the cluster sizes (see Brusco & Cradit, 2001; Steinley, 2003) have been manipulated indeed reveal that PCCR and a sequential approach perform equally well, when all components are strong and relevant.
${\varvec{W}}^\mathrm{{true}}$ can be computed from ${\varvec{T}}^\mathrm{{true}}$ as follows: ${\varvec{W}}^\mathrm{{true}}=\left( {\varvec{X}}^{\mathrm{{true}}^{\prime }}{\varvec{X}}^\mathrm{{true}} \right) ^{-1}{\varvec{X}}^{\mathrm{{true}}^{\prime }}{\varvec{T}}^\mathrm{{true}}$, with ${\varvec{X}}^\mathrm{{true}}={\varvec{T}}^\mathrm{{true}}{\varvec{P}}_{{\varvec{X}}}^{\mathrm{{true}}^{\prime }}$.
This choice of cluster-specific regression weights implies that the clusters are clearly separated in terms of their underlying cluster-specific regression models. To quantify the degree of cluster separation, we computed the ratio $ z = \frac{\mathop {\sum }\nolimits _{r = 1}^{R} \left| {p_{{\varvec{Y}}}^{k}}^\mathrm{{true}}_{r} - {{p_{{\varvec{Y}}}^{k}}_{r}^{\prime }}^{{{\mathrm{{true}}}}} \right| }{\mathop {\sum }\nolimits _{r = 1}^{R} \left| {p_{{\varvec{Y}}}^{k}}_{r}^\mathrm{{true}} \right| }$ for each pair of clusters, with $z>.50$ implying that clusters are well separated (Kiers & Smilde, 2007) and $p_{{\varvec{Y}}_{r}}^{k^\mathrm{{true}}} \,\left( {p_{{\varvec{Y}}_{r}}^{k^{\prime }}} ^{'^\mathrm{{true}}} \right) $ indicating the $r^{\mathrm {th}}$ true regression weight for cluster $k\left( k^{\prime } \right) $. For the generated datasets with two clusters the z-ratio equals 1.23, whereas the mean z-ratio (computed over all possible cluster pairs) for the three- and four-cluster datasets equals 1.57 (with separate values of 1.23, 1.82, and 1.67) and 4.47 (with separate values of 1.23, 2.64, 8.76, 1.15, 7.33, and 5.71), respectively. Note that for generated datasets in the three- and four-cluster conditions, the clusters are more clearly separated than in the two-cluster conditions. Notice further that the larger mean z-ratio for the solutions with four clusters is mainly due to the fourth cluster being clearly separated from the other three clusters (with the z-ratios for the other three clusters being in the range of the z-ratios for the generated datasets with two and three clusters).
We believe that models with more than four clusters and/or four components are too complex for our data (i.e., 48 countries and 34 variables). In particular, the fit of these complex models is not substantially larger than the fit of less complex models. Furthermore, we did not consider solutions with one cluster and/or one component as such models are too simplistic.
These computations are based on the use of a single core at the same time. Modern computers, however, are able to use up to four cores simultaneously, reducing the computation time to 2 h.

References

Arminger, G., & Stein, P. (1997). Finite mixtures of covariance structure models with regressors: Loglikelihood function, minimum distance estimation, fit indices, and a complex example. Sociological Methods & Research, 26(2), 148–182. doi:10.1177/0049124197026002002.
Article Google Scholar
Brusco, M. J., & Cradit, J. D. (2001). A variable selection heuristic for K-means clustering. Psychometrika, 66, 249–270. doi:10.1007/BF02294838.
Article Google Scholar
Brusco, M. J., Cradit, J. D., Steinley, D., & Fox, G. L. (2008). Cautionary remarks on the use of clusterwise regression. Multivariate Behavioral Research, 43(1), 29–49. doi:10.1080/00273170701836653.
Article PubMed Google Scholar
Brusco, M. J., Cradit, J. D., & Tashchian, A. (2003). Multicriterion clusterwise regression for joint segmentation settings: An application to customer value. Journal of Marketing Research, 40(2), 225–234.
Article Google Scholar
Ceulemans, E., & Kiers, H. A. L. (2009). Discriminating between strong and weak structures in three-mode principal component analysis. British Journal of Mathematical & Statistical Psychology, 62, 601–620. doi:10.1348/000711008X369474.
Article Google Scholar
Ceulemans, E., Kuppens, P., & Van Mechelen, I. (2012). Capturing the structure of distinct types of individual differences in the situation-specific experience of emotions: The case of anger. European Journal of Personality, 26, 484–495. doi:10.1002/per.847.
Article Google Scholar
Ceulemans, E., & Van Mechelen, I. (2008). CLASSI: A classification model for the study of sequential processes and individual differences therein. Psychometrika, 73, 107–124. doi:10.1007/s11336-007-9024-1.
Article Google Scholar
Ceulemans, E., Van Mechelen, I., & Leenen, I. (2007). The local minima problem in hierarchical classes analysis: an evaluation of a simulated annealing algorithm and various multistart procedures. Psychometrika, 72, 377–391.
Article Google Scholar
Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155–159. doi:10.1037/0033-2909-112-1-155.
Article PubMed Google Scholar
Coxe, K. L. (1986). Principal components regression analysis. In S. Kotz, N. L. Johnson, & C. B. Read (Eds.), Encyclopedia of statistical sciences (pp. 181–184). New York: Wiley.
Google Scholar
de Jong, S., & Kiers, H. A. L. (1992). Principal covariates regression: Part I. Theory. Chemometrics and Intelligent Laboratory Systems, 14(1–3), 155–164. doi:10.1016/0169-7439(92)80100-I.
Article Google Scholar
De Roover, K., Ceulemans, E., Timmerman, M. E., Vansteelandt, K., Stouten, J., & Onghena, P. (2012). Clusterwise simultaneous component analysis for analyzing structural differences in multivariate multiblock data. Psychological Methods, 17, 100–119. doi:10.1037/a0025385.
Article PubMed Google Scholar
DeSarbo, W. S., & Cron, W. L. (1988). A maximum likelihood methodology for clusterwise linear regression. Journal of Classification, 5, 249–282.
Article Google Scholar
DeSarbo, W. S., & Edwards, E. A. (1996). Typologies of compulsive buying behavior: A constrained clusterwise regression approach. Journal of Consumer Psychology, 5(3), 231–262.
Article Google Scholar
DeSarbo, W. S., Oliver, R. L., & Rangaswamy, A. (1989). A simulated annealing methodology for clusterwise linear regression. Psychometrika, 54(4), 707–736.
Article Google Scholar
Hahn, C., Johnson, M. D., Herrmann, A., & Huber, F. (2002). Capturing customer heterogeneity using a finite mixture PLS approach. Schmalenbach Business Review, 54(3), 243–269.
Google Scholar
Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218.
Article Google Scholar
Kaiser, H. F. (1958). The varimax criterion for analytic rotation in factor analysis. Psychometrika, 23, 187–200.
Article Google Scholar
Kiers, H. A. L. (1989). Three-way methods for the analysis of qualitative and quantitative two-way data. Leiden: DSWO Press.
Google Scholar
Kiers, H. A. L., & Smilde, A. (2007). A comparison of various methods for multivariate regression with highly collinear variables. Statistical Methods & Applications, 16(2), 193–228. doi:10.1007/s10260-006-0025-5.
Article Google Scholar
Kiers, H. A. L., & ten Berge, J. M. F. (1992). Minimization of a class of matrix trace functions by means of refined majorization. Psychometrika, 57, 371–382.
Article Google Scholar
Korth, B., & Tucker, L. R. (1975). The distribution of chance congruence coefficients from simulated data. Psychometrika, 40(3), 361–372.
Article Google Scholar
Kroonenberg, P. M. (2008). Applied multiway data analysis. Hoboken, NJ: Wiley.
Book Google Scholar
Kroonenberg, P. M. (1983). Three-mode principal component analysis: Theory and applications. Leiden: DSWO Press.
Google Scholar
Kuppens, P., Ceulemans, E., Timmerman, M. E., Diener, E., & Kim-Prieto, C. (2006). Universal intracultural and intercultural dimensions of the recalled frequency of emotional experience. Journal of Cross-Cultural Psychology, 37, 491–515. doi:10.1177/0022022106290474.
Article Google Scholar
Leisch, F. (2004). FlexMix: A general framework for finite mixture models and latent class regression in R. Journal of Statistical Software, 11(8), 1–18.
Article Google Scholar
Roa, C. R. (1964). The use and interpretation of principal component analysis in applied research. Sankhyā: The Indian Journal of Statistics, Series A, 26(4), 329–358.
Google Scholar
Sarstedt, M., & Ringle, C. M. (2010). Treating unobserved heterogeneity in PLS path modeling: a comparison of FIMIX-PLS with different data analysis strategies. Journal of Applied Statistics, 37(8), 1299–1318. doi:10.1080/02664760903030213.
Article Google Scholar
Schott, J. R. (2005). Matrix analysis for statistics (2nd ed.). Hoboken, NJ: Wiley.
Google Scholar
Späth, H. (1979). Algorithm 39: Clusterwise linear regression. Computing, 22(4), 367–373. doi:10.1007/BF02265317.
Article Google Scholar
Späth, H. (1981). Correction to algorithm 39: Clusterwise linear regression. Computing, 26(3), 275–275. doi:10.1007/BF02243486.
Article Google Scholar
Steinley, D. (2003). Local optima in K-means clustering: What you don’t know may hurt you. Psychological Methods, 8, 294–304. doi:10.1037/1082-989X.8.3.294.
Article PubMed Google Scholar
Steinley, D. (2004). Properties of the Hubert-Arabie adjusted rand index. Psychological Methods, 9(3), 386–396. doi:10.1037/1082-989X.9.3.386.
Article PubMed Google Scholar
Stormshak, E. A., Bierman, K. L., Bruschi, C., Dodge, K. A., & Coie, J. D., The Conduct Problems Prevention Research Group. (1999). The relation between behavior problems and peer preference in different classroom contexts. Child Development, 70(1), 169–182.
ten Berge, J. M. F. (1977). Orthogonal procrustes rotation for two or more matrices. Psychometrika, 42(2), 267–276.
Article Google Scholar
Tucker, L. R. (1951). A method for synthesis of factor analysis studies. Personnel Research Section Rapport #984. Washington, DC: Department of the Army.
van den Berg, R. A., Hoefsloot, H. C. J., Westerhuis, J. A., Smilde, A. K., & van der Werf, M. J. (2006). Centering, scaling and transformations: improving the biological information content of metabolomics data. BMC Genomics, 7(1), 142–157. doi:10.1186/1471-2164-7-142.
Article PubMed PubMed Central Google Scholar
van den Berg, R. A., Van Mechelen, I., Wilderjans, T. F., Van Deun, K., Kiers, H. A. L., & Smilde, A. K. (2009). Integrating functional genomics data using maximum likelihood based simultaneous component analysis. BMC Bioinformatics, 10, 340. doi:10.1186/1471-2105-10-340.
Article PubMed PubMed Central Google Scholar
Van Deun, K., Smilde, A. K., van der Werf, M. J., Kiers, H. A. L., & Van Mechelen, I. (2009). A structured overview of simultaneous component based data integration. BMC Bioinformatics, 10, 246. doi:10.1186/1471-2105-10-246.
Article PubMed PubMed Central Google Scholar
Vervloet, M., Van Deun, K., Van den Noortgate, W., & Ceulemans, E. (2013). On the selection of the weighting parameter value in principal covariates regression. Chemometrics and Intelligent Laboratory Systems, 123, 36–43. doi:10.1016/j.chemolab.2013.02.005.
Article Google Scholar
Wedel, M., & DeSarbo, W. S. (1995). A mixture likelihood approach for generalized linear models. Journal of Classification, 12, 21–55.
Article Google Scholar
Wilderjans, T. F., & Ceulemans, E. (2013). Clusterwise Parafac to identify heterogeneity in three-way data. Chemometrics and Intelligent Laboratory Systems, 129, 87–97. doi:10.1016/j.chemolab.2013.09.010.
Article Google Scholar
Wilderjans, T. F., Ceulemans, E., & Kuppens, P. (2012). Clusterwise HICLAS: A generic modeling strategy to trace similarities and differences in multi-block binary data. Behavior Research Methods, 44, 532–545. doi:10.3758/s13428-011-0166-9.
Article PubMed Google Scholar
Wilderjans, T. F., Ceulemans, E., & Van Mechelen, I. (2009). Simultaneous analysis of coupled data blocks differing in size: A comparison of two weighting schemes. Computational Statistics and Data Analysis, 53, 1086–1098. doi:10.1016/j.csda.2008.09.031.
Article Google Scholar
Wilderjans, T. F., Ceulemans, E., Van Mechelen, I., & van den Berg, R. A. (2011). Simultaneous analysis of coupled data matrices subject to different amounts of noise. British Journal of Mathematical and Statistical Psychology, 64, 277–290. doi:10.1348/000711010X513263.
Article PubMed Google Scholar
Wold, H. (1966). Estimation of principal component and related methods by iterative least squares. In P. R. Krishnaiah (Ed.), Multivariate analysis (pp. 391–420). New York: Academic Press.
Google Scholar

Download references

Acknowledgements

The first author is a postdoctoral Fellow of the Fund of Scientific Research (FWO) Flanders (Belgium). The research leading to the results reported in this paper was sponsored in part by the Research Fund of KU Leuven (GOA/15/003) and by the Interuniversity Attraction Poles programme financed by the Belgian Government (IAP/P7/06).

Author information

Authors and Affiliations

Methodology and Statistics Unit, Institute of Psychology, Faculty of Social and Behavioral Sciences, Leiden University, Wassenaarseweg 52 (Pieter de la Court Building), 2333 AK , Leiden, The Netherlands
Tom Frans Wilderjans
KU Leuven, Tiensestraat 102, Box 3713, 3000, Leuven, Belgium
Tom Frans Wilderjans, Eva Vande Gaer, Iven Van Mechelen & Eva Ceulemans
Faculty of Behavioural and Social Sciences, University of Groningen, Groningen, The Netherlands
Henk A. L. Kiers

Authors

Tom Frans Wilderjans
View author publications
You can also search for this author in PubMed Google Scholar
Eva Vande Gaer
View author publications
You can also search for this author in PubMed Google Scholar
Henk A. L. Kiers
View author publications
You can also search for this author in PubMed Google Scholar
Iven Van Mechelen
View author publications
You can also search for this author in PubMed Google Scholar
Eva Ceulemans
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tom Frans Wilderjans.

Appendices

Appendix 1: INDORT Rotation Toward Orthogonality per Level-2 Unit

The aim of the INDORT rotation is to find the orthonormal rotation matrix ${\varvec{U}} \, \left( {{\varvec{U}}^{\prime }{\varvec{U}} = {\varvec{UU}}^{\prime } = {\varvec{I}}} \right) $ that minimizes the following loss function:

$$\begin{aligned} f = \mathop \sum \limits _{{i = 1}}^{I} \left\| {\varvec{U}}^{\prime }{\varvec{W}}^{\prime }{\varvec{X}}_{i}^{\prime } {\varvec{X}}_{i} {\varvec{WU}} - {\varvec{D}}_{i}\right\| ^{2} = \mathop \sum \limits _{{i = 1}}^{I} \left\| {\varvec{U}}^{\prime }{\varvec{T}}_{i}^{\prime } {\varvec{T}}_{i} {\varvec{U}} - {\varvec{D}}_{i}\right\| ^{2} , \end{aligned}$$

where ${\varvec{D}}_{i} $ is a diagonal matrix containing the variances of the component scores for level-2 unit i. This loss function equals zero when the component scores are at the same time orthogonal across level-2 units (i.e., ${\varvec{T}}'{\varvec{T}}={\varvec{I}}N)$, which will always be the case since we imposed ${\varvec{T}}$ to be orthogonal (see Sect. 2.2), and orthogonal within each level-2 unit i (i.e., ${\varvec{T}}_{i}^{\prime }{\varvec{T}}_{i} ={\varvec{I}}N_{i} )$. This function can be rewritten as:

$$\begin{aligned} \mathop \sum \limits _{{i = 1}}^{I} \left\| {\varvec{U}}^{\prime }{\varvec{T}}_{i}^{\prime } {\varvec{T}}_{i} {\varvec{U}} - {\varvec{D}}_{i}\right\| ^{2}= & {} \mathop \sum \limits _{{i = 1}}^{I} {\varvec{tr}}\left[ {\left( {{\varvec{U}}^{\prime }{\varvec{T}}_{i}^{\prime } {\varvec{T}}_{i} {\varvec{U}} - {\varvec{D}}_{i} } \right) \left( {{\varvec{U}}^{\prime }{\varvec{T}}_{i}^{\prime } {\varvec{T}}_{i} {\varvec{U}} - {\varvec{D}}_{i} } \right) ^{\prime } } \right] \nonumber \\= & {} \mathop \sum \limits _{{i = 1}}^{I} {\varvec{tr}}\left[ {\left( {{\varvec{U}}^{\prime }{\varvec{T}}_{i}^{\prime } {\varvec{T}}_{i} {\varvec{U}} - {\varvec{D}}_{i} } \right) {\varvec{U}}^{\prime }{\varvec{U}}\left( {{\varvec{U}}^{\prime }{\varvec{T}}_{i}^{\prime } {\varvec{T}}_{i} {\varvec{U}} - {\varvec{D}}_{i} } \right) ^{\prime } {\varvec{U}}^{\prime }{\varvec{U}}} \right] \nonumber \\= & {} \mathop \sum \limits _{{i = 1}}^{I} {\varvec{tr}}\left[ {{\varvec{U}}\left( {{\varvec{U}}^{\prime }{\varvec{T}}_{i}^{\prime } {\varvec{T}}_{i} {\varvec{U}} - {\varvec{D}}_{i} } \right) {\varvec{U}}^{\prime }{\varvec{U}}\left( {{\varvec{U}}^{\prime }{\varvec{T}}_{i}^{\prime } {\varvec{T}}_{i} {\varvec{U}} - {\varvec{D}}_{i} } \right) ^{\prime } {\varvec{U}}^{\prime }} \right] \nonumber \\= & {} \mathop \sum \limits _{{i = 1}}^{I} {\varvec{tr}} \left[ \left( {\varvec{UU}}^{\prime } {\varvec{T}}_{i}^{\prime } {\varvec{T}}_{i} {\varvec{UU}}^{\prime } - {\varvec{UD}}_{i} {\varvec{U}}^{\prime } \right) \left( {\varvec{UU}}^{\prime }{\varvec{T}}_{i}^{\prime } {\varvec{T}}_{i} {\varvec{UU}}^{\prime } - {\varvec{UD}}_{i}^{\prime } {\varvec{U}}^{\prime } \right) \right] \nonumber \\= & {} \mathop \sum \limits _{{i = 1}}^{I} {\varvec{tr}}\left[ {\left( {{\varvec{T}}_{i}^{\prime } {\varvec{T}}_{i} - {\varvec{UD}}_{i} {\varvec{U}}^{\prime }} \right) \left( {{\varvec{T}}_{i}^{\prime } {\varvec{T}}_{i} - {\varvec{UD}}_{i}^{\prime } {\varvec{U}}^{\prime }} \right) } \right] \end{aligned}$$

yielding the INDORT loss function, for which an alternating least squares estimation approach exists (Kroonenberg, 1983; Kiers, 1989).

Appendix 2: Unconstrained Minimization of ${\varvec{W}}$

To update ${\varvec{W}}$ (without any constraint), the loss function in $\left( 4 \right) $ can be rewritten as follows:

$$\begin{aligned} L_{2}\left( {\varvec{W}} \right)= & {} \beta \sum \limits _{i=1}^I \left\| {\varvec{X}}_{i}-{\varvec{X}}_{i}{\varvec{W}}{\varvec{P}}_{{\varvec{X}}}^{\prime } \right\| ^{2} +\left( 1-\beta \right) \sum \limits _{i=1}^I \left\| {\varvec{y}}_{i}-\sum \limits _{k=1}^K {{\varvec{c}}_{ik}{\varvec{X}}_{i}{\varvec{W}}{\varvec{p}}_{{\varvec{Y}}}^{k^{\prime }}} \right\| ^{2} \nonumber \\= & {} \sum \limits _{k=1}^K \left\| \sqrt{\beta }{\varvec{X}}_{k}-\sqrt{\beta }{\varvec{X}}_{k}{\varvec{W}}{\varvec{P}}_{{\varvec{X}}}^{\prime } \right\| ^{2} +\sum \limits _{k=1}^K \left\| \sqrt{1-\beta } {\varvec{y}}_{k}-\sqrt{1-\beta } {\varvec{X}}_{k}{\varvec{W}}{\varvec{p}}_{{\varvec{Y}}}^{k^{\prime }} \right\| ^{2}\nonumber \\= & {} \sum \limits _{k=1}^K \left\| \mathrm{vec}\left( \sqrt{\beta }{\varvec{X}}_{k} \right) -\sqrt{\beta }\left( {\varvec{P}}_{{\varvec{X}}}\otimes {\varvec{X}}_{k} \right) \mathrm{vec}\left( {\varvec{W}} \right) \right\| ^{2} \nonumber \\&+\sum \limits _{k=1}^K \left\| \mathrm{vec}\left( \sqrt{1-\beta } {\varvec{y}}_{k} \right) -\sqrt{1-\beta } \left( {\varvec{p}}_{{\varvec{Y}}}^{k}\otimes {\varvec{X}}_{k} \right) \mathrm{vec}\left( {\varvec{W}} \right) \right\| ^{2}\nonumber \\= & {} \left\| \left[ \begin{array}{c} \mathrm{vec}\left( \sqrt{\beta }{\varvec{X}}_{1} \right) \\ \ldots \\ \mathrm{vec}\left( {\sqrt{\beta }{\varvec{X}}}_{K} \right) \\ \end{array} \right] -\left[ \begin{array}{c} \sqrt{\beta }\left( {\varvec{P}}_{{\varvec{X}}}\otimes {\varvec{X}}_{1} \right) \\ \ldots \\ \sqrt{\beta }\left( {\varvec{P}}_{{\varvec{X}}}\otimes {\varvec{X}}_{K} \right) \\ \end{array} \right] \mathrm{vec}\left( {\varvec{W}} \right) \right\| ^{2}+\left\| \left[ \begin{array}{c} \mathrm{vec}\left( \sqrt{1-\beta } {\varvec{y}}_{1} \right) \\ \ldots \\ \mathrm{vec}\left( {\sqrt{1-\beta } {\varvec{y}}}_{K} \right) \\ \end{array} \right] \right. \nonumber \\&-\left. \left[ \begin{array}{c} \sqrt{1-\beta } \left( {\varvec{p}}_{{\varvec{Y}}}^{1}\otimes {\varvec{X}}_{1} \right) \\ \ldots \\ \sqrt{1-\beta } \left( {\varvec{p}}_{{\varvec{Y}}}^{K}\otimes {\varvec{X}}_{K} \right) \\ \end{array} \right] \mathrm{vec}\left( {\varvec{W}} \right) \right\| ^{2}\nonumber \\= & {} \left\| \left[ \begin{array}{c} \mathrm{vec}\left( \sqrt{\beta }{\varvec{X}}_{1} \right) \\ \ldots \\ \mathrm{vec}\left( {\sqrt{\beta }{\varvec{X}}}_{K} \right) \\ \mathrm{vec}\left( \sqrt{1-\beta } {\varvec{y}}_{1} \right) \\ \ldots \\ \mathrm{vec}\left( {\sqrt{1-\beta } {\varvec{y}}}_{K} \right) \\ \end{array} \right] -\left[ \begin{array}{c} \sqrt{\beta }\left( {\varvec{P}}_{{\varvec{X}}}\otimes {\varvec{X}}_{1} \right) \\ \ldots \\ \sqrt{\beta }\left( {\varvec{P}}_{{\varvec{X}}}\otimes {\varvec{X}}_{K} \right) \\ \sqrt{1-\beta } \left( {\varvec{p}}_{{\varvec{Y}}}^{1}\otimes {\varvec{X}}_{1} \right) \\ \ldots \\ \sqrt{1-\beta } \left( {\varvec{p}}_{{\varvec{Y}}}^{K}\otimes {\varvec{X}}_{K} \right) \\ \end{array} \right] \mathrm{vec}\left( {\varvec{W}} \right) \right\| ^{2}\nonumber \\= & {} \left\| {\varvec{y}}^{*}-{\varvec{Z}}^{*}\mathrm{vec}\left( {\varvec{W}} \right) \right\| ^{2}. \end{aligned}$$

It appears that updating this loss function over ${\varvec{W}}$ conditional on ${\varvec{P}}_{{\varvec{X}}} , {\varvec{p}}_{{\varvec{Y}}}^{k} $, and ${\varvec{C}}$ boils down to solving a multivariate multiple linear regression problem, for which the following closed-form solution exists:

$$\begin{aligned} \mathrm{vec}\left( {\varvec{W}} \right) = \left( {{\varvec{Z}}^{{*^{\prime }}} {\varvec{Z}}^{*} } \right) ^{{ - 1}} {\varvec{Z}}^{{*^{\prime }}} {\varvec{y}}^{*}. \end{aligned}$$

The matrix ${\varvec{W}}$ can be obtained by devectorizing $\mathrm{vec}\left( {{\varvec{W}}} \right) $ into a matrix. Note that $ {\varvec{Z}}^{{*^{\prime }}} {\varvec{Z}}^{*} $ and $ {\varvec{Z}}^{{*^{\prime }}} {\varvec{y}}^{*} $ can be simplified as follows:

$$\begin{aligned} {\varvec{Z}}^{{*^{\prime }}} {\varvec{Z}}^{*}= & {} \left[ {\begin{array}{c} {\sqrt{\beta }\left( {{\varvec{P}}_{{\varvec{X}}} \otimes {\varvec{X}}_{1} } \right) } \\ \ldots \\ {\sqrt{\beta }\left( {{\varvec{P}}_{{\varvec{X}}} \otimes {\varvec{X}}_{K} } \right) } \\ {\sqrt{1 - \beta } \left( {{\varvec{p}}_{{\varvec{Y}}}^{1} \otimes {\varvec{X}}_{1} } \right) } \\ \ldots \\ {\sqrt{1 - \beta } \left( {{\varvec{p}}_{{\varvec{Y}}}^{K} \otimes {\varvec{X}}_{K} } \right) } \\ \end{array} } \right] ^{\prime } \left[ {\begin{array}{c} {\sqrt{\beta }\left( {{\varvec{P}}_{{\varvec{X}}} \otimes {\varvec{X}}_{1} } \right) } \\ \ldots \\ {\sqrt{\beta }\left( {{\varvec{P}}_{{\varvec{X}}} \otimes {\varvec{X}}_{K} } \right) } \\ {\sqrt{1 - \beta } \left( {{\varvec{p}}_{{\varvec{Y}}}^{1} \otimes {\varvec{X}}_{1} } \right) } \\ \ldots \\ {\sqrt{1 - \beta } \left( {{\varvec{p}}_{{\varvec{Y}}}^{K} \otimes {\varvec{X}}_{K} } \right) } \\ \end{array} } \right] \nonumber \\= & {} \left( {\sqrt{\beta }{\varvec{P}}_{{\varvec{X}}}^{\prime } \sqrt{\beta }{\varvec{P}}_{{\varvec{X}}} \otimes {\varvec{X}}_{1}^{\prime } {\varvec{X}}_{1} } \right) + \cdots + \left( {\sqrt{\beta }{\varvec{P}}_{{\varvec{X}}}^{\prime } \sqrt{\beta }{\varvec{P}}_{{\varvec{X}}} \otimes {\varvec{X}}_{K}^{\prime } {\varvec{X}}_{K} } \right) \nonumber \\&+ \left( {\sqrt{1 - \beta } {\varvec{p}}_{{\varvec{Y}}}^{{1^{\prime }}} \sqrt{1 - \beta } {\varvec{p}}_{{\varvec{Y}}}^{1} \otimes {\varvec{X}}_{1}^{\prime } {\varvec{X}}_{1} } \right) + \cdots + \left( {\sqrt{1 - \beta } {\varvec{p}}_{{\varvec{Y}}}^{{K^{\prime }}} \sqrt{1 - \beta } {\varvec{p}}_{{\varvec{Y}}}^{K} \otimes {\varvec{X}}_{K}^{\prime } {\varvec{X}}_{K} } \right) \nonumber \\= & {} \mathop \sum \limits _{{k = 1}}^{K} \left[ {\left( {\sqrt{\beta }{\varvec{P}}_{{\varvec{X}}}^{\prime } \sqrt{\beta }{\varvec{P}}_{{\varvec{X}}} \otimes {\varvec{X}}_{k}^{\prime } {\varvec{X}}_{k} } \right) + \left( {\sqrt{1 - \beta } {\varvec{p}}_{{\varvec{Y}}}^{{k^{\prime }}} \sqrt{1 - \beta } {\varvec{p}}_{{\varvec{Y}}}^{k} \otimes {\varvec{X}}_{k}^{\prime } {\varvec{X}}_{k} } \right) } \right] \nonumber \\= & {} \mathop \sum \limits _{{k = 1}}^{K} \left[ {\left( {\sqrt{\beta }{\varvec{P}}_{{\varvec{X}}}^{\prime } \sqrt{\beta }{\varvec{P}}_{{\varvec{X}}} + \sqrt{1 - \beta } {\varvec{p}}_{{\varvec{Y}}}^{{k^{\prime }}} \sqrt{1 - \beta } {\varvec{p}}_{{\varvec{Y}}}^{k} } \right) \otimes {\varvec{X}}_{k}^{\prime } {\varvec{X}}_{k} } \right] \nonumber \\= & {} \mathop \sum \limits _{{k = 1}}^{K} \left[ {\left( {\beta {\varvec{P}}_{{\varvec{X}}}^{\prime } {\varvec{P}}_{{\varvec{X}}} + \left( {1 - \beta } \right) {\varvec{p}}_{{\varvec{Y}}}^{{k^{\prime }}} {\varvec{p}}_{{\varvec{Y}}}^{k} } \right) \otimes {\varvec{X}}_{k}^{\prime } {\varvec{X}}_{k} } \right] \nonumber \\ {\varvec{Z}}^{{*^{\prime }}} {\varvec{y}}^{*}= & {} \left[ {\begin{array}{c} {\sqrt{\beta }\left( {{\varvec{P}}_{{\varvec{X}}} \otimes {\varvec{X}}_{1} } \right) } \\ \ldots \\ {\sqrt{\beta }\left( {{\varvec{P}}_{{\varvec{X}}} \otimes {\varvec{X}}_{K} } \right) } \\ {\sqrt{1 - \beta } \left( {{\varvec{p}}_{{\varvec{Y}}}^{1} \otimes {\varvec{X}}_{1} } \right) } \\ \ldots \\ {\sqrt{1 - \beta } \left( {{\varvec{p}}_{{\varvec{Y}}}^{K} \otimes {\varvec{X}}_{K} } \right) } \\ \end{array} } \right] ^{\prime } \left[ {\begin{array}{c} {\begin{array}{c} {\text {vec}\left( {\sqrt{\beta }{\varvec{X}}_{1} } \right) } \\ \ldots \\ {\text {vec}\left( {\sqrt{\beta }{\varvec{X}}_{K} } \right) } \\ \end{array} } \\ {\text {vec}\left( {\sqrt{1 - \beta } {\varvec{y}}_{1} } \right) } \\ \ldots \\ {\text {vec}\left( {\sqrt{1 - \beta } {\varvec{y}}_{K} } \right) } \\ \end{array} } \right] \nonumber \end{aligned}$$

$$\begin{aligned}= & {} \left( {\sqrt{\beta }{\varvec{P}}_{{\varvec{X}}}^{\prime } \otimes {\varvec{X}}_{1}^{\prime } } \right) \text {vec}\left( {\sqrt{\beta }{\varvec{X}}_{1} } \right) \nonumber \\&+ \ldots + \left( {\sqrt{\beta }{\varvec{P}}_{{\varvec{X}}}^{\prime } \otimes {\varvec{X}}_{K}^{\prime } } \right) \text {vec}\left( {\sqrt{\beta }{\varvec{X}}_{K} } \right) + \left( {{\sqrt{1- \beta }} {\varvec{p}}_{{\varvec{Y}}}^{{1^{\prime }}} \otimes {\varvec{X}}_{1}^{\prime } } \right) \text {vec}\left( {{\sqrt{1- \beta }} {\varvec{y}}_{1} } \right) \\&+ \ldots + \left( {{\sqrt{1- \beta }} {\varvec{p}}_{{\varvec{Y}}}^{{K^{\prime }}} \otimes {\varvec{X}}_{K}^{\prime } } \right) \text {vec}\left( {{\sqrt{1- \beta }} {\varvec{y}}_{K} } \right) \\&= \text {vec}\left[ \beta {\varvec{X}}_{1}^{\prime } {\varvec{X}}_{1} {\varvec{P}}_{{\varvec{X}}} + \ldots + \beta {\varvec{X}}_{K}^{\prime } {\varvec{X}}_{K} {\varvec{P}}_{{\varvec{X}}} \right. \nonumber \\&\left. + \left( {1 - \beta } \right) {\varvec{X}}_{1}^{\prime } {\varvec{y}}_{1} {\varvec{p}}_{{\varvec{Y}}}^{1} + \ldots + \left( {1 - \beta } \right) {\varvec{X}}_{K}^{\prime } {\varvec{y}}_{K} {\varvec{p}}_{{\varvec{Y}}}^{K} \right] \nonumber \\= & {} \text {vec}\left[ {\beta \left( {\mathop \sum \nolimits _{{k = 1}}^{K} {\varvec{X}}_{k}^{\prime } {\varvec{X}}_{k} } \right) {\varvec{P}}_{{\varvec{X}}} + \left( {1 - \beta } \right) \mathop \sum \nolimits _{{k = 1}}^{K} \left( {{\varvec{X}}_{k}^{\prime } {\varvec{y}}_{k} {\varvec{p}}_{{\varvec{Y}}}^{k} } \right) } \right] . \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wilderjans, T.F., Vande Gaer, E., Kiers, H.A.L. et al. Principal Covariates Clusterwise Regression (PCCR): Accounting for Multicollinearity and Population Heterogeneity in Hierarchically Organized Data. Psychometrika 82, 86–111 (2017). https://doi.org/10.1007/s11336-016-9522-0

Download citation

Received: 07 July 2014
Revised: 18 February 2016
Published: 30 November 2016
Issue Date: March 2017
DOI: https://doi.org/10.1007/s11336-016-9522-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Principal Covariates Clusterwise Regression (PCCR): Accounting for Multicollinearity and Population Heterogeneity in Hierarchically Organized Data

Abstract

Access this article

Similar content being viewed by others

Reporting reliability, convergent and discriminant validity with structural equation modeling: A review and best-practice recommendations

Sampling Techniques for Quantitative Research

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: INDORT Rotation Toward Orthogonality per Level-2 Unit

Appendix 2: Unconstrained Minimization of \({\varvec{W}}\)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Principal Covariates Clusterwise Regression (PCCR): Accounting for Multicollinearity and Population Heterogeneity in Hierarchically Organized Data

Abstract

Access this article

Similar content being viewed by others

Reporting reliability, convergent and discriminant validity with structural equation modeling: A review and best-practice recommendations

Sampling Techniques for Quantitative Research

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: INDORT Rotation Toward Orthogonality per Level-2 Unit

Appendix 2: Unconstrained Minimization of \({\varvec{W}}\)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation