Abstract
In the behavioral sciences, many research questions pertain to a regression problem in that one wants to predict a criterion on the basis of a number of predictors. Although in many cases, ordinary least squares regression will suffice, sometimes the prediction problem is more challenging, for three reasons: first, multiple highly collinear predictors can be available, making it difficult to grasp their mutual relations as well as their relations to the criterion. In that case, it may be very useful to reduce the predictors to a few summary variables, on which one regresses the criterion and which at the same time yields insight into the predictor structure. Second, the population under study may consist of a few unknown subgroups that are characterized by different regression models. Third, the obtained data are often hierarchically structured, with for instance, observations being nested into persons or participants within groups or countries. Although some methods have been developed that partially meet these challenges (i.e., principal covariates regression (PCovR), clusterwise regression (CR), and structural equation models), none of these methods adequately deals with all of them simultaneously. To fill this gap, we propose the principal covariates clusterwise regression (PCCR) method, which combines the key idea’s behind PCovR (de Jong & Kiers in Chemom Intell Lab Syst 14(1–3):155–164, 1992) and CR (Späth in Computing 22(4):367–373, 1979). The PCCR method is validated by means of a simulation study and by applying it to cross-cultural data regarding satisfaction with life.
Similar content being viewed by others
Notes
We generated 360 datasets by manipulating the number of clusters (at three levels: two, three, and four clusters), the cluster sizes (at two levels: one small cluster with 20 % of the level-2 units and one large cluster with 80 % of the level-2 units; the other level-2 units were equally spread across the other clusters) and the \(\alpha \)-level (at six levels: .001, .01, .05, .15, .25 and .35); all other factors were held constant (i.e., 50 level 2-units with 25 level-1 units each, 20 predictors, two relevant and six irrelevant components, the same cluster-specific regression weights \({\varvec{p}}_{{\varvec{Y}}}^{k^\mathrm{{true}}}\) as in Table 2 and 20 % noise in \({\varvec{X}}\) and in \({\varvec{y}})\); we used ten replications for each cell of the design. The recovery of the underlying clustering, quantified by the adjusted rand index (ARI; see Sect. 4.3.2), can be considered reasonable as the mean ARI value equals .93, whereas the mean ARI for the comparable conditions in the simulation study with equal sized underlying clusters amounts to .96. Recovery, however, drops when the number of clusters increases (i.e., mean ARI of .94, .95 and .89 for two, three, and four clusters, respectively) and/or when there is one large cluster next to multiple small ones (i.e., mean ARI of .99 and .87 for the first and second level of the cluster size factor, respectively).
Note that taking .0000001 as convergence criterion might be too strict, resulting in a considerable lengthening of the computation time. A possible solution is to use \(.0000001\, \left\| {\varvec{X}} \right\| ^{2}\) instead (i.e., to look at the proportional decrease instead of the absolute decrease in fit).
Note that step three of the PCCR algorithm consists of a double (nested) iterative procedure of which the outer iterations pertain to updating the level-2 unit memberships and the inner iterations to updating \({\varvec{W}}, {\varvec{P}}_{{\varvec{X}}}\), and \({\varvec{p}}_{{\varvec{Y}}}^{k}\), conditional on \({\varvec{C}}\).
Another option is to directly solve the constrained problem (instead of the unconstrained one) by means of an iterative majorization approach (Kiers & ten Berge, 1992). The technical details of this approach are available upon request from the authors. The results of a pilot simulation study showed that both algorithms in most cases lead to the same final solution, but that the majorization approach consumes more time. Therefore, the majorization approach is not further discussed in this manuscript.
When the worst fitting level 2-unit is the only level-2 unit in his/her cluster, we move on to the level-2 unit with the second worst fit, etcetera.
The results of a pilot study in which the number of components and clusters (i.e., two and four), the amount of noise in \({\varvec{X}}\) and \({\varvec{y}}\) (i.e., 10 and 40 %), and the cluster sizes (see Brusco & Cradit, 2001; Steinley, 2003) have been manipulated indeed reveal that PCCR and a sequential approach perform equally well, when all components are strong and relevant.
\({\varvec{W}}^\mathrm{{true}}\) can be computed from \({\varvec{T}}^\mathrm{{true}}\) as follows: \({\varvec{W}}^\mathrm{{true}}=\left( {\varvec{X}}^{\mathrm{{true}}^{\prime }}{\varvec{X}}^\mathrm{{true}} \right) ^{-1}{\varvec{X}}^{\mathrm{{true}}^{\prime }}{\varvec{T}}^\mathrm{{true}}\), with \({\varvec{X}}^\mathrm{{true}}={\varvec{T}}^\mathrm{{true}}{\varvec{P}}_{{\varvec{X}}}^{\mathrm{{true}}^{\prime }}\).
This choice of cluster-specific regression weights implies that the clusters are clearly separated in terms of their underlying cluster-specific regression models. To quantify the degree of cluster separation, we computed the ratio \( z = \frac{\mathop {\sum }\nolimits _{r = 1}^{R} \left| {p_{{\varvec{Y}}}^{k}}^\mathrm{{true}}_{r} - {{p_{{\varvec{Y}}}^{k}}_{r}^{\prime }}^{{{\mathrm{{true}}}}} \right| }{\mathop {\sum }\nolimits _{r = 1}^{R} \left| {p_{{\varvec{Y}}}^{k}}_{r}^\mathrm{{true}} \right| }\) for each pair of clusters, with \(z>.50\) implying that clusters are well separated (Kiers & Smilde, 2007) and \(p_{{\varvec{Y}}_{r}}^{k^\mathrm{{true}}} \,\left( {p_{{\varvec{Y}}_{r}}^{k^{\prime }}} ^{'^\mathrm{{true}}} \right) \) indicating the \(r^{\mathrm {th}}\) true regression weight for cluster \(k\left( k^{\prime } \right) \). For the generated datasets with two clusters the z-ratio equals 1.23, whereas the mean z-ratio (computed over all possible cluster pairs) for the three- and four-cluster datasets equals 1.57 (with separate values of 1.23, 1.82, and 1.67) and 4.47 (with separate values of 1.23, 2.64, 8.76, 1.15, 7.33, and 5.71), respectively. Note that for generated datasets in the three- and four-cluster conditions, the clusters are more clearly separated than in the two-cluster conditions. Notice further that the larger mean z-ratio for the solutions with four clusters is mainly due to the fourth cluster being clearly separated from the other three clusters (with the z-ratios for the other three clusters being in the range of the z-ratios for the generated datasets with two and three clusters).
We believe that models with more than four clusters and/or four components are too complex for our data (i.e., 48 countries and 34 variables). In particular, the fit of these complex models is not substantially larger than the fit of less complex models. Furthermore, we did not consider solutions with one cluster and/or one component as such models are too simplistic.
These computations are based on the use of a single core at the same time. Modern computers, however, are able to use up to four cores simultaneously, reducing the computation time to 2 h.
References
Arminger, G., & Stein, P. (1997). Finite mixtures of covariance structure models with regressors: Loglikelihood function, minimum distance estimation, fit indices, and a complex example. Sociological Methods & Research, 26(2), 148–182. doi:10.1177/0049124197026002002.
Brusco, M. J., & Cradit, J. D. (2001). A variable selection heuristic for K-means clustering. Psychometrika, 66, 249–270. doi:10.1007/BF02294838.
Brusco, M. J., Cradit, J. D., Steinley, D., & Fox, G. L. (2008). Cautionary remarks on the use of clusterwise regression. Multivariate Behavioral Research, 43(1), 29–49. doi:10.1080/00273170701836653.
Brusco, M. J., Cradit, J. D., & Tashchian, A. (2003). Multicriterion clusterwise regression for joint segmentation settings: An application to customer value. Journal of Marketing Research, 40(2), 225–234.
Ceulemans, E., & Kiers, H. A. L. (2009). Discriminating between strong and weak structures in three-mode principal component analysis. British Journal of Mathematical & Statistical Psychology, 62, 601–620. doi:10.1348/000711008X369474.
Ceulemans, E., Kuppens, P., & Van Mechelen, I. (2012). Capturing the structure of distinct types of individual differences in the situation-specific experience of emotions: The case of anger. European Journal of Personality, 26, 484–495. doi:10.1002/per.847.
Ceulemans, E., & Van Mechelen, I. (2008). CLASSI: A classification model for the study of sequential processes and individual differences therein. Psychometrika, 73, 107–124. doi:10.1007/s11336-007-9024-1.
Ceulemans, E., Van Mechelen, I., & Leenen, I. (2007). The local minima problem in hierarchical classes analysis: an evaluation of a simulated annealing algorithm and various multistart procedures. Psychometrika, 72, 377–391.
Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155–159. doi:10.1037/0033-2909-112-1-155.
Coxe, K. L. (1986). Principal components regression analysis. In S. Kotz, N. L. Johnson, & C. B. Read (Eds.), Encyclopedia of statistical sciences (pp. 181–184). New York: Wiley.
de Jong, S., & Kiers, H. A. L. (1992). Principal covariates regression: Part I. Theory. Chemometrics and Intelligent Laboratory Systems, 14(1–3), 155–164. doi:10.1016/0169-7439(92)80100-I.
De Roover, K., Ceulemans, E., Timmerman, M. E., Vansteelandt, K., Stouten, J., & Onghena, P. (2012). Clusterwise simultaneous component analysis for analyzing structural differences in multivariate multiblock data. Psychological Methods, 17, 100–119. doi:10.1037/a0025385.
DeSarbo, W. S., & Cron, W. L. (1988). A maximum likelihood methodology for clusterwise linear regression. Journal of Classification, 5, 249–282.
DeSarbo, W. S., & Edwards, E. A. (1996). Typologies of compulsive buying behavior: A constrained clusterwise regression approach. Journal of Consumer Psychology, 5(3), 231–262.
DeSarbo, W. S., Oliver, R. L., & Rangaswamy, A. (1989). A simulated annealing methodology for clusterwise linear regression. Psychometrika, 54(4), 707–736.
Hahn, C., Johnson, M. D., Herrmann, A., & Huber, F. (2002). Capturing customer heterogeneity using a finite mixture PLS approach. Schmalenbach Business Review, 54(3), 243–269.
Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218.
Kaiser, H. F. (1958). The varimax criterion for analytic rotation in factor analysis. Psychometrika, 23, 187–200.
Kiers, H. A. L. (1989). Three-way methods for the analysis of qualitative and quantitative two-way data. Leiden: DSWO Press.
Kiers, H. A. L., & Smilde, A. (2007). A comparison of various methods for multivariate regression with highly collinear variables. Statistical Methods & Applications, 16(2), 193–228. doi:10.1007/s10260-006-0025-5.
Kiers, H. A. L., & ten Berge, J. M. F. (1992). Minimization of a class of matrix trace functions by means of refined majorization. Psychometrika, 57, 371–382.
Korth, B., & Tucker, L. R. (1975). The distribution of chance congruence coefficients from simulated data. Psychometrika, 40(3), 361–372.
Kroonenberg, P. M. (2008). Applied multiway data analysis. Hoboken, NJ: Wiley.
Kroonenberg, P. M. (1983). Three-mode principal component analysis: Theory and applications. Leiden: DSWO Press.
Kuppens, P., Ceulemans, E., Timmerman, M. E., Diener, E., & Kim-Prieto, C. (2006). Universal intracultural and intercultural dimensions of the recalled frequency of emotional experience. Journal of Cross-Cultural Psychology, 37, 491–515. doi:10.1177/0022022106290474.
Leisch, F. (2004). FlexMix: A general framework for finite mixture models and latent class regression in R. Journal of Statistical Software, 11(8), 1–18.
Roa, C. R. (1964). The use and interpretation of principal component analysis in applied research. Sankhyā: The Indian Journal of Statistics, Series A, 26(4), 329–358.
Sarstedt, M., & Ringle, C. M. (2010). Treating unobserved heterogeneity in PLS path modeling: a comparison of FIMIX-PLS with different data analysis strategies. Journal of Applied Statistics, 37(8), 1299–1318. doi:10.1080/02664760903030213.
Schott, J. R. (2005). Matrix analysis for statistics (2nd ed.). Hoboken, NJ: Wiley.
Späth, H. (1979). Algorithm 39: Clusterwise linear regression. Computing, 22(4), 367–373. doi:10.1007/BF02265317.
Späth, H. (1981). Correction to algorithm 39: Clusterwise linear regression. Computing, 26(3), 275–275. doi:10.1007/BF02243486.
Steinley, D. (2003). Local optima in K-means clustering: What you don’t know may hurt you. Psychological Methods, 8, 294–304. doi:10.1037/1082-989X.8.3.294.
Steinley, D. (2004). Properties of the Hubert-Arabie adjusted rand index. Psychological Methods, 9(3), 386–396. doi:10.1037/1082-989X.9.3.386.
Stormshak, E. A., Bierman, K. L., Bruschi, C., Dodge, K. A., & Coie, J. D., The Conduct Problems Prevention Research Group. (1999). The relation between behavior problems and peer preference in different classroom contexts. Child Development, 70(1), 169–182.
ten Berge, J. M. F. (1977). Orthogonal procrustes rotation for two or more matrices. Psychometrika, 42(2), 267–276.
Tucker, L. R. (1951). A method for synthesis of factor analysis studies. Personnel Research Section Rapport #984. Washington, DC: Department of the Army.
van den Berg, R. A., Hoefsloot, H. C. J., Westerhuis, J. A., Smilde, A. K., & van der Werf, M. J. (2006). Centering, scaling and transformations: improving the biological information content of metabolomics data. BMC Genomics, 7(1), 142–157. doi:10.1186/1471-2164-7-142.
van den Berg, R. A., Van Mechelen, I., Wilderjans, T. F., Van Deun, K., Kiers, H. A. L., & Smilde, A. K. (2009). Integrating functional genomics data using maximum likelihood based simultaneous component analysis. BMC Bioinformatics, 10, 340. doi:10.1186/1471-2105-10-340.
Van Deun, K., Smilde, A. K., van der Werf, M. J., Kiers, H. A. L., & Van Mechelen, I. (2009). A structured overview of simultaneous component based data integration. BMC Bioinformatics, 10, 246. doi:10.1186/1471-2105-10-246.
Vervloet, M., Van Deun, K., Van den Noortgate, W., & Ceulemans, E. (2013). On the selection of the weighting parameter value in principal covariates regression. Chemometrics and Intelligent Laboratory Systems, 123, 36–43. doi:10.1016/j.chemolab.2013.02.005.
Wedel, M., & DeSarbo, W. S. (1995). A mixture likelihood approach for generalized linear models. Journal of Classification, 12, 21–55.
Wilderjans, T. F., & Ceulemans, E. (2013). Clusterwise Parafac to identify heterogeneity in three-way data. Chemometrics and Intelligent Laboratory Systems, 129, 87–97. doi:10.1016/j.chemolab.2013.09.010.
Wilderjans, T. F., Ceulemans, E., & Kuppens, P. (2012). Clusterwise HICLAS: A generic modeling strategy to trace similarities and differences in multi-block binary data. Behavior Research Methods, 44, 532–545. doi:10.3758/s13428-011-0166-9.
Wilderjans, T. F., Ceulemans, E., & Van Mechelen, I. (2009). Simultaneous analysis of coupled data blocks differing in size: A comparison of two weighting schemes. Computational Statistics and Data Analysis, 53, 1086–1098. doi:10.1016/j.csda.2008.09.031.
Wilderjans, T. F., Ceulemans, E., Van Mechelen, I., & van den Berg, R. A. (2011). Simultaneous analysis of coupled data matrices subject to different amounts of noise. British Journal of Mathematical and Statistical Psychology, 64, 277–290. doi:10.1348/000711010X513263.
Wold, H. (1966). Estimation of principal component and related methods by iterative least squares. In P. R. Krishnaiah (Ed.), Multivariate analysis (pp. 391–420). New York: Academic Press.
Acknowledgements
The first author is a postdoctoral Fellow of the Fund of Scientific Research (FWO) Flanders (Belgium). The research leading to the results reported in this paper was sponsored in part by the Research Fund of KU Leuven (GOA/15/003) and by the Interuniversity Attraction Poles programme financed by the Belgian Government (IAP/P7/06).
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1: INDORT Rotation Toward Orthogonality per Level-2 Unit
The aim of the INDORT rotation is to find the orthonormal rotation matrix \({\varvec{U}} \, \left( {{\varvec{U}}^{\prime }{\varvec{U}} = {\varvec{UU}}^{\prime } = {\varvec{I}}} \right) \) that minimizes the following loss function:
where \({\varvec{D}}_{i} \) is a diagonal matrix containing the variances of the component scores for level-2 unit i. This loss function equals zero when the component scores are at the same time orthogonal across level-2 units (i.e., \({\varvec{T}}'{\varvec{T}}={\varvec{I}}N)\), which will always be the case since we imposed \({\varvec{T}}\) to be orthogonal (see Sect. 2.2), and orthogonal within each level-2 unit i (i.e., \({\varvec{T}}_{i}^{\prime }{\varvec{T}}_{i} ={\varvec{I}}N_{i} )\). This function can be rewritten as:
yielding the INDORT loss function, for which an alternating least squares estimation approach exists (Kroonenberg, 1983; Kiers, 1989).
Appendix 2: Unconstrained Minimization of \({\varvec{W}}\)
To update \({\varvec{W}}\) (without any constraint), the loss function in \(\left( 4 \right) \) can be rewritten as follows:
It appears that updating this loss function over \({\varvec{W}}\) conditional on \({\varvec{P}}_{{\varvec{X}}} , {\varvec{p}}_{{\varvec{Y}}}^{k} \), and \({\varvec{C}}\) boils down to solving a multivariate multiple linear regression problem, for which the following closed-form solution exists:
The matrix \({\varvec{W}}\) can be obtained by devectorizing \(\mathrm{vec}\left( {{\varvec{W}}} \right) \) into a matrix. Note that \( {\varvec{Z}}^{{*^{\prime }}} {\varvec{Z}}^{*} \) and \( {\varvec{Z}}^{{*^{\prime }}} {\varvec{y}}^{*} \) can be simplified as follows:
Rights and permissions
About this article
Cite this article
Wilderjans, T.F., Vande Gaer, E., Kiers, H.A.L. et al. Principal Covariates Clusterwise Regression (PCCR): Accounting for Multicollinearity and Population Heterogeneity in Hierarchically Organized Data. Psychometrika 82, 86–111 (2017). https://doi.org/10.1007/s11336-016-9522-0
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11336-016-9522-0