Skip to main content
Log in

Disaggregating level-specific effects in cross-classified multilevel models

  • Original Manuscript
  • Published:
Behavior Research Methods Aims and scope Submit manuscript

Abstract

In psychology and other fields, data often have a cross-classified structure, whereby observations are nested within multiple types of non-hierarchical clusters (e.g., repeated measures cross-classified by persons and stimuli). This paper discusses ways that, in cross-classified multilevel models, slopes of lower-level predictors can implicitly reflect an ambiguous blend of multiple effects (for instance, a purely observation-level effect as well as a unique between-cluster effect for each type of cluster). The possibility of conflating multiple effects of lower-level predictors is well recognized for non-cross-classified multilevel models, but has not been fully discussed or clarified for cross-classified contexts. Consequently, in published cross-classified modeling applications, this possibility is almost always ignored, and researchers routinely specify models that conflate multiple effects. In this paper, we show why this common practice can be problematic, and show how to disaggregate level-specific effects in cross-classified models. We provide a novel suite of options that include fully cluster-mean-centered, partially cluster-mean-centered, and contextual effect models, each of which provides a unique interpretation of model parameters. We further clarify how to avoid both fixed and random conflation, the latter of which is widely misunderstood even in non-cross-classified models. We provide simulation results showing the possible deleterious impact of such conflation in cross-classified models, and walk through pedagogical examples to illustrate the disaggregation of level-specific effects. We conclude by considering additional model complexities that can arise with cross-classification, providing guidance for researchers in choosing among model specifications, and describing newly available software to aid researchers who wish to disaggregate effects in practice.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. In certain situations, researchers may prefer to estimate such a conflated slope, as we further consider in the Discussion section.

  2. Note that disaggregating level-specific effects of level 1 variables is still relevant for categorical predictors, both for hierarchical models (see Yaremych et al., 2021) and cross-classified models. We focused this review on continuous level 1 predictors, as centering categorical predictors is already rarely done even in hierarchical models. Additionally, with categorical variables, there is often no between-cluster variation in the level 1 variable (e.g., a within-subjects "condition" dummy variable in which all subjects have an equal number of 1s and 0s), and in this context, there are no possible between-cluster effects.

  3. Throughout this paper, we use uppercase J and K to denote the cluster type (i.e., J clusters vs. K clusters), and lowercase j and k to index individual clusters of each type (i.e., j takes on integer values from 1 to the total number of J clusters, and k takes on integer values from 1 to the total number of K clusters).

  4. To yield a more interpretable intercept, one could additionally center each of the three predictors in Eq. (4) by their respective grand means, such that the zero point of each IV is then the grand mean; doing so would not change the overall fit of the model nor any of the slopes, and would allow \({\gamma }_{00}\) to be interpreted as the across-J-and-K-cluster expected value of the outcome when all predictors are at their mean.

  5. In cases in which there are multiple observations for cluster combinations jk, one could also estimate an interaction random component, \({u}_{0jk}\), upon which we elaborate in the Discussion (for our illustrative example with only observation per jk combination, we could not include this term as it would be completely confounded with the level 1 error term; Bolger & Laurenceau, 2013)

  6. For simplicity, we’ve focused here on two-way cross-classification (i.e., two types of clusters), which is the most common type of cross-classified data structure. The general idea of disaggregation vs. conflation, however, applies to models with any number of cross-classifications (such as a three-way cross-classification; see, e.g., Pruitt et al., 2014). In models with more than two cross-classifications, researchers can create fully cluster-mean-centered variables by mean-centering by all cluster types, or can specify partially cluster-mean-centered models by mean-centering using a subset of cluster types, or can specify a contextual effect model by leaving level 1 variables uncentered (or centered-by-a-constant) and adding each cluster mean as a separate predictor.

  7. Here we use the term “bias” to mean the difference in the average sample-estimated slope of the level 1 variable and the true population-generating within-cluster effect (\({\gamma }_{w}\)). We are thus assuming the within-cluster effect is the desired estimand, but consider the possibility for researchers to instead be interested in estimating a conflated effect in the Discussion.

  8. The conditional ICC for J clusters is defined as \({\tau }_{00\left(J\right)}/({\tau }_{00\left(J\right)}+{\tau }_{00\left(K\right)}+{\sigma }^{2})\) and quantifies the proportion of unexplained outcome variance (i.e., that not accounted for by the predictors) that is between J clusters. The conditional ICC for K clusters is similarly defined as \({\tau }_{00\left(K\right)}/({\tau }_{00\left(J\right)}+{\tau }_{00\left(K\right)}+{\sigma }^{2})\). Note that the unconditional ICCs (quantifying the overall outcome variance that is between each cluster type) implicitly varies across conditions as a function of the magnitude of between-cluster effects (larger effects = larger unconditional ICC).

  9. At the extremes of the x-axis, the bias induced by the between-person effect in the uncentered x model is slightly less than what is observed at moderate values on the x-axis. Though this is a minor point and not a focus of the manuscript, this is because the relative weighting of the three effects is also influenced by the relative amount of explained variance at each level (as previously noted in hierarchical models; Raudenbush & Bryk, 2002; Rights, 2022), and at the extremes of the x-axis, the between-person effect is of largest magnitude.

  10. As an alternative approach in unbalanced designs, Raudenbush (2009) suggested researchers could create a level 1 independent variable that was independent of both cluster types by partialing out possible cluster-level effects (by either regressing x on a set of dummy codes representing membership in each cluster of each type and extracting the residuals, or using an alternative and less computationally intensive approach; see pp. 488–489). Raudenbush (2009) did not, however, consider inclusion of cluster means of the level 1 variable as separate predictors (nor consider the concept of random conflation). In cases wherein researchers wish to examine only the within-cluster effect, the Raudenbush (2009) approach can be used to ensure a disaggregated within-cluster effect. In supplemental simulations, we generated data with imbalance, and found only trivial differences in estimation of the within-cluster effect via the Raudenbush (2009) approach vs. via fully cluster-mean-centering.

  11. It is important to note that, despite the seemingly modest cluster-level sample sizes and cell sizes, the results in Fig. 7 are still from very large datasets with many level 1 observations (the smallest being 25 \(\times\) 25 \(\times\) 5 = 3125 total level 1 units). It is thus not surprising that the conflated effects are so similar to the within effect. As a follow-up investigation, we generated data in which the cell size was still 5, but the number of J and K clusters was either 5 or 25. As shown in Online Appendix B Figure OA3, the degree of conflation is much more pronounced than that shown in Fig. 7.

  12. Note that here we focus on fixed slope model specification for two reasons: 1) to provide a simple and concise illustration of the substantively distinct within-cluster vs. between-cluster effects, and 2) because the full suite of possible random slope specifications encountered convergence issues, which can be common for cross-classified models with small numbers of clusters. In practice, however, researchers should be mindful to consider adding random slopes when possible, as failing to do so when there is underlying slope heterogeneity can yield inaccurate standard errors and high type I error rates (Barr et al., 2013; Schielzeth & Forstmeier, 2009). For further demonstration, we provide corresponding random slope specifications and discuss differences across models in Online Appendix C.

  13. In each model, we also grand-mean-centered each level-specific portion of trait similarity, so that the estimate of the fixed component of the intercept can be interpreted as the estimated across-person-and-trait-pair average face correlation (see footnote 4).

  14. Xie, et al. (2021), in fact, specified a person-mean-centered model that did not include the trait-pair mean of trait similarity and, as such, the slope of person-mean-centered trait similarity was conflated by the between-trait-pair effect. Nonetheless, their partially conflated slope was not markedly dissimilar from the unconflated slope, and thus substantive conclusions were not greatly compromised.

  15. For instance, for the simulation conditions described earlier in Fig. 3 in which level-specific effects are equivalent (i.e., the 0 point on the x-axis for each plot), the uncentered-x model had a slightly lower across-sample standard deviation of estimates than the unconflated model (going from left to right and top to bottom in Fig. 3, the standard deviation for the uncentered-x vs. fully cluster-mean-centered models was 0.064 vs. 0.069, 0.031 vs. 0.032, 0.031 vs. 0.032, and 0.018 vs. 0.018, respectively).

  16. These two heteroscedastic random effect terms can be estimated in any software that allows the slope of a cluster mean to vary at its own level (e.g., including a random effect for J-cluster mean across J clusters; this is estimable, for instance, with the lmer function in the lme4 package in R; Bates et al., 2015). Note that this atypical specification may encounter estimation issues (e.g., non-convergence) due to the complexity of random effect structure. For more details on modeling such heteroscedasticity in hierarchical contexts in practice, see Rights & Sterba (2023).

References

  • Abdel Magid, H. S., Milliren, C. E., Pettee Gabriel, K., & Nagata, J. M. (2021). Disentangling individual, school, and neighborhood effects on screen time among adolescents and young adults in the United States. Preventive Medicine, 142, 106357. https://doi.org/10.1016/j.ypmed.2020.106357

    Article  PubMed  Google Scholar 

  • Akaeda, N. (2021). Welfare states and the health impact of social capital: Focusing on the crowding-out and crowding-in perspectives. Social Indicators Research, 157(3). https://doi.org/10.1007/s11205-021-02679-7

  • Barker, K. M., Subramanian, S. V., Berkman, L., Austin, S. B., & Evans, C. R. (2019). Adolescent sexual initiation: A cross-classified multilevel analysis of peer group-, school-, and neighborhood-level influences. Journal of Adolescent Health, 65(3), 390–396. https://doi.org/10.1016/j.jadohealth.2019.03.002

    Article  Google Scholar 

  • Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68(3), 255–278.

    Article  Google Scholar 

  • Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1–48.

    Article  Google Scholar 

  • Besliu, C. (2022). Complexity in insurance selection: Cross-classified multilevel analysis of experimental data. Journal of Behavioral and Experimental Finance, 100713. https://doi.org/10.1016/j.jbef.2022.100713

  • Bickel, R. (2007). Multilevel analysis for applied research: It’s just regression! Guilford Press.

    Google Scholar 

  • Bolger, N., & Laurenceau, J. P. (2013). Intensive longitudinal methods: An introduction to diary and experience sampling research. Guilford Press.

    Google Scholar 

  • Bozorgmehr, K., Maier, W., Brenner, H., Saum, K., Stock, C., Miksch, A., Holleczek, B., Szecsenyi, J., & Razum, O. (2015). Social disparities in disease management programmes for coronary heart disease in Germany: A cross-classified multilevel analysis. Journal of Epidemiology and Community Health (1979), 69(11), 1091–1101. https://doi.org/10.1136/jech-2014-204506

    Article  PubMed  Google Scholar 

  • Cronbach, L. J. (1976). Research on classrooms and schools: Formulation of questions, design and analysis. Stanford University.

    Google Scholar 

  • Goldstein, H. (2011). Multilevel statistical models. Wiley.

    Google Scholar 

  • Gooty, J., & Yammarino, F. J. (2016). The Leader-Member exchange relationship: A multisource, cross-level investigation. Journal of Management, 42(4), 915–935. https://doi.org/10.1177/0149206313503009

    Article  Google Scholar 

  • Graubard, B. I., & Korn, E. L. (1994). Regression analysis with clustered data. Statistics in Medicine, 13(5–7), 509–522. https://doi.org/10.1002/sim.4780130514

    Article  PubMed  Google Scholar 

  • Hoffman, L., & Walters, R. W. (2022). Catching up on multilevel modeling. Annual Review of Psychology, 73(1), 659–689. https://doi.org/10.1146/annurev-psych-020821-103525

    Article  PubMed  Google Scholar 

  • Hofmann, D. A., & Gavin, M. B. (1998). Centering decisions in hierarchical linear models: Implications for research in organizations. Journal of Management, 24(5), 623–641. https://doi.org/10.1016/s0149-2063(99)80077-4

    Article  Google Scholar 

  • Hox, J. J., & Roberts, J. K. (2011). Handbook of advanced multilevel analysis. Routledge.

    Book  Google Scholar 

  • Hox, J. J., Moerbeek, M., & van de Schoot, R. (2017). Multilevel analysis: Techniques and applications. Routledge.

    Book  Google Scholar 

  • Im, M. H., Kim, E. S., Kwok, O.-M., Yoon, M., & Willson, V. L. (2016). Impact of not addressing partially cross-classified multilevel structure in testing measurement invariance: A Monte Carlo study. Frontiers in Psychology, 7. https://doi.org/10.3389/fpsyg.2016.00328

  • Jiang, J., & Wang, P. (2021). Which generation is more likely to participate in society? A longitudinal analysis. Social Indicators Research, 162(1). https://doi.org/10.1007/s11205-021-02830-4ss

  • Johnson, B. D. (2011). Cross-classified multilevel models: An application to the criminal case processing of indicted terrorists. Journal of Quantitative Criminology, 28(1), 163–189. https://doi.org/10.1007/s10940-011-9157-3

    Article  Google Scholar 

  • Johnson, P. E. (2019, February 18). Using rockchalk for Regression Analysis. https://cran.r-hub.io/web/packages/rockchalk/vignettes/rockchalk.pdf

  • Judd, C. M., Westfall, J., & Kenny, D. A. (2012). Treating stimuli as a random factor in social psychology: A new and comprehensive solution to a pervasive but largely ignored problem. Journal of Personality and Social Psychology, 103(1), 54–69.

    Article  PubMed  Google Scholar 

  • Kim, Y. S. (2016). Examination of the relative effects of neighborhoods and schools on juvenile delinquency: A multilevel cross-classified model approach. Deviant Behavior, 37(10), 1196–1214. https://doi.org/10.1080/01639625.2016.1170537

    Article  Google Scholar 

  • Kim, Y.-S., Petscher, Y., Foorman, B. R., & Zhou, C. (2010). The contributions of phonological awareness and letter-name knowledge to letter-sound acquisition—a cross-classified multilevel model approach. Journal of Educational Psychology, 102(2), 313–326. https://doi.org/10.1037/a0018449

    Article  Google Scholar 

  • Lüdtke, O., Marsh, H. W., Robitzsch, A., Trautwein, U., Asparouhov, T., & Muthén, B. (2008). The multilevel latent covariate model: A new, more reliable approach to group-level effects in contextual studies. Psychological Methods, 13(3), 203–229. https://doi.org/10.1037/a0012869

    Article  PubMed  Google Scholar 

  • Luo, W., & Kwok, O. M. (2010). Proportional reduction of prediction error in cross-classified random effects models. Sociological Methods & Research, 39(2), 188–205. https://doi.org/10.1177/0049124110384062

    Article  Google Scholar 

  • Matuschek, H., Kliegl, R., Vasishth, S., Baayen, H., & Bates, D. (2017). Balancing type I error and power in linear mixed models. Journal of Memory and Language, 94, 305–315.

    Article  Google Scholar 

  • McNeish, D., Stapleton, L. M., & Silverman, R. D. (2017). On the unnecessary ubiquity of hierarchical linear modeling. Psychological Methods, 22, 114–140. https://doi.org/10.1037/met0000078

    Article  PubMed  Google Scholar 

  • Pedersen, W., Bakken, A., & von Soest, T. (2018). Neighborhood or school? Influences on alcohol consumption and heavy episodic drinking among urban adolescents. Journal of Youth and Adolescence, 47(10), 2073–2087. https://doi.org/10.1007/s10964-017-0787-0

    Article  PubMed  Google Scholar 

  • Preacher, K. J., Zyphur, M. J., & Zhang, Z. (2010). A general multilevel SEM framework for assessing multilevel mediation. Psychological Methods, 15(3), 209–233. https://doi.org/10.1037/a0020141

    Article  PubMed  Google Scholar 

  • Preacher, K. J., Zhang, Z., & Zyphur, M. J. (2016). Multilevel structural equation models for assessing moderation within and across levels of analysis. Psychological Methods, 21(2), 189–205. https://doi.org/10.1037/met0000052

    Article  PubMed  Google Scholar 

  • Pruitt, S. L., Leonard, T., Zhang, S., Schootman, M., Halm, E. A., & Gupta, S. (2014). Physicians, clinics, and neighborhoods: Multiple levels of influence on colorectal cancer screening. Cancer Epidemiology, Biomarkers & Prevention, 23(7), 1346–1355.

    Article  Google Scholar 

  • Rasbash, J., & Goldstein, H. (1994). Efficient analysis of mixed hierarchical and cross-classified random structures using a multilevel model. Journal of Educational and Behavioral Statistics, 19(4), 337–350. https://doi.org/10.3102/10769986019004337

    Article  Google Scholar 

  • Raudenbush, S. W. (2009). Adaptive centering with random effects: An alternative to the fixed effects model for studying time-varying treatments in school settings. Education Finance and Policy, 4(4), 468–491. https://doi.org/10.1162/edfp.2009.4.4.468

    Article  Google Scholar 

  • Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods. Sage Publications.

    Google Scholar 

  • Reise, S. P., & Duan, N. (Eds.). (2009). Multilevel modeling: Methodological advances, issues, and applications. Psychology Press.

    Google Scholar 

  • Reiss, M. V., & Tsvetkova, M. (2019). Perceiving education from Facebook profile pictures. New Media & Society, 22(3), 146144481986867. https://doi.org/10.1177/1461444819868678

    Article  Google Scholar 

  • Rights, J. D. (2022). Aberrant distortion of variance components in multilevel models under conflation of level-specific effects. Psychological Methods. https://doi.org/10.1037/met0000514

  • Rights, J. D., & Sterba, S. K. (2019). Quantifying explained variance in multilevel models: An integrative framework for defining R-squared measures. Psychological Methods, 24(3), 309–338. https://doi.org/10.1037/met0000184

    Article  PubMed  Google Scholar 

  • Rights, J. D., & Sterba, S. K. (2021). Effect size measures for longitudinal growth analyses: Extending a framework of multilevel model R-squareds to accommodate heteroscedasticity, autocorrelation, nonlinearity, and alternative centering strategies. New Directions for Child and Adolescent Development, 2021(175), 65–110. https://doi.org/10.1002/cad.20387

    Article  PubMed  Google Scholar 

  • Rights, J. D., & Sterba, S. K. (2023). On the common but problematic specification of conflated random slopes in multilevel models. Multivariate Behavioral Research, ahead-of-print(ahead-of-print), 1–28. https://doi.org/10.1080/00273171.2023.2174490

  • Rights, J. D., Preacher, K. J., & Cole, D. A. (2020). The danger of conflating level-specific effects of control variables when primary interest lies in level-2 effects. British Journal of Mathematical and Statistical Psychology, 73, 194–211.

    Article  PubMed  Google Scholar 

  • Sakai-Bizmark, R., Richmond, T. K., Kawachi, I., Elliott, M. N., Davies, S. L., Tortolero Emery, S., Peskin, M., Milliren, C. E., & Schuster, M. A. (2020). School social capital and tobacco experimentation among adolescents: Evidence from a cross-classified multilevel, longitudinal analysis. Journal of Adolescent Health, 66(4), 431–438. https://doi.org/10.1016/j.jadohealth.2019.10.022

    Article  Google Scholar 

  • Schielzeth, H., & Forstmeier, W. (2009). Conclusions beyond support: Overconfident estimates in mixed models. Behavioral Ecology, 20(2), 416–420. https://doi.org/10.1093/beheco/arn145

    Article  PubMed  Google Scholar 

  • Scott, A. J., & Holt, D. (1982). The effect of two-stage sampling on ordinary least squares methods. Journal of the American Statistical Association, 77(380), 848–854.

    Article  Google Scholar 

  • Shi, Y., Leite, W., & Algina, J. (2010). The impact of omitting the interaction between crossed factors in cross-classified random effects modelling. The British Journal of Mathematical and Statistical Psychology, 63(Pt 1), 1–15. https://doi.org/10.1348/000711008X398968

    Article  PubMed  Google Scholar 

  • Snijders, T. A. B., & Bosker, R. J. (2012). Multilevel analysis: An introduction to basic and advanced multilevel modeling. Sage.

    Google Scholar 

  • Vinas-Forcade, J., Mels, C., Van Houtte, M., Valcke, M., & Derluyn, I. (2020). Can failure be prevented? Using longitudinal data to identify at-risk students upon entering secondary school. British Educational Research Journal, 47(1), 205–225. https://doi.org/10.1002/berj.3683

    Article  Google Scholar 

  • Wickham, R. E., Hardy, K. K., Buckman, H. L., & Lepovic, E. (2021). Comparing cross-classified mixed effects and Bayesian structural equations modeling for stimulus sampling designs: A simulation study. Journal of Experimental Social Psychology, 92, 104062. https://doi.org/10.1016/j.jesp.2020.104062

    Article  Google Scholar 

  • Xie, S. Y., Flake, J. K., Stolier, R. M., Freeman, J. B., & Hehman, E. (2021). Facial impressions are predicted by the structure of group stereotypes. Psychological Science, 32(12), 1979–1993. https://doi.org/10.1177/09567976211024259

    Article  PubMed  Google Scholar 

  • Yaremych, H. E., Preacher, K. J., Hedeker, D. (2021). Centering categorical predictors in multilevel models: Best practices and interpretation. Psychological Methods.https://doi.org/10.1037/met0000434 Advance online publication

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yingchi Guo.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Open Science Statement

The empirical data used in this manuscript are from an existing study with open-access data (Xie, et al., 2021). The code with which we re-analyzed these data are included in our Appendix C.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 1617 KB)

Supplementary file2 (DOCX 1494 KB)

Appendices

Appendix A: Interpretation of coefficients in fixed-slope fully cluster-mean-centered, partially cluster-mean-centered, and contextual effect cross-classified models

In Appendix A, we provide mathematical derivations to show how to interpret the fixed coefficients in the cross-classified model specifications listed in Table 1, and, in particular, show which coefficients are unconflated, which are conflated, and, of the latter, the effects of which they are implicitly a weighted average. This can be determined by re-expressing each model as a regression of \({y}_{ijk}\) on \({x}_{ijk}-{x}_{\cdot j\cdot }-{x}_{\cdot \cdot k}\), \({x}_{\cdot j\cdot }\), and \({x}_{\cdot \cdot k}\). When writing the model this way, the slopes represent the model-implied within-cluster, between-J-cluster, and between-K-cluster effect, respectively.

Uncentered x (or centered-by-a-constant x) models

The first uncentered x model noted in Table 1 includes only \({x}_{ijk}\) and yields a slope that is a conflated blend of \({\gamma }_{w}\), \({\gamma }_{b(J)}\), and \({\gamma }_{b(K)}\); this was shown in Eq. (6) by re-expressing the model as a regression of \({y}_{ijk}\) on \({x}_{ijk}-{x}_{\cdot j\cdot }-{x}_{\cdot \cdot k}\), \({x}_{\cdot j\cdot }\), and \({x}_{\cdot \cdot k}\), and clarifying that the model-implied within-cluster, between-J-cluster, and between-K-cluster effects were equivalent. Following a similar procedure, we can consider the model in row 2 of Table 1 that includes as predictors both \({x}_{ijk}\) and \({x}_{\cdot j\cdot }\):

$$\begin{array}{l}{y}_{ijk}={\gamma }_{00}+{\gamma }_{c(K)}{x}_{ijk}+{\gamma }_{bcc(J)}{x}_{\cdot j\cdot }+{u}_{0j}+{u}_{0k}+{e}_{ijk}\\ \quad\ \ ={\gamma }_{00}+{\gamma }_{c(K)}({x}_{ijk}-{x}_{\cdot j\cdot }-{x}_{\cdot \cdot k})+{\gamma }_{c(K)}({x}_{\cdot j\cdot }+{x}_{\cdot \cdot k})+{\gamma }_{bcc(J)}{x}_{\cdot j\cdot }+{u}_{0j}+{u}_{0k}+{e}_{ijk}\\ \quad\,\,\,\,\,={\gamma }_{00}+{\gamma }_{c(K)}({x}_{ijk}-{x}_{\cdot j\cdot }-{x}_{\cdot \cdot k})+({\gamma }_{bcc(J)}+{\gamma }_{c(K)}){x}_{\cdot j\cdot }+{\gamma }_{c(K)}{x}_{\cdot \cdot k}+{u}_{0j}+{u}_{0k}+{e}_{ijk}\end{array}$$
(23)

This re-expression in Eq. 23) shows that the within-cluster effect is constrained equal to the between-K-cluster effect, as both are given as \({\gamma }_{c(K)}\) (i.e., the re-expressed/model-implied slope of \({x}_{ijk}-{x}_{\cdot j\cdot }-{x}_{\cdot \cdot k}\) = the re-expressed/model-implied slope of \({x}_{\cdot \cdot k}\) = \({\gamma }_{c(K)}\)). Hence, this \({\gamma }_{c(K)}\) coefficient is a conflated blend/weighted average of the underlying \({\gamma }_{w}\) and \({\gamma }_{b(K)}\) defined in the general unconflated model in Eq. (4). The between-J-cluster effect is given as \({\gamma }_{bcc(J)}+{\gamma }_{c(K)}\). Note that, since there is no implicit constraint on the directly estimated slope of \({x}_{\cdot j\cdot }\) (i.e., \({\gamma }_{bcc(J)}\)) in this model, the between-cluster effect is appropriately recovered, i.e., \({\gamma }_{bcc(J)}+{\gamma }_{c(K)}={\gamma }_{b(J)}\). However, \({\gamma }_{bcc(J)}\) is itself of questionable utility, as it doesn’t represent the true underlying contextual effect for J clusters (i.e., the difference in the between-J-cluster effect and the within-cluster effect, \({\gamma }_{b(J)}-{\gamma }_{w}\)), and instead is a conflated contextual effect, representing the difference in the between-J-cluster effect and the conflated effect \({\gamma }_{c(K)}\). The degree to which \({\gamma }_{bcc(J)}\) differs from the true contextual effect, \({\gamma }_{bc(J)}\), is equal to the difference in \({\gamma }_{w}\) and \({\gamma }_{c(K)}\), as shown here:

$$\begin{array}{l}{\gamma }_{bcc(J)}={\gamma }_{b(J)}-{\gamma }_{c(K)}\\ \qquad\ \ ={\gamma }_{bc(J)}+({\gamma }_{w}-{\gamma }_{c(K)})\end{array}$$
(24)

These same general ideas apply to the model in row 3 of Table 1 that instead includes slopes of \({x}_{ijk}\) and \({x}_{\cdot \cdot k}\), which can be re-expressed as:

$$\begin{array}{l}{y}_{ijk}={\gamma }_{00}+{\gamma }_{c(J)}{x}_{ijk}+{\gamma }_{bcc(K)}{x}_{\cdot \cdot k}+{u}_{0j}+{u}_{0k}+{e}_{ijk}\\\quad\ \ ={\gamma }_{00}+{\gamma }_{c(J)}({x}_{ijk}-{x}_{\cdot j\cdot }-{x}_{\cdot \cdot k})+{\gamma }_{c(J)}({x}_{\cdot j\cdot }+ x _{\cdot\cdot k})+ {\gamma }_{bcc(K)} x_{\cdot\cdot k} +{u}_{0j}+{u}_{0k}+{e}_{ijk}\\ \quad\,\,\,\,={\gamma }_{00}+{\gamma }_{c(J)}({x}_{ijk}-{x}_{\cdot j\cdot }-{x}_{\cdot \cdot k})+{\gamma }_{c(J)}{x}_{\cdot j\cdot }+({\gamma }_{bcc(K)}+{\gamma }_{c(J)}){x}_{\cdot \cdot k}+{u}_{0j}+{u}_{0k}+{e}_{ijk}\end{array}$$
(25)

Here, \({\gamma }_{c(J)}\) is a conflated blend of \({\gamma }_{w}\) and \({\gamma }_{b(J)}\), and \({\gamma }_{bcc(K)}\) is a conflated contextual effect, i.e., \({\gamma }_{bcc(K)}={\gamma }_{b(K)}-{\gamma }_{c(J)}\), which can also be written as \({\gamma }_{bcc(K)}={\gamma }_{bc(K)}+({\gamma }_{w}-{\gamma }_{c(J)})\).

The last uncentered x model in Table 1, i.e., the contextual effect model with both cluster means, can be re-expressed as:

$$\begin{array}{l}{y}_{ijk}={\gamma }_{00}+{\gamma }_{w}{x}_{ijk}+{\gamma }_{bc(J)}{x}_{\cdot j\cdot }+{\gamma }_{bc(K)}{x}_{\cdot \cdot k}+{u}_{0j}+{u}_{0k}+{e}_{ijk}\\ \quad\ \ ={\gamma }_{00}+{\gamma }_{w}({x}_{ijk}-{x}_{\cdot j\cdot }-{x}_{\cdot \cdot k})+{\gamma }_{w}({x}_{\cdot j\cdot }+{x}_{\cdot \cdot k})+{\gamma }_{bc(J)}{x}_{\cdot j\cdot }+{\gamma }_{bc(K)}{x}_{\cdot \cdot k}+{u}_{0j}+{u}_{0k}+{e}_{ijk}\\ \quad\,\,\,\,={\gamma }_{00}+{\gamma }_{w}({x}_{ijk}-{x}_{\cdot j\cdot }-{x}_{\cdot \cdot k})+({\gamma }_{bc(J)}+{\gamma }_{w}){x}_{\cdot j\cdot }+({\gamma }_{bc(K)}+{\gamma }_{w}){x}_{\cdot \cdot k}+{u}_{0j}+{u}_{0k}+{e}_{ijk}\end{array}$$
(26)

In this model, there are no implicit constraints placed across the level-specific effects, and as such, it accurately recovers each of these effects and is likelihood-equivalent to the fully cluster-mean-centered model presented in Eq. (4). This model is a reparameterization of the fully cluster-mean-centered model in which \({\gamma }_{bc(J)}+{\gamma }_{w}\) and \({\gamma }_{bc(K)}+{\gamma }_{w}\) are equal to \({\gamma }_{b(J)}\) and \({\gamma }_{b(K)}\), respectively, and thus \({\gamma }_{bc(J)}\) is the unconflated contextual effect for J clusters (\({\gamma }_{bc(J)}={\gamma }_{b(J)}-{\gamma }_{w}\)) and \({\gamma }_{bc(K)}\) is the unconflated contextual effect for K clusters (\({\gamma }_{bc(K)}={\gamma }_{b(K)}-{\gamma }_{w}\)).

Partially cluster-mean-centered models

In rows 5–8 of Table 1, we present models with predictor \({x}_{ijk}\) cluster-mean-centered by only the J-cluster means (\({x}_{\cdot j\cdot }\) s). The first such partially cluster-mean-centered model in row 5 of Table 1 includes \({x}_{ijk}-{x}_{\cdot j\cdot }\) as the only predictor, which can be re-expressed as:

$$\begin{array}{l}{\gamma }_{ijk}={\gamma }_{00}+{\gamma }_{c(K)}({x}_{ijk}-{x}_{\cdot j\cdot })+{u}_{0j}+{u}_{0k}+{e}_{ijk}\\ \quad\ \ ={\gamma }_{00}+{\gamma }_{c(K)}({x}_{ijk}-{x}_{\cdot j\cdot }-{x}_{\cdot \cdot k})+{\gamma }_{c(K)}{x}_{\cdot \cdot k}+{u}_{0j}+{u}_{0k}+{e}_{ijk}\end{array}$$
(27)

In this model, the slope of \({x}_{ijk}-{x}_{\cdot j\cdot }\), \({\gamma }_{c(K)}\), is a conflated blend of the underlying \({\gamma }_{w}\) and \({\gamma }_{b(K)}\), given that the within-effect and between-K-cluster effect are constrained to be the equivalent to \({\gamma }_{c(K)}\), whereas the between-J-cluster effect is implicitly constrained to be 0.

In row 6 of Table 1, both \({x}_{ijk}-{x}_{\cdot j\cdot }\) and \({x}_{\cdot j\cdot }\) are included as predictors in the model, which can be re-expressed as:

$$\begin{array}{l}{\gamma }_{ijk}={\gamma }_{00}+{\gamma }_{c(K)}({x}_{ijk}-{x}_{\cdot j\cdot })+{\gamma }_{b(J)}{x}_{\cdot j\cdot }+{u}_{0j}+{u}_{0k}+{e}_{ijk}\\ \quad\ \ ={\gamma }_{00}+{\gamma }_{c(K)}({x}_{ijk}-{x}_{\cdot j\cdot }-{x}_{\cdot \cdot k})+{\gamma }_{c(K)}{x}_{\cdot \cdot k}+{\gamma }_{b(J)}{x}_{\cdot j\cdot }+{u}_{0j}+{u}_{0k}+{e}_{ijk}\end{array}$$
(28)

Here, \({\gamma }_{c(K)}\) is still a conflated blend of the underlying \({\gamma }_{w}\) and \({\gamma }_{b(K)}\), as the within-effect and between-K-cluster effect are still constrained to be equivalent. The slope of the J-cluster mean, \({x}_{\cdot j\cdot }\), accurately represents the between-J-cluster effect, \({\gamma }_{b(J)}\).

Including K-cluster mean (\({x}_{\cdot \cdot k}\)) instead of J-cluster mean (\({x}_{\cdot j\cdot }\)) yields the model in row 7 of Table 1, which can be re-expressed as:

$$\begin{array}{l}{\gamma }_{ijk}={\gamma }_{00}+{\gamma }_{w}({x}_{ijk}-{x}_{\cdot j\cdot })+{\gamma }_{bc(K)}{x}_{\cdot \cdot k}+{u}_{0j}+{u}_{0k}+{e}_{ijk}\\ \quad \ \ ={\gamma }_{00}+{\gamma }_{w}({x}_{ijk}-{x}_{\cdot j\cdot }-{x}_{\cdot \cdot k})+{\gamma }_{w}{x}_{\cdot \cdot k}+{\gamma }_{bc(K)}{x}_{\cdot \cdot k}+{u}_{0j}+{u}_{0k}+{e}_{ijk}\\ \quad\,\,\,\, ={\gamma }_{00}+{\gamma }_{w}({x}_{ijk}-{x}_{\cdot j\cdot }-{x}_{\cdot \cdot k})+({\gamma }_{w}+{\gamma }_{bc(K)}){x}_{\cdot \cdot k}+{u}_{0j}+{u}_{0k}+{e}_{ijk}\end{array}$$
(29)

This model does not place implicit constraints on the within-cluster effect nor the between-K-cluster effect, and the former is represented by the slope of \({x}_{ijk}-{x}_{\cdot j\cdot }-{x}_{\cdot \cdot k}\) and the latter is equal to \({\gamma }_{bc(K)}+{\gamma }_{w}\), where \({\gamma }_{bc(K)}\) is the K-cluster contextual effect. The between-J-cluster effect is implicitly constrained to be zero.

The last partially cluster-mean-centered model in Table 1 includes x cluster-mean-centered by J clusters (i.e., \({x}_{ijk}-{x}_{\cdot j\cdot }\)) and the two cluster means as predictors, which can be re-expressed as:

$$\begin{array}{l}{\gamma }_{ijk}={\gamma }_{00}+{\gamma }_{w}({x}_{ijk}-{x}_{\cdot j\cdot })+{\gamma }_{b(J)}{x}_{\cdot j\cdot }+{\gamma }_{bc(K)}{x}_{\cdot \cdot k}+{u}_{0j}+{u}_{0k}+{e}_{ijk}\\ \quad \ \ ={\gamma }_{00}+{\gamma }_{w}({x}_{ijk}-{x}_{\cdot j\cdot }-{x}_{\cdot \cdot k})+{\gamma }_{w}{x}_{\cdot \cdot k}+{\gamma }_{b(J)}{x}_{\cdot j\cdot }+{\gamma }_{bc(K)}{x}_{\cdot \cdot k}+{u}_{0j}+{u}_{0k}+{e}_{ijk}\\ \quad \,\,\,\, ={\gamma }_{00}+{\gamma }_{w}({x}_{ijk}-{x}_{\cdot j\cdot }-{x}_{\cdot \cdot k})+{\gamma }_{b(J)}{x}_{\cdot j\cdot }+({\gamma }_{w}+{\gamma }_{bc(K)}){x}_{\cdot \cdot k}+{u}_{0j}+{u}_{0k}+{e}_{ijk}\end{array}$$
(30)

This model places no implicit constraints across level-specific effects and is a reparameterization of the fully cluster-mean-centered model represented in Eq. (4). The slope of \({x}_{ijk}-{x}_{\cdot j\cdot }\), \({x}_{\cdot j\cdot }\), and \({x}_{\cdot \cdot k}\) denote the within-cluster effect, between-J-cluster effect, and K-cluster contextual effect, respectively.

Note that, in Table 1, we do not include models that are cluster-mean-centered by K-cluster means, as these would yield the same analytic results as those shown for the cluster-mean-centered by J-cluster means models (just switching all K to J and all k to j).

Fully cluster-mean-centered models

Fully cluster-mean-centered models include the purely within-cluster component of x, \({x}_{ijk}-{x}_{\cdot j\cdot }-{x}_{\cdot \cdot k}\), as the level 1 predictor. As such, there are no constraints placed across the within-cluster, between-J-cluster, and between-K-cluster effects, as shown in rows 9–12 of Table 1. The first fully cluster-mean-centered model depicted in row 9 of Table 1 simply includes \({x}_{ijk}-{x}_{\cdot j\cdot }-{x}_{\cdot \cdot k}\) as the only predictor:

$${y}_{ijk}={\gamma }_{00}+{\gamma }_{w}({x}_{ijk}-{x}_{\cdot j\cdot }-{x}_{\cdot \cdot k})+{u}_{0j}+{u}_{0k}+{e}_{ijk}$$
(31)

Here the within-cluster effect is represented by the slope of \({x}_{ijk}-{x}_{\cdot j\cdot }-{x}_{\cdot \cdot k}\), and both between-cluster effects are implicitly constrained to be zero. Adding the J-cluster mean of x to Eq. (31) yields the model in row 10 of Table 1:

$${\gamma }_{ijk}={\gamma }_{00}+{\gamma }_{w}({x}_{ijk}-{x}_{\cdot j\cdot }-{x}_{\cdot \cdot k})+{\gamma }_{b(J)}{x}_{\cdot j\cdot }+{u}_{0j}+{u}_{0k}+{e}_{ijk}$$
(32)

This allows for a non-zero between-J-cluster effect, given as the slope of \({x}_{\cdot j\cdot }\). Similarly, adding instead the K-cluster mean of x yields the model in row 11 of Table 1:

$${\gamma }_{ijk}={\gamma }_{00}+{\gamma }_{w}({x}_{ijk}-{x}_{\cdot j\cdot }-{x}_{\cdot \cdot k})+{\gamma }_{b(K)}{x}_{\cdot \cdot k}+{u}_{0j}+{u}_{0k}+{e}_{ijk}$$
(33)

This allows for a non-zero between-K-cluster effect, given as the slope of \({x}_{\cdot \cdot k}\). Last, including both cluster means yields the model presented in Eq. (4), which can directly estimate each of the three possible effects of x.

Appendix B Interpretation of random slopes in fully cluster-mean-centered, partially cluster-mean-centered, and contextual effect cross-classified models

In Appendix B, we provide mathematical derivations to show how to interpret the random slopes in various cross-classified model specifications, similar to what was shown in Appendix A for fixed-slope models. Here, by re-expressing the random portion of the model as a regression of \({y}_{ijk}\) on \({x}_{ijk}-{x}_{\cdot j\cdot }-{x}_{\cdot \cdot k}\), \({x}_{\cdot j\cdot }\), and \({x}_{\cdot \cdot k}\), we can see which random components are unconflated and which are conflated.

We first define what can be termed the homoscedastic maximal random effect structure that includes J-cluster and K-cluster random effects for \({x}_{ijk}-{x}_{\cdot j\cdot }-{x}_{\cdot \cdot k}\), a K-cluster random effect for \({x}_{\cdot j\cdot }\), and a J-cluster random effect for \({x}_{\cdot \cdot k}\). The random effect portion of the model (i.e., the cluster-level error terms) are thus given as:

$$\begin{array}{c}RE={u}_{0j}+{u}_{0k}+({u}_{wj}+{u}_{wk})({x}_{ijk}-{x}_{\cdot j\cdot }-{x}_{\cdot \cdot k})+{u}_{bk}{x}_{\cdot j\cdot }+{u}_{bj}{x}_{\cdot \cdot k}\\ \left[\begin{array}{c}{u}_{0j}\\ {u}_{wj}\\ {u}_{bj}\\ {u}_{0k}\\ {u}_{wk}\\ {u}_{bk}\end{array}\right]\sim MVN\left(\left[\begin{array}{c}0\\ 0\\ 0\\ 0\\ 0\\ 0\end{array}\right],\left[\begin{array}{cccccc}{\tau }_{00(J)}& & & & & \\ {\tau }_{0w(J)}& {\tau }_{ww(J)}& & & & \\ {\tau }_{0b(J)}& {\tau }_{wb(J)}& {\tau }_{bb(J)}& & & \\ 0& 0& 0& {\tau }_{00(K)}& & \\ 0& 0& 0& {\tau }_{0w(K)}& {\tau }_{ww(K)}& \\ 0& 0& 0& {\tau }_{0b(K)}& {\tau }_{wb(K)}& {\tau }_{bb(K)}\end{array}\right]\right)\end{array}$$
(34)

Here we use \({u}_{w}\) terms to represent cluster-specific deviations in the effect of the fully cluster-mean-centered portion of x, and \({u}_{b}\) terms to represent cluster-specific deviations in the effect of the cluster-mean portion of x, noting that the effect of the J-cluster mean can vary across K clusters, and vice versa.

We also define the heteroscedastic maximal random effect structure that adds a J-cluster random effect for \({x}_{\cdot j\cdot }\) and a K-cluster random effect for \({x}_{\cdot \cdot k}\), given as:

$$\begin{array}{c}RE={u}_{0j}+{u}_{0k}+({u}_{wj}+{u}_{wk})({x}_{ijk}-{x}_{\cdot j\cdot }-{x}_{\cdot \cdot k})+({u}_{bk}+{u}_{hj}){x}_{\cdot j\cdot }+({u}_{bj}+{u}_{hk}){x}_{\cdot \cdot k}\\ \left[\begin{array}{c}{u}_{0j}\\ {u}_{wj}\\ {u}_{bj}\\ {u}_{hj}\\ {u}_{0k}\\ {u}_{wk}\\ {u}_{bk}\\ {u}_{hk}\end{array}\right]\sim MVN\left(\left[\begin{array}{c}0\\ 0\\ 0\\ 0\\ 0\\ 0\\ 0\\ 0\end{array}\right],\left[\begin{array}{cccccccc}{\tau }_{00(J)}& & & & & & & \\ {\tau }_{0w(J)}& {\tau }_{ww(J)}& & & & & & \\ {\tau }_{0b(J)}& {\tau }_{wb(J)}& {\tau }_{bb(J)}& & & & & \\ {\tau }_{0h(J)}& {\tau }_{wh(J)}& {\tau }_{bh(J)}& {\tau }_{hh(J)}& & & & \\ 0& 0& 0& 0& {\tau }_{00(K)}& & & \\ 0& 0& 0& 0& {\tau }_{0w(K)}& {\tau }_{ww(K)}& & \\ 0& 0& 0& 0& {\tau }_{0b(K)}& {\tau }_{wb(K)}& {\tau }_{bb(K)}& \\ 0& 0& 0& 0& {\tau }_{0h(K)}& {\tau }_{wh(K)}& {\tau }_{bh(K)}& {\tau }_{hh(K)}\end{array}\right]\right)\end{array}$$
(35)

Note that this specification assumes that there is heteroscedasticity at level 2, in that the variance of cluster means of y are allowed to follow a quadratic function of the cluster means of x. Specifically:

$$\begin{array}{l}\mathrm{var}({y}_{\cdot j\cdot }|{x}_{\cdot j\cdot })=\mathrm{var}({u}_{0j}+{u}_{hj}{x}_{\cdot j\cdot }|{x}_{\cdot j\cdot })\\ \qquad \qquad \quad\,\,={\tau }_{00(J)}+{x}_{\cdot j\cdot }{\tau }_{0h(J)}+{x}_{\cdot j\cdot }^{2}{\tau }_{hh(J)}\end{array}$$
(36)

and

$$\begin{array}{l}\mathrm{var}({y}_{\cdot \cdot k}|{x}_{\cdot \cdot k})=\mathrm{var}({u}_{0k}+{u}_{hk}{x}_{\cdot \cdot k}|{x}_{\cdot \cdot k})\\ \qquad \qquad\quad\,\,\,\,={\tau }_{00(K)}+{x}_{\cdot \cdot k}{\tau }_{0h(K)}+{x}_{\cdot\cdot k }^{2}{\tau }_{hh(K)}\end{array}$$
(37)

(note that, in both Eqs. (36) and (37), every term in the model other than the two random effect terms would be constant values and hence can be excluded from the variance expression). Thus, the random effects of \({u}_{hj}\) and \({u}_{hk}\)Footnote 16 (where the “h” stands for “heteroscedasticity”) can be seen as mechanisms for modeling heteroscedasticity at level 2, and they thus have a very different interpretation than a typical random effect (see Rights & Sterba, 2023, for more details in a hierarchical context). Such random effect terms are highly atypical, even in hierarchical contexts, but it is necessary to consider this specification to explicate some of the issues of random conflation in certain models, as certain random effect structures are constrained versions of this heteroscedastic structure.

Uncentered x (or centered-by-a-constant x) models

We start with the most basic and standard random slope specification, that is, the one in which there is a J-cluster and K-cluster random effect for the uncentered x:

$$\begin{array}{l}RE={u}_{0j}+{u}_{0k}+({u}_{1j}+{u}_{1k}){x}_{ijk}\\ \quad \,\,\,\,={u}_{0j}+{u}_{0k}+({u}_{1j}+{u}_{1k})({x}_{ijk}-{x}_{\cdot j\cdot }-{x}_{\cdot \cdot k})+({u}_{1j}+{u}_{1k})({x}_{\cdot j\cdot }+{x}_{\cdot \cdot k})\\ \quad \,\,\,={u}_{0j}+{u}_{0k}+({u}_{1j}+{u}_{1k})({x}_{ijk}-{x}_{\cdot j\cdot }-{x}_{\cdot \cdot k})+({u}_{1j}+{u}_{1k}){x}_{\cdot j\cdot }+({u}_{1j}+{u}_{1k}){x}_{\cdot \cdot k}\end{array}$$
(38)

Comparing this re-expression to the more general heteroscedastic maximal specification in Eq. (35) clarifies that the former is a constrained version of the latter, and makes the implicit constraints that each possible J-cluster random slope term is exactly equal (i.e., \({u}_{wj}={u}_{bj}={u}_{hj}\), each represented by \({u}_{1j}\)) and that each possible K-cluster random slope term is exactly equal (i.e., \({u}_{wk}={u}_{bk}={u}_{hk}\), each represented by \({u}_{1k}\)). In terms of constraints on the actual model parameters (noting that the error terms are not themselves parameters), these are more apparent when considering the random effect covariance matrix of this uncentered x model:

$$\left[\begin{array}{c}{u}_{0j}\\ {u}_{1j}\\ {u}_{0k}\\ {u}_{1k}\end{array}\right]\sim MVN\left(\left[\begin{array}{c}0\\ 0\\ 0\\ 0\end{array}\right],\left[\begin{array}{cccc}{\tau }_{00(J)}& & & \\ {\tau }_{01(J)}& {\tau }_{11(J)}& & \\ 0& 0& {\tau }_{00(K)}& \\ 0& 0& {\tau }_{01(K)}& {\tau }_{11(K)}\end{array}\right]\right)$$
(39)

Here we can see that the variance of each of the three possible J-cluster random slope terms are given by the same parameter (\({\tau }_{11(J)}\)), as is the variance of each of the K-cluster random slope terms (\({\tau }_{11(K)}\)) and hence this model places the four model constraints of \(\mathrm{var}({u}_{wj})=\mathrm{var}({u}_{bj})\), \(\mathrm{var}({u}_{wj})=\mathrm{var}({u}_{hj})\), \(\mathrm{var}({u}_{wk})=\mathrm{var}({u}_{bk})\), and \(\mathrm{var}({u}_{wk})=\mathrm{var}({u}_{hk})\). It also assumes that the three pairwise combinations of the J-cluster terms have correlation of 1, as do the three pairwise combinations of K-cluster terms, and hence the model places six additional constraints that each of \({\mathrm{corr}}({u}_{wj},{u}_{bj})\), \({\mathrm{corr}}({u}_{wj},{u}_{hj})\), \({\mathrm{corr}}({u}_{bj},{u}_{hj})\), \({\mathrm{corr}}({u}_{wk},{u}_{bk})\), \({\mathrm{corr}}({u}_{wk},{u}_{hk})\), and \({\mathrm{corr}}({u}_{bk},{u}_{hk})\) are equal to 1 (and since all error terms by definition have the same mean of 0, these constraints hold if and only if \({u}_{wj}={u}_{bj}={u}_{hj}\) and \({u}_{wk}={u}_{bk}={u}_{hk}\)). Last, as there is only one intercept-slope covariance term for J clusters and one for K clusters in Eq. 39, this uncentered x model also places the four additional constraints that \({\mathrm{corr}}({u}_{0j},{u}_{wj})={\mathrm{corr}}({u}_{0j},{u}_{bj})\), \({\mathrm{corr}}({u}_{0j},{u}_{wj})={\mathrm{corr}}({u}_{0j},{u}_{hj})\), \({\mathrm{corr}}({u}_{0k},{u}_{wk})={\mathrm{corr}}({u}_{0k},{u}_{bk})\), and \({\mathrm{corr}}({u}_{0k},{u}_{wk})={\mathrm{corr}}({u}_{0k},{u}_{hk})\). Thus, there are 14 constraints in total that, when placed on the general heteroscedastic random effect structure in Eq. (35), yield the uncentered x random effect specification in Eqs. (38-39).

In terms of interpreting the random portion of this uncentered x model, the random slope term for each cluster type is some weighted average of the \({u}_{w}\), \({u}_{b}\), and \({u}_{h}\) terms, and hence implicitly reflects some combination of the across-cluster slope variability in the within-cluster effect and in the between-cluster effect of x, as well as possible heteroscedasticity at level 2. That is, the \({\tau }_{11(J)}\) term defines the heterogeneity in the effect of both \({x}_{ijk}-{x}_{\cdot j\cdot }-{x}_{\cdot \cdot k}\) and \({x}_{\cdot \cdot k}\) and is also involved in the model-implied variance of the cluster means of y conditional on the cluster means of x, i.e.,

$$\begin{array}{l}\mathrm{var}({y}_{\cdot j\cdot }|{x}_{\cdot j\cdot })=\mathrm{var}({u}_{0j}+{u}_{1j}{x}_{\cdot j\cdot }|{x}_{\cdot j\cdot })\\ \qquad \qquad \quad \,\,\, ={\tau }_{00(J)}+{x}_{\cdot j\cdot }{\tau }_{01(J)}+{x}_{\cdot j\cdot }^{2}{\tau }_{11(J)}\end{array}$$
(40)

Similarly, the \({\tau }_{11(K)}\) term defines the heterogeneity in the effect of both \({x}_{ijk}-{x}_{\cdot j\cdot }-{x}_{\cdot \cdot k}\) and \({x}_{\cdot j\cdot }\) and is also involved in the model-implied variance of the cluster means of y conditional on the cluster means of x, i.e.,

$$\begin{array}{l}\mathrm{var}({y}_{\cdot \cdot k}|{x}_{\cdot \cdot k})=\mathrm{var}({u}_{0k}+{u}_{1k}{x}_{\cdot \cdot k}|{x}_{\cdot \cdot k})\\ \qquad \qquad \quad \,\,\,\,={\tau }_{00(K)}+{x}_{\cdot \cdot k}{\tau }_{01(K)}+{x}_{\cdot \cdot k}^{2}{\tau }_{11(K)}\end{array}$$
(41)

We next consider an uncentered x random slope specification that additionally adds J-cluster random slopes for \({x}_{\cdot \cdot k}\) and K-cluster random slopes for \({x}_{\cdot j\cdot }\), and is thus given as:

$$\begin{array}{l}RE={u}_{0j}+{u}_{0k}+({u}_{1j}+{u}_{1k}){x}_{ijk}+{u}_{2k}{x}_{\cdot j\cdot }+{u}_{2j}{x}_{\cdot \cdot k}\\ \quad\ \ ={u}_{0j}+{u}_{0k}+({u}_{1j}+{u}_{1k})({x}_{ijk}-{x}_{\cdot j\cdot }-{x}_{\cdot \cdot k})+({u}_{1j}+{u}_{1k})({x}_{\cdot j\cdot }+{x}_{\cdot \cdot k})+{u}_{2k}{x}_{\cdot j\cdot }+{u}_{2j}{x}_{\cdot \cdot k}\\ \quad \ \ ={u}_{0j}+{u}_{0k}+({u}_{1j}+{u}_{1k})({x}_{ijk}-{x}_{\cdot j\cdot }-{x}_{\cdot \cdot k})+({u}_{1j}+{u}_{1k}+{u}_{2k}){x}_{\cdot j\cdot }+({u}_{1j}+{u}_{2j}+{u}_{1k}){x}_{\cdot \cdot k}\end{array}$$
(42)

This re-expression clarifies what each random slope term represents in relation to the more general heteroscedastic random effect specification, that is:

$$\begin{array}{l}{u}_{wj}={u}_{1j}\\ {u}_{bj}\ ={u}_{1j}+{u}_{2j}\\ {u}_{hj}\,={u}_{1j}\\ {u}_{wk}={u}_{1k}\\ {u}_{bk}\,={u}_{1k}+{u}_{2k}\\ {u}_{hk}\,={u}_{1k} \end{array}$$
(43)

Hence, this model still places the constraint that the \({u}_{w}\) terms are equal to the corresponding \({u}_{h}\) terms, each given by \({u}_{1}\). However, because there is no implicit constraint that involves the random slope terms for \({x}_{\cdot j\cdot }\) and \({x}_{\cdot \cdot k}\) (\({u}_{2k}\) and \({u}_{2j}\), respectively), the \({u}_{b}\) terms are adequately recovered in this model. The \({u}_{2k}\) term then represents a sort of conflated “contextual effect” error term in that it denotes the K-cluster-specific difference in the underlying \({u}_{bk}\) and the (conflated) \({u}_{1k}\) term (i.e., \({u}_{2k}={u}_{bk}-{u}_{1k}\)) and likewise the \({u}_{2j}\) denotes the cluster-J-specific difference in the (conflated) \({u}_{1j}\) term and the underlying \({u}_{bj}\) (i.e., \({u}_{2j}={u}_{bj}-{u}_{1j}\)).

In terms of the specific constraints on the model parameters, we can consider the random effect covariance matrix of this model:

$$\left[\begin{array}{c}{u}_{0j}\\ {u}_{1j}\\ {u}_{2j}\\ {u}_{0k}\\ {u}_{1k}\\ {u}_{2k}\end{array}\right]\sim MVN\left(\left[\begin{array}{c}0\\ 0\\ 0\\ 0\\ 0\\ 0\end{array}\right],\left[\begin{array}{cccccc}{\tau }_{00(J)}& & & & & \\ {\tau }_{01(J)}& {\tau }_{11(J)}& & & & \\ {\tau }_{02(J)}& {\tau }_{12(J)}& {\tau }_{22(J)}& & & \\ 0& 0& 0& {\tau }_{00(K)}& & \\ 0& 0& 0& {\tau }_{01(K)}& {\tau }_{11(K)}& \\ 0& 0& 0& {\tau }_{02(K)}& {\tau }_{12(K)}& {\tau }_{22(K)}\end{array}\right]\right)$$
(44)

This model places eight constraints total on the general heteroscedastic specification in Eq. (35), namely \(\mathrm{var}({u}_{wj})=\mathrm{var}({u}_{hj})\), \(\mathrm{var}({u}_{wk})=\mathrm{var}({u}_{hk})\), \({\mathrm{corr}}({u}_{wj},{u}_{hj})=1\), \({\mathrm{corr}}({u}_{wk},{u}_{hk})=1\), \({\mathrm{corr}}({u}_{0j},{u}_{wj})={\mathrm{corr}}({u}_{0j},{u}_{hj})\), \({\mathrm{corr}}({u}_{0k},{u}_{wk})={\mathrm{corr}}({u}_{0k},{u}_{hk})\), \({\mathrm{corr}}({u}_{bj},{u}_{wj})={\mathrm{corr}}({u}_{bj},{u}_{hj})\), and \({\mathrm{corr}}({u}_{bk},{u}_{wk})={\mathrm{corr}}({u}_{bk},{u}_{hk})\). In interpreting the random effect variances, the random slope term for \({x}_{ijk}-{x}_{\cdot j\cdot }-{x}_{\cdot \cdot k}\) for each cluster type represents some weighted average of the \({u}_{w}\) and \({u}_{h}\) terms and hence the \({\tau }_{11}\) variances implicitly reflect some combination of the across-cluster slope variability in the within-cluster effect of x as well as possible heteroscedasticity at level 2 (noting that the model-implied variances of the cluster means of y conditional on the cluster means of x are equal to the expressions given in Eqs. 40 and 41). The random slope variance of the cluster means (\({\tau }_{22(J)}\) and \({\tau }_{22(K)}\)) reflect variability in the conflated contextual effect terms (\({u}_{2j}\) and \({u}_{2k}\)) defined above.

Last, we consider the uncentered x model that adds to the expression in Eq. (42) a J-cluster random slope term for \({x}_{\cdot j\cdot }\) and a K-cluster random slope term for \({x}_{\cdot \cdot k}\):

$$\begin{array}{l}RE={u}_{0j}+{u}_{0k}+({u}_{1j}+{u}_{1k}){x}_{ijk}+({u}_{2k}+{u}_{3j}){x}_{\cdot j\cdot }+({u}_{2j}+{u}_{3k}){x}_{\cdot \cdot k}\\ \quad\ \ ={u}_{0j}+{u}_{0k}+({u}_{1j}+{u}_{1k})({x}_{ijk}-{x}_{\cdot j\cdot }-{x}_{\cdot \cdot k})+({u}_{1j}+{u}_{1k})({x}_{\cdot j\cdot }+{x}_{\cdot \cdot k})+({u}_{2k}+{u}_{3j}){x}_{\cdot j\cdot }+({u}_{2j}+{u}_{3k}){x}_{\cdot \cdot k}\\ \quad\,\,\,\, ={u}_{0j}+{u}_{0k}+({u}_{1j}+{u}_{1k})({x}_{ijk}-{x}_{\cdot j\cdot }-{x}_{\cdot \cdot k})+({u}_{1j}+{u}_{3j}+{u}_{1k}+{u}_{2k}){x}_{\cdot j\cdot }+({u}_{1j}+{u}_{2j}+{u}_{1k}+{u}_{3k}){x}_{\cdot \cdot k}\end{array}$$
(45)

This re-expression clarifies what each random slope term represents in relation to the more general heteroscedastic random effect specification, that is:

$$\begin{array}{l}{u}_{wj}={u}_{1j}\\ {u}_{bj}\, ={u}_{1j}+{u}_{2j}\\ {u}_{hj}\, ={u}_{1j}+{u}_{3j}\\ {u}_{wk}\,={u}_{1k}\\ {u}_{bk}\, ={u}_{1k}+{u}_{2k}\\ {u}_{hk}\,={u}_{1k}+{u}_{3k} \end{array}$$
(46)

Here, there are no implicit constraints in relation to the heteroscedastic maximal specification, and thus each of the \({u}_{w}\) , \({u}_{b}\), and \({u}_{h}\) terms are adequately recovered in this model. The \({u}_{2}\) terms represent a sort of unconflated “contextual effect” error term in that they denote the cluster-specific difference in the underlying \({u}_{b}\) and \({u}_{w}\) term (i.e., \({u}_{2}={u}_{b}-{u}_{w}\)). The random effect covariance matrix is given as:

$$\left[\begin{array}{c}{u}_{0j}\\ {u}_{1j}\\ {u}_{2j}\\ {u}_{3j}\\ {u}_{0k}\\ {u}_{1k}\\ {u}_{2k}\\ {u}_{3k}\end{array}\right]\sim MVN\left(\left[\begin{array}{c}0\\ 0\\ 0\\ 0\\ 0\\ 0\\ 0\\ 0\end{array}\right],\left[\begin{array}{cccccccc}{\tau }_{00(J)}& & & & & & & \\ {\tau }_{01(J)}& {\tau }_{11(J)}& & & & & & \\ {\tau }_{02(J)}& {\tau }_{12(J)}& {\tau }_{22(J)}& & & & & \\ {\tau }_{03(J)}& {\tau }_{13(J)}& {\tau }_{23(J)}& {\tau }_{33(J)}& & & & \\ 0& 0& 0& 0& {\tau }_{00(K)}& & & \\ 0& 0& 0& 0& {\tau }_{01(K)}& {\tau }_{11(K)}& & \\ 0& 0& 0& 0& {\tau }_{02(K)}& {\tau }_{12(K)}& {\tau }_{22(K)}& \\ 0& 0& 0& 0& {\tau }_{03(K)}& {\tau }_{13(K)}& {\tau }_{23(K)}& {\tau }_{33(K)}\end{array}\right]\right)$$
(47)

This specification is likelihood-equivalent to the general heteroscedastic specification in Eq. (35), and the random slope variances of \({x}_{ijk}-{x}_{\cdot j\cdot }-{x}_{\cdot \cdot k}\) have the same interpretation across both parameterizations. The random slope variances of the level 2 predictors in Eq. (47), however, have a unique interpretation here in that they represent the variance in the contextual effect terms mentioned above.

Partially cluster-mean-centered models

The most basic partially cluster-mean-centered model includes random effects of both cluster types for the J-cluster-mean-centered x, without any random effects for the cluster means:

$$\begin{array}{l}RE={u}_{0j}+{u}_{0k}+({u}_{1j}+{u}_{1k})({x}_{ijk}-{x}_{\cdot j\cdot })\\ \quad\ \ ={u}_{0j}+{u}_{0k}+({u}_{1j}+{u}_{1k})({x}_{ijk}-{x}_{\cdot j\cdot }-{x}_{\cdot \cdot k})+({u}_{1j}+{u}_{1k}){x}_{\cdot \cdot k} \end{array}$$
(48)

The re-expression shows what each random slope term represents in relation to the general heteroscedastic random effect model shown in Eq. (35):

$$\begin{array}{l}{u}_{wj}={u}_{1j}\\ {u}_{bj}\,={u}_{1j}\\ {u}_{hj}\,=0\\ {u}_{wk}={u}_{1k}\\ {u}_{bk}\,=0\\ {u}_{hk}={u}_{1k} \end{array}$$
(49)

Hence, this model places the constraints that \({u}_{wj}\) equals \({u}_{bj}\) (both given by \({u}_{1j}\)) and that \({u}_{wk}\) equals \({u}_{hk}\) (both given as \({u}_{1k}\)). Additionally, the J-cluster heteroscedastic component, \({u}_{hj}\), is constrained to be zero, as is the K-cluster between-cluster effect component, \({u}_{bk}\). The random effect covariance matrix of this model is given as:

$$\left[\begin{array}{c}{u}_{0j}\\ {u}_{1j}\\ {u}_{0k}\\ {u}_{1k}\end{array}\right]\sim MVN\left(\left[\begin{array}{c}0\\ 0\\ 0\\ 0\end{array}\right],\left[\begin{array}{cccc}{\tau }_{00(J)}& & & \\ {\tau }_{01(J)}& {\tau }_{11(J)}& & \\ 0& 0& {\tau }_{00(K)}& \\ 0& 0& {\tau }_{01(K)}& {\tau }_{11(K)}\end{array}\right]\right)$$
(50)

This model places a total of 14 constraints on the general heteroscedastic specification in Eq. (35), namely: \(\mathrm{var}({u}_{wj})=\mathrm{var}({u}_{bj})\), \(\mathrm{var}({u}_{wk})=\mathrm{var}({u}_{hk})\), \(\mathrm{var}({u}_{hj})=0\), \(\mathrm{var}({u}_{bk})=0\), \({\mathrm{corr}}({u}_{wj},{u}_{bj})=1\), \({\mathrm{corr}}({u}_{wk},{u}_{bk})=0\), \({\mathrm{corr}}({u}_{wj,}{u}_{hj})={0}\), \({\mathrm{corr}}({u}_{wk},{u}_{hk})=1\), \({\mathrm{corr}}({u}_{bj},{u}_{hj})={0}\), \({\mathrm{corr}}({u}_{bk},{u}_{hk})=0\), \({\mathrm{corr}}\mathrm{(}{u}_{0j},{u}_{wj}\mathrm{)}=\mathrm{corr(}{u}_{0j},{u}_{bj}\mathrm{)}\), \({\mathrm{corr}}\mathrm{(}{u}_{0k},{u}_{bk})=0\), \({\mathrm{corr}}\mathrm{(}{u}_{0j},{u}_{hj}\mathrm{)}=0\), \({\mathrm{corr}}\mathrm{(}{u}_{0k},{u}_{wk}\mathrm{)}=\mathrm{corr(}{u}_{0k},{u}_{hk}\mathrm{)}\). In interpreting the random effect variances, the \({\tau }_{11(J)}\) implicitly reflects some combination of the variability in the within-cluster and between-J-cluster effects of x, and \({\tau }_{11(K)}\) reflects some combination of the variability in the within-cluster effect as well as possible K-cluster heteroscedasticity at level 2.

We next consider the partially cluster-mean-centered model that additionally adds J-cluster random slopes for \({x}_{\cdot \cdot k}\) and K-cluster random slopes for \({x}_{\cdot j\cdot }\), and is thus given as:

$$\begin{array}{l}RE={u}_{0j}+{u}_{0k}+({u}_{1j}+{u}_{1k})({x}_{ijk}-{x}_{\cdot j\cdot })+{u}_{2k}{x}_{\cdot j\cdot }+{u}_{2j}{x}_{\cdot \cdot k}\\\quad \ \ ={u}_{0j}+{u}_{0k}+({u}_{1j}+{u}_{1k})({x}_{ijk}-{x}_{\cdot j\cdot }-{x}_{\cdot \cdot k})+({u}_{1j}+{u}_{1k}){x}_{\cdot \cdot k}+{u}_{2k}{x}_{\cdot j\cdot }+{u}_{2j}{x}_{\cdot \cdot k}\\ \quad \ \ ={u}_{0j}+{u}_{0k}+({u}_{1j}+{u}_{1k})({x}_{ijk}-{x}_{\cdot j\cdot }-{x}_{\cdot \cdot k})+{u}_{2k}{x}_{\cdot j\cdot }+({u}_{1j}+{u}_{1k}+{u}_{2j}){x}_{\cdot \cdot k}\end{array}$$
(51)

What each random slope term represents in relation to general heteroscedastic random effect specification is shown as follows:

$$\begin{array}{l}{u}_{wj}={u}_{1j}\\ {u}_{bj}\,={u}_{1j}+{u}_{2j}\\ {u}_{hj}\,=0\\ {u}_{wk}={u}_{1k}\\ {u}_{bk}\,={u}_{2k}\\ {u}_{hk}={u}_{1k} \end{array}$$
(52)

Hence, the model still places the constraint that \({u}_{wk}\) is equal to \({u}_{hk}\) (both given as \({u}_{1k}\)), and that \({u}_{hj}\) equals zero. However, there is no implicit constraint involving the random slope terms for either \({x}_{\cdot j\cdot }\) or \({x}_{\cdot \cdot k}\) (\({u}_{2k}\) and \({u}_{2j}\), respectively), and thus the \({u}_{b}\) terms are recovered. Specifically, \({u}_{2k}={u}_{bk}\), and the \({u}_{2j}\) term represents the previously defined conflated contextual effect, i.e., the J-cluster-specific difference in the underlying \({u}_{bk}\) and the conflated \({u}_{1j}\) term (i.e., \({u}_{2j}={u}_{bj}-{u}_{1j}\)). The random effect covariance matrix of this model is given as:

$$\left[\begin{array}{c}{u}_{0j}\\ {u}_{1j}\\ {u}_{2j}\\ {u}_{0k}\\ {u}_{1k}\\ {u}_{2k}\end{array}\right]\sim MVN\left(\left[\begin{array}{c}0\\ 0\\ 0\\ 0\\ 0\\ 0\end{array}\right],\left[\begin{array}{cccccc}{\tau }_{00(J)}& & & & & \\ {\tau }_{01(J)}& {\tau }_{11(J)}& & & & \\ {\tau }_{02(J)}& {\tau }_{12(J)}& {\tau }_{22(J)}& & & \\ 0& 0& 0& {\tau }_{00(K)}& & \\ 0& 0& 0& {\tau }_{01(K)}& {\tau }_{11(K)}& \\ 0& 0& 0& {\tau }_{02(K)}& {\tau }_{12(K)}& {\tau }_{22(K)}\end{array}\right]\right)$$
(53)

This model places eight constraints on the general heteroscedastic specification in Eq. (35), namely \(\mathrm{var}({u}_{hj})=0\), \(\mathrm{var}({u}_{wk})=\mathrm{var}({u}_{hk})\), \({\mathrm{corr}}({u}_{wj},{u}_{hj})={0}\), \({\mathrm{corr}}({u}_{wk},{u}_{hk})=1\), \({\mathrm{corr}}({u}_{bj},{u}_{hj})={0}\), \({\mathrm{corr}}({u}_{wk},{u}_{bk})={\mathrm{corr}}({u}_{bk},{u}_{hk})\), \({\mathrm{corr}}({u}_{0j},{u}_{hj})={0}\), and \({\mathrm{corr}}({u}_{0k},{u}_{wk})={\mathrm{corr}}({u}_{0k},{u}_{hk})\). The two random slope variances for the level 1 predictor \({x}_{ijk}-{x}_{\cdot j\cdot }\) have different interpretations: since \({u}_{1j}\) represents \({u}_{wj}\), the \({\tau }_{11(J)}\) variance reflects across-J-cluster variability in the within-cluster effect of x; since \({u}_{1k}\), on the other hand, represents some weighted average of \({u}_{wk}\) and \({u}_{hk}\) , the \({\tau }_{11(K)}\) variance reflects across-K-cluster variability in some combination of the within-cluster effect as well as possible K-cluster heteroscedasticity at level 2. The random slope variance for the J-cluster mean (\({\tau }_{22(K)}\)) reflects across-K-cluster variability in the between-J-cluster effect, whereas the random slope variance for K-cluster mean (\({\tau }_{22(J)}\)) reflects variability in the previously defined conflated contextual effect term (\({u}_{2j}\)).

Last, we consider adding the heteroscedastic components for both cluster types, and the model is given as:

$$\begin{array}{l}RE={u}_{0j}+{u}_{0k}+({u}_{1j}+{u}_{1k})({x}_{ijk}-{x}_{\cdot j\cdot })+({u}_{2k}+{u}_{3j}){x}_{\cdot j\cdot }+({u}_{2j}+{u}_{3k}){x}_{\cdot \cdot k}\\ \quad \ \ ={u}_{0j}+{u}_{0k}+({u}_{1j}+{u}_{1k})({x}_{ijk}-{x}_{\cdot j\cdot }-{x}_{\cdot \cdot k})+({u}_{1j}+{u}_{1k}){x}_{\cdot \cdot k}+({u}_{2k}+{u}_{3j}){x}_{\cdot j\cdot }+({u}_{2j}+{u}_{3k}){x}_{\cdot \cdot k}\\ \quad \ \ ={u}_{0j}+{u}_{0k}+({u}_{1j}+{u}_{1k})({x}_{ijk}-{x}_{\cdot j\cdot }-{x}_{\cdot \cdot k})+({u}_{2k}+{u}_{3j}){x}_{\cdot j\cdot }+({u}_{1j}+{u}_{1k}+{u}_{2j}+{u}_{3k}){x}_{\cdot \cdot k}\end{array}$$
(54)

Again, this re-expression clarifies how each specification is related to the general heteroscedastic random effect model, that is:

$$\begin{array}{l}{u}_{wj}\,={u}_{1j}\\ {u}_{bj}\,\,={u}_{1j}+{u}_{2j}\\ {u}_{hj}\,={u}_{3j}\\ {u}_{wk}={u}_{1k}\\ {u}_{bk}\,={u}_{2k}\\ {u}_{hk}\,={u}_{1k}+{u}_{3k} \end{array}$$
(55)

There is no implicit constraint involving any of the random slope terms for \({x}_{ijk}-{x}_{\cdot j\cdot }-{x}_{\cdot \cdot k}\), \({x}_{\cdot j\cdot }\), or \({x}_{\cdot \cdot k}\), and thus \({u}_{w}\), \({u}_{b}\) and \({u}_{h}\) are all recovered from this model. The \({u}_{2j}\) represents the unconflated “contextual effect” error term, i.e., the underlying difference in the J-cluster random effects for the between-K-cluster and within-cluster effects (i.e., \({u}_{2j}={u}_{bj}-{u}_{wj}\)). The \({u}_{3k}\) represents a different type of unconflated “contextual effect,” namely, the difference in the K-cluster random effects for the within-cluster effects and the K-cluster heteroscedastic component (i.e., \({u}_{3k}={u}_{hk}-{u}_{1k}\)). The random effect covariance matrix is given as:

$$\left[\begin{array}{c}{u}_{0j}\\ {u}_{1j}\\ {u}_{2j}\\ {u}_{3j}\\ {u}_{0k}\\ {u}_{1k}\\ {u}_{2k}\\ {u}_{3k}\end{array}\right]\sim MVN\left(\left[\begin{array}{c}0\\ 0\\ 0\\ 0\\ 0\\ 0\\ 0\\ 0\end{array}\right],\left[\begin{array}{cccccccc}{\tau }_{00(J)}& & & & & & & \\ {\tau }_{01(J)}& {\tau }_{11(J)}& & & & & & \\ {\tau }_{02(J)}& {\tau }_{12(J)}& {\tau }_{22(J)}& & & & & \\ {\tau }_{03(J)}& {\tau }_{13(J)}& {\tau }_{23(J)}& {\tau }_{33(J)}& & & & \\ 0& 0& 0& 0& {\tau }_{00(K)}& & & \\ 0& 0& 0& 0& {\tau }_{01(K)}& {\tau }_{11(K)}& & \\ 0& 0& 0& 0& {\tau }_{02(K)}& {\tau }_{12(K)}& {\tau }_{22(K)}& \\ 0& 0& 0& 0& {\tau }_{03(K)}& {\tau }_{13(K)}& {\tau }_{23(K)}& {\tau }_{33(K)}\end{array}\right]\right)$$
(56)

This specification is likelihood-equivalent to the general heteroscedastic specification in Eq. (35). In terms of interpretation, both of the random slope variances for \({x}_{ijk}-{x}_{\cdot j\cdot }-{x}_{\cdot \cdot k}\), \({\tau }_{11(J)}\) and \({\tau }_{11(K)}\), and both random slope variances for \({x}_{\cdot j\cdot }\), \({\tau }_{22(K)}\) and \({\tau }_{33(J)}\), have the same interpretation as the general heteroscedastic specification. The random slope variances for \({x}_{\cdot \cdot k}\), however, have unique interpretations, as \({\tau }_{22(J)}\) and \({\tau }_{33(K)}\) represent the variances in the aforementioned contextual effect residuals.

Fully cluster-mean-centered models

Last, we consider fully cluster-mean-centered models. The most basic of these models includes only the fully cluster-mean-centered x as the predictor:

$$RE={u}_{0j}+{u}_{0k}+({u}_{1j}+{u}_{1k})({x}_{ijk}-{x}_{\cdot j\cdot }-{x}_{\cdot \cdot k})$$
(57)

Here, each \({u}_{w}\) is represented by \({u}_{1}\), so that the interpretation of \({u}_{1}\) is the same as that of the general heteroscedastic specification in Eq. (35); however, both of the between-cluster random effects and the level 2 heteroscedasticity components are constrained to be zero. The random effect covariance matrix is then:

$$\left(\begin{array}{c}{u}_{0j}\\ {u}_{1j}\\ {u}_{0k}\\ {u}_{1k}\end{array}\right)\sim MVN\left(\left[\begin{array}{c}0\\ 0\\ 0\\ 0\end{array}\right],\left[\begin{array}{cccc}{\tau }_{00(J)}& & & \\ {\tau }_{01(J)}& {\tau }_{11(J)}& & \\ 0& 0& {\tau }_{00(K)}& \\ 0& 0& {\tau }_{01(K)}& {\tau }_{11(K)}\end{array}\right]\right)$$
(58)

Here, each variance has the same interpretation as that from the general heteroscedastic expression in Eq. (35). In relation to the general heteroscedastic expression, this model places a total of 14 constraints, i.e., that all variances and covariances shown in Eq. (35) but not B25 are equal to zero.

Expanding upon this model, one could also add a J-cluster random component for \({x}_{\cdot \cdot k}\) and a K-cluster random component for \({x}_{\cdot j\cdot }\), which would yield the following random effect specification:

$$RE={u}_{0j}+{u}_{0k}+({u}_{1j}+{u}_{1k})({x}_{ijk}-{x}_{\cdot j\cdot }-{x}_{\cdot \cdot k})+{u}_{2k}{x}_{\cdot j\cdot }+{u}_{2j}{x}_{\cdot \cdot k}$$
(59)

In this model, \({u}_{1}\) still represents \({u}_{w}\) as specified in general heteroscedastic model in Eq. (35), whereas \({u}_{2k}\) represents \({u}_{bk}\) and \({u}_{2j}\) represents \({u}_{bj}\). The random effect covariance matrix is then as follows:

$$\left(\begin{array}{c}{u}_{0j}\\ {u}_{1j}\\ {u}_{2j}\\ {u}_{0k}\\ {u}_{1k}\\ {u}_{2k}\end{array}\right)\sim MVN\left(\left[\begin{array}{c}0\\ 0\\ 0\\ 0\\ 0\\ 0\end{array}\right],\left[\begin{array}{cccccc}{\tau }_{00(J)}& & & & & \\ {\tau }_{01(J)}& {\tau }_{11(J)}& & & & \\ {\tau }_{02(J)}& {\tau }_{12(J)}& {\tau }_{22(J)}& & & \\ 0& 0& 0& {\tau }_{00(K)}& & \\ 0& 0& 0& {\tau }_{01(K)}& {\tau }_{11(K)}& \\ 0& 0& 0& {\tau }_{02(K)}& {\tau }_{12(K)}& {\tau }_{22(K)}\end{array}\right]\right)$$
(60)

Each variance again has the same interpretation as the corresponding variance from the general heteroscedastic expression in Eq. (35), but this constrains all other terms from Eq. (35) to zero.

Last, if one were to additionally add the level 2 heteroscedasticity component for both cluster types, the random effects would be given as:

$$RE={u}_{0j}+{u}_{0k}+({u}_{1j}+{u}_{1k})({x}_{ijk}-{x}_{\cdot j\cdot }-{x}_{\cdot \cdot k})+({u}_{2k}+{u}_{3j}){x}_{\cdot j\cdot }+({u}_{2j}+{u}_{3k}){x}_{\cdot \cdot k}$$
(61)

And the random effect covariance matrix would be:

$$\left[\begin{array}{c}{u}_{0j}\\ {u}_{1j}\\ {u}_{2j}\\ {u}_{3j}\\ {u}_{0k}\\ {u}_{1k}\\ {u}_{2k}\\ {u}_{3k}\end{array}\right]\sim MVN\left(\left[\begin{array}{c}0\\ 0\\ 0\\ 0\\ 0\\ 0\\ 0\\ 0\end{array}\right],\left[\begin{array}{cccccccc}{\tau }_{00(J)}& & & & & & & \\ {\tau }_{01(J)}& {\tau }_{11(J)}& & & & & & \\ {\tau }_{02(J)}& {\tau }_{12(J)}& {\tau }_{22(J)}& & & & & \\ {\tau }_{03(J)}& {\tau }_{13(J)}& {\tau }_{23(J)}& {\tau }_{33(J)}& & & & \\ 0& 0& 0& 0& {\tau }_{00(K)}& & & \\ 0& 0& 0& 0& {\tau }_{01(K)}& {\tau }_{11(K)}& & \\ 0& 0& 0& 0& {\tau }_{02(K)}& {\tau }_{12(K)}& {\tau }_{22(K)}& \\ 0& 0& 0& 0& {\tau }_{03(K)}& {\tau }_{13(K)}& {\tau }_{23(K)}& {\tau }_{33(K)}\end{array}\right]\right)$$
(62)

This is equal to the general heteroscedastic model given in Eq. (35).

Appendix C: R function to fully or partially cluster-mean-center level 1 variables and compute cluster means of level 1 variables for cross-classified models

cc.gmc R function Description:

This function reads in a dataset containing level 1 predictors and cluster identifying variables, and outputs the fully cluster-mean-centered level 1 predictor along with each type of cluster mean. It can accommodate any number of cross-classifications and can center multiple level 1 predictors simultaneously. Partially cluster-mean-centered level 1 predictors can be computed by specifying a subset of cluster types, as described below. Using this function, researchers can utilize any of the centering options discussed throughout this paper (see, e.g., Table 1).

This function is a modified version of the gmc function from the rockchalk package (Johnson, 2019).

cc.gmc R function Input:

dframe – Dataset with row denoting observations and columns denoting variables

x – List of level 1 predictors to be centered (E.g., c(“V1”,”V2”))

by – List of variables names indicating higher-level clusters (E.g., c(“Cluster1”,”Cluster2”)). To create partially cluster-mean-center variables, enter only a subset of clusters here.

fulldataframe – Logical argument indicating whether or not to return full data frame. If set to TRUE, full data frame with newly added columns of cluster-mean-centered level 1 predictors and cluster means is returned. If set to FALSE, only cluster-mean-centered level 1 predictors, cluster means, and cluster identification variables will be returned.

cc.gmc R function Code:

figure afigure a

cc.gmc R function simulated example:

figure b

Code used for models in empirical example section:

figure cfigure cfigure c

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Guo, Y., Dhaliwal, J. & Rights, J.D. Disaggregating level-specific effects in cross-classified multilevel models. Behav Res (2023). https://doi.org/10.3758/s13428-023-02238-7

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.3758/s13428-023-02238-7

Keywords

Navigation