Skip to main content

Level-specific residuals and diagnostic measures, plots, and tests for random effects selection in multilevel and mixed models


Multilevel data structures are often found in multiple substantive research areas, and multilevel models (MLMs) have been widely used to allow for such multilevel data structures. One important step when applying MLM is the selection of an optimal set of random effects to account for variability and heteroscedasticity in multilevel data. Literature reviews on current practices in applying MLM showed that diagnostic plots are only rarely used for model selection and for model checking. In this study, possible random effects and a generic description of the random effects were provided to guide researchers to select necessary random effects. In addition, based on extensive literature reviews, level-specific diagnostic plots were presented using various kinds of level-specific residuals, and diagnostic measures and statistical tests were suggested to select a set of random effects. Existing and newly proposed methods were illustrated using two data sets: a cross-sectional data set and a longitudinal data set. Along with the illustration, we discuss the methods and provide guidelines to select necessary random effects in model-building steps. R code was provided for the analyses.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4


  1. To review current practices of using diagnostic plots and model selection methods regarding random effects, 72 papers were randomly selected from nine APA journals through the PsychINFO database. We found that random effects were selected based on LRT (33%), Wald test (26%), goodness-of-fit statistics (13%), information criteria (2%), and pseudo R-square (2%). Twelve papers (17%) did not consider a model selection regarding random effects.

  2. Of the 72 papers we reviewed, there was one paper which presented a diagnostic plot to investigate autoregressive effects.

  3. Exceptionally, O’Connell, Yeomans-Maldonado, and McCoach (2016) listed conditional and marginal residuals for education researchers. In this study, we added one more kind of residual called independent residuals, based on extensive reviews in the statistics literature.

  4. In the statistics literature, independent residuals are also known as transformed or normalized residuals (e.g., Galecki and Burzykowski, 2013).

  5. Although we mainly use the term of MLM instead of LMM throughout this paper, we divide the literature into MLM and LMM as far as inspecting residuals is concerned because MLM literature presents practices of inspecting level-specific residuals in the context of the social and behavioral sciences whereas LMM presents them in the context of statistics.

  6. Because there are 23 schools in the Math data, there are 23 SIQR scores. The points represent the math scores of the individual students from the cluster with a SIQR value indicated on the x-axis.

  7. O’Connell et al., (2016) did not mention whether standardized or unstandardized residuals were used.

  8. Pinheiro and Bates (2000, p. 245) also presented an autocorrelation function of the conditional independent residuals to assess the adequacy of a model with the level-1 error.

  9. The values of SIQR for each school did not change very much from Model 2 to Model 3, but they changed enough for the ranking of SIQRs for the schools to change between models. Specifically, the 6th smallest SIQR (the one that the first-quartile is largely dependent upon for 22 schools) changed from 0.5461 (for schid= 25456) in Model 2 to 0.4958 (for schid= 68493) in Model 3, whereas the median SIQR and third-quartile SIQR stayed largely consistent between Model 2 and Model 3. Because the first-quartile of SIQR was smaller in Model 3, the SIQR(SIQR) increased.

  10. 𝜖ij is referred to as “random error” for the null models (without covariates), and is referred to as “random residual” after covariates are modeled.

  11. Conditional independent residuals were plotted instead of marginal standardized residuals (which were plotted for Model 1a) because errors are now allowed to correlate.

  12. Model 1a with ARMA(1,0) was also selected by AIC and BIC of the three candidate models: Model 1a with unstructured, compound, and ARMA(1,0): Model 1a with unstructured (AIC= 2244.172, BIC= 2338.290), Model 1a with compound (AIC= 2286.203, BIC= 2325.419), and Model 1a with ARMA(1,0) (AIC= 2242.170, BIC= 2281.386).

  13. Although Model 2 was selected instead of Model 3 in Step 4, Model 3 is included in this table for comparison.


Download references


Funding was provided in part by the National Science Foundation (SES:1851690) to Sun-Joo Cho, Sarah Brown-Schmidt, and Paul De Boeck. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. We are grateful to Sonya Sterba (Vanderbilt University) for helpful comments on earlier versions of this article.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Sun-Joo Cho.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

(PDF 29.5 KB)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Cho, SJ., De Boeck, P., Naveiras, M. et al. Level-specific residuals and diagnostic measures, plots, and tests for random effects selection in multilevel and mixed models. Behav Res (2022).

Download citation

  • Accepted:

  • Published:

  • DOI:


  • Diagnostic plots
  • Level-specific residuals
  • Mixed-effects model
  • Multilevel model
  • Random effect selection