Skip to main content
Log in

Modeling item-level heterogeneous treatment effects: A tutorial with the glmer function from the lme4 package in R

  • Original Manuscript
  • Published:
Behavior Research Methods Aims and scope Submit manuscript

Abstract

Recent advancements in education scholarship have introduced Item Response Theory (IRT) models to address treatment heterogeneity at the assessment item level. These models for item-level heterogeneous treatment effects (IL-HTE) enable detailed analyses of treatments that may have varying impacts on individual items within an assessment. This article offers a comprehensive tutorial for applied researchers interested in implementing IL-HTE analysis in R, utilizing the lme4 package. Using empirical data from a second-grade reading comprehension assessment as a running example, this tutorial emphasizes model-building strategies, interpretation techniques, visualization methods, and extensions. By following this tutorial, researchers will gain practical insights into utilizing IL-HTE analysis for enhanced understanding and interpretation of treatment effects at the item level.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Availability of data and materials

The code for this article (which provides access to the already public data) is available as an online supplemental file.

Code Availability

The code for this article is available as an online supplemental file (https://researchbox.org/2054).

Notes

  1. On an educational test, \(b_i\) is most often interpreted as item easiness. Item easiness is the negative of what is usually called item difficulty in the IRT literature. Because lmer syntax allows most easily for the estimation of item easiness parameters, and our examples are drawn from education research, we will proceed with the item easiness terminology.

References

  • Adams, R. J., Wilson, M., & Wang, W. (1997). The Multidimensional Random Coefficients Multinomial Logit Model. Applied Psychological Measurement, 21(1), 1–23. https://doi.org/10.1177/0146621697211001

  • Ahmed, I., Bertling, M., Zhang, L., Ho, A. D., Loyalka, P., Xue, H., & Domingue, B. W. (2023). Heterogeneity of item-treatment interactions masks complexity and generalizability in randomized controlled trials. Edworkingpapers.com. https://doi.org/10.26300/1NW4-NA96

  • Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting Linear Mixed-Effects Models Usinglme4. Journal of Statistical Software, 67(1). https://doi.org/10.18637/jss.v067.i01

  • Briggs, D. C. (2008). Using Explanatory Item Response Models to Analyze Group Differences in Science Achievement. Applied Measurement in Education, 21(2), 89–118. https://doi.org/10.1080/08957340801926086

  • Bulut, O., Gorgun, G., & Yildirim-Erbasli, S. N. (2021). Estimating Explanatory Extensions of Dichotomous and Polytomous Rasch Models: The eirm Package in R. Psych, 3(3), 308–321. https://doi.org/10.3390/psych3030023

  • Bürkner, P.-C. (2021). Bayesian Item Response Modeling in R with brms and Stan. Journal of Statistical Software, 100(5). https://doi.org/10.18637/jss.v100.i05

  • Christensen, R. H. B. (2022). Ordinal—regression models for ordinal data.

  • De Boeck, P. (2008). Random Item IRT Models. Psychometrika, 73(4), 533–559. https://doi.org/10.1007/s11336-008-9092-x

  • De Boeck, P., Bakker, M., Zwitser, R., Nivard, M., Hofman, A., Tuerlinckx, F., & Partchev, I. (2011). The Estimation of Item Response Models with thelmerFunction from thelme4Package inR. Journal of Statistical Software, 39(12). https://doi.org/10.18637/jss.v039.i12

  • Doran, H., Bates, D., Bliese, P., & Dowling, M. (2007). Estimating the Multilevel Rasch Model: With thelme4Package. Journal of Statistical Software, 20(2). https://doi.org/10.18637/jss.v020.i02

  • Francis, D. J., Kulesz, P. A., Khalaf, S., Walczak, M., & Vaughn, S. R. (2022). Is the treatment weak or the test insensitive: Interrogating item difficulties to elucidate the nature of reading intervention effects. Learning and Individual Differences, 97, 102167. https://doi.org/10.1016/j.lindif.2022.102167

  • Gilbert, J. B. (2022). Estimating treatment effects with the explanatory item response model. EdWorkingPapers.com. https://doi.org/10.26300/SNVZ-EW19

  • Gilbert, J. B., Kim, J. S., & Miratrix, L. W. (2023). Modeling Item-Level Heterogeneous Treatment Effects With the Explanatory Item Response Model: Leveraging Large-Scale Online Assessments to Pinpoint the Impact of Educational Interventions. Journal of Educational and Behavioral Statistics, 107699862311717. https://doi.org/10.3102/10769986231171710

  • Gilbert, Joshua B. (2023). How measurement affects causal inference: Attenuation bias is (usually) more important than scoring weights. Edworkingpapers.com. https://doi.org/10.26300/4HAH-6S55

  • Halekoh, U., & Højsgaard, S. (2014). A Kenward-Roger Approximation and Parametric Bootstrap Methods for Tests in Linear Mixed Models - TheRPackagepbkrtest. Journal of Statistical Software, 59(9). https://doi.org/10.18637/jss.v059.i09

  • Hedges, L. V. (1981). Distribution Theory for Glass’s Estimator of Effect size and Related Estimators. Journal of Educational Statistics, 6(2), 107–128. https://doi.org/10.3102/10769986006002107

  • Jeon, M., & Rockwood, N. (2017). PLmixed: An R Package for Generalized Linear Mixed Models With Factor Structures. Applied Psychological Measurement, 42(5), 401–402. https://doi.org/10.1177/0146621617748326

  • Kim, J. S., Burkhauser, M. A., Relyea, J. E., Gilbert, J. B., Scherer, E., Fitzgerald, J., & McIntyre, J. (2023). A longitudinal randomized trial of a sustained content literacy intervention from first to second grade: Transfer effects on students’ reading comprehension. Journal of Educational Psychology, 115(1), 73–98. https://doi.org/10.1037/edu0000751

  • Koretz, D. (2005). Alignment, High Stakes, and the Inflation of Test Scores. Yearbook of the National Society for the Study of Education, 104(2), 99–118. https://doi.org/10.1111/j.1744-7984.2005.00027.x

  • Lüdecke, D. (2018). Ggeffects: Tidy data frames of marginal effects from regression models. Journal of Open Source Software, 3(26), 772. https://doi.org/10.21105/joss.00772

  • Molenberghs, G., & Verbeke, G. (2007). Likelihood Ratio, Score, and Wald Tests in a Constrained Parameter Space. The American Statistician, 61(1), 22–27. https://doi.org/10.1198/000313007x171322

  • Montoya, A. K., & Jeon, M. (2019). MIMIC Models for Uniform and Nonuniform DIF as Moderated Mediation Models. Applied Psychological Measurement, 44(2), 118–136. https://doi.org/10.1177/0146621619835496

  • Naumann, A., Hochweber, J., & Hartig, J. (2014). Modeling Instructional Sensitivity Using a Longitudinal Multilevel Differential Item Functioning Approach. Journal of Educational Measurement, 51(4), 381–399. https://doi.org/10.1111/jedm.12051

  • Petscher, Y., Compton, D. L., Steacy, L., & Kinnon, H. (2020). Past perspectives and new opportunities for the explanatory item response model. Annals of Dyslexia, 70(2), 160–179. https://doi.org/10.1007/s11881-020-00204-y

  • Rabe-Hesketh, S., & Skrondal, A. (2022). Multilevel and longitudinal modeling using stata. STATA press.

  • Rijmen, F., Tuerlinckx, F., De Boeck, P., & Kuppens, P. (2003). A nonlinear mixed model framework for item response theory. Psychological Methods, 8(2), 185–205. https://doi.org/10.1037/1082-989x.8.2.185

  • Sales, A., Prihar, E., Heffernan, N., & Pane, J. F. (2021). The effect of an intelligent tutor on performance on specific posttest problems. International Educational Data Mining Society.

  • Self, S. G., & Liang, K.-Y. (1987). Asymptotic Properties of Maximum Likelihood Estimators and Likelihood Ratio Tests under Nonstandard Conditions. Journal of the American Statistical Association, 82(398), 605–610. https://doi.org/10.1080/01621459.1987.10478472

  • Stoel, R. D., Garre, F. G., Dolan, C., & Wittenboer, G. van den. (2006). On the likelihood ratio test in structural equation modeling when parameters are subject to boundary constraints. Psychological Methods, 11(4), 439–455. https://doi.org/10.1037/1082-989x.11.4.439

  • Stram, D. O., & Lee, J. W. (1994). Variance components testing in the longitudinal mixed effects model. Biometrics, 50(4), 1171. https://doi.org/10.2307/2533455

  • Wilson, M., & De Boeck, P. (2004). Descriptive and explanatory item response models. Springer New York. https://doi.org/10.1007/978-1-4757-3990-9_2

  • Zhang, J., Ackerman, T., & Wang, Y. (2021). 2PL model: Compare generalized linear mixed model with latent variable model based IRT framework. Retrieved from https://doi.org/10.31234/osf.io/p6wuz

Download references

Funding

No financial support was received for the writing of this article.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joshua B. Gilbert.

Ethics declarations

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Conflicts of interest

The author reports no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gilbert, J.B. Modeling item-level heterogeneous treatment effects: A tutorial with the glmer function from the lme4 package in R. Behav Res (2023). https://doi.org/10.3758/s13428-023-02245-8

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.3758/s13428-023-02245-8

Keywords

Navigation