Abstract
Recent advancements in education scholarship have introduced Item Response Theory (IRT) models to address treatment heterogeneity at the assessment item level. These models for item-level heterogeneous treatment effects (IL-HTE) enable detailed analyses of treatments that may have varying impacts on individual items within an assessment. This article offers a comprehensive tutorial for applied researchers interested in implementing IL-HTE analysis in R, utilizing the lme4 package. Using empirical data from a second-grade reading comprehension assessment as a running example, this tutorial emphasizes model-building strategies, interpretation techniques, visualization methods, and extensions. By following this tutorial, researchers will gain practical insights into utilizing IL-HTE analysis for enhanced understanding and interpretation of treatment effects at the item level.
Similar content being viewed by others
Availability of data and materials
The code for this article (which provides access to the already public data) is available as an online supplemental file.
Code Availability
The code for this article is available as an online supplemental file (https://researchbox.org/2054).
Notes
On an educational test, \(b_i\) is most often interpreted as item easiness. Item easiness is the negative of what is usually called item difficulty in the IRT literature. Because lmer syntax allows most easily for the estimation of item easiness parameters, and our examples are drawn from education research, we will proceed with the item easiness terminology.
References
Adams, R. J., Wilson, M., & Wang, W. (1997). The Multidimensional Random Coefficients Multinomial Logit Model. Applied Psychological Measurement, 21(1), 1–23. https://doi.org/10.1177/0146621697211001
Ahmed, I., Bertling, M., Zhang, L., Ho, A. D., Loyalka, P., Xue, H., & Domingue, B. W. (2023). Heterogeneity of item-treatment interactions masks complexity and generalizability in randomized controlled trials. Edworkingpapers.com. https://doi.org/10.26300/1NW4-NA96
Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting Linear Mixed-Effects Models Usinglme4. Journal of Statistical Software, 67(1). https://doi.org/10.18637/jss.v067.i01
Briggs, D. C. (2008). Using Explanatory Item Response Models to Analyze Group Differences in Science Achievement. Applied Measurement in Education, 21(2), 89–118. https://doi.org/10.1080/08957340801926086
Bulut, O., Gorgun, G., & Yildirim-Erbasli, S. N. (2021). Estimating Explanatory Extensions of Dichotomous and Polytomous Rasch Models: The eirm Package in R. Psych, 3(3), 308–321. https://doi.org/10.3390/psych3030023
Bürkner, P.-C. (2021). Bayesian Item Response Modeling in R with brms and Stan. Journal of Statistical Software, 100(5). https://doi.org/10.18637/jss.v100.i05
Christensen, R. H. B. (2022). Ordinal—regression models for ordinal data.
De Boeck, P. (2008). Random Item IRT Models. Psychometrika, 73(4), 533–559. https://doi.org/10.1007/s11336-008-9092-x
De Boeck, P., Bakker, M., Zwitser, R., Nivard, M., Hofman, A., Tuerlinckx, F., & Partchev, I. (2011). The Estimation of Item Response Models with thelmerFunction from thelme4Package inR. Journal of Statistical Software, 39(12). https://doi.org/10.18637/jss.v039.i12
Doran, H., Bates, D., Bliese, P., & Dowling, M. (2007). Estimating the Multilevel Rasch Model: With thelme4Package. Journal of Statistical Software, 20(2). https://doi.org/10.18637/jss.v020.i02
Francis, D. J., Kulesz, P. A., Khalaf, S., Walczak, M., & Vaughn, S. R. (2022). Is the treatment weak or the test insensitive: Interrogating item difficulties to elucidate the nature of reading intervention effects. Learning and Individual Differences, 97, 102167. https://doi.org/10.1016/j.lindif.2022.102167
Gilbert, J. B. (2022). Estimating treatment effects with the explanatory item response model. EdWorkingPapers.com. https://doi.org/10.26300/SNVZ-EW19
Gilbert, J. B., Kim, J. S., & Miratrix, L. W. (2023). Modeling Item-Level Heterogeneous Treatment Effects With the Explanatory Item Response Model: Leveraging Large-Scale Online Assessments to Pinpoint the Impact of Educational Interventions. Journal of Educational and Behavioral Statistics, 107699862311717. https://doi.org/10.3102/10769986231171710
Gilbert, Joshua B. (2023). How measurement affects causal inference: Attenuation bias is (usually) more important than scoring weights. Edworkingpapers.com. https://doi.org/10.26300/4HAH-6S55
Halekoh, U., & Højsgaard, S. (2014). A Kenward-Roger Approximation and Parametric Bootstrap Methods for Tests in Linear Mixed Models - TheRPackagepbkrtest. Journal of Statistical Software, 59(9). https://doi.org/10.18637/jss.v059.i09
Hedges, L. V. (1981). Distribution Theory for Glass’s Estimator of Effect size and Related Estimators. Journal of Educational Statistics, 6(2), 107–128. https://doi.org/10.3102/10769986006002107
Jeon, M., & Rockwood, N. (2017). PLmixed: An R Package for Generalized Linear Mixed Models With Factor Structures. Applied Psychological Measurement, 42(5), 401–402. https://doi.org/10.1177/0146621617748326
Kim, J. S., Burkhauser, M. A., Relyea, J. E., Gilbert, J. B., Scherer, E., Fitzgerald, J., & McIntyre, J. (2023). A longitudinal randomized trial of a sustained content literacy intervention from first to second grade: Transfer effects on students’ reading comprehension. Journal of Educational Psychology, 115(1), 73–98. https://doi.org/10.1037/edu0000751
Koretz, D. (2005). Alignment, High Stakes, and the Inflation of Test Scores. Yearbook of the National Society for the Study of Education, 104(2), 99–118. https://doi.org/10.1111/j.1744-7984.2005.00027.x
Lüdecke, D. (2018). Ggeffects: Tidy data frames of marginal effects from regression models. Journal of Open Source Software, 3(26), 772. https://doi.org/10.21105/joss.00772
Molenberghs, G., & Verbeke, G. (2007). Likelihood Ratio, Score, and Wald Tests in a Constrained Parameter Space. The American Statistician, 61(1), 22–27. https://doi.org/10.1198/000313007x171322
Montoya, A. K., & Jeon, M. (2019). MIMIC Models for Uniform and Nonuniform DIF as Moderated Mediation Models. Applied Psychological Measurement, 44(2), 118–136. https://doi.org/10.1177/0146621619835496
Naumann, A., Hochweber, J., & Hartig, J. (2014). Modeling Instructional Sensitivity Using a Longitudinal Multilevel Differential Item Functioning Approach. Journal of Educational Measurement, 51(4), 381–399. https://doi.org/10.1111/jedm.12051
Petscher, Y., Compton, D. L., Steacy, L., & Kinnon, H. (2020). Past perspectives and new opportunities for the explanatory item response model. Annals of Dyslexia, 70(2), 160–179. https://doi.org/10.1007/s11881-020-00204-y
Rabe-Hesketh, S., & Skrondal, A. (2022). Multilevel and longitudinal modeling using stata. STATA press.
Rijmen, F., Tuerlinckx, F., De Boeck, P., & Kuppens, P. (2003). A nonlinear mixed model framework for item response theory. Psychological Methods, 8(2), 185–205. https://doi.org/10.1037/1082-989x.8.2.185
Sales, A., Prihar, E., Heffernan, N., & Pane, J. F. (2021). The effect of an intelligent tutor on performance on specific posttest problems. International Educational Data Mining Society.
Self, S. G., & Liang, K.-Y. (1987). Asymptotic Properties of Maximum Likelihood Estimators and Likelihood Ratio Tests under Nonstandard Conditions. Journal of the American Statistical Association, 82(398), 605–610. https://doi.org/10.1080/01621459.1987.10478472
Stoel, R. D., Garre, F. G., Dolan, C., & Wittenboer, G. van den. (2006). On the likelihood ratio test in structural equation modeling when parameters are subject to boundary constraints. Psychological Methods, 11(4), 439–455. https://doi.org/10.1037/1082-989x.11.4.439
Stram, D. O., & Lee, J. W. (1994). Variance components testing in the longitudinal mixed effects model. Biometrics, 50(4), 1171. https://doi.org/10.2307/2533455
Wilson, M., & De Boeck, P. (2004). Descriptive and explanatory item response models. Springer New York. https://doi.org/10.1007/978-1-4757-3990-9_2
Zhang, J., Ackerman, T., & Wang, Y. (2021). 2PL model: Compare generalized linear mixed model with latent variable model based IRT framework. Retrieved from https://doi.org/10.31234/osf.io/p6wuz
Funding
No financial support was received for the writing of this article.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethics approval
Not applicable.
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Conflicts of interest
The author reports no conflicts of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Gilbert, J.B. Modeling item-level heterogeneous treatment effects: A tutorial with the glmer function from the lme4 package in R. Behav Res 56, 5055–5067 (2024). https://doi.org/10.3758/s13428-023-02245-8
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.3758/s13428-023-02245-8