Addressing model uncertainty in item response theory person scores through model averaging
- 85 Downloads
Item banks are often created in large-scale research and testing settings in the social sciences to predict individuals’ latent trait scores. A common procedure is to fit multiple candidate item response theory (IRT) models to a calibration sample and select a single best-fitting IRT model. The parameter estimates from this model are then used to obtain trait scores for subsequent respondents. However, this model selection procedure ignores model uncertainty stemming from the fact that the model ranking in the calibration phase is subject to sampling variability. Consequently, the standard errors of trait scores obtained from subsequent respondents do not reflect such uncertainty. Ignoring such sources of uncertainty contributes to the current replication crisis in the social sciences. In this article, we propose and demonstrate an alternative procedure to account for model uncertainty in this context—model averaging of IRT trait scores and their standard errors. We outline the general procedure step-by-step and provide software to aid researchers in implementation, both for large-scale research settings with item banks and for smaller research settings involving IRT scoring. We then demonstrate the procedure with a simulated item-banking illustration, comparing model selection and model averaging within sample in terms of predictive coverage. We conclude by discussing ways that model averaging and IRT scoring can be used and investigated in future research.
KeywordsItem response theory Model uncertainty Model averaging Item banks
Compliance with ethical standards
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
- Cohen AS, Cho S-J (2017) Information criteria. In: van der Linden WJ, Hambleton RK (eds) Handbook of item response theory, models, statistical tools, and applications. CRC, Boca RatonGoogle Scholar
- de Ayala RJ (2009) The theory and practice of item response theory. Guilford Publishing, New YorkGoogle Scholar
- Edelen MO, Tucker JS, Shadel WG, Stucky BD, Cerully J, Zhen L, Hansen M, Cai L (2014) Development of the PROMIS® health expectancies of smoking item banks. Nicotine Tob Res 16:S222–S230Google Scholar
- Kaplan D (2016) On the utility of Bayesian model averaging for improving prediction in the social and behavioral sciences. Society of multivariate behavioral research meeting, RichmondGoogle Scholar