A Bayesian random effects model for testlets
 Eric T. Bradlow,
 Howard Wainer,
 Xiaohui Wang
 … show all 3 hide
Rent the article at a discount
Rent now* Final gross prices may vary according to local VAT.
Get AccessAbstract
Standard item response theory (IRT) models fit to dichotomous examination responses ignore the fact that sets of items (testlets) often come from a single common stimuli (e.g. a reading comprehension passage). In this setting, all items given to an examinee are unlikely to be conditionally independent (given examinee proficiency). Models that assume conditional independence will overestimate the precision with which examinee proficiency is measured. Overstatement of precision may lead to inaccurate inferences such as prematurely ending an examination in which the stopping rule is based on the estimated standard error of examinee proficiency (e.g., an adaptive test). To model examinations that may be a mixture of independent items and testlets, we modified one standard IRT model to include an additional random effect for items nested within the same testlet. We use a Bayesian framework to facilitate posterior inference via a Data Augmented Gibbs Sampler (DAGS; Tanner & Wong, 1987). The modified and standard IRT models are both applied to a data set from a disclosed form of the SAT. We also provide simulation results that indicates that the degree of precision bias is a function of the variability of the testlet effects, as well as the testlet design.
 Albert, J. H. (1992) Bayesian estimation of normal ogive response curves using Gibbs sampling. Journal of Educational Statistics 17: pp. 251269
 Albert, J. H., Chib, S. (1993) Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association 88: pp. 669679
 Bradlow, E. T., Zaslavsky, A. M. (1997) Case influence analysis in Bayesian inference. Journal of Computational and Graphical Statistics 6: pp. 314331
 Bradlow, E. T., Zaslavsky, A. M. (1999) A hierarchical latent variable model for ordinal customer satisfaction survey data with “no answer” responses. Journal of the American Statistical Association 94: pp. 4352
 Gelfand, A. E., Smith, A. F. M. (1990) Samplingbased approaches to calculating marginal densities. Journal of the American Statistical Association 85: pp. 398409
 Gelman, A., Rubin, D. B. (1992) Inference from iterative simulation using multiple sequences. Statistical Science 7: pp. 457511
 Hulin, C. L., Drasgow, F., Parsons, L. K. (1983) Item response theory. DowJonesIrwin, Homewood, IL
 Lord, F. M., Novick, M. R. (1968) Statistical theories of mental test scores. AddisonWesley, Reading, PA
 McDonald, R. P. (1981) The dimensionality of tests and items. British Journal of Mathematical and Statistical Psychology 34: pp. 100117
 McDonald, R. P. (1982) Linear versus nonlinear models in item response theory. Applied Psychological Measurement 6: pp. 379396
 Mislevy, R. J., Bock, R. D. (1983) BILOG: Item and test scoring with binary logistic models. Scientific Software, Mooresville, IN
 Rosenbaum, P. R. (1988) Item Bundles. Psychometrika 53: pp. 349359
 Sireci, S. G., Wainer, H., Thissen, D. (1991) On the reliability of testletbased tests. Journal of Educational Measurement 28: pp. 237247
 Stout, W. F. (1987) A nonparametric approach for assessing latent trait dimensionality. Psychometrika 52: pp. 589617
 Stout, W. F. (1990) A new item response theory modeling approach with applications to unidimensional assessment and ability estimation. Psychometrika 55: pp. 293326
 Stout, W., Habing, B., Douglas, J., Kim, H. R., Roussos, L., Zhang, J. (1996) Conditional covariancebased nonparametric multidimensionality assessment. Applied Psychological Measurement 20: pp. 331354
 Tanner, M. A., Wong, W. H. (1987) The calculation of posterior distributions by data augmentation. Journal of the American Statistical Association 82: pp. 528540
 Wainer, H. (1995) Precision and differential item functioning on a testletbased test: The 1991 Law School Admissions Test as an example. Applied Measurement in Education 8: pp. 157187
 Wainer, H., Kiely, G. (1987) Item clusters and computerized adaptive testing: A case for testlets. Journal of Educational Measurement 24: pp. 185202
 Wainer, H., Thissen, D. (1996) How is reliability related to the quality of test scores? What is the effect of local dependence on reliability?. Educational Measurement: Issues and Practice 15: pp. 2229
 Yen, W. (1993) Scaling performance assessments: Strategies for managing local item dependence. Journal of Educational Measurement 30: pp. 187213
 Zhang, J. (1996).Some fundamental issues in item response theory with applications. Unpublised doctoral dissertation, University of Illinois at UrbanaChampaign.
 Zhang, J., Stout, W. F. (1999) Conditional covariance structure of generalized compensatory multidimensional items. Psychometrika 64: pp. 129152
 Title
 A Bayesian random effects model for testlets
 Journal

Psychometrika
Volume 64, Issue 2 , pp 153168
 Cover Date
 19990601
 DOI
 10.1007/BF02294533
 Print ISSN
 00333123
 Online ISSN
 18600980
 Publisher
 SpringerVerlag
 Additional Links
 Topics
 Keywords

 Gibbs sampler
 Data augmentation
 Testlets
 Industry Sectors
 Authors

 Eric T. Bradlow ^{(1)}
 Howard Wainer ^{(2)}
 Xiaohui Wang ^{(2)}
 Author Affiliations

 1. Marketing and Statistics The Wharton School, The University of Pennsylvania, USA
 2. Educational Testing Service, USA