Abstract
In this paper it is demonstrated how statistical inference from multistage test designs can be made based on the conditional likelihood. Special attention is given to parameter estimation, as well as the evaluation of model fit. Two reasons are provided why the fit of simple measurement models is expected to be better in adaptive designs, compared to linear designs: more parameters are available for the same number of observations; and undesirable response behavior, like slipping and guessing, might be avoided owing to a better match between item difficulty and examinee proficiency. The results are illustrated with simulated data, as well as with real data.
This is a preview of subscription content, log in to check access.








Notes
- 1.
We use a superscript [m] to denote random variables and parameters that relate to the mth module. Multiple modules, e.g., modules 1 and 2 are denoted by the superscript [1,2].
- 2.
Whenever possible without introducing ambiguity, we ignore the distinction between random variables and their realizations in our formulae.
- 3.
A sample had to be drawn because of limitations of the OPLM software package w.r.t. the maximum number of observations.
References
Andersen, E.B. (1973a). Conditional inference and models for measuring. Mentalhygiejnisk Forskningsinstitut.
Andersen, E.B. (1973b). A goodness of fit test for the Rasch model. Psychometrika, 38, 123–140.
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F.M. Lord & M.R. Novick (Eds.), Statistical theories of mental test scores (pp. 395–479). Reading: Addison-Wesley.
Bock, R.D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters. Psychometrika, 46, 443–460.
Cronbach, L.J., & Gleser, G.C. (1965). Psychological test and personnel decisions (2nd ed.). Urbana: University of Illinois Press.
Eggen, T.J.H.M., & Verhelst, N.D. (2011). Item calibration in incomplete designs. Psychológica, 32, 107–132.
Glas, C.A.W. (1988). The Rasch model and multistage testing. Journal of Educational Statistics, 13, 45–52.
Glas, C.A.W. (1989). Contributions to estimating and testing Rasch models. Unpublished doctoral dissertation, Arnhem: Cito.
Glas, C.A.W. (2000). Item calibration and parameter drift. In W.J. Van der Linden & C.A.W. Glas (Eds.), Computerized adaptive testing: theory and practice (pp. 183–199). Dordrecht: Kluwer Academic Publishers.
Glas, C.A.W. (2010). Item parameter estimation and item fit analysis. In W.J. Van der Linden & C.A.W. Glas (Eds.), Elements of adaptive testing (pp. 269–288). Berlin: Springer.
Glas, C.A.W., Wainer, H., & Bradlow, E. (2000). MML and EAP estimation in testlet-based adaptive testing. In W.J. Van der Linden & C.A.W. Glas (Eds.), Computerized adaptive testing: theory and practice (pp. 271–287). Dordrecht: Kluwer Academic Publishers.
Kubinger, K.D., Steinfeld, J., Reif, M., & Yanagida, T. (2012). Biased (conditional) parameter estimation of a Rasch model calibrated item pool administered according to a branched testing design. Psychological Test and Assessment Modeling, 52(4), 450–460.
Lord, F.M. (1971a). The self-scoring flexilevel test. Journal of Educational Measurement, 8(3), 147–151.
Lord, F.M. (1971b). A theoretical study of two-stage testing. Psychometrika, 36, 227–242.
Masters, G.N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149–174.
Neyman, J., & Scott, E.L. (1948). Consistent estimates based on partially consistent observations. Econometrica, 16, 1–32.
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: The Danish Institute of Educational Research. (Expanded edition, 1980, Chicago, The University of Chicago Press).
Rubin, D. (1976). Inference and missing data. Biometrika, 63, 581–592.
Van der Linden, W.J. & Glas, C.A.W. (Eds.) (2010). Elements of adaptive testing. New York: Springer.
Verhelst, N.D., & Glas, C.A.W. (1995). The one parameter logistic model: OPLM. In G.H. Fischer & I.W. Molenaar (Eds.), Rasch models: foundations, recent developments and applications (pp. 215–238). New York: Springer.
Verhelst, N.D., Glas, C.A.W., & Verstralen, H.H.F.M. (1993). OPLM: one parameter logistic model. Arnhem: Cito. Computer program and manual.
Wainer, H., Bradlow, E., & Du, Z. (2000). Testlet response theory: an analog for the 3pl model useful in testlet-based adaptive testing. In W. Van der Linden & C. Glas (Eds.), Computerized adaptive testing: theory and practice (pp. 245–269). Dordrecht: Kluwer Academic Publishers.
Warm, T. (1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54, 427–450.
Weiss, D.J. (Ed.) (1983). New horizons in testing: latent trait test theory and computerized adaptive testing. New York: Academic Press.
Zenisky, A., Hambleton, R.K., & Luecht, R. (2010). Multistage testing: issues, designs and research. In W.J. Van der Linden & C.A.W. Glas (Eds.), Elements of adaptive testing (pp. 355–372). Berlin: Springer.
Author information
Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zwitser, R.J., Maris, G. Conditional Statistical Inference with Multistage Testing Designs. Psychometrika 80, 65–84 (2015). https://doi.org/10.1007/s11336-013-9369-6
Received:
Published:
Issue Date:
Key words
- multistage testing
- adaptive testing
- item response theory
- parameter estimation
- conditional maximum likelihood