Conditional Statistical Inference with Multistage Testing Designs

Abstract

In this paper it is demonstrated how statistical inference from multistage test designs can be made based on the conditional likelihood. Special attention is given to parameter estimation, as well as the evaluation of model fit. Two reasons are provided why the fit of simple measurement models is expected to be better in adaptive designs, compared to linear designs: more parameters are available for the same number of observations; and undesirable response behavior, like slipping and guessing, might be avoided owing to a better match between item difficulty and examinee proficiency. The results are illustrated with simulated data, as well as with real data.

This is a preview of subscription content, log in to check access.

Figure 1.
Figure 2.
Figure 3.
Figure 4.
Figure 5.
Figure 6.
Figure 7.
Figure 8.

Notes

  1. 1.

    We use a superscript [m] to denote random variables and parameters that relate to the mth module. Multiple modules, e.g., modules 1 and 2 are denoted by the superscript [1,2].

  2. 2.

    Whenever possible without introducing ambiguity, we ignore the distinction between random variables and their realizations in our formulae.

  3. 3.

    A sample had to be drawn because of limitations of the OPLM software package w.r.t. the maximum number of observations.

References

  1. Andersen, E.B. (1973a). Conditional inference and models for measuring. Mentalhygiejnisk Forskningsinstitut.

  2. Andersen, E.B. (1973b). A goodness of fit test for the Rasch model. Psychometrika, 38, 123–140.

    Article  Google Scholar 

  3. Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F.M. Lord & M.R. Novick (Eds.), Statistical theories of mental test scores (pp. 395–479). Reading: Addison-Wesley.

    Google Scholar 

  4. Bock, R.D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters. Psychometrika, 46, 443–460.

    Article  Google Scholar 

  5. Cronbach, L.J., & Gleser, G.C. (1965). Psychological test and personnel decisions (2nd ed.). Urbana: University of Illinois Press.

    Google Scholar 

  6. Eggen, T.J.H.M., & Verhelst, N.D. (2011). Item calibration in incomplete designs. Psychológica, 32, 107–132.

    Google Scholar 

  7. Glas, C.A.W. (1988). The Rasch model and multistage testing. Journal of Educational Statistics, 13, 45–52.

    Article  Google Scholar 

  8. Glas, C.A.W. (1989). Contributions to estimating and testing Rasch models. Unpublished doctoral dissertation, Arnhem: Cito.

  9. Glas, C.A.W. (2000). Item calibration and parameter drift. In W.J. Van der Linden & C.A.W. Glas (Eds.), Computerized adaptive testing: theory and practice (pp. 183–199). Dordrecht: Kluwer Academic Publishers.

    Google Scholar 

  10. Glas, C.A.W. (2010). Item parameter estimation and item fit analysis. In W.J. Van der Linden & C.A.W. Glas (Eds.), Elements of adaptive testing (pp. 269–288). Berlin: Springer.

    Google Scholar 

  11. Glas, C.A.W., Wainer, H., & Bradlow, E. (2000). MML and EAP estimation in testlet-based adaptive testing. In W.J. Van der Linden & C.A.W. Glas (Eds.), Computerized adaptive testing: theory and practice (pp. 271–287). Dordrecht: Kluwer Academic Publishers.

    Google Scholar 

  12. Kubinger, K.D., Steinfeld, J., Reif, M., & Yanagida, T. (2012). Biased (conditional) parameter estimation of a Rasch model calibrated item pool administered according to a branched testing design. Psychological Test and Assessment Modeling, 52(4), 450–460.

    Google Scholar 

  13. Lord, F.M. (1971a). The self-scoring flexilevel test. Journal of Educational Measurement, 8(3), 147–151.

    Article  Google Scholar 

  14. Lord, F.M. (1971b). A theoretical study of two-stage testing. Psychometrika, 36, 227–242.

    Article  Google Scholar 

  15. Masters, G.N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149–174.

    Article  Google Scholar 

  16. Neyman, J., & Scott, E.L. (1948). Consistent estimates based on partially consistent observations. Econometrica, 16, 1–32.

    Article  Google Scholar 

  17. Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: The Danish Institute of Educational Research. (Expanded edition, 1980, Chicago, The University of Chicago Press).

    Google Scholar 

  18. Rubin, D. (1976). Inference and missing data. Biometrika, 63, 581–592.

    Article  Google Scholar 

  19. Van der Linden, W.J. & Glas, C.A.W. (Eds.) (2010). Elements of adaptive testing. New York: Springer.

    Google Scholar 

  20. Verhelst, N.D., & Glas, C.A.W. (1995). The one parameter logistic model: OPLM. In G.H. Fischer & I.W. Molenaar (Eds.), Rasch models: foundations, recent developments and applications (pp. 215–238). New York: Springer.

    Google Scholar 

  21. Verhelst, N.D., Glas, C.A.W., & Verstralen, H.H.F.M. (1993). OPLM: one parameter logistic model. Arnhem: Cito. Computer program and manual.

    Google Scholar 

  22. Wainer, H., Bradlow, E., & Du, Z. (2000). Testlet response theory: an analog for the 3pl model useful in testlet-based adaptive testing. In W. Van der Linden & C. Glas (Eds.), Computerized adaptive testing: theory and practice (pp. 245–269). Dordrecht: Kluwer Academic Publishers.

    Google Scholar 

  23. Warm, T. (1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54, 427–450.

    Article  Google Scholar 

  24. Weiss, D.J. (Ed.) (1983). New horizons in testing: latent trait test theory and computerized adaptive testing. New York: Academic Press.

    Google Scholar 

  25. Zenisky, A., Hambleton, R.K., & Luecht, R. (2010). Multistage testing: issues, designs and research. In W.J. Van der Linden & C.A.W. Glas (Eds.), Elements of adaptive testing (pp. 355–372). Berlin: Springer.

    Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Robert J. Zwitser.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Zwitser, R.J., Maris, G. Conditional Statistical Inference with Multistage Testing Designs. Psychometrika 80, 65–84 (2015). https://doi.org/10.1007/s11336-013-9369-6

Download citation

Key words

  • multistage testing
  • adaptive testing
  • item response theory
  • parameter estimation
  • conditional maximum likelihood