Skip to main content

Advertisement

Log in

Variable-Length Stopping Rules for Multidimensional Computerized Adaptive Testing

  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

In computerized adaptive testing (CAT), a variable-length stopping rule refers to ending item administration after a pre-specified measurement precision standard has been satisfied. The goal is to provide equal measurement precision for all examinees regardless of their true latent trait level. Several stopping rules have been proposed in unidimensional CAT, such as the minimum information rule or the maximum standard error rule. These rules have also been extended to multidimensional CAT and cognitive diagnostic CAT, and they all share the same idea of monitoring measurement error. Recently, Babcock and Weiss (J Comput Adapt Test 2012. https://doi.org/10.7333/1212-0101001) proposed an “absolute change in theta” (CT) rule, which is useful when an item bank is exhaustive of good items for one or more ranges of the trait continuum. Choi, Grady and Dodd (Educ Psychol Meas 70:1–17, 2010) also argued that a CAT should stop when the standard error does not change, implying that the item bank is likely exhausted. Although these stopping rules have been evaluated and compared in different simulation studies, the relationships among the various rules remain unclear, and therefore there lacks a clear guideline regarding when to use which rule. This paper presents analytic results to show the connections among various stopping rules within both unidimensional and multidimensional CAT. In particular, it is argued that the CT-rule alone can be unstable and it can end the test prematurely. However, the CT-rule can be a useful secondary rule to monitor the point of diminished returns. To further provide empirical evidence, three simulation studies are reported using both the 2PL model and the multidimensional graded response model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. In a preliminary simulation study, item selection using D-optimal (i.e., maximizing the determinant of the Fisher test information matrix) versus A-optimal (i.e., minimizing the trace of the inverse Fisher information matrix) were compared, with versus without content balancing, and different ways of selecting content weights. Because comparing different item selection methods was not the focus of this study, the optimal method based on the preliminary results was used.

References

  • Anderson, T. W. (1984). An introduction to multivariate statistical analysis (2nd ed.). New York: Wiley.

    Google Scholar 

  • Babcock, B., & Weiss, D. (2012). Termination criteria in computerized adaptive tests: Do variable-length CATs provide efficient and effective measurement? Journal of Computerized Adaptive Testing. https://doi.org/10.7333/1212-0101001

  • Boyd, A. M., Dodd, B. G., & Choi, S. W. (2010). Polytomous models in computerized adaptive testing. In M. L. In Nering & R. Ostini (Eds.), Handbook of polytomous item response theory models (pp. 229–255). New York NY: Routledge.

    Google Scholar 

  • Cai, L. (2015). flexMIRT version 3: Flexible multilevel multidimensional item analysis and test scoring [Computer software]. Chapel Hill, NC: Vector Psychometric Group.

    Google Scholar 

  • Chang, H. H., & Ying, Z. L. (2008). To weight or not to weight? Balancing influence of initial items in adaptive testing. Psychometrika, 73(3), 441–450.

    Article  Google Scholar 

  • Cheng, Y., Guo, F., Chang, H., & Douglas, J. (2009). Constraint weighted a-stratification for computerized adaptive testing with nonstatistical constraints: Balancing measurement efficiency and exposure control. Educational and Psychological Measurement, 69, 35–49.

    Article  Google Scholar 

  • Choi, S. W., Grady, M. W., & Dodd, B. G. (2010). A new stopping rule for computerized adaptive testing. Educational and Psychological Measurement, 70, 1–17.

    PubMed  PubMed Central  Google Scholar 

  • Daniel, M. H. (1999). Behind the scenes: Using new measurement methods on DAS and KAIT. In S. E. Embretson & S. L. Hershberger (Eds.), The new rules of measurement (pp. 37–63). Mahwah, NJ: Lawrence Erlbaum Associates.

    Google Scholar 

  • Dodd, B. G., Koch, W. R., & De Ayala, R. J. (1989). Operational characteristics of adaptive testing procedures using the graded response model. Applied Psychological Measurement, 13, 129–143.

    Article  Google Scholar 

  • Dodd, B. G., Koch, W. R., & De Ayala, R. J. (1993). Computerized adaptive testing using the partial credit model: Effects of item pool characteristics and different stopping rules. Educational and Psychological Measurement, 53, 61–77.

    Article  Google Scholar 

  • Fayers, P. M. (2007). Applying item response theory and computer adaptive testing: The challenges for health outcomes assessment. Quality of Life Research, 16, 187–194.

    Article  PubMed  Google Scholar 

  • Gardner, W., Shear, K., Kelleher, K., Pajer, K., Mammen, O., Buysse, D., et al. (2004). Computerized adaptive measurement of depression: A simulation study. BMC Psychiatry, 4(13), 1–11.

    Google Scholar 

  • Gershon, R. C. (2017). FastCAT—Customizing CAT administration rules to increase response efficiency. Paper presented at the 6th international conference on computerized adaptive testing, Niigata, Japan.

  • Gibbons, R. D., Weiss, D. J., Kupfer, D. J., Frank, E., Fagiolini, A., Grochocinski, V. J., et al. (2008). Using computerized adaptive testing to reduce the burden of mental health assessment. Psychiatric Services, 59, 49–58.

    Article  Google Scholar 

  • Hart, D. L., Cook, K. F., Mioduski, J. E., Teal, C. R., & Crane, P. K. (2006). Simulated computerized adaptive test for patients with shoulder impairments was efficient and produced valid measures of function. Journal of Clinical Epidemiology, 59, 290–298.

    Article  PubMed  Google Scholar 

  • Hart, D. L., Mioduski, J. E., & Stratford, P. W. (2005). Simulated computerized adaptive tests for measuring functional status were efficient with good discriminant validity in patients with hip, knee, or foot/ankle impairments. Journal of Clinical Epidemiology, 58, 629–638.

    Article  PubMed  Google Scholar 

  • Hsieh, C.-A., von Eye, A. A., & Maier, K. S. (2010). Using a multivariate multilevel polytomous item response theory model to study parallel processes of change: The dynamic association between adolescents’ social isolation and engagement with delinquent peers in the National Youth Survey. Multivariate Behavioral Research, 45(3), 508–552.

    Article  PubMed  Google Scholar 

  • Jiang, S., Wang, C., & Weiss, D. J. (2016). Sample size requirements for estimation of item parameters in the multidimensional graded response model. Frontiers in Psychology (Quantitative Psychology and Measurement). https://doi.org/10.3389/fpsyg.2016.00109

  • Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.

    Google Scholar 

  • Makransky, G., & Glas, C. A. W. (2013). The applicability of multidimensional computerized adaptive testing for cognitive ability measurement in organizational assessment. International Journal of Testing, 13, 123–139.

    Article  Google Scholar 

  • Maurelli, V., & Weiss, D. J. (1981). Factors influencing the psychometric characteristics of an adaptive testing strategy for test batteries (Research Rep. No. 81-4). Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program, Computerized Adaptive Testing Laboratory. Retrieved from https://eric.ed.gov/?id=ED212676.

  • Michel, P., Baumstarck, K., Ghattas, B., Pelletier, J., Loundou, A., Boucekine, M., et al. (2016). A Multidimensional Computerized Adaptive Short-Form Quality of Life Questionnaire developed and validated for multiple sclerosis. The MusiQoL-MCAT. Medicine, 95(14), Article, e3068.

    Article  Google Scholar 

  • Mulder, J., & van der Linden, W. J. (2009). Multidimensional adaptive testing with optimal design criteria for item selection. Psychometrika, 74(2), 273–296.

    Article  PubMed  Google Scholar 

  • Nering, M. L., & Ostini, R. (2010). Handbook of polytomous item response theory models. New York: Taylor and Francis.

    Google Scholar 

  • Nikolaus, S., Bode, C., Taal, E., Vonkeman, H. E., Glas, C. A. W., & van der Laar, M. A. F. J. (2015). Working mechanism of a multidimensional computerized adaptive test for fatigue in rheumatoid arthritis. Health Qual Life Outcomes, 13, 23.

    Article  PubMed  PubMed Central  Google Scholar 

  • Samejima, F. (1969). Estimation of latent trait ability using a response pattern of graded scores. Psychometrika Monograph. No. 17.

  • Segall, D. O. (1996). Multidimensional adaptive testing. Psychometrika, 61(2), 331–354.

    Article  Google Scholar 

  • Thissen, D., & Mislevy, R. J. (2000). Testing algorithms. In H. Wainer (Ed.), Computerized adaptive testing: A primer. 2. (pp. 101–133). Hillsdale, NJ: Lawrence Erlbaum.

    Google Scholar 

  • Veldkamp, B. P., & van der Linden, W. J. (2002). Multidimensional adaptive testing with constraints on test content. Psychometrika, 67(4), 575–588.

    Article  Google Scholar 

  • Wang, C. (2014). Improving measurement precision of hierarchical latent traits using adaptive testing. Journal of Educational and Behavioral Statistics, 39, 452–477.

    Article  Google Scholar 

  • Wang, C. (2015). On latent trait estimation in multidimensional compensatory item response models. Psychometrika, 80, 428–449.

    Article  PubMed  Google Scholar 

  • Wang, C., & Chang, H. (2011). Item selection in multidimensional computerized adaptive tests: Gaining information from different angles. Psychometrika, 76, 363–384.

    Article  Google Scholar 

  • Wang, C., Chang, H., & Boughton, K. (2011). Kullback–Leibler information and its applications in multidimensional adaptive tests. Psychometrika, 76, 13–39.

    Article  Google Scholar 

  • Wang, C., Chang, H., & Boughton, K. (2013). Deriving stopping rules for multidimensional computerized adaptive testing. Applied Psychological Measurement, 37, 99–122.

    Article  Google Scholar 

  • Wang, C., Chang, H., & Douglas, J. (2012). Combining CAT with cognitive diagnosis: A weighted item selection approach. Behavior Research Methods, 44, 95–109.

    Article  PubMed  Google Scholar 

  • Wang, C., Su, S., & Weiss, D. J. (2018). Robustness of parameter estimation to assumptions of normality in the multidimensional graded response model. Multivariate Behavioral Research, 53(3), 403–418.

    Article  PubMed  Google Scholar 

  • Weiss, D. J., & Kingsbury, G. G. (1984). Application of computerized adaptive testing to educational problems. Journal of Educational Measurement, 21, 361–375.

    Article  Google Scholar 

  • Weiss, D. J. (2011). Better data from better measurements using computerized adaptive testing. Journal of Methods and Measurement in the Social Sciences, 2, 1–27.

    Article  Google Scholar 

Download references

Acknowledgements

This research was supported by the Eunice Kennedy Shriver National Institutes of Child Health and Human Development of the National Institutes of Health under Award Number R01HD079439 to the Mayo Clinic in Rochester Minnesota through a subcontract to the University of Minnesota, and IES R305D160010 awarded to the first author.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chun Wang.

Additional information

The R code and the real MGRM item parameters used in this paper are available online.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (csv 8 KB)

Supplementary material 2 (R 25 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, C., Weiss, D.J. & Shang, Z. Variable-Length Stopping Rules for Multidimensional Computerized Adaptive Testing. Psychometrika 84, 749–771 (2019). https://doi.org/10.1007/s11336-018-9644-7

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11336-018-9644-7

Keywords

Navigation