Abstract
Computerized classification tests (CCTs) are used to classify examinees into categories in the context of professional certification testing. The term “variable-length” refers to CCTs that terminate (i.e., cease administering items to the examinee) when a classification can be made with a prespecified level of certainty. The sequential probability ratio test (SPRT) is a common criterion for terminating variable-length CCTs, but recent research has proposed more efficient methods. Specifically, the stochastically curtailed SPRT (SCSPRT) and the generalized likelihood ratio criterion (GLR) have been shown to classify examinees with accuracy similar to the SPRT while using fewer items. This article shows that the GLR criterion itself may be stochastically curtailed, resulting in a new termination criterion, the stochastically curtailed GLR (SCGLR). All four criteria—the SPRT, SCSPRT, GLR, and the new SCGLR—were compared using a simulation study. In this study, we examined the criteria in testing conditions that varied several CCT design features, including item bank characteristics, pass/fail threshold, and examinee ability distribution. In each condition, the termination criteria were evaluated according to their accuracy (proportion of examinees classified correctly), efficiency (test length), and loss (a single statistic combing both accuracy and efficiency). The simulation results showed that the SCGLR can yield increased efficiency without sacrificing accuracy, relative to the SPRT, SCSPRT, and GLR in a wide variety of CCT designs.
Similar content being viewed by others
References
Bartroff, J., Finkelman, M., & Lai, T. L. (2008). Modern sequential analysis and its applications to computerized adaptive testing. Psychometrika, 73, 473–486.
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 397–472). Reading, MA: Addison-Wesley.
Camilli, G. (1994). Origin of the scaling constant d = 1.7 in item response theory. Journal of Educational and Behavioral Statistics, 19, 293–295.
Finkelman, M. (2008). On using stochastic curtailment to shorten the SPRT in sequential mastery testing. Journal of Educational and Behavioral Statistics, 33, 442–463.
Finkelman, M. (2010). Variations on stochastic curtailment in sequential mastery testing. Applied Psychological Measurement, 34, 27–45.
Finkelman, M. D., He, Y., Kim, W., & Lai, A. M. (2011). Stochastic curtailment of health questionnaires: A method to reduce respondent burden. Statistics in Medicine, 30, 1989–2004.
Finkelman, M. D., Smits, N., Kim, W., & Riley, B. (2012). Curtailment and stochastic curtailment to shorten the CES-D. Applied Psychological Measurement, 36, 632–658.
Huang, W. (2004). Stepwise likelihood ratio statistics in sequential studies. Journal of the Royal Statistical Society, 66, 401–409.
Huebner, A. (2012). Item overexposure in computerized classification tests using sequential item selection. Practical Assessment, Research, & Evaluation, 17(12). Retrieved from http://pareonline.net/getvn.asp?v=17&n=12
Lin, C.-J. (2011). Item selection criteria with practical constraints for computerized classification testing. Educational and Psychological Measurement, 71, 20–36.
Lord, F. M. (1980). Applications of item response to theory to practical testing problems. Hillsdale, NJ: Erlbaum.
R Development Core Team. (2011). R: A language and environment for statistical computing [Computer software]. Vienna, Austria: R Foundation for Statistical Computing, Available from www.R-project.org/
Rulison, K. L., & Loken, E. (2009). I’ve fallen and I can’t get up: Can high ability students recover from early mistakes in CAT? Applied Psychological Measurement, 33, 83–101.
Smits, N., & Finkelman, M. D. (2013). A comparison of computerized classification testing and computerized adaptive testing in clinical psychology. Journal of Computerized Adaptive Testing, 1, 19–37. doi:10.7333/1302-0102019
Spray, J. A., & Reckase, M. D. (1996). Comparison of SPRT and sequential Bayes procedures for classifying examinees into two categories using a computerized adaptive test. Journal of Educational & Behavioral Statistics, 21, 405–414.
Sympson, J. B., & Hetter, R. D. (1985). Controlling item exposure rates in computerized adaptive testing. In Proceedings of the 27th Annual Meeting of the Military Testing Association (pp. 937–977). San Diego, CA: Navy Personnel Research and Development Center.
Thompson, N. A. (2007). A practitioner’s guide for variable-length computerized classification testing. Practical Assessment Research & Evaluation, 12(1). Retrieved from http://pareonline.net/getvn.asp?v=12&n=1
Thompson, N. A. (2009). Item selection in computerized classification testing. Educational and Psychological Measurement, 69, 778–793.
Thompson, N. A. (2010, June). Nominal error rates in computerized classification testing. Article presented at the first annual conference of the International Association for Computerized Adaptive Testing, Arnhem, Netherlands.
Thompson, N. A. (2011). Termination criteria for computerized classification testing. Practical Assessment, Research, & Evaluation, 16(4). Retrieved from http://pareonline.net/pdf/v16n4.pdf
Vos, H. J. (2000). A Bayesian procedure in the context of sequential mastery testing. Psicológica, 21, 191–211.
Wald, A. (1947). Sequential analysis. New York, NY: Wiley.
Author information
Authors and Affiliations
Corresponding author
Appendixes
Appendixes
Appendix 1
We now review the computation of P(D J = D j ′) originally given by Finkelman (2008).
Two methods of computation were proposed in that article, computing the probability exactly and approximating it by using the central limit theorem. Both methods rely on knowledge of the (j ′ + 1) … J remaining items that may potentially be administered to an examinee after stage j ′ if the test is not terminated before the maximum number of items J is reached. Finkelman (2008) notes that the remaining potential items can be known even if an item exposure control method is used provided the item selection method is non-adaptive.
When the number of remaining items (J − j ′) is relatively small, P(D J = D j ′) may be computed exactly. For example, suppose an examinee is at stage j ′ = 97 of a test with maximum number of items J = 100. Then, (J − j ′) = 3 items remain, and thus there are 23 = 8 possible response patterns for those items. Given the parameters of those items and a θ value, two pieces of information may be determined: (1) the probability that each pattern occurs calculated using Eq. 1, and (2) the response patterns that lead to the interim decision matching the final decision—that is, for that patterns is it true that \( {D}_J={D}_{j^{\hbox{'}}} \). The specific θ value used depends on D j ′ as described by Finkelman (2008): θ + is used if the interim classification decision at stage j ′ is nonmastery, and θ − is used if the interim decision is mastery. Once this information is obtained, the probabilities for the response patterns for which D J = D j ′ are summed, resulting in an exact calculation of P(D J = D j ′).
P(D J = D j ′) may also be approximated using the following formulas, given by Finkelman (2008). For the case in which the interim decision is nonmastery, P(D J = D j ′) is given by
where Φ is the cumulative distribution function of the standard normal distribution; E θ (log λ J |λ j ′), the conditional expectation of log λ J given λ j ′, is given by
and the conditional variance is
Then, for the case in which the interim decision is mastery, P(D J = D j ′) is given by
Note that Eqs. A2 and A3 use the notation X j , which represents the response to a single item. This is in contrast to the boldface notation used in Eq. 4, denoting a vector of responses.
Appendix 2
Rights and permissions
About this article
Cite this article
Huebner, A.R., Fina, A.D. The stochastically curtailed generalized likelihood ratio: A new termination criterion for variable-length computerized classification tests. Behav Res 47, 549–561 (2015). https://doi.org/10.3758/s13428-014-0490-y
Published:
Issue Date:
DOI: https://doi.org/10.3758/s13428-014-0490-y