Skip to main content
Log in

The stochastically curtailed generalized likelihood ratio: A new termination criterion for variable-length computerized classification tests

  • Published:
Behavior Research Methods Aims and scope Submit manuscript

Abstract

Computerized classification tests (CCTs) are used to classify examinees into categories in the context of professional certification testing. The term “variable-length” refers to CCTs that terminate (i.e., cease administering items to the examinee) when a classification can be made with a prespecified level of certainty. The sequential probability ratio test (SPRT) is a common criterion for terminating variable-length CCTs, but recent research has proposed more efficient methods. Specifically, the stochastically curtailed SPRT (SCSPRT) and the generalized likelihood ratio criterion (GLR) have been shown to classify examinees with accuracy similar to the SPRT while using fewer items. This article shows that the GLR criterion itself may be stochastically curtailed, resulting in a new termination criterion, the stochastically curtailed GLR (SCGLR). All four criteria—the SPRT, SCSPRT, GLR, and the new SCGLR—were compared using a simulation study. In this study, we examined the criteria in testing conditions that varied several CCT design features, including item bank characteristics, pass/fail threshold, and examinee ability distribution. In each condition, the termination criteria were evaluated according to their accuracy (proportion of examinees classified correctly), efficiency (test length), and loss (a single statistic combing both accuracy and efficiency). The simulation results showed that the SCGLR can yield increased efficiency without sacrificing accuracy, relative to the SPRT, SCSPRT, and GLR in a wide variety of CCT designs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Bartroff, J., Finkelman, M., & Lai, T. L. (2008). Modern sequential analysis and its applications to computerized adaptive testing. Psychometrika, 73, 473–486.

    Article  Google Scholar 

  • Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 397–472). Reading, MA: Addison-Wesley.

    Google Scholar 

  • Camilli, G. (1994). Origin of the scaling constant d = 1.7 in item response theory. Journal of Educational and Behavioral Statistics, 19, 293–295.

    Google Scholar 

  • Finkelman, M. (2008). On using stochastic curtailment to shorten the SPRT in sequential mastery testing. Journal of Educational and Behavioral Statistics, 33, 442–463.

    Article  Google Scholar 

  • Finkelman, M. (2010). Variations on stochastic curtailment in sequential mastery testing. Applied Psychological Measurement, 34, 27–45.

    Article  Google Scholar 

  • Finkelman, M. D., He, Y., Kim, W., & Lai, A. M. (2011). Stochastic curtailment of health questionnaires: A method to reduce respondent burden. Statistics in Medicine, 30, 1989–2004.

    Article  PubMed  Google Scholar 

  • Finkelman, M. D., Smits, N., Kim, W., & Riley, B. (2012). Curtailment and stochastic curtailment to shorten the CES-D. Applied Psychological Measurement, 36, 632–658.

    Article  Google Scholar 

  • Huang, W. (2004). Stepwise likelihood ratio statistics in sequential studies. Journal of the Royal Statistical Society, 66, 401–409.

    Article  Google Scholar 

  • Huebner, A. (2012). Item overexposure in computerized classification tests using sequential item selection. Practical Assessment, Research, & Evaluation, 17(12). Retrieved from http://pareonline.net/getvn.asp?v=17&n=12

  • Lin, C.-J. (2011). Item selection criteria with practical constraints for computerized classification testing. Educational and Psychological Measurement, 71, 20–36.

    Article  Google Scholar 

  • Lord, F. M. (1980). Applications of item response to theory to practical testing problems. Hillsdale, NJ: Erlbaum.

    Google Scholar 

  • R Development Core Team. (2011). R: A language and environment for statistical computing [Computer software]. Vienna, Austria: R Foundation for Statistical Computing, Available from www.R-project.org/

  • Rulison, K. L., & Loken, E. (2009). I’ve fallen and I can’t get up: Can high ability students recover from early mistakes in CAT? Applied Psychological Measurement, 33, 83–101.

    Article  PubMed Central  PubMed  Google Scholar 

  • Smits, N., & Finkelman, M. D. (2013). A comparison of computerized classification testing and computerized adaptive testing in clinical psychology. Journal of Computerized Adaptive Testing, 1, 19–37. doi:10.7333/1302-0102019

    Article  Google Scholar 

  • Spray, J. A., & Reckase, M. D. (1996). Comparison of SPRT and sequential Bayes procedures for classifying examinees into two categories using a computerized adaptive test. Journal of Educational & Behavioral Statistics, 21, 405–414.

    Article  Google Scholar 

  • Sympson, J. B., & Hetter, R. D. (1985). Controlling item exposure rates in computerized adaptive testing. In Proceedings of the 27th Annual Meeting of the Military Testing Association (pp. 937–977). San Diego, CA: Navy Personnel Research and Development Center.

    Google Scholar 

  • Thompson, N. A. (2007). A practitioner’s guide for variable-length computerized classification testing. Practical Assessment Research & Evaluation, 12(1). Retrieved from http://pareonline.net/getvn.asp?v=12&n=1

  • Thompson, N. A. (2009). Item selection in computerized classification testing. Educational and Psychological Measurement, 69, 778–793.

    Article  Google Scholar 

  • Thompson, N. A. (2010, June). Nominal error rates in computerized classification testing. Article presented at the first annual conference of the International Association for Computerized Adaptive Testing, Arnhem, Netherlands.

  • Thompson, N. A. (2011). Termination criteria for computerized classification testing. Practical Assessment, Research, & Evaluation, 16(4). Retrieved from http://pareonline.net/pdf/v16n4.pdf

  • Vos, H. J. (2000). A Bayesian procedure in the context of sequential mastery testing. Psicológica, 21, 191–211.

    Google Scholar 

  • Wald, A. (1947). Sequential analysis. New York, NY: Wiley.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alan R. Huebner.

Appendixes

Appendixes

Appendix 1

We now review the computation of P(D J  = D j ′) originally given by Finkelman (2008).

Two methods of computation were proposed in that article, computing the probability exactly and approximating it by using the central limit theorem. Both methods rely on knowledge of the (j ′ + 1) … J remaining items that may potentially be administered to an examinee after stage j ′ if the test is not terminated before the maximum number of items J is reached. Finkelman (2008) notes that the remaining potential items can be known even if an item exposure control method is used provided the item selection method is non-adaptive.

When the number of remaining items (Jj ′) is relatively small, P(D J = D j ′) may be computed exactly. For example, suppose an examinee is at stage j ′ = 97 of a test with maximum number of items J = 100. Then, (Jj ′) = 3 items remain, and thus there are 23 = 8 possible response patterns for those items. Given the parameters of those items and a θ value, two pieces of information may be determined: (1) the probability that each pattern occurs calculated using Eq. 1, and (2) the response patterns that lead to the interim decision matching the final decision—that is, for that patterns is it true that \( {D}_J={D}_{j^{\hbox{'}}} \). The specific θ value used depends on D j ′ as described by Finkelman (2008): θ + is used if the interim classification decision at stage j ′ is nonmastery, and θ is used if the interim decision is mastery. Once this information is obtained, the probabilities for the response patterns for which D J = D j ′ are summed, resulting in an exact calculation of P(D J = D j ′).

P(D J = D j ′) may also be approximated using the following formulas, given by Finkelman (2008). For the case in which the interim decision is nonmastery, P(D J = D j ′) is given by

$$ P\left({D}_J={D}_{j\prime}\right)=P\left({D}_J=\mathrm{nonmastery}\right)\approx \Phi \left(\frac{ \log C-{E}_{\theta}\left(\left. \log {\lambda}_J\right|{\lambda}_{j\prime}\right)}{{\left\{ Va{r}_{\theta}\left(\left. \log {\lambda}_J\right|{\lambda}_{j\prime}\right)\right\}}^{\frac{1}{2}}}\right), $$
(A1)

where Φ is the cumulative distribution function of the standard normal distribution; E θ (log λ J |λ j ′), the conditional expectation of log λ J given λ j ′, is given by

$$ {E}_{\theta}\left(\left. \log {\lambda}_J\right|{\lambda}_{j\prime}\right)= \log {\lambda}_{j\prime }+{\displaystyle {\sum}_{j=j\prime +1}^J}{E}_{\theta}\left( \log \frac{L\left({\theta}_{+};{X}_j\right)}{L\left({\theta}_{-};{X}_j\right)}\right); $$
(A2)

and the conditional variance is

$$ Va{r}_{\theta}\left(\left. \log {\lambda}_J\right|{\lambda}_{j\prime}\right)={\displaystyle {\sum}_{j=j\prime +1}^J} Va{r}_{\theta}\left( \log \frac{L\left({\theta}_{+};{X}_j\right)}{L\left({\theta}_{-};{X}_j\right)}\right). $$
(A3)

Then, for the case in which the interim decision is mastery, P(D J = D j ′) is given by

$$ P\left({D}_J=\mathrm{mastery}\right)=1-P\left({D}_J=\mathrm{nonmastery}\right). $$

Note that Eqs. A2 and A3 use the notation X j , which represents the response to a single item. This is in contrast to the boldface notation used in Eq. 4, denoting a vector of responses.

Appendix 2

Table 6 48 simulation conditions, resulting from three levels of cut point (–1, 0, 1), two levels of bank shape (broad and peaked), two levels of δ (.10 and .20), two levels of γ (.80 and .99), and two levels of θ distribution [Normal (0,1) and Normal (0.25,1.25)]
Table 7 All simulation results yielded by the 48 conditions listed in Table 6

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huebner, A.R., Fina, A.D. The stochastically curtailed generalized likelihood ratio: A new termination criterion for variable-length computerized classification tests. Behav Res 47, 549–561 (2015). https://doi.org/10.3758/s13428-014-0490-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.3758/s13428-014-0490-y

Keywords

Navigation