The stochastically curtailed generalized likelihood ratio: A new termination criterion for variable-length computerized classification tests

Huebner, Alan R.; Fina, Anthony D.

doi:10.3758/s13428-014-0490-y

The stochastically curtailed generalized likelihood ratio: A new termination criterion for variable-length computerized classification tests

Published: 07 June 2014

Volume 47, pages 549–561, (2015)
Cite this article

Behavior Research Methods Aims and scope Submit manuscript

Alan R. Huebner¹ &
Anthony D. Fina²

253 Accesses
9 Citations
1 Altmetric
Explore all metrics

Abstract

Computerized classification tests (CCTs) are used to classify examinees into categories in the context of professional certification testing. The term “variable-length” refers to CCTs that terminate (i.e., cease administering items to the examinee) when a classification can be made with a prespecified level of certainty. The sequential probability ratio test (SPRT) is a common criterion for terminating variable-length CCTs, but recent research has proposed more efficient methods. Specifically, the stochastically curtailed SPRT (SCSPRT) and the generalized likelihood ratio criterion (GLR) have been shown to classify examinees with accuracy similar to the SPRT while using fewer items. This article shows that the GLR criterion itself may be stochastically curtailed, resulting in a new termination criterion, the stochastically curtailed GLR (SCGLR). All four criteria—the SPRT, SCSPRT, GLR, and the new SCGLR—were compared using a simulation study. In this study, we examined the criteria in testing conditions that varied several CCT design features, including item bank characteristics, pass/fail threshold, and examinee ability distribution. In each condition, the termination criteria were evaluated according to their accuracy (proportion of examinees classified correctly), efficiency (test length), and loss (a single statistic combing both accuracy and efficiency). The simulation results showed that the SCGLR can yield increased efficiency without sacrificing accuracy, relative to the SPRT, SCSPRT, and GLR in a wide variety of CCT designs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparing computer adaptive testing stopping rules under the generalized partial-credit model

Article 20 June 2018

Standard Error of Ability Estimates and the Classification Accuracy and Consistency of Binary Decisions

Article 17 September 2014

The Asymptotic Distribution of Average Test Overlap Rate in Computerized Adaptive Testing

Article 01 July 2019

References

Bartroff, J., Finkelman, M., & Lai, T. L. (2008). Modern sequential analysis and its applications to computerized adaptive testing. Psychometrika, 73, 473–486.
Article Google Scholar
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 397–472). Reading, MA: Addison-Wesley.
Google Scholar
Camilli, G. (1994). Origin of the scaling constant d = 1.7 in item response theory. Journal of Educational and Behavioral Statistics, 19, 293–295.
Google Scholar
Finkelman, M. (2008). On using stochastic curtailment to shorten the SPRT in sequential mastery testing. Journal of Educational and Behavioral Statistics, 33, 442–463.
Article Google Scholar
Finkelman, M. (2010). Variations on stochastic curtailment in sequential mastery testing. Applied Psychological Measurement, 34, 27–45.
Article Google Scholar
Finkelman, M. D., He, Y., Kim, W., & Lai, A. M. (2011). Stochastic curtailment of health questionnaires: A method to reduce respondent burden. Statistics in Medicine, 30, 1989–2004.
Article PubMed Google Scholar
Finkelman, M. D., Smits, N., Kim, W., & Riley, B. (2012). Curtailment and stochastic curtailment to shorten the CES-D. Applied Psychological Measurement, 36, 632–658.
Article Google Scholar
Huang, W. (2004). Stepwise likelihood ratio statistics in sequential studies. Journal of the Royal Statistical Society, 66, 401–409.
Article Google Scholar
Huebner, A. (2012). Item overexposure in computerized classification tests using sequential item selection. Practical Assessment, Research, & Evaluation, 17(12). Retrieved from http://pareonline.net/getvn.asp?v=17&n=12
Lin, C.-J. (2011). Item selection criteria with practical constraints for computerized classification testing. Educational and Psychological Measurement, 71, 20–36.
Article Google Scholar
Lord, F. M. (1980). Applications of item response to theory to practical testing problems. Hillsdale, NJ: Erlbaum.
Google Scholar
R Development Core Team. (2011). R: A language and environment for statistical computing [Computer software]. Vienna, Austria: R Foundation for Statistical Computing, Available from www.R-project.org/
Rulison, K. L., & Loken, E. (2009). I’ve fallen and I can’t get up: Can high ability students recover from early mistakes in CAT? Applied Psychological Measurement, 33, 83–101.
Article PubMed Central PubMed Google Scholar
Smits, N., & Finkelman, M. D. (2013). A comparison of computerized classification testing and computerized adaptive testing in clinical psychology. Journal of Computerized Adaptive Testing, 1, 19–37. doi:10.7333/1302-0102019
Article Google Scholar
Spray, J. A., & Reckase, M. D. (1996). Comparison of SPRT and sequential Bayes procedures for classifying examinees into two categories using a computerized adaptive test. Journal of Educational & Behavioral Statistics, 21, 405–414.
Article Google Scholar
Sympson, J. B., & Hetter, R. D. (1985). Controlling item exposure rates in computerized adaptive testing. In Proceedings of the 27th Annual Meeting of the Military Testing Association (pp. 937–977). San Diego, CA: Navy Personnel Research and Development Center.
Google Scholar
Thompson, N. A. (2007). A practitioner’s guide for variable-length computerized classification testing. Practical Assessment Research & Evaluation, 12(1). Retrieved from http://pareonline.net/getvn.asp?v=12&n=1
Thompson, N. A. (2009). Item selection in computerized classification testing. Educational and Psychological Measurement, 69, 778–793.
Article Google Scholar
Thompson, N. A. (2010, June). Nominal error rates in computerized classification testing. Article presented at the first annual conference of the International Association for Computerized Adaptive Testing, Arnhem, Netherlands.
Thompson, N. A. (2011). Termination criteria for computerized classification testing. Practical Assessment, Research, & Evaluation, 16(4). Retrieved from http://pareonline.net/pdf/v16n4.pdf
Vos, H. J. (2000). A Bayesian procedure in the context of sequential mastery testing. Psicológica, 21, 191–211.
Google Scholar
Wald, A. (1947). Sequential analysis. New York, NY: Wiley.
Google Scholar

Download references

Author information

Authors and Affiliations

University of Notre Dame, 153 Hurley Hall, Notre Dame, IN, 46556, USA
Alan R. Huebner
University of Iowa, 340 Lindquist Center S, Iowa City, IA, 52242, USA
Anthony D. Fina

Authors

Alan R. Huebner
View author publications
You can also search for this author in PubMed Google Scholar
Anthony D. Fina
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alan R. Huebner.

Appendixes

Appendix 1

We now review the computation of P(D _J = D _j ′) originally given by Finkelman (2008).

Two methods of computation were proposed in that article, computing the probability exactly and approximating it by using the central limit theorem. Both methods rely on knowledge of the (j ′ + 1) … J remaining items that may potentially be administered to an examinee after stage j ′ if the test is not terminated before the maximum number of items J is reached. Finkelman (2008) notes that the remaining potential items can be known even if an item exposure control method is used provided the item selection method is non-adaptive.

When the number of remaining items (J − j ′) is relatively small, P(D _J = D _j ′) may be computed exactly. For example, suppose an examinee is at stage j ′ = 97 of a test with maximum number of items J = 100. Then, (J − j ′) = 3 items remain, and thus there are 2³ = 8 possible response patterns for those items. Given the parameters of those items and a θ value, two pieces of information may be determined: (1) the probability that each pattern occurs calculated using Eq. 1, and (2) the response patterns that lead to the interim decision matching the final decision—that is, for that patterns is it true that $ {D}_J={D}_{j^{\hbox{'}}} $. The specific θ value used depends on D _j ′ as described by Finkelman (2008): θ ₊ is used if the interim classification decision at stage j ′ is nonmastery, and θ ₋ is used if the interim decision is mastery. Once this information is obtained, the probabilities for the response patterns for which D _J = D _j ′ are summed, resulting in an exact calculation of P(D _J = D _j ′).

P(D _J = D _j ′) may also be approximated using the following formulas, given by Finkelman (2008). For the case in which the interim decision is nonmastery, P(D _J = D _j ′) is given by

$$ P\left({D}_J={D}_{j\prime}\right)=P\left({D}_J=\mathrm{nonmastery}\right)\approx \Phi \left(\frac{ \log C-{E}_{\theta}\left(\left. \log {\lambda}_J\right|{\lambda}_{j\prime}\right)}{{\left\{ Va{r}_{\theta}\left(\left. \log {\lambda}_J\right|{\lambda}_{j\prime}\right)\right\}}^{\frac{1}{2}}}\right), $$

(A1)

where Φ is the cumulative distribution function of the standard normal distribution; E _θ(log λ _J|λ _j ′), the conditional expectation of log λ _J given λ _j ′, is given by

$$ {E}_{\theta}\left(\left. \log {\lambda}_J\right|{\lambda}_{j\prime}\right)= \log {\lambda}_{j\prime }+{\displaystyle {\sum}_{j=j\prime +1}^J}{E}_{\theta}\left( \log \frac{L\left({\theta}_{+};{X}_j\right)}{L\left({\theta}_{-};{X}_j\right)}\right); $$

(A2)

and the conditional variance is

$$ Va{r}_{\theta}\left(\left. \log {\lambda}_J\right|{\lambda}_{j\prime}\right)={\displaystyle {\sum}_{j=j\prime +1}^J} Va{r}_{\theta}\left( \log \frac{L\left({\theta}_{+};{X}_j\right)}{L\left({\theta}_{-};{X}_j\right)}\right). $$

(A3)

Then, for the case in which the interim decision is mastery, P(D _J = D _j ′) is given by

$$ P\left({D}_J=\mathrm{mastery}\right)=1-P\left({D}_J=\mathrm{nonmastery}\right). $$

Note that Eqs. A2 and A3 use the notation X _j, which represents the response to a single item. This is in contrast to the boldface notation used in Eq. 4, denoting a vector of responses.

Appendix 2

Table 6 48 simulation conditions, resulting from three levels of cut point (–1, 0, 1), two levels of bank shape (broad and peaked), two levels of δ (.10 and .20), two levels of γ (.80 and .99), and two levels of θ distribution [Normal (0,1) and Normal (0.25,1.25)]

Full size table

Table 7 All simulation results yielded by the 48 conditions listed in Table 6

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huebner, A.R., Fina, A.D. The stochastically curtailed generalized likelihood ratio: A new termination criterion for variable-length computerized classification tests. Behav Res 47, 549–561 (2015). https://doi.org/10.3758/s13428-014-0490-y

Download citation

Published: 07 June 2014
Issue Date: June 2015
DOI: https://doi.org/10.3758/s13428-014-0490-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The stochastically curtailed generalized likelihood ratio: A new termination criterion for variable-length computerized classification tests

Abstract

Access this article

Similar content being viewed by others

Comparing computer adaptive testing stopping rules under the generalized partial-credit model

Standard Error of Ability Estimates and the Classification Accuracy and Consistency of Binary Decisions

The Asymptotic Distribution of Average Test Overlap Rate in Computerized Adaptive Testing

References

Author information

Authors and Affiliations

Corresponding author

Appendixes

Appendix 1

Appendix 2

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The stochastically curtailed generalized likelihood ratio: A new termination criterion for variable-length computerized classification tests

Abstract

Access this article

Similar content being viewed by others

Comparing computer adaptive testing stopping rules under the generalized partial-credit model

Standard Error of Ability Estimates and the Classification Accuracy and Consistency of Binary Decisions

The Asymptotic Distribution of Average Test Overlap Rate in Computerized Adaptive Testing

References

Author information

Authors and Affiliations

Corresponding author

Appendixes

Appendixes

Appendix 1

Appendix 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation