Sequential Detection of Compromised Items Using Response Times in Computerized Adaptive Testing

Choe, Edison M.; Zhang, Jinming; Chang, Hua-Hua

doi:10.1007/s11336-017-9596-3

Sequential Detection of Compromised Items Using Response Times in Computerized Adaptive Testing

Published: 22 November 2017

Volume 83, pages 650–673, (2018)
Cite this article

Psychometrika Aims and scope Submit manuscript

Edison M. Choe¹,
Jinming Zhang² &
Hua-Hua Chang²

1049 Accesses
17 Citations
Explore all metrics

Abstract

Item compromise persists in undermining the integrity of testing, even secure administrations of computerized adaptive testing (CAT) with sophisticated item exposure controls. In ongoing efforts to tackle this perennial security issue in CAT, a couple of recent studies investigated sequential procedures for detecting compromised items, in which a significant increase in the proportion of correct responses for each item in the pool is monitored in real time using moving averages. In addition to actual responses, response times are valuable information with tremendous potential to reveal items that may have been leaked. Specifically, examinees that have preknowledge of an item would likely respond more quickly to it than those who do not. Therefore, the current study proposes several augmented methods for the detection of compromised items, all involving simultaneous monitoring of changes in both the proportion correct and average response time for every item using various moving average strategies. Simulation results with an operational item pool indicate that, compared to the analysis of responses alone, utilizing response times can afford marked improvements in detection power with fewer false positives.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparison of Two Item Preknowledge Detection Approaches Using Response Time

Sequential Generalized Likelihood Ratio Tests for Online Item Monitoring

Article 04 June 2022

Sequential Monitoring of Aberrant Test-Taking Behaviors Based on Response Times

References

Armstrong, R. D., & Shi, M. (2009). A parametric cumulative sum statistic for person fit. Applied Psychological Measurement, 33, 391–410.
Article Google Scholar
Armstrong, R. D., Stoumbos, Z. G., Kung, M. T., & Shi, M. (2007). On the performance of the lz person-fit statistic. Practical Assessment Research and Evaluation, 12(16).
Belov, D. I. (2014). Detecting item preknowledge in computerized adaptive testing using information theory and combinatorial optimization. Journal of Computerized Adaptive Testing, 2, 37–58.
Article Google Scholar
Belov, D. I. (2015). Comparing the performance of eight item preknowledge detection statistics. Applied Psychological Measurement, 40, 83–97.
Article PubMed PubMed Central Google Scholar
Belov, D. I., & Armstrong, R. D. (2010). Automatic detection of answer copying via Kullback–Leibler divergence and K-Index. Applied Psychological Measurement, 34, 379–392.
Article Google Scholar
Belov, D. I., Pashley, P. J., Lewis, C., & Armstrong, R. D. (2007). Detecting aberrant responses with Kullback–Leibler distance. In K. Shigemasu, A. Okada, T. Imaizumi, & T. Hoshino (Eds.), New trends in psychometrics (pp. 7–14). Tokyo: Universal Academy Press.
Google Scholar
Bock, R. D., & Mislevy, R. J. (1982). Adaptive EAP estimation of ability in a microcomputer environment. Applied Psychological Measurement, 6, 431–444.
Article Google Scholar
Chang, H.-H. (2015). Psychometrics behind computerized adaptive testing. Psychometrika, 80, 1–20.
Article PubMed Google Scholar
Chang, H.-H., Qian, J., & Ying, Z. (2001). $a$-stratified multistage computerized adaptive testing with $b$-blocking. Applied Psychological Measurement, 25, 333–341.
Article Google Scholar
Chang, H.-H., & Stout, W. (1993). The asymptotic posterior normality of the latent trait in an IRT model. Psychometrika, 58, 37–52.
Article Google Scholar
Chang, H.-H., & Ying, Z. (1999). $a$-stratified multistage computerized adaptive testing. Applied Psychological Measurement, 23, 211–222.
Article Google Scholar
Chang, H.-H., & Ying, Z. (2008). To weight or not to weight? Balancing influence of initial items in adaptive testing. Psychometrika, 73, 441–450.
Article Google Scholar
Chang, H.-H., & Ying, Z. (2009). Nonlinear sequential designs for logistic item response theory models with applications to computerized adaptive tests. The Annals of Statistics, 37, 1466–1488.
Article Google Scholar
Chang, S. W., Ansley, T. N., & Lin, S. H. (2000). Performance of item exposure control methods in computerized adaptive testing: Further explorations. In Paper presented at the annual meeting of the American Educational Research Association, New Orleans, LA.
Drasgow, F., Levine, M. V., & Williams, E. A. (1985). Appropriateness measurement with polychotomous item response models and standardized indices. British Journal of Mathematical and Statistical Psychology, 38, 67–86.
Article Google Scholar
Egberink, I., Meijer, R. R., Veldkamp, B. P., Schakel, L., & Smid, N. G. (2010). Detection of aberrant item score patterns in computerized adaptive testing: An empirical example using the CUSUM. Personality and Individual Differences, 48, 921–925.
Article Google Scholar
Georgiadou, E., Triantafillou, E., & Economides, A. A. (2007). A review of item exposure control strategies for computerized adaptive testing developed from 1983 to 2005. The Journal of Technology, Learning, and Assessment.
Han, N., & Hambleton, R. (2004). Detecting exposed test items in computer-based testing. In Paper presented at the annual meeting of the National Council on Measurement in Education, San Diego, CA.
Hau, K.-T., & Chang, H.-H. (2001). Item selection in computerized adaptive testing: Should more discriminating items be used first? Journal of Educational Measurement, 38, 249–266.
Article Google Scholar
Hetter, R. D., & Sympson, J. B. (1997). Item exposure control in CAT-ASVAB. In W. Sands, B. Waters, & J. McBride (Eds.), Computerized adaptive testing: From inquiry to operation (pp. 141–144). Washington, DC: American Psychological Association.
Chapter Google Scholar
Impara, J. C., & Kingsbury, G. (2005). Detecting cheating in computer adaptive tests using data forensics. In Paper presented at the annual meeting of the National Council on Measurement in Education, Montreal, Cananda.
Kang, H.-A., & Chang, H.-H. (2016). Online detection of item compromise in CAT using responses and response times. In Paper presented at the annual meeting of the National Council on Measurement in Education, Washington, D.C.
Karabatsos, G. (2003). Comparing the aberrant response detection performance of thirty-six person-fit statistics. Applied Measurement in Education, 16, 277–298.
Article Google Scholar
Kingsbury, G. G., & Zara, A. R. (1989). Procedures for selecting items for computerized adaptive tests. Applied Measurement in Education, 2, 359–375.
Article Google Scholar
Levine, M. V., & Drasgow, F. (1988). Optimal appropriateness measurement. Psychometrika, 53, 161–176.
Article Google Scholar
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Erlbaum.
Google Scholar
Lord, F. M., & Novick, M. R. (1968). Statistical Theories of mental test scores. Reading, MA: Addison-Wesley.
Google Scholar
Lu, Y., & Hambleton, R. (2003). Statistics for detecting disclosed items in a CAT environment (Research Report No. 498). Amherst, MA: University of Massachusetts, School of Education, Center for Educational Assessment.
Marianti, S., Fox, J.-P., Marianna, A., Veldkamp, B. P., & Tijmstra, J. (2014). Testing for aberrant behavior in response time modeling. Journal of Educational and Behavioral Statistics, 39, 426–451.
Article Google Scholar
Mavridis, D., & Moustaki, I. (2008). Detecting outliers in factor analysis using the forward search algorithm. Multivariate Behavioral Research, 43, 435–475.
Article Google Scholar
Mavridis, D., & Moustaki, I. (2009). The forward search algorithm for detecting aberrant response patterns in factor analysis for binary data. Journal of Computational and Graphical Statistics, 18, 1016–1034.
Article Google Scholar
McLeod, L. D., & Lewis, C. (1999). Detecting item memorization in the CAT environment. Applied Psychological Measurement, 23, 147–160.
Article Google Scholar
McLeod, L. D., & Schnipke, D. L. (1999). Detecting items that have been memorized in the computerized adaptive testing environment. In Paper presented at the annual meeting of the National Council on Measurement in Education, Montreal, Canada.
Meijer, R. R. (2002). Outlier detection in high-stakes certification testing. Journal of Educational Measurement, 39, 219–233.
Article Google Scholar
Meijer, R. R., & Sotaridona, L. S. (2006). Detection of advance item knowledge using response times in computer adaptive testing. Technical Report 03-03, Law School Admission Council.
Mislevy, R. J., & Chang, H.-H. (2000). Does adaptive testing violate local independence? Psychometrika, 65, 149–156.
Article Google Scholar
Moustaki, I., & Knott, M. (2014). Latent variable models that account for atypical responses. Journal of the Royal Statistical Society, Series C, 63, 343–360.
Article Google Scholar
O’Leary, L. S., & Smith, R. W. (2017). Detecting candidate preknowledge and compromised content using differential person and item functioning. In G. J. Cizek & J. A. Wollack (Eds.), Handbook of quantitative methods for detecting cheating on tests (pp. 151–163). New York, NY: Routledge.
Google Scholar
Öztürk, N. K., & Karabatsos, G. (2017). A Bayesian robust IRT outlier-detection model. Applied Psychological Measurement, 41, 195–208.
Article PubMed Google Scholar
Risk, N. M. (2015). The impact of item parameter drift in computer adaptive testing (CAT) (Unpublished doctoral dissertation). University of Illinois at Chicago.
Stocking, M. L. (1993). Controlling item exposure rates in a realistic adaptive testing paradigm. ETS Research Report Series (pp. 1–31).
Stocking, M. L., & Lewis, C. (1998). Controlling item exposure conditional on ability in computerized adaptive testing. Journal of Educational and Behavioral Statistics, 23, 57–75.
Article Google Scholar
Sympson, J. B., & Hetter, R. D. (1985). Controlling item-exposure rates in computerized adaptive testing. In Proceedings of the 27th annual meeting of the Military Testing Association, San Diego, CA: Navy Personnel Research and Development Center.
Tatsuoka, K. K. (1984). Caution indices based on item response theory. Psychometrika, 49, 95–110.
Article Google Scholar
Tendeiro, J. N., & Meijer, R. R. (2012). A CUSUM to detect person misfit: A discussion and some alternative for existing procedures. Applied Psychological Measurement, 36, 420–442.
Article Google Scholar
van der Linden, W. J. (2003). Some alternatives to Sympson–Hetter item-exposure control in computerized adaptive testing. Journal of Educational and Behavioral Statistics, 28, 249–265.
Article Google Scholar
van der Linden, W. J. (2006). A lognormal model for response times on test items. Journal of Educational and Behavioral Statistics, 31, 181–204.
Article Google Scholar
van der Linden, W. J. (2007). A hierarchical framework for modeling speed and accuracy on test items. Psychometrika, 72, 287–308.
Article Google Scholar
van der Linden, W. J., & Guo, F. (2008). Bayesian procedures for identifying aberrant response-time patterns in adaptive testing. Psychometrika, 73, 365–384.
Article Google Scholar
van der Linden, W. J., & Lewis, C. (2015). Bayesian checks on cheating on tests. Psychometrika, 80, 689–706.
Article PubMed Google Scholar
van der Linden, W. J., & van Krimpen-Stoop, E. (2003). Using response times to detect aberrant responses in computerized adaptive testing. Psychometrika, 68, 251–265.
Article Google Scholar
van Krimpen-Stoop, E., & Meijer, R. R. (2001). CUSUM-based person-fit statistics for adaptive testing. Journal of Educational and Behavioral Statistics, 26, 199–218.
Article Google Scholar
Veerkamp, W. J. J., & Glas, C. A. W. (2000). Detection of known items in adaptive testing with a statistical quality control method. Journal of Educational and Behavioral Statistics, 25, 373–389.
Article Google Scholar
Zhang, J. (2014). A sequential procedure for detecting compromised items in the item pool of a CAT system. Applied Psychological Measurement, 38, 87–104.
Article Google Scholar
Zhang, J., & Li, J. (2016). Monitoring items in real time to enhance CAT security. Journal of Educational Measurement, 53, 131–151.
Article Google Scholar
Zhu, R., Yu, F., & Liu, S. (2002). Statistical indexes for monitoring item behavior under computer adaptive testing environment. In: Paper presented at the annual meeting of the American Educational Research Association, New Orleans, LA.

Download references

Author information

Authors and Affiliations

Graduate Management Admission Council® (GMAC®), 11921 Freedom Drive, Suite 300, Reston, VA, 20190, USA
Edison M. Choe
University of Illinois at Urbana-Champaign, Champaign, IL, 61820, USA
Jinming Zhang & Hua-Hua Chang

Authors

Edison M. Choe
View author publications
You can also search for this author in PubMed Google Scholar
Jinming Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Hua-Hua Chang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Edison M. Choe.

Appendix

1.1 Application of Lyapunov’s Central Limit Theorem

Assume that log RT is normally distributed as follows: $\log T_{ij} \sim \mathcal {N}(\mu _{ij},\sigma _j^2)$, where $\mu _{ij}=\beta _j-\tau _i$ and $\sigma _j^2=1/\alpha _j^2$. The mean log RT of the moving sample for item j is then given as $\hat{\mu }_j^{(m)}=\dfrac{1}{m}\sum \nolimits _{i=n-m+1}^n \log T_{ij}$. Also, define the following: $s_m^2=\sum \nolimits _{i=n-m+1}^n \sigma _j^2=m\sigma _j^2$. In this context, Lyapunov’s CLT states that

$$\begin{aligned} \dfrac{1}{s_m}\sum \limits _{i=n-m+1}^n(\log T_{ij}-\mu _{ij}) = \dfrac{\hat{\mu }_j^{(m)}-\sum \nolimits _{i=n-m+1}^n\mu _{ij}/m}{\sigma _j/\sqrt{m}} \; {\mathop {\longrightarrow }\limits ^{\text{ d }}} \; \mathcal {N}(0,1) \end{aligned}$$

(A.1)

if, for any $\delta >0$, the following condition is met:

$$\begin{aligned} \lim _{m\rightarrow \infty }\dfrac{1}{s_m^{2+\delta }}\sum \limits _{i=n-m+1}^nE\left( |\log T_{ij}-\mu _{ij}|^{2+\delta }\right) =0. \end{aligned}$$

(A.2)

Recognizing that the expectation term is a central absolute moment of $\log T_{ij}$,

$$\begin{aligned} E\left( |\log T_{ij}-\mu _{ij}|^{2+\delta }\right) = \sigma _j^{2+\delta }(1+\delta )!! \cdot {\left\{ \begin{array}{ll} \sqrt{2/\pi } &{} \mathrm {if} \; 2+\delta \; \mathrm {is \; odd} \\ \;\;\;\; 1 &{} \mathrm {if} \; 2+\delta \; \mathrm {is \; even} \end{array}\right. }. \end{aligned}$$

(A.3)

Therefore, using $\delta =2$ for simplicity,

$$\begin{aligned} \lim _{m\rightarrow \infty }\dfrac{1}{s_m^{4}}\sum \limits _{i=n-m+1}^nE\left( |\log T_{ij}-\mu _{ij}|^{4}\right) =&\lim _{m\rightarrow \infty }\dfrac{1}{m^2\sigma _j^4}\sum \limits _{i=n-m+1}^n3\sigma _j^{4} \\ =&\lim _{m\rightarrow \infty }\dfrac{m\left( 3\sigma _j^{4}\right) }{m^2\sigma _j^4} \\ =&\lim _{m\rightarrow \infty }\dfrac{3}{m} \\ =&\; 0, \end{aligned}$$

thereby meeting Lyapunov’s condition for the asymptotic normality of the test statistic.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Choe, E.M., Zhang, J. & Chang, HH. Sequential Detection of Compromised Items Using Response Times in Computerized Adaptive Testing. Psychometrika 83, 650–673 (2018). https://doi.org/10.1007/s11336-017-9596-3

Download citation

Received: 11 January 2017
Published: 22 November 2017
Issue Date: September 2018
DOI: https://doi.org/10.1007/s11336-017-9596-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sequential Detection of Compromised Items Using Response Times in Computerized Adaptive Testing

Abstract

Access this article

Similar content being viewed by others

Comparison of Two Item Preknowledge Detection Approaches Using Response Time

Sequential Generalized Likelihood Ratio Tests for Online Item Monitoring

Sequential Monitoring of Aberrant Test-Taking Behaviors Based on Response Times

References

Author information

Authors and Affiliations

Corresponding author

Appendix

1.1 Application of Lyapunov’s Central Limit Theorem

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Sequential Detection of Compromised Items Using Response Times in Computerized Adaptive Testing

Abstract

Access this article

Similar content being viewed by others

Comparison of Two Item Preknowledge Detection Approaches Using Response Time

Sequential Generalized Likelihood Ratio Tests for Online Item Monitoring

Sequential Monitoring of Aberrant Test-Taking Behaviors Based on Response Times

References

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

1.1 Application of Lyapunov’s Central Limit Theorem

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation