Psychometrics Behind Computerized Adaptive Testing

Chang, Hua-Hua

doi:10.1007/s11336-014-9401-5

Psychometrics Behind Computerized Adaptive Testing

Published: 06 February 2014

Volume 80, pages 1–20, (2015)
Cite this article

Psychometrika Aims and scope Submit manuscript

Hua-Hua Chang¹

4453 Accesses
98 Citations
1 Altmetric
Explore all metrics

Abstract

The paper provides a survey of 18 years’ progress that my colleagues, students (both former and current) and I made in a prominent research area in Psychometrics—Computerized Adaptive Testing (CAT). We start with a historical review of the establishment of a large sample foundation for CAT. It is worth noting that the asymptotic results were derived under the framework of Martingale Theory, a very theoretical perspective of Probability Theory, which may seem unrelated to educational and psychological testing. In addition, we address a number of issues that emerged from large scale implementation and show that how theoretical works can be helpful to solve the problems. Finally, we propose that CAT technology can be very useful to support individualized instruction on a mass scale. We show that even paper and pencil based tests can be made adaptive to support classroom teaching.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Re-evaluating GPT-4’s bar exam performance

Article Open access 30 March 2024

Eric Martínez

The Use of Cronbach’s Alpha When Developing and Reporting Research Instruments in Science Education

Article Open access 07 June 2017

Keith S. Taber

The Promises and Challenges of Artificial Intelligence for Teachers: a Systematic Review of Research

Article Open access 25 March 2022

Ismail Celik, Muhterem Dindar, … Sanna Järvelä

Notes

The hazard function is the instantaneous rate at which events occur. In psychological terms, the hazard rate is the conditional probability of finishing the task in the next moment, which is therefore, also viewed as the processing capacity of an individual.

References

Armitage, P. (2002). Statistical methods in medical research (4th ed.). Bodmin: MPG Books.
Book Google Scholar
Carlson, S. (2000). ETS finds flaws in the way online GRE rates some students. The Chronicle of Higher Education, 47(8), A47.
Google Scholar
Chang, H.-H. (2004). Understanding computerized adaptive testing—from Robbins—Monro to Lord, and beyond. In D. Kaplan (Ed.), The Sage handbook of quantitative methods for the social sciences (pp. 117–133). Thousand Oaks: Sage.
Google Scholar
Chang, H.-H. (2012). Making computerized adaptive testing diagnostic tools for schools. In R.W. Lissitz & H. Jiao (Eds.), Computers and their impact on state assessments: recent history and predictions for the future (pp. 195–226). Charlotte: Information Age Publisher.
Google Scholar
Chang, H.-H., & Stout, W. (1993). The asymptotic posterior normality of the latent trait in an IRT model. Psychometrika, 58(1), 37–52.
Article Google Scholar
Chang, H.-H., & van der Linden, W.J. (2003). Optimal stratification of item pools in a-stratified computerized adaptive testing. Applied Psychological Measurement, 27(4), 262–274.
Article Google Scholar
Chang, H.-H., & Ying, Z. (1996). A global information approach to computerized adaptive testing. Applied Psychological Measurement, 20(3), 213–229.
Article Google Scholar
Chang, H.-H., & Ying, Z. (1999). a-stratified multistage computerized adaptive testing. Applied Psychological Measurement, 23(3), 211–222.
Article Google Scholar
Chang, H.-H., & Ying, Z. (2007). Computerized adaptive testing. In N. Salkind (Ed.), The Sage encyclopedia of measurement and statistics (pp. 170–174). Thousand Oaks, CA: Sage.
Google Scholar
Chang, H.-H., & Ying, Z. (2008). To weight or not to weight? Balancing influence of initial items in adaptive testing. Psychometrika, 73(3), 441–450.
Article Google Scholar
Chang, H.-H., & Ying, Z. (2009). Nonlinear sequential designs for logistic item response theory models with applications to computerized adaptive tests. The Annals of Statistics, 37(3), 1466–1488.
Article Google Scholar
Chang, H.-H., Qian, J., & Ying, Z. (2001). a-stratified multistage computerized adaptive testing with b blocking. Applied Psychological Measurement, 25(4), 333–341.
Article Google Scholar
Cheng, Y. (2009). When cognitive diagnosis meets computerized adaptive testing: CD-CAT. Psychometrika, 74(4), 619–642.
Article Google Scholar
Cheng, Y. (2010). Improving cognitive diagnostic computerized adaptive testing by balancing attribute coverage: the modified maximum global discrimination index method. Educational and Psychological Measurement, 70, 902–913.
Article Google Scholar
Cheng, Y., & Chang, H.-H. (2009). The maximum priority index method for severely constrained item selection in computerized adaptive testing. British Journal of Mathematical and Statistical Psychology, 62, 369–383.
Article PubMed Google Scholar
Cheng, Y., Chang, H.-H., & Yi, Q. (2007). Two-phase item selection procedure for flexible content balancing in CAT. Applied Psychological Measurement, 31(6), 467–482.
Article Google Scholar
Cheng, Y., Chang, H.-H., Douglas, J., & Guo, F. (2009). Constraint-weighted a-stratification for computerized adaptive testing with non-psychometric constraints: balancing measurement efficiency and exposure control. Educational and Psychological Measurement, 69, 35–49.
Article Google Scholar
Davey, T., & Nering, N. (2002). Controlling item exposure and maintaining item security. In C.N. Mills, M.T. Potenza, J.J. Fremer, & W.C. Ward (Eds.), Computer-based testing: building the foundation for future assessments (pp. 165–191). Mahwah: Lawrence Erlbaum.
Google Scholar
Downing, S.M. (2006). Twelve steps for effective test development. In S.M. Downing & T.M. Haladyna (Eds.), Handbook of test development (pp. 3–25). Mahwah: Lawrence Erlbaum Associates.
Google Scholar
Fan, Z., Wang, C., Chang, H.-H., & Douglas, J. (2012). Utilizing response time distributions for item selection in computerized adaptive testing. Journal of Educational and Behavioral Statistics, 37(5), 655–670.
Article Google Scholar
Hau, K., & Chang, H.-H. (2001). Item selection in computerized adaptive testing: should more discriminating items be used first? Journal of Educational Measurement, 38(3), 249–266.
Article Google Scholar
Hodges, J.I., & Lehmann, E.L. (1956). The efficiency of some nonparametric competitors of t-test. The Annals of Mathematical Statistics, 27(2), 324–335.
Article Google Scholar
Holland, P.W. (1990). The Dutch identity: a new tool for the study of item response theory model. Psychometrika, 55, 577–601.
Article Google Scholar
Klein Entink, R.H., van der Linden, W.J., & Fox, J.-P. (2009). A Box–Cox normal model for response times. British Journal of Mathematical and Statistical Psychology, 62, 621–640.
Article PubMed Google Scholar
Lan, K.K.G., & DeMets, D.L. (1983). Discrete sequential boundaries for clinical trials. Biometrika, 70(3), 659–663.
Article Google Scholar
Leung, C., Chang, H.-H., & Hau, K. (2003). Computerized adaptive testing: a comparison of three content balancing methods. The Journal of Technology, Learning, and Assessment, 2(5), 2–15.
Google Scholar
Liu, H., You, X., Wang, W., Ding, S., & Chang, H.-H. (2013). The development of computerized adaptive testing with cognitive diagnosis for an English achievement test in China. Journal of Classification, 30, 152–172.
Article Google Scholar
Lord, M.F. (1970). Some test theory for tailored testing. In W.H. Holzman (Ed.), Computer assisted instruction, testing, and guidance (pp. 139–183). New York: Harper and Row.
Google Scholar
Lord, F. (1980). Applications of item response theory to practical testing problems. Hillsdale: Erlbaum.
Google Scholar
Luecht, R.M., & Nungester, R.J. (1998). Some practical examples of computer-adaptive sequential testing. Journal of Educational Measurement, 35(3), 229–249.
Article Google Scholar
Maris, E. (1993). Additive and multiplicative models for gamma distributed random variables, and their applications as psychometric models for response times. Psychometrika, 58, 445–469.
Article Google Scholar
McGlohen, M., & Chang, H.-H. (2008). Combining computer adaptive testing technology with cognitively diagnostic assessment. Behavior Research Methods, 40(3), 808–821.
Article PubMed Google Scholar
Merritt, J. (2003). Why the folks at ETS flunked the course—a tech-savvy service will soon be giving B-school applicants their GMATs. Business Week, Dec. 29.
Mislevy, R., & Chang, H.-H. (2000). Does adaptive testing violate local independence? Psychometrika, 65(2), 149–156.
Article Google Scholar
Mulder, J., & van der Linden, W.J. (2009). Multidimensional adaptive testing with optimal design criteria for item selection. Psychometrika, 74(2), 273–296.
Article PubMed Central PubMed Google Scholar
O’Brien, P.C., & Fleming, T.R. (1979). A multiple testing procedure for clinical trials. Biometrics, 35, 549–556.
Article PubMed Google Scholar
Pocock, S.J. (2002). Clinical trials: a practical research approach. Padstow: TJ International.
Google Scholar
Ranger, J., & Kuhn, J.T. (2011). A flexible latent trait model for response times in tests. Psychometrika, 77, 31–47.
Article Google Scholar
Reckase, M.D. (2009). Multidimensional item response theory. New York: Springer.
Book Google Scholar
Reckase, M.D., & McKinley, R.L. (1991). The discrimination power of items that measure more than one dimension. Applied Psychological Measurement, 15(4), 361–373.
Article Google Scholar
Robbins, H., & Monro, S. (1951). A stochastic approximation method. The Annals of Mathematical Statistics, 22, 400–407.
Article Google Scholar
Roskam, E.E. (1997). Models for speed and time-limit tests. In W.J. van der Linden & R. Hambleton (Eds.), Handbook of modern item response theory (pp. 187–208). New York: Springer.
Chapter Google Scholar
Rounder, J.N., Sun, D., Speckman, P.L., Lu, J., & Zhou, D. (2003). A hierarchical Bayesian statistical framework for response time distributions. Psychometrika, 68, 589–606.
Article Google Scholar
Scheiblechner, H. (1979). Specific objective stochastic latency mechanisms. Journal of Mathematical Psychology, 19, 18–38.
Article Google Scholar
Segall, D.O. (1996). Multidimensional adaptive testing. Psychometrika, 61(2), 331–354.
Article Google Scholar
Segall, D.O. (2001). General ability measurement: an application of multidimensional item response theory. Psychometrika, 66(1), 79–97.
Article Google Scholar
Thissen, D. (1983). Timed testing: an approach using item response theory. In D.J. Weiss (Ed.), New horizons in testing (pp. 179–203). New York: Academic Press.
Chapter Google Scholar
van der Linden, W.J. (1999). Empirical initialization of the trait estimator in adaptive testing. Applied Psychological Measurement, 23, 21–29.
Article Google Scholar
van der Linden, W.J. (2006). A lognormal model for response times on test items. Journal of Educational and Behavioral Statistics, 31, 181–204.
Article Google Scholar
van der Linden, W.J. (2007). A hierarchical framework for modeling speed and accuracy on test items. Psychometrika, 72, 287–308.
Article Google Scholar
van der Linden, W.J., & Chang, H.-H. (2003). Implementing content constraints in alpha-stratified adaptive testing using a shadow test approach. Applied Psychological Measurement, 27(2), 107–120.
Article Google Scholar
Veldkamp, B.P., & Van Der Linden, W.J. (2002). Multidimensional adaptive testing with constraints on test content. Psychometrika, 67(4), 575–588.
Article Google Scholar
Wang, C. (2013). Mutual information item selection method in cognitive diagnostic computerized adaptive testing with short test length. Educational and Psychological Measurement, 73, 1017–1035.
Article Google Scholar
Wang, C., & Chang, H.-H. (2011). Item selection in multidimensional computerized adaptive testing—gaining information different angles. Psychometrika, 76(3), 363–384.
Article Google Scholar
Wang, T., & Hanson, B.A. (2005). Development and calibration of an item response model that incorporates response time. Applied Psychological Measurement, 29, 323–339.
Article Google Scholar
Wang, C., Chang, H.-H., & Huebner, A. (2011a). Restrictive stochastic item selection methods in cognitive diagnostic CAT. Journal of Educational Measurement, 48(3), 255–273.
Article Google Scholar
Wang, C., Chang, H.-H., & Boughton, K. (2011b). Kullback–Leibler information and its applications in multidimensional adaptive testing. Psychometrika, 76(1), 13–39.
Article Google Scholar
Wang, C., Chang, H.-H., & Douglas, J. (2012). Combining CAT with cognitive diagnosis: a weighted item selection approach. Behavior Research Methods, 44, 95–109.
Article PubMed Google Scholar
Wang, C., Chang, H.-H., & Douglas, J. (2013a). The linear transformation model with frailties for the analysis of item response times. British Journal of Mathematical & Statistical Psychology, 66, 144–168.
Article Google Scholar
Wang, C., Chang, H., & Boughton, K. (2013b). Deriving stopping rules for multidimensional computerized adaptive testing. Applied Psychological Measurement, 37, 99–122.
Article Google Scholar
Wang, C., Fan, Z., Chang, H.-H., & Douglas, J. (2013c). A semiparametric model for jointly analyzing response times and accuracy in computerized testing. Journal of Educational and Behavioral Statistics, 38(4), 381–417.
Article Google Scholar
Wang, Y.-Q., Liu, H., & You, X. (2013d). Learning diagnosis—from concepts to system development. Paper presented at the Anual Meeting of Assessment and Evaluation, the Chinese Society of Education, Dalian, China, May.
Webley, K. (2013). A is for adaptive—personalized learning is poised to transform education. Can it enrich students and investors as the same time? Time, June 17, 40–45.
Google Scholar
Weiss, D.J. (1982). Improving measurement quality and efficiency with adaptive testing. Applied Psychological Measurement, 6, 473–492.
Article Google Scholar
Xu, X., Chang, H., & Douglas, J. (2003). A simulation study to compare CAT strategies for cognitive diagnosis. Paper presented at the annual meeting of National Council on Measurement in Education, Chicago.
Yi, Q., & Chang, H.-H. (2003). α-stratified CAT design with content blocking. British Journal of Mathematical & Statistical Psychology, 56, 359–378.
Article Google Scholar
Zheng, Y., & Chang, H.-H. (2011). Automatic on-the-fly assembly for computer adaptive multistage testing. Paper presented at the annual meeting of the National Council on Measurement in Education, New Orleans, LA, April.
Zheng, Y., Chang, C.-H., & Chang, H.-H. (2013). Content-balancing strategy in bifactor computerized adaptive patient-reported outcome measurement. Quality of Life Research, 22, 491–499.
Article PubMed Google Scholar

Download references

Acknowledgements

I wish to thank Ying Cheng, Edison Choe, Rui Guo, Hyeon-Ah Kang, Justin Kern, Ya-Hui Su, Poh Hua Tay, Chun Wang, Shiyu Wang, Wen Zeng, Changjin Zheng, and Yi Zheng for their suggestions and comments which lead to numerous improvements.

Author information

Authors and Affiliations

University of Illinois at Urbana-Champaign, 430 Psychology Building, 630 E. Daniel Street, M/C 716, Champaign, IL, 61820, USA
Hua-Hua Chang

Authors

Hua-Hua Chang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hua-Hua Chang.

Additional information

This article is based on the Presidential Address Hua-Hua Chang gave on June 25, 2013 at the 78th Annual Meeting of the Psychometric Society held in Arnhem, the Netherlands.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chang, HH. Psychometrics Behind Computerized Adaptive Testing. Psychometrika 80, 1–20 (2015). https://doi.org/10.1007/s11336-014-9401-5

Download citation

Received: 27 October 2013
Published: 06 February 2014
Issue Date: March 2015
DOI: https://doi.org/10.1007/s11336-014-9401-5

Key words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Psychometrics Behind Computerized Adaptive Testing

Abstract

Access this article

Similar content being viewed by others

Re-evaluating GPT-4’s bar exam performance

The Use of Cronbach’s Alpha When Developing and Reporting Research Instruments in Science Education

The Promises and Challenges of Artificial Intelligence for Teachers: a Systematic Review of Research

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Key words

Navigation

Abstract

Access this article

Similar content being viewed by others

Re-evaluating GPT-4’s bar exam performance

The Use of Cronbach’s Alpha When Developing and Reporting Research Instruments in Science Education

The Promises and Challenges of Artificial Intelligence for Teachers: a Systematic Review of Research

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

Search

Navigation