Advertisement

Psychometrika

, Volume 80, Issue 1, pp 1–20 | Cite as

Psychometrics Behind Computerized Adaptive Testing

  • Hua-Hua ChangEmail author
Article

Abstract

The paper provides a survey of 18 years’ progress that my colleagues, students (both former and current) and I made in a prominent research area in Psychometrics—Computerized Adaptive Testing (CAT). We start with a historical review of the establishment of a large sample foundation for CAT. It is worth noting that the asymptotic results were derived under the framework of Martingale Theory, a very theoretical perspective of Probability Theory, which may seem unrelated to educational and psychological testing. In addition, we address a number of issues that emerged from large scale implementation and show that how theoretical works can be helpful to solve the problems. Finally, we propose that CAT technology can be very useful to support individualized instruction on a mass scale. We show that even paper and pencil based tests can be made adaptive to support classroom teaching.

Key words

computerized adaptive testing multidimensional CAT sequential design martingale theory a-stratified item selection response time constraint management CD-CAT 

Notes

Acknowledgements

I wish to thank Ying Cheng, Edison Choe, Rui Guo, Hyeon-Ah Kang, Justin Kern, Ya-Hui Su, Poh Hua Tay, Chun Wang, Shiyu Wang, Wen Zeng, Changjin Zheng, and Yi Zheng for their suggestions and comments which lead to numerous improvements.

References

  1. Armitage, P. (2002). Statistical methods in medical research (4th ed.). Bodmin: MPG Books. CrossRefGoogle Scholar
  2. Carlson, S. (2000). ETS finds flaws in the way online GRE rates some students. The Chronicle of Higher Education, 47(8), A47. Google Scholar
  3. Chang, H.-H. (2004). Understanding computerized adaptive testing—from Robbins—Monro to Lord, and beyond. In D. Kaplan (Ed.), The Sage handbook of quantitative methods for the social sciences (pp. 117–133). Thousand Oaks: Sage. Google Scholar
  4. Chang, H.-H. (2012). Making computerized adaptive testing diagnostic tools for schools. In R.W. Lissitz & H. Jiao (Eds.), Computers and their impact on state assessments: recent history and predictions for the future (pp. 195–226). Charlotte: Information Age Publisher. Google Scholar
  5. Chang, H.-H., & Stout, W. (1993). The asymptotic posterior normality of the latent trait in an IRT model. Psychometrika, 58(1), 37–52. CrossRefGoogle Scholar
  6. Chang, H.-H., & van der Linden, W.J. (2003). Optimal stratification of item pools in a-stratified computerized adaptive testing. Applied Psychological Measurement, 27(4), 262–274. CrossRefGoogle Scholar
  7. Chang, H.-H., & Ying, Z. (1996). A global information approach to computerized adaptive testing. Applied Psychological Measurement, 20(3), 213–229. CrossRefGoogle Scholar
  8. Chang, H.-H., & Ying, Z. (1999). a-stratified multistage computerized adaptive testing. Applied Psychological Measurement, 23(3), 211–222. CrossRefGoogle Scholar
  9. Chang, H.-H., & Ying, Z. (2007). Computerized adaptive testing. In N. Salkind (Ed.), The Sage encyclopedia of measurement and statistics (pp. 170–174). Thousand Oaks, CA: Sage. Google Scholar
  10. Chang, H.-H., & Ying, Z. (2008). To weight or not to weight? Balancing influence of initial items in adaptive testing. Psychometrika, 73(3), 441–450. CrossRefGoogle Scholar
  11. Chang, H.-H., & Ying, Z. (2009). Nonlinear sequential designs for logistic item response theory models with applications to computerized adaptive tests. The Annals of Statistics, 37(3), 1466–1488. CrossRefGoogle Scholar
  12. Chang, H.-H., Qian, J., & Ying, Z. (2001). a-stratified multistage computerized adaptive testing with b blocking. Applied Psychological Measurement, 25(4), 333–341. CrossRefGoogle Scholar
  13. Cheng, Y. (2009). When cognitive diagnosis meets computerized adaptive testing: CD-CAT. Psychometrika, 74(4), 619–642. CrossRefGoogle Scholar
  14. Cheng, Y. (2010). Improving cognitive diagnostic computerized adaptive testing by balancing attribute coverage: the modified maximum global discrimination index method. Educational and Psychological Measurement, 70, 902–913. CrossRefGoogle Scholar
  15. Cheng, Y., & Chang, H.-H. (2009). The maximum priority index method for severely constrained item selection in computerized adaptive testing. British Journal of Mathematical and Statistical Psychology, 62, 369–383. CrossRefPubMedGoogle Scholar
  16. Cheng, Y., Chang, H.-H., & Yi, Q. (2007). Two-phase item selection procedure for flexible content balancing in CAT. Applied Psychological Measurement, 31(6), 467–482. CrossRefGoogle Scholar
  17. Cheng, Y., Chang, H.-H., Douglas, J., & Guo, F. (2009). Constraint-weighted a-stratification for computerized adaptive testing with non-psychometric constraints: balancing measurement efficiency and exposure control. Educational and Psychological Measurement, 69, 35–49. CrossRefGoogle Scholar
  18. Davey, T., & Nering, N. (2002). Controlling item exposure and maintaining item security. In C.N. Mills, M.T. Potenza, J.J. Fremer, & W.C. Ward (Eds.), Computer-based testing: building the foundation for future assessments (pp. 165–191). Mahwah: Lawrence Erlbaum. Google Scholar
  19. Downing, S.M. (2006). Twelve steps for effective test development. In S.M. Downing & T.M. Haladyna (Eds.), Handbook of test development (pp. 3–25). Mahwah: Lawrence Erlbaum Associates. Google Scholar
  20. Fan, Z., Wang, C., Chang, H.-H., & Douglas, J. (2012). Utilizing response time distributions for item selection in computerized adaptive testing. Journal of Educational and Behavioral Statistics, 37(5), 655–670. CrossRefGoogle Scholar
  21. Hau, K., & Chang, H.-H. (2001). Item selection in computerized adaptive testing: should more discriminating items be used first? Journal of Educational Measurement, 38(3), 249–266. CrossRefGoogle Scholar
  22. Hodges, J.I., & Lehmann, E.L. (1956). The efficiency of some nonparametric competitors of t-test. The Annals of Mathematical Statistics, 27(2), 324–335. CrossRefGoogle Scholar
  23. Holland, P.W. (1990). The Dutch identity: a new tool for the study of item response theory model. Psychometrika, 55, 577–601. CrossRefGoogle Scholar
  24. Klein Entink, R.H., van der Linden, W.J., & Fox, J.-P. (2009). A Box–Cox normal model for response times. British Journal of Mathematical and Statistical Psychology, 62, 621–640. CrossRefPubMedGoogle Scholar
  25. Lan, K.K.G., & DeMets, D.L. (1983). Discrete sequential boundaries for clinical trials. Biometrika, 70(3), 659–663. CrossRefGoogle Scholar
  26. Leung, C., Chang, H.-H., & Hau, K. (2003). Computerized adaptive testing: a comparison of three content balancing methods. The Journal of Technology, Learning, and Assessment, 2(5), 2–15. Google Scholar
  27. Liu, H., You, X., Wang, W., Ding, S., & Chang, H.-H. (2013). The development of computerized adaptive testing with cognitive diagnosis for an English achievement test in China. Journal of Classification, 30, 152–172. CrossRefGoogle Scholar
  28. Lord, M.F. (1970). Some test theory for tailored testing. In W.H. Holzman (Ed.), Computer assisted instruction, testing, and guidance (pp. 139–183). New York: Harper and Row. Google Scholar
  29. Lord, F. (1980). Applications of item response theory to practical testing problems. Hillsdale: Erlbaum. Google Scholar
  30. Luecht, R.M., & Nungester, R.J. (1998). Some practical examples of computer-adaptive sequential testing. Journal of Educational Measurement, 35(3), 229–249. CrossRefGoogle Scholar
  31. Maris, E. (1993). Additive and multiplicative models for gamma distributed random variables, and their applications as psychometric models for response times. Psychometrika, 58, 445–469. CrossRefGoogle Scholar
  32. McGlohen, M., & Chang, H.-H. (2008). Combining computer adaptive testing technology with cognitively diagnostic assessment. Behavior Research Methods, 40(3), 808–821. CrossRefPubMedGoogle Scholar
  33. Merritt, J. (2003). Why the folks at ETS flunked the course—a tech-savvy service will soon be giving B-school applicants their GMATs. Business Week, Dec. 29. Google Scholar
  34. Mislevy, R., & Chang, H.-H. (2000). Does adaptive testing violate local independence? Psychometrika, 65(2), 149–156. CrossRefGoogle Scholar
  35. Mulder, J., & van der Linden, W.J. (2009). Multidimensional adaptive testing with optimal design criteria for item selection. Psychometrika, 74(2), 273–296. CrossRefPubMedCentralPubMedGoogle Scholar
  36. O’Brien, P.C., & Fleming, T.R. (1979). A multiple testing procedure for clinical trials. Biometrics, 35, 549–556. CrossRefPubMedGoogle Scholar
  37. Pocock, S.J. (2002). Clinical trials: a practical research approach. Padstow: TJ International. Google Scholar
  38. Ranger, J., & Kuhn, J.T. (2011). A flexible latent trait model for response times in tests. Psychometrika, 77, 31–47. CrossRefGoogle Scholar
  39. Reckase, M.D. (2009). Multidimensional item response theory. New York: Springer. CrossRefGoogle Scholar
  40. Reckase, M.D., & McKinley, R.L. (1991). The discrimination power of items that measure more than one dimension. Applied Psychological Measurement, 15(4), 361–373. CrossRefGoogle Scholar
  41. Robbins, H., & Monro, S. (1951). A stochastic approximation method. The Annals of Mathematical Statistics, 22, 400–407. CrossRefGoogle Scholar
  42. Roskam, E.E. (1997). Models for speed and time-limit tests. In W.J. van der Linden & R. Hambleton (Eds.), Handbook of modern item response theory (pp. 187–208). New York: Springer. CrossRefGoogle Scholar
  43. Rounder, J.N., Sun, D., Speckman, P.L., Lu, J., & Zhou, D. (2003). A hierarchical Bayesian statistical framework for response time distributions. Psychometrika, 68, 589–606. CrossRefGoogle Scholar
  44. Scheiblechner, H. (1979). Specific objective stochastic latency mechanisms. Journal of Mathematical Psychology, 19, 18–38. CrossRefGoogle Scholar
  45. Segall, D.O. (1996). Multidimensional adaptive testing. Psychometrika, 61(2), 331–354. CrossRefGoogle Scholar
  46. Segall, D.O. (2001). General ability measurement: an application of multidimensional item response theory. Psychometrika, 66(1), 79–97. CrossRefGoogle Scholar
  47. Thissen, D. (1983). Timed testing: an approach using item response theory. In D.J. Weiss (Ed.), New horizons in testing (pp. 179–203). New York: Academic Press. CrossRefGoogle Scholar
  48. van der Linden, W.J. (1999). Empirical initialization of the trait estimator in adaptive testing. Applied Psychological Measurement, 23, 21–29. CrossRefGoogle Scholar
  49. van der Linden, W.J. (2006). A lognormal model for response times on test items. Journal of Educational and Behavioral Statistics, 31, 181–204. CrossRefGoogle Scholar
  50. van der Linden, W.J. (2007). A hierarchical framework for modeling speed and accuracy on test items. Psychometrika, 72, 287–308. CrossRefGoogle Scholar
  51. van der Linden, W.J., & Chang, H.-H. (2003). Implementing content constraints in alpha-stratified adaptive testing using a shadow test approach. Applied Psychological Measurement, 27(2), 107–120. CrossRefGoogle Scholar
  52. Veldkamp, B.P., & Van Der Linden, W.J. (2002). Multidimensional adaptive testing with constraints on test content. Psychometrika, 67(4), 575–588. CrossRefGoogle Scholar
  53. Wang, C. (2013). Mutual information item selection method in cognitive diagnostic computerized adaptive testing with short test length. Educational and Psychological Measurement, 73, 1017–1035. CrossRefGoogle Scholar
  54. Wang, C., & Chang, H.-H. (2011). Item selection in multidimensional computerized adaptive testing—gaining information different angles. Psychometrika, 76(3), 363–384. CrossRefGoogle Scholar
  55. Wang, T., & Hanson, B.A. (2005). Development and calibration of an item response model that incorporates response time. Applied Psychological Measurement, 29, 323–339. CrossRefGoogle Scholar
  56. Wang, C., Chang, H.-H., & Huebner, A. (2011a). Restrictive stochastic item selection methods in cognitive diagnostic CAT. Journal of Educational Measurement, 48(3), 255–273. CrossRefGoogle Scholar
  57. Wang, C., Chang, H.-H., & Boughton, K. (2011b). Kullback–Leibler information and its applications in multidimensional adaptive testing. Psychometrika, 76(1), 13–39. CrossRefGoogle Scholar
  58. Wang, C., Chang, H.-H., & Douglas, J. (2012). Combining CAT with cognitive diagnosis: a weighted item selection approach. Behavior Research Methods, 44, 95–109. CrossRefPubMedGoogle Scholar
  59. Wang, C., Chang, H.-H., & Douglas, J. (2013a). The linear transformation model with frailties for the analysis of item response times. British Journal of Mathematical & Statistical Psychology, 66, 144–168. CrossRefGoogle Scholar
  60. Wang, C., Chang, H., & Boughton, K. (2013b). Deriving stopping rules for multidimensional computerized adaptive testing. Applied Psychological Measurement, 37, 99–122. CrossRefGoogle Scholar
  61. Wang, C., Fan, Z., Chang, H.-H., & Douglas, J. (2013c). A semiparametric model for jointly analyzing response times and accuracy in computerized testing. Journal of Educational and Behavioral Statistics, 38(4), 381–417. CrossRefGoogle Scholar
  62. Wang, Y.-Q., Liu, H., & You, X. (2013d). Learning diagnosis—from concepts to system development. Paper presented at the Anual Meeting of Assessment and Evaluation, the Chinese Society of Education, Dalian, China, May. Google Scholar
  63. Webley, K. (2013). A is for adaptive—personalized learning is poised to transform education. Can it enrich students and investors as the same time? Time, June 17, 40–45. Google Scholar
  64. Weiss, D.J. (1982). Improving measurement quality and efficiency with adaptive testing. Applied Psychological Measurement, 6, 473–492. CrossRefGoogle Scholar
  65. Xu, X., Chang, H., & Douglas, J. (2003). A simulation study to compare CAT strategies for cognitive diagnosis. Paper presented at the annual meeting of National Council on Measurement in Education, Chicago. Google Scholar
  66. Yi, Q., & Chang, H.-H. (2003). α-stratified CAT design with content blocking. British Journal of Mathematical & Statistical Psychology, 56, 359–378. CrossRefGoogle Scholar
  67. Zheng, Y., & Chang, H.-H. (2011). Automatic on-the-fly assembly for computer adaptive multistage testing. Paper presented at the annual meeting of the National Council on Measurement in Education, New Orleans, LA, April. Google Scholar
  68. Zheng, Y., Chang, C.-H., & Chang, H.-H. (2013). Content-balancing strategy in bifactor computerized adaptive patient-reported outcome measurement. Quality of Life Research, 22, 491–499. CrossRefPubMedGoogle Scholar

Copyright information

© The Psychometric Society 2014

Authors and Affiliations

  1. 1.University of Illinois at Urbana-ChampaignChampaignUSA

Personalised recommendations