Heuristic Assembly of a Classification Multistage Test with Testlets

  • Zhuoran WangEmail author
  • Ying Li
  • Werner Wothke
Conference paper
Part of the Springer Proceedings in Mathematics & Statistics book series (PROMS, volume 265)


In addition to the advantages of shortening test and balancing item bank usage, multistage testing (MST) has its unique merit of incorporating testlets. Testlet refers to a group of items sharing the same piece of stimulus. As MST can include an entire testlet in one module, fewer stimuli are required than items. On the other hand, computerized adaptive testing (CAT) selects item one by one, thus excludes the possibility of several items sharing the same stimulus. In this way, testlets in MST save the stimuli processing time and facilitate ability estimate. In order to utilize the advantages brings by testlet, a classification MST was designed to upgrade an operational listening test. A heuristic module top-down assembly procedure incorporating testlet was developed based on the modified normalized weighted absolute deviation heuristic (NWADH). A three-stage classification MST with 1-3-5 panel design was assembled to classify examinees into six levels. A real data-based simulation study was conducted to compare the performance of the classification MST and the operational linear test in terms of ability recovery and classification accuracy. The bi-factor model was used in item parameter calibration and examinee scoring. Results show the 30-item MST had a similar performance as the 44-item linear test with prior knowledge of examinee ability and outperformed the 44-item linear test without prior information, in both ability recovery and classification accuracy. In conclusion, the classification MST can shorten the test while keeping a good accuracy.


Multistage testing Classification Testlets 


  1. Boyd, A. M., Dodd, B., & Fitzpatrick, S. (2013). A comparison of exposure control procedures in CAT systems based on different measurement models for testlets. Applied Measurement in Education, 26(2), 113–135.CrossRefGoogle Scholar
  2. Bradlow, E. T., Wainer, H., & Wang, X. (1999). A Bayesian random effects model for testlets. Psychometrika, 64(2), 153–168.CrossRefGoogle Scholar
  3. Breithaupt, K., Ariel, A., & Veldkamp, B. (2005). Automated simultaneous assembly for multistage testing. International Journal of Testing, 5(3), 319–330.CrossRefGoogle Scholar
  4. Luecht, R. (1998). Computer-assisted test assembly using optimization heuristics. Applied Psychological Measurement, 22(3), 224–236.CrossRefGoogle Scholar
  5. Van der Linden, W. (1998). Optimal assembly of psychological and educational tests. Applied Psychological Measurement, 22(3), 195–211.CrossRefGoogle Scholar
  6. Weiss, D. J., & Gibbons, R. D. (2007). Computerized adaptive testing with the bi-factor model. In D. J. Weiss (Ed.), Proceedings of the 2007 GMAC conference on computerized adaptive testing.Google Scholar
  7. Yan, D., von Davier, A., & Lewis, C. (Eds.). (2016). Computerized multistage testing: Theory and applications. CRC Press.Google Scholar
  8. Zheng, Y., Chang, C. H., & Chang, H. H. (2013). Content-balancing strategy in bifactor computerized adaptive patient-reported outcome measurement. Quality of Life Research, 22(3), 491–499.CrossRefGoogle Scholar
  9. Zheng, Y., Nozawa, Y., Zhu, R., & Gao, X. (2016). Automated top-down heuristic assembly of a classification multistage test. International Journal of Quantitative Research in Education, 3(4), 242–265.CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.University of Minnesota Twin CitiesMinneapolisUSA
  2. 2.American Councils for International EducationWashingtonUSA
  3. 3.Werner Wothke ConsultingWashingtonUSA

Personalised recommendations