Skip to main content

Multistage Testing: Issues, Designs, and Research

  • Chapter
  • First Online:
Elements of Adaptive Testing

Abstract

Just as traditional computerized adaptive testing (CAT) involves adaptive selection of individual items for sequential administration to examinees as a test is in progress, multistage testing (MST) is an analogous approach that uses sets of items as the building blocks for a test. In MST terminology, these sets of items have come to be termed modules (Luecht & Nungester, 1998) or testlets (Wainer & Kiely, 1987) and can be characterized as short versions of linear test forms where some specified number of individual items are administered together to meet particular test specifications and provide a certain proportion of the total test information. The individual items in a module may be all related to one or more common stems (such as passages or graphics) or be more generally discrete from one another, per the content specifications of the testing program for the test in question. These self-contained, carefully constructed, fixed sets of items are the same for every examinee to whom each set is administered, but any two examinees may or may not be presented with the same sequence of modules, nor even the same modules.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Adema, J. J. (1990). The construction of customized two-stage tests. Journal of Educational Measurement, 27, 241–253.

    Article  Google Scholar 

  • Angoff, W. & Huddleston, E. (1958). The multi-level experiment: A study of a two-level testing system for the College Board Scholastic Aptitude Test (Statistical Report No. SR-58-21). Princeton, NJ: Educational Testing Service.

    Google Scholar 

  • Armstrong, R., Jones, D., Koppel, N. & Pashley, P. (2000, April). Computerized adaptive testing with multiple forms structures. Paper presented at the meeting of the National Council on Measurement in Education, New Orleans, LA.

    Google Scholar 

  • Berger, M. P. F. (1994). A general approach to algorithmic design of fixed-form tests, adaptive tests, and testlets. Applied Psychological Measurement, 18, 141–153.

    Article  Google Scholar 

  • Bradlow, E. T., Wainer, H. & Wang, X. (1999). A Bayesian random effects model for testlets. Psychometrika, 64, 153–168.

    Article  Google Scholar 

  • Breithaupt, K., Ariel, A. & Veldkamp, B. P. (2005). Automated simultaneous assembly for multistage testing. International Journal of Testing, 5, 319–330.

    Article  Google Scholar 

  • Breithaupt, K. & Hare, D. R. (2007), Automated simultaneous assembly of multistage testlets for a high-stakes licensing examination. Educational and Psychological Measurement, 67, 5–20.

    Article  MathSciNet  Google Scholar 

  • Cronbach, L. J. & Gleser, G. C. (1965). Psychological tests and personnel decisions (2nd ed.). Urbana, IL: University of Illinois Press.

    Google Scholar 

  • Dodd, B. G. & Fitzpatrick, S. J. (2002). Alternatives for scoring CBTs. In C. N. Mills, M. T. Potenza, J. J. Fremer & W. C. Ward (Eds.), Computer-based testing: Building the foundation for future assessments (pp. 215–236). Mahwah, NJ: Lawrence Erlbaum Associates.

    Google Scholar 

  • Folk, V. G. & Smith, R. L. (2002). Models for delivery of computer-based tests. In C. N. Mills, M. T. Potenza, J. J. Fremer & W. C. Ward (Eds.), Computer-based testing: Building the foundation for future assessments (pp. 41–66). Mahwah, NJ: Lawrence Erlbaum Associates.

    Google Scholar 

  • Glas, C. A. W., Wainer, H. & Bradlow, E. T. (2000). MML and EAP estimation in testlet-based adaptive testing. In W. J. van der Linden & C. A. W. Glas (Eds.), Computerized adaptive testing: Theory and practice. Boston: Kluwer-Nijhof Publishing.

    Google Scholar 

  • Green, B. F., Jr., Bock, R. D., Humphreys, L. G., Linn, R. B. & Reckase, M. D. (1984). Technical guidelines for assessing computerized adaptive tests. Journal of Educational Measurement, 21, 347–360.

    Article  Google Scholar 

  • Hadadi, A., Luecht, R. M., Swanson, D. B. & Case, S. M. (1998, April). Study 1: Effects of modular subtest structure and item review on examinee performance, perceptions and pacing. Paper presented at the meeting of the National Council on Measurement in Education, San Diego.

    Google Scholar 

  • Hambleton, R. K. & Xing, D. (2006). Optimal and non-optimal computer-based test designs for making pass-fail decisions. Applied Measurement in Education, 19, 221–239.

    Article  Google Scholar 

  • Jodoin, M. G., Zenisky, A. L. & Hambleton, R. K. (2006). Comparison of the psychometric properties of several computer-based test designs for credentialing exams. Applied Measurement in Education, 2006, 19, 203–220.

    Article  Google Scholar 

  • Kim, H. (1993). Monte Carlo simulation comparison of two-stage testing and computer adaptive testing. Unpublished doctoral dissertation, University of Nebraska, Lincoln.

    Google Scholar 

  • Kim, H. & Plake, B. (1993, April). Monte Carlo simulation comparison of two-stage testing and computerized adaptive testing. Paper presented at the meeting of the National Council on Measurement in Education, Atlanta.

    Google Scholar 

  • Lee, G. & Frisbie, D. A. (1999). Estimating reliability under a generalizability theory model for test scores composed of testlets. Applied Measurement in Education, 12, 237–255.

    Article  Google Scholar 

  • Linn, R., Rock, D. & Cleary, T. (1969). The development and evaluation of several programmed testing methods. Educational and Psychological Measurement, 29, 129–146.

    Article  Google Scholar 

  • Lord, F. M. (1971). A theoretical study of two-stage testing. Psychometrika, 36, 227–242.

    Article  Google Scholar 

  • Lord, F. M. (1977). Practical applications of item characteristic curve theory. Journal of Educational Measurement, 14, 227–238.

    Article  Google Scholar 

  • Lord, F. M. (1980). Applications of item response theory to practical testing problems. Mahwah, NJ: Lawrence Erlbaum Associates.

    Google Scholar 

  • Lord, F. M. & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.

    MATH  Google Scholar 

  • Luecht, R. M. (1997, March). An adaptive sequential paradigm for managing multidimensional content. Paper presented at the meeting of the National Council on Measurement in Education, Chicago.

    Google Scholar 

  • Luecht, R. M. (1998). Computer-assisted test assembly using optimization heuristics. Applied Psychological Measurement, 22, 224–236.

    Article  Google Scholar 

  • Luecht, R. M. (2000, April). Implementing the computer-adaptive sequential testing (CAST) framework to mass produce high quality computer-adaptive and mastery tests. Paper presented at the meeting of the National Council on Measurement in Education, New Orleans, LA.

    Google Scholar 

  • Luecht, R. M. (2003, April). Exposure control using adaptive multistage item bundles. Paper presented at the meeting of the National Council on Measurement in Education, Chicago.

    Google Scholar 

  • Luecht, R. M. (2006). Designing tests for pass-fail decisions using item response theory. In S. Downing & T. Haladyna (Eds.), Handbook of test development (pp. 575–596). Mahwah, NJ: Lawrence Erlbaum Associates.

    Google Scholar 

  • Luecht, R. M., Brumfield, T. & Breithaupt, K. (2006). A testlet-assembly design for adaptive multistage tests. Applied Measurement in Education, 19, 189–202.

    Article  Google Scholar 

  • Luecht, R. M. & Burgin, W. (2003, April). Test information targeting strategies for adaptive multistage testing designs. Paper presented at the meeting of the National Council on Measurement in Education, Chicago.

    Google Scholar 

  • Luecht, R. M. & Nungester, R. J. (1998). Some practical examples of computer-adaptive sequential testing. Journal of Educational Measurement, 35(3), 229–249.

    Article  Google Scholar 

  • Luecht, R. M., Nungester, R. J. & Hadadi, A. (1996, April). Heuristic-based CAT: Balancing item information, content and exposure. Paper presented at the meeting of the National Council on Measurement in Education, New York.

    Google Scholar 

  • Mead, A. (2006). An introduction to multistage testing [Special Issue]. Applied Measurement in Education, 19, 185–260.

    Article  Google Scholar 

  • Mills, C. N. & Stocking, M. L. (1996). Practical issues in large-scale computerized adaptive testing. Applied Measurement in Education, 9, 287–304.

    Article  Google Scholar 

  • Patsula, L. N. (1999). A comparison of computerized-adaptive testing and multi-stage testing. Unpublished doctoral dissertation, University of Massachusetts, Amherst.

    Google Scholar 

  • Reese, L. M. & Schnipke, D. L. (1999). An evaluation of a two-stage testlet design for computerized testing (Computerized Testing Report 96-04). Newtown, PA: Law School Admission Council.

    Google Scholar 

  • Reese, L. M., Schnipke, D. L. & Luebke, S. W. (1999). Incorporating content constraints into a multi-stage adaptive testlet design. (Law School Admissions Council Computerized Testing Report 97-02). Newtown, PA: Law School Admission Council.

    Google Scholar 

  • Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika, Monograph Supplement, No. 17.

    Google Scholar 

  • Schnipke, D. L. & Reese, L. M. (1999). A comparison of testlet-based test designs for computerized adaptive testing (Law School Admissions Council Computerized Testing Report 97-01). Newtown, PA: Law School Admission Council.

    Google Scholar 

  • Sireci, S. G., Thissen, D. & Wainer, H. (1991). On the reliability of testlet-based tests. Journal of Educational Measurement, 28, 237–247.

    Article  Google Scholar 

  • Thissen, D. (1998, April). Scaled scores for CATs based on linear combinations of testlet scores. Paper presented at the meeting of the National Council on Measurement in Education, San Diego.

    Google Scholar 

  • Thissen, D., Steinberg, L. & Mooney, J. A. (1989). Trace lines for testlets: A use of multiple-categorical-response models. Journal of Educational Measurement, 26, 247–260.

    Article  Google Scholar 

  • van der Linden, W. J. (2000). Optimal assembly of tests with item sets. Applied Psychological Measurement, 24, 225–240.

    Article  Google Scholar 

  • van der Linden, W. J. (2005). Models for optimal test design. New York: Springer-Verlag.

    MATH  Google Scholar 

  • van der Linden, W. J. & Adema, J. J. (1998). Simultaneous assembly of multiple test forms. Journal of Educational Measurement, 35, 185–198.

    Article  Google Scholar 

  • van der Linden, W. J., Ariel, A. & Veldkamp, B. P. (2006). Assembling a computerized adaptive testing item pool as a set of linear tests. Journal of Educational and Behavioral Statistics, 31, 81–99.

    Article  Google Scholar 

  • Vos, H. J. (2000, April). Adaptive mastery testing using a Multidimensional IRT Model and Bayesian sequential decision theory. Paper presented at the meeting of the National Council on Measurement in Education, New Orleans.

    Google Scholar 

  • Vos, H. J. & Glas, C. A. W. (2001, April). Multidimensional IRT based adaptive sequential mastery testing. Paper presented at the meeting of the National Council in Measurement in Education, Seattle.

    Google Scholar 

  • Wainer, H. (1993). Some practical considerations when converting a linearly administered test to an adaptive format. Educational Measurement: Issues and Practice, 12(1), 15–20.

    Article  MathSciNet  Google Scholar 

  • Wainer, H., Bradlow, E. T. & Du, Z. (2000). Testlet response theory: An analog for the 3PL model useful in testlet-based adaptive testing. In W. J. van der Linden & C. A. W. Glas (Eds.), Computerized adaptive testing: Theory and practice (pp. 245–270). Boston: Kluwer-Nijhof Publishing.

    Google Scholar 

  • Wainer, H. & Kiely, G. L. (1987). Item clusters and computerized adaptive testing: A case for testlets. Journal of Educational Measurement, 24, 185–201.

    Article  Google Scholar 

  • Wainer, H. & Lewis, C. (1990). Toward a psychometrics for testlets. Journal of Educational Measurement, 27, 1–14.

    Article  Google Scholar 

  • Wainer, H., Sireci, S. & Thissen, D. (1991). Differential testlet functioning: Definitions and detection. Journal of Educational Measurement, 28, 197–219.

    Article  Google Scholar 

  • Wald, A. (1947). Sequential analysis. New York: Wiley.

    MATH  Google Scholar 

  • Wise, S. L. & Kingsbury, G. G. (2000). Practical issues in developing and maintaining a computerized adaptive testing program. Psicológica, 21, 135–155.

    Google Scholar 

  • Xing, D. (2001). Impact of several computer-based testing variables on the psychometric properties of credentialing examinations. Unpublished doctoral dissertation, University of Massachusetts, Amherst.

    Google Scholar 

  • Xing, D. & Hambleton, R. K. (2004). Impact of test design, item quality, and item bank size on the psychometric properties of computer-based credentialing exams. Educational and Psychological Measurement, 64, 5–21.

    Article  MathSciNet  Google Scholar 

  • Zenisky, A. L. (2004). Evaluating the effects of several multi-stage testing design variables on selected psychometric outcomes for certification and licensure assessment. Unpublished doctoral dissertation, University of Massachusetts, Amherst.

    Google Scholar 

  • Zenisky, A. L., Hambleton, R. K. & Sireci, S. G. (2002). Identification and evaluation of local item dependencies in the Medical College Admissions Test. Journal of Educational Measurement, 39, 1–16.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Zenisky, A., Hambleton, R.K., Luecht, R.M. (2009). Multistage Testing: Issues, Designs, and Research. In: van der Linden, W., Glas, C. (eds) Elements of Adaptive Testing. Statistics for Social and Behavioral Sciences. Springer, New York, NY. https://doi.org/10.1007/978-0-387-85461-8_18

Download citation

Publish with us

Policies and ethics