Abstract
Just as traditional computerized adaptive testing (CAT) involves adaptive selection of individual items for sequential administration to examinees as a test is in progress, multistage testing (MST) is an analogous approach that uses sets of items as the building blocks for a test. In MST terminology, these sets of items have come to be termed modules (Luecht & Nungester, 1998) or testlets (Wainer & Kiely, 1987) and can be characterized as short versions of linear test forms where some specified number of individual items are administered together to meet particular test specifications and provide a certain proportion of the total test information. The individual items in a module may be all related to one or more common stems (such as passages or graphics) or be more generally discrete from one another, per the content specifications of the testing program for the test in question. These self-contained, carefully constructed, fixed sets of items are the same for every examinee to whom each set is administered, but any two examinees may or may not be presented with the same sequence of modules, nor even the same modules.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Adema, J. J. (1990). The construction of customized two-stage tests. Journal of Educational Measurement, 27, 241–253.
Angoff, W. & Huddleston, E. (1958). The multi-level experiment: A study of a two-level testing system for the College Board Scholastic Aptitude Test (Statistical Report No. SR-58-21). Princeton, NJ: Educational Testing Service.
Armstrong, R., Jones, D., Koppel, N. & Pashley, P. (2000, April). Computerized adaptive testing with multiple forms structures. Paper presented at the meeting of the National Council on Measurement in Education, New Orleans, LA.
Berger, M. P. F. (1994). A general approach to algorithmic design of fixed-form tests, adaptive tests, and testlets. Applied Psychological Measurement, 18, 141–153.
Bradlow, E. T., Wainer, H. & Wang, X. (1999). A Bayesian random effects model for testlets. Psychometrika, 64, 153–168.
Breithaupt, K., Ariel, A. & Veldkamp, B. P. (2005). Automated simultaneous assembly for multistage testing. International Journal of Testing, 5, 319–330.
Breithaupt, K. & Hare, D. R. (2007), Automated simultaneous assembly of multistage testlets for a high-stakes licensing examination. Educational and Psychological Measurement, 67, 5–20.
Cronbach, L. J. & Gleser, G. C. (1965). Psychological tests and personnel decisions (2nd ed.). Urbana, IL: University of Illinois Press.
Dodd, B. G. & Fitzpatrick, S. J. (2002). Alternatives for scoring CBTs. In C. N. Mills, M. T. Potenza, J. J. Fremer & W. C. Ward (Eds.), Computer-based testing: Building the foundation for future assessments (pp. 215–236). Mahwah, NJ: Lawrence Erlbaum Associates.
Folk, V. G. & Smith, R. L. (2002). Models for delivery of computer-based tests. In C. N. Mills, M. T. Potenza, J. J. Fremer & W. C. Ward (Eds.), Computer-based testing: Building the foundation for future assessments (pp. 41–66). Mahwah, NJ: Lawrence Erlbaum Associates.
Glas, C. A. W., Wainer, H. & Bradlow, E. T. (2000). MML and EAP estimation in testlet-based adaptive testing. In W. J. van der Linden & C. A. W. Glas (Eds.), Computerized adaptive testing: Theory and practice. Boston: Kluwer-Nijhof Publishing.
Green, B. F., Jr., Bock, R. D., Humphreys, L. G., Linn, R. B. & Reckase, M. D. (1984). Technical guidelines for assessing computerized adaptive tests. Journal of Educational Measurement, 21, 347–360.
Hadadi, A., Luecht, R. M., Swanson, D. B. & Case, S. M. (1998, April). Study 1: Effects of modular subtest structure and item review on examinee performance, perceptions and pacing. Paper presented at the meeting of the National Council on Measurement in Education, San Diego.
Hambleton, R. K. & Xing, D. (2006). Optimal and non-optimal computer-based test designs for making pass-fail decisions. Applied Measurement in Education, 19, 221–239.
Jodoin, M. G., Zenisky, A. L. & Hambleton, R. K. (2006). Comparison of the psychometric properties of several computer-based test designs for credentialing exams. Applied Measurement in Education, 2006, 19, 203–220.
Kim, H. (1993). Monte Carlo simulation comparison of two-stage testing and computer adaptive testing. Unpublished doctoral dissertation, University of Nebraska, Lincoln.
Kim, H. & Plake, B. (1993, April). Monte Carlo simulation comparison of two-stage testing and computerized adaptive testing. Paper presented at the meeting of the National Council on Measurement in Education, Atlanta.
Lee, G. & Frisbie, D. A. (1999). Estimating reliability under a generalizability theory model for test scores composed of testlets. Applied Measurement in Education, 12, 237–255.
Linn, R., Rock, D. & Cleary, T. (1969). The development and evaluation of several programmed testing methods. Educational and Psychological Measurement, 29, 129–146.
Lord, F. M. (1971). A theoretical study of two-stage testing. Psychometrika, 36, 227–242.
Lord, F. M. (1977). Practical applications of item characteristic curve theory. Journal of Educational Measurement, 14, 227–238.
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Mahwah, NJ: Lawrence Erlbaum Associates.
Lord, F. M. & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.
Luecht, R. M. (1997, March). An adaptive sequential paradigm for managing multidimensional content. Paper presented at the meeting of the National Council on Measurement in Education, Chicago.
Luecht, R. M. (1998). Computer-assisted test assembly using optimization heuristics. Applied Psychological Measurement, 22, 224–236.
Luecht, R. M. (2000, April). Implementing the computer-adaptive sequential testing (CAST) framework to mass produce high quality computer-adaptive and mastery tests. Paper presented at the meeting of the National Council on Measurement in Education, New Orleans, LA.
Luecht, R. M. (2003, April). Exposure control using adaptive multistage item bundles. Paper presented at the meeting of the National Council on Measurement in Education, Chicago.
Luecht, R. M. (2006). Designing tests for pass-fail decisions using item response theory. In S. Downing & T. Haladyna (Eds.), Handbook of test development (pp. 575–596). Mahwah, NJ: Lawrence Erlbaum Associates.
Luecht, R. M., Brumfield, T. & Breithaupt, K. (2006). A testlet-assembly design for adaptive multistage tests. Applied Measurement in Education, 19, 189–202.
Luecht, R. M. & Burgin, W. (2003, April). Test information targeting strategies for adaptive multistage testing designs. Paper presented at the meeting of the National Council on Measurement in Education, Chicago.
Luecht, R. M. & Nungester, R. J. (1998). Some practical examples of computer-adaptive sequential testing. Journal of Educational Measurement, 35(3), 229–249.
Luecht, R. M., Nungester, R. J. & Hadadi, A. (1996, April). Heuristic-based CAT: Balancing item information, content and exposure. Paper presented at the meeting of the National Council on Measurement in Education, New York.
Mead, A. (2006). An introduction to multistage testing [Special Issue]. Applied Measurement in Education, 19, 185–260.
Mills, C. N. & Stocking, M. L. (1996). Practical issues in large-scale computerized adaptive testing. Applied Measurement in Education, 9, 287–304.
Patsula, L. N. (1999). A comparison of computerized-adaptive testing and multi-stage testing. Unpublished doctoral dissertation, University of Massachusetts, Amherst.
Reese, L. M. & Schnipke, D. L. (1999). An evaluation of a two-stage testlet design for computerized testing (Computerized Testing Report 96-04). Newtown, PA: Law School Admission Council.
Reese, L. M., Schnipke, D. L. & Luebke, S. W. (1999). Incorporating content constraints into a multi-stage adaptive testlet design. (Law School Admissions Council Computerized Testing Report 97-02). Newtown, PA: Law School Admission Council.
Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika, Monograph Supplement, No. 17.
Schnipke, D. L. & Reese, L. M. (1999). A comparison of testlet-based test designs for computerized adaptive testing (Law School Admissions Council Computerized Testing Report 97-01). Newtown, PA: Law School Admission Council.
Sireci, S. G., Thissen, D. & Wainer, H. (1991). On the reliability of testlet-based tests. Journal of Educational Measurement, 28, 237–247.
Thissen, D. (1998, April). Scaled scores for CATs based on linear combinations of testlet scores. Paper presented at the meeting of the National Council on Measurement in Education, San Diego.
Thissen, D., Steinberg, L. & Mooney, J. A. (1989). Trace lines for testlets: A use of multiple-categorical-response models. Journal of Educational Measurement, 26, 247–260.
van der Linden, W. J. (2000). Optimal assembly of tests with item sets. Applied Psychological Measurement, 24, 225–240.
van der Linden, W. J. (2005). Models for optimal test design. New York: Springer-Verlag.
van der Linden, W. J. & Adema, J. J. (1998). Simultaneous assembly of multiple test forms. Journal of Educational Measurement, 35, 185–198.
van der Linden, W. J., Ariel, A. & Veldkamp, B. P. (2006). Assembling a computerized adaptive testing item pool as a set of linear tests. Journal of Educational and Behavioral Statistics, 31, 81–99.
Vos, H. J. (2000, April). Adaptive mastery testing using a Multidimensional IRT Model and Bayesian sequential decision theory. Paper presented at the meeting of the National Council on Measurement in Education, New Orleans.
Vos, H. J. & Glas, C. A. W. (2001, April). Multidimensional IRT based adaptive sequential mastery testing. Paper presented at the meeting of the National Council in Measurement in Education, Seattle.
Wainer, H. (1993). Some practical considerations when converting a linearly administered test to an adaptive format. Educational Measurement: Issues and Practice, 12(1), 15–20.
Wainer, H., Bradlow, E. T. & Du, Z. (2000). Testlet response theory: An analog for the 3PL model useful in testlet-based adaptive testing. In W. J. van der Linden & C. A. W. Glas (Eds.), Computerized adaptive testing: Theory and practice (pp. 245–270). Boston: Kluwer-Nijhof Publishing.
Wainer, H. & Kiely, G. L. (1987). Item clusters and computerized adaptive testing: A case for testlets. Journal of Educational Measurement, 24, 185–201.
Wainer, H. & Lewis, C. (1990). Toward a psychometrics for testlets. Journal of Educational Measurement, 27, 1–14.
Wainer, H., Sireci, S. & Thissen, D. (1991). Differential testlet functioning: Definitions and detection. Journal of Educational Measurement, 28, 197–219.
Wald, A. (1947). Sequential analysis. New York: Wiley.
Wise, S. L. & Kingsbury, G. G. (2000). Practical issues in developing and maintaining a computerized adaptive testing program. Psicológica, 21, 135–155.
Xing, D. (2001). Impact of several computer-based testing variables on the psychometric properties of credentialing examinations. Unpublished doctoral dissertation, University of Massachusetts, Amherst.
Xing, D. & Hambleton, R. K. (2004). Impact of test design, item quality, and item bank size on the psychometric properties of computer-based credentialing exams. Educational and Psychological Measurement, 64, 5–21.
Zenisky, A. L. (2004). Evaluating the effects of several multi-stage testing design variables on selected psychometric outcomes for certification and licensure assessment. Unpublished doctoral dissertation, University of Massachusetts, Amherst.
Zenisky, A. L., Hambleton, R. K. & Sireci, S. G. (2002). Identification and evaluation of local item dependencies in the Medical College Admissions Test. Journal of Educational Measurement, 39, 1–16.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Zenisky, A., Hambleton, R.K., Luecht, R.M. (2009). Multistage Testing: Issues, Designs, and Research. In: van der Linden, W., Glas, C. (eds) Elements of Adaptive Testing. Statistics for Social and Behavioral Sciences. Springer, New York, NY. https://doi.org/10.1007/978-0-387-85461-8_18
Download citation
DOI: https://doi.org/10.1007/978-0-387-85461-8_18
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-85459-5
Online ISBN: 978-0-387-85461-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)