Multistage Testing: Issues, Designs, and Research

Zenisky, April; Hambleton, Ronald K.; Luecht, Richard M.

doi:10.1007/978-0-387-85461-8_18

April Zenisky³,
Ronald K. Hambleton³ &
Richard M. Luecht⁴

Part of the book series: Statistics for Social and Behavioral Sciences ((SSBS))

3240 Accesses
11 Citations

Abstract

Just as traditional computerized adaptive testing (CAT) involves adaptive selection of individual items for sequential administration to examinees as a test is in progress, multistage testing (MST) is an analogous approach that uses sets of items as the building blocks for a test. In MST terminology, these sets of items have come to be termed modules (Luecht & Nungester, 1998) or testlets (Wainer & Kiely, 1987) and can be characterized as short versions of linear test forms where some specified number of individual items are administered together to meet particular test specifications and provide a certain proportion of the total test information. The individual items in a module may be all related to one or more common stems (such as passages or graphics) or be more generally discrete from one another, per the content specifications of the testing program for the test in question. These self-contained, carefully constructed, fixed sets of items are the same for every examinee to whom each set is administered, but any two examinees may or may not be presented with the same sequence of modules, nor even the same modules.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Adema, J. J. (1990). The construction of customized two-stage tests. Journal of Educational Measurement, 27, 241–253.
Article Google Scholar
Angoff, W. & Huddleston, E. (1958). The multi-level experiment: A study of a two-level testing system for the College Board Scholastic Aptitude Test (Statistical Report No. SR-58-21). Princeton, NJ: Educational Testing Service.
Google Scholar
Armstrong, R., Jones, D., Koppel, N. & Pashley, P. (2000, April). Computerized adaptive testing with multiple forms structures. Paper presented at the meeting of the National Council on Measurement in Education, New Orleans, LA.
Google Scholar
Berger, M. P. F. (1994). A general approach to algorithmic design of fixed-form tests, adaptive tests, and testlets. Applied Psychological Measurement, 18, 141–153.
Article Google Scholar
Bradlow, E. T., Wainer, H. & Wang, X. (1999). A Bayesian random effects model for testlets. Psychometrika, 64, 153–168.
Article Google Scholar
Breithaupt, K., Ariel, A. & Veldkamp, B. P. (2005). Automated simultaneous assembly for multistage testing. International Journal of Testing, 5, 319–330.
Article Google Scholar
Breithaupt, K. & Hare, D. R. (2007), Automated simultaneous assembly of multistage testlets for a high-stakes licensing examination. Educational and Psychological Measurement, 67, 5–20.
Article MathSciNet Google Scholar
Cronbach, L. J. & Gleser, G. C. (1965). Psychological tests and personnel decisions (2nd ed.). Urbana, IL: University of Illinois Press.
Google Scholar
Dodd, B. G. & Fitzpatrick, S. J. (2002). Alternatives for scoring CBTs. In C. N. Mills, M. T. Potenza, J. J. Fremer & W. C. Ward (Eds.), Computer-based testing: Building the foundation for future assessments (pp. 215–236). Mahwah, NJ: Lawrence Erlbaum Associates.
Google Scholar
Folk, V. G. & Smith, R. L. (2002). Models for delivery of computer-based tests. In C. N. Mills, M. T. Potenza, J. J. Fremer & W. C. Ward (Eds.), Computer-based testing: Building the foundation for future assessments (pp. 41–66). Mahwah, NJ: Lawrence Erlbaum Associates.
Google Scholar
Glas, C. A. W., Wainer, H. & Bradlow, E. T. (2000). MML and EAP estimation in testlet-based adaptive testing. In W. J. van der Linden & C. A. W. Glas (Eds.), Computerized adaptive testing: Theory and practice. Boston: Kluwer-Nijhof Publishing.
Google Scholar
Green, B. F., Jr., Bock, R. D., Humphreys, L. G., Linn, R. B. & Reckase, M. D. (1984). Technical guidelines for assessing computerized adaptive tests. Journal of Educational Measurement, 21, 347–360.
Article Google Scholar
Hadadi, A., Luecht, R. M., Swanson, D. B. & Case, S. M. (1998, April). Study 1: Effects of modular subtest structure and item review on examinee performance, perceptions and pacing. Paper presented at the meeting of the National Council on Measurement in Education, San Diego.
Google Scholar
Hambleton, R. K. & Xing, D. (2006). Optimal and non-optimal computer-based test designs for making pass-fail decisions. Applied Measurement in Education, 19, 221–239.
Article Google Scholar
Jodoin, M. G., Zenisky, A. L. & Hambleton, R. K. (2006). Comparison of the psychometric properties of several computer-based test designs for credentialing exams. Applied Measurement in Education, 2006, 19, 203–220.
Article Google Scholar
Kim, H. (1993). Monte Carlo simulation comparison of two-stage testing and computer adaptive testing. Unpublished doctoral dissertation, University of Nebraska, Lincoln.
Google Scholar
Kim, H. & Plake, B. (1993, April). Monte Carlo simulation comparison of two-stage testing and computerized adaptive testing. Paper presented at the meeting of the National Council on Measurement in Education, Atlanta.
Google Scholar
Lee, G. & Frisbie, D. A. (1999). Estimating reliability under a generalizability theory model for test scores composed of testlets. Applied Measurement in Education, 12, 237–255.
Article Google Scholar
Linn, R., Rock, D. & Cleary, T. (1969). The development and evaluation of several programmed testing methods. Educational and Psychological Measurement, 29, 129–146.
Article Google Scholar
Lord, F. M. (1971). A theoretical study of two-stage testing. Psychometrika, 36, 227–242.
Article Google Scholar
Lord, F. M. (1977). Practical applications of item characteristic curve theory. Journal of Educational Measurement, 14, 227–238.
Article Google Scholar
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Mahwah, NJ: Lawrence Erlbaum Associates.
Google Scholar
Lord, F. M. & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.
MATH Google Scholar
Luecht, R. M. (1997, March). An adaptive sequential paradigm for managing multidimensional content. Paper presented at the meeting of the National Council on Measurement in Education, Chicago.
Google Scholar
Luecht, R. M. (1998). Computer-assisted test assembly using optimization heuristics. Applied Psychological Measurement, 22, 224–236.
Article Google Scholar
Luecht, R. M. (2000, April). Implementing the computer-adaptive sequential testing (CAST) framework to mass produce high quality computer-adaptive and mastery tests. Paper presented at the meeting of the National Council on Measurement in Education, New Orleans, LA.
Google Scholar
Luecht, R. M. (2003, April). Exposure control using adaptive multistage item bundles. Paper presented at the meeting of the National Council on Measurement in Education, Chicago.
Google Scholar
Luecht, R. M. (2006). Designing tests for pass-fail decisions using item response theory. In S. Downing & T. Haladyna (Eds.), Handbook of test development (pp. 575–596). Mahwah, NJ: Lawrence Erlbaum Associates.
Google Scholar
Luecht, R. M., Brumfield, T. & Breithaupt, K. (2006). A testlet-assembly design for adaptive multistage tests. Applied Measurement in Education, 19, 189–202.
Article Google Scholar
Luecht, R. M. & Burgin, W. (2003, April). Test information targeting strategies for adaptive multistage testing designs. Paper presented at the meeting of the National Council on Measurement in Education, Chicago.
Google Scholar
Luecht, R. M. & Nungester, R. J. (1998). Some practical examples of computer-adaptive sequential testing. Journal of Educational Measurement, 35(3), 229–249.
Article Google Scholar
Luecht, R. M., Nungester, R. J. & Hadadi, A. (1996, April). Heuristic-based CAT: Balancing item information, content and exposure. Paper presented at the meeting of the National Council on Measurement in Education, New York.
Google Scholar
Mead, A. (2006). An introduction to multistage testing [Special Issue]. Applied Measurement in Education, 19, 185–260.
Article Google Scholar
Mills, C. N. & Stocking, M. L. (1996). Practical issues in large-scale computerized adaptive testing. Applied Measurement in Education, 9, 287–304.
Article Google Scholar
Patsula, L. N. (1999). A comparison of computerized-adaptive testing and multi-stage testing. Unpublished doctoral dissertation, University of Massachusetts, Amherst.
Google Scholar
Reese, L. M. & Schnipke, D. L. (1999). An evaluation of a two-stage testlet design for computerized testing (Computerized Testing Report 96-04). Newtown, PA: Law School Admission Council.
Google Scholar
Reese, L. M., Schnipke, D. L. & Luebke, S. W. (1999). Incorporating content constraints into a multi-stage adaptive testlet design. (Law School Admissions Council Computerized Testing Report 97-02). Newtown, PA: Law School Admission Council.
Google Scholar
Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika, Monograph Supplement, No. 17.
Google Scholar
Schnipke, D. L. & Reese, L. M. (1999). A comparison of testlet-based test designs for computerized adaptive testing (Law School Admissions Council Computerized Testing Report 97-01). Newtown, PA: Law School Admission Council.
Google Scholar
Sireci, S. G., Thissen, D. & Wainer, H. (1991). On the reliability of testlet-based tests. Journal of Educational Measurement, 28, 237–247.
Article Google Scholar
Thissen, D. (1998, April). Scaled scores for CATs based on linear combinations of testlet scores. Paper presented at the meeting of the National Council on Measurement in Education, San Diego.
Google Scholar
Thissen, D., Steinberg, L. & Mooney, J. A. (1989). Trace lines for testlets: A use of multiple-categorical-response models. Journal of Educational Measurement, 26, 247–260.
Article Google Scholar
van der Linden, W. J. (2000). Optimal assembly of tests with item sets. Applied Psychological Measurement, 24, 225–240.
Article Google Scholar
van der Linden, W. J. (2005). Models for optimal test design. New York: Springer-Verlag.
MATH Google Scholar
van der Linden, W. J. & Adema, J. J. (1998). Simultaneous assembly of multiple test forms. Journal of Educational Measurement, 35, 185–198.
Article Google Scholar
van der Linden, W. J., Ariel, A. & Veldkamp, B. P. (2006). Assembling a computerized adaptive testing item pool as a set of linear tests. Journal of Educational and Behavioral Statistics, 31, 81–99.
Article Google Scholar
Vos, H. J. (2000, April). Adaptive mastery testing using a Multidimensional IRT Model and Bayesian sequential decision theory. Paper presented at the meeting of the National Council on Measurement in Education, New Orleans.
Google Scholar
Vos, H. J. & Glas, C. A. W. (2001, April). Multidimensional IRT based adaptive sequential mastery testing. Paper presented at the meeting of the National Council in Measurement in Education, Seattle.
Google Scholar
Wainer, H. (1993). Some practical considerations when converting a linearly administered test to an adaptive format. Educational Measurement: Issues and Practice, 12(1), 15–20.
Article MathSciNet Google Scholar
Wainer, H., Bradlow, E. T. & Du, Z. (2000). Testlet response theory: An analog for the 3PL model useful in testlet-based adaptive testing. In W. J. van der Linden & C. A. W. Glas (Eds.), Computerized adaptive testing: Theory and practice (pp. 245–270). Boston: Kluwer-Nijhof Publishing.
Google Scholar
Wainer, H. & Kiely, G. L. (1987). Item clusters and computerized adaptive testing: A case for testlets. Journal of Educational Measurement, 24, 185–201.
Article Google Scholar
Wainer, H. & Lewis, C. (1990). Toward a psychometrics for testlets. Journal of Educational Measurement, 27, 1–14.
Article Google Scholar
Wainer, H., Sireci, S. & Thissen, D. (1991). Differential testlet functioning: Definitions and detection. Journal of Educational Measurement, 28, 197–219.
Article Google Scholar
Wald, A. (1947). Sequential analysis. New York: Wiley.
MATH Google Scholar
Wise, S. L. & Kingsbury, G. G. (2000). Practical issues in developing and maintaining a computerized adaptive testing program. Psicológica, 21, 135–155.
Google Scholar
Xing, D. (2001). Impact of several computer-based testing variables on the psychometric properties of credentialing examinations. Unpublished doctoral dissertation, University of Massachusetts, Amherst.
Google Scholar
Xing, D. & Hambleton, R. K. (2004). Impact of test design, item quality, and item bank size on the psychometric properties of computer-based credentialing exams. Educational and Psychological Measurement, 64, 5–21.
Article MathSciNet Google Scholar
Zenisky, A. L. (2004). Evaluating the effects of several multi-stage testing design variables on selected psychometric outcomes for certification and licensure assessment. Unpublished doctoral dissertation, University of Massachusetts, Amherst.
Google Scholar
Zenisky, A. L., Hambleton, R. K. & Sireci, S. G. (2002). Identification and evaluation of local item dependencies in the Medical College Admissions Test. Journal of Educational Measurement, 39, 1–16.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Center for Educational Assessment, University of Massachusetts, Amherst, MA, 01002, USA
April Zenisky & Ronald K. Hambleton
ERM Department, University of North Carolina at Greensboro, Greensboro, NC, 26170, USA
Richard M. Luecht

Authors

April Zenisky
View author publications
You can also search for this author in PubMed Google Scholar
Ronald K. Hambleton
View author publications
You can also search for this author in PubMed Google Scholar
Richard M. Luecht
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

CTB/McGraw-Hill LLC, Ryan Ranch Road 20, Monterey, 93940, U.S.A.
Wim J. van der Linden
Fac. Behavioural Sciences, Twente University, Enschede, Netherlands
Cees A.W. Glas

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Zenisky, A., Hambleton, R.K., Luecht, R.M. (2009). Multistage Testing: Issues, Designs, and Research. In: van der Linden, W., Glas, C. (eds) Elements of Adaptive Testing. Statistics for Social and Behavioral Sciences. Springer, New York, NY. https://doi.org/10.1007/978-0-387-85461-8_18

Download citation

DOI: https://doi.org/10.1007/978-0-387-85461-8_18
Published: 31 December 2009
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-85459-5
Online ISBN: 978-0-387-85461-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics