Effects of Discontinue Rules on Psychometric Properties of Test Scores

von Davier, Matthias; Cho, Youngmi; Pan, Tianshu

doi:10.1007/s11336-018-09652-3

Effects of Discontinue Rules on Psychometric Properties of Test Scores

Published: 03 January 2019

Volume 84, pages 147–163, (2019)
Cite this article

Psychometrika Aims and scope Submit manuscript

1814 Accesses
3 Citations
Explore all metrics

Abstract

This paper provides results on a form of adaptive testing that is used frequently in intelligence testing. In these tests, items are presented in order of increasing difficulty. The presentation of items is adaptive in the sense that a session is discontinued once a test taker produces a certain number of incorrect responses in sequence, with subsequent (not observed) responses commonly scored as wrong. The Stanford-Binet Intelligence Scales (SB5; Riverside Publishing Company, 2003) and the Kaufman Assessment Battery for Children (KABC-II; Kaufman and Kaufman, 2004), the Kaufman Adolescent and Adult Intelligence Test (Kaufman and Kaufman 2014) and the Universal Nonverbal Intelligence Test (2nd ed.) (Bracken and McCallum 2015) are some of the many examples using this rule. He and Wolfe (Educ Psychol Meas 72(5):808–826, 2012. https://doi.org/10.1177/0013164412441937) compared different ability estimation methods in a simulation study for this discontinue rule adaptation of test length. However, there has been no study, to our knowledge, of the underlying distributional properties based on analytic arguments drawing on probability theory, of what these authors call stochastic censoring of responses. The study results obtained by He and Wolfe (Educ Psychol Meas 72(5):808–826, 2012. https://doi.org/10.1177/0013164412441937) agree with results presented by DeAyala et al. (J Educ Meas 38:213–234, 2001) as well as Rose et al. (Modeling non-ignorable missing data with item response theory (IRT; ETS RR-10-11), Educational Testing Service, Princeton, 2010) and Rose et al. (Psychometrika 82:795–819, 2017. https://doi.org/10.1007/s11336-016-9544-7) in that ability estimates are biased most when scoring the not observed responses as wrong. This scoring is used operationally, so more research is needed in order to improve practice in this field. The paper extends existing research on adaptivity by discontinue rules in intelligence tests in multiple ways: First, an analytical study of the distributional properties of discontinue rule scored items is presented. Second, a simulation is presented that includes additional scoring rules and uses ability estimators that may be suitable to reduce bias for discontinue rule scored intelligence tests.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Variable-Length Stopping Rules for Multidimensional Computerized Adaptive Testing

Article 03 December 2018

Statistical Foundations for Computerized Adaptive Testing with Response Revision

Article 25 February 2019

Simulation Studies of Item Bias Estimation Accuracy

Notes

We thank the associate editor and reviewer #1 for pointing out that ignorability follows directly from the fact that missingness under the discontinue rule is completely determined by the observed data. We agree that this should be sufficient to lay the discussion to rest and accept this as a fact. However, it appears that the existence of peer-reviewed articles on the topic that assume non-ignorability requires some more elaboration.
Preferably, this should be a bias-corrected estimator such as the ones proposed by Warm (1989) and Firth (1993) so that conditional probabilities add up to the expected scores for all ability levels.

References

Bolt, D. M., Cohen, A. S., & Wollack, J. A. (2002). Item parameter estimation under conditions of test speededness: Application of a mixture Rasch model with ordinal constraints. Journal of Educational Measurement, 39, 331–348.
Article Google Scholar
Bracken, B. A., & McCallum, R. S. (2015). Universal nonverbal intelligence test (2nd ed.). Itasca, IL: Riverside Publishers.
Google Scholar
Chen, H., Yamamoto, K., & von Davier, M. (2014). Controlling multistage testing exposure rates in international large-scale assessments. In D. L. Yan, A. A. von Davier, & C. Lewis (Eds.), Computerized multistage testing: Theory and applications. New York: CRC Press.
Google Scholar
DeAyala, R. J., Plake, B. S., & Impara, J. C. (2001). The impact of omitted responses on the accuracy of ability estimation in item response theory. Journal of Educational Measurement, 38, 213–234.
Article Google Scholar
Firth, D. (1993). Bias reduction of maximum likelihood estimates. Biometrika, 80(1), 27–38.
Article Google Scholar
Glas, C. A. W. (2010). Item parameter estimation and item fit analysis. In W. J. van der Linden & C. A. W. Glas (Eds.), Elements of adaptive testing (pp. 269–288). New York: Springer.
Google Scholar
He, W., & Wolfe, E. W. (2012). Treatment of not-administered items on individually administered intelligence tests. Educational and Psychological Measurement, 72(5), 808–826. https://doi.org/10.1177/0013164412441937.
Article Google Scholar
Holland, P. W., & Rosenbaum, P. R. (1986). Conditional association and unidimensionality in monotone latent variable models. The Annals of Statistics, 14(4), 1523–1543.
Article Google Scholar
Holland, P. W., & Thayer, D. T. (1986). Differential item functioning and the Mantel–Haenzel procedure. ETS Research Report Series. https://doi.org/10.1002/j.2330-8516.1986.tb00186.x.
Homack, S. R., & Reynolds, C. R. (2007). Essentials of assessment with brief intelligence tests. Hoboken: Wiley. ISBN: 978-0-471-26412-5.
Google Scholar
Kaufman, A. S., & Kaufman, N. L. (2004). Manual: Kaufman assessment battery for children (2nd ed.). Circle Pines, MN: AGS Publishing.
Google Scholar
Kaufman, A. S., & Kaufman, N. L. (2014). Kaufman adolescent and adult intelligence test. Encyclopedia of Special Education. https://doi.org/10.1002/9781118660584.ese1323.
Little, R. J. A. (1988). Missing-data adjustments in large surveys. Journal of Business and Economic Statistics, 6, 287–296.
Google Scholar
Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data (2nd ed.). Hoboken, NJ: Wiley.
Book Google Scholar
Little, R. J., & Zhang, N. (2011). Subsample ignorable likelihood for regression analysis with missing data. Journal of the Royal Statistical Society: Series C: Applied Statistics, 60(4), 591–605. https://doi.org/10.1111/j.1467-9876.2011.00763.x.
Article Google Scholar
Little, R. J., Rubin, D. B., & Zangeneh, S. Z. (2017). Conditions for ignoring the missing-data mechanism in likelihood inferences for parameter subsets. Journal of the American Statistical Association, 112(517), 314–320.
Article Google Scholar
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum.
Google Scholar
Mantel, N., & Haenszel, W. (1959). Statistical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer Institute, 22, 719–748.
PubMed Google Scholar
Mislevy, R. J., & Wu, P.-K. (1996). Missing responses and IRT ability estimation: Omits, choice, time limits, and adaptive testing. ETS Research Report Series, 1996, i–36. https://doi.org/10.1002/j.2333-8504.1996.tb01708.x.
Article Google Scholar
Morris, T. P., White, I. R., & Royston, P. (2014). Tuning multiple imputation by predictive mean matching and local residual draws. BMC Medical Research Methodology, 14, 75–87.
Article PubMed PubMed Central Google Scholar
Riverside Publishing Company. (2003). Stanford-Binet intelligence scales (SB5) (5th edn). Itasca, IL.
Rose, N., von Davier, M., & Xu, X. (2010). Modeling non-ignorable missing data with item response theory (IRT; ETS RR-10-11). Princeton, NJ: Educational Testing Service.
Google Scholar
Rose, N., von Davier, M., & Nagengast, B. (2017). Modeling omitted and not-reached items in IRT models. Psychometrika, 82, 795–819. https://doi.org/10.1007/s11336-016-9544-7.
Article Google Scholar
Reichenbach, H. (1956). The direction of time. Berkeley, LA: University of California Press.
Book Google Scholar
Rubin, D. B. (1976). Inference and missing data. Biometrika, 63, 581–592.
Article Google Scholar
Rubin, D. B. (1986). Statistical matching using file concatenation with adjusted weights and multiple imputations. Journal of Business and Economic Statistics, 4, 87–94.
Google Scholar
Suppes, P. (1970). A probabilistic theory of causality. Amsterdam: North-Holland Publishing Company.
Google Scholar
Suppes, P., & Zanotti, M. (1981). When are probabilistic explanations possible? Synthese, 48, 191–199.
Article Google Scholar
van der Linden, W. (ed.) (2016). Handbook of item response theory (Vol. 1, 2nd edn). Boca Raton: CRC Press.
von Davier, M. (2005). A general diagnostic model applied to language testing data. In Research report RR-05-16. Princeton, NJ: ETS.
von Davier, M. (2016a). The rasch model. Chapter 3. In W. van der Linden (Ed.), Handbook of item response theory (2nd ed., Vol. 1, pp. 31–48). Boca Raton: CRC Press. https://doi.org/10.1201/9781315374512-4.
Chapter Google Scholar
von Davier, M. (2016b). CTT and No-DIF and ? = (almost) Rasch model. Chapter 14. In: M. Rosen, K. Y. Hansen, U. Wolff (Eds.). Cognitive abilities and educational outcomes: A festschrift in Honour of Jan-Eric Gustafsson (pp. 249–272). A Volume in the Springer Book Series: Methodology of Educational Measurement and Assessment.
von Davier, M., & Rost, J. (1995). Polytomous mixed Rasch models. In G. H. Fischer & I. W. Molenaar (Eds.), Rasch models—foundations, recent developments, and applications (pp. 371–379). New York: Springer.
Verhelst, N. D., & Glas, C. A. W. (1995). The one parameter logistic model. In G. H. Fischer & I. W. Molenaar (Eds.), Rasch models. New York, NY: Springer. https://doi.org/10.1007/978-1-4612-4230-7_12.
Chapter Google Scholar
Warm, T. (1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54(3), 427–450.
Article Google Scholar
Yamamoto, K., & Everson, H. (1997). Modeling the effects of test length and test time on parameter estimation using the HYBRID model. In J. Rost & R. Langeheine (Eds.), Applications of latent trait and latent class models in the social sciences (pp. 89–98). New York: Waxman.
Google Scholar

Download references

Author information

Authors and Affiliations

National Board of Medical Examiners, 3750 Market Street, Philadelphia, PA, 19104-3102, USA
Matthias von Davier
American Institutes for Research, 1000 Thomas Jefferson Street, NW, Washington D.C., 20007, USA
Youngmi Cho
Pearson, 19500 Bulverde Rd, San Antonio, TX, 78259, USA
Tianshu Pan

Authors

Matthias von Davier
View author publications
You can also search for this author in PubMed Google Scholar
Youngmi Cho
View author publications
You can also search for this author in PubMed Google Scholar
Tianshu Pan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matthias von Davier.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

von Davier, M., Cho, Y. & Pan, T. Effects of Discontinue Rules on Psychometric Properties of Test Scores. Psychometrika 84, 147–163 (2019). https://doi.org/10.1007/s11336-018-09652-3

Download citation

Received: 16 August 2017
Published: 03 January 2019
Issue Date: 15 March 2019
DOI: https://doi.org/10.1007/s11336-018-09652-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Effects of Discontinue Rules on Psychometric Properties of Test Scores

Abstract

Access this article

Similar content being viewed by others

Variable-Length Stopping Rules for Multidimensional Computerized Adaptive Testing

Statistical Foundations for Computerized Adaptive Testing with Response Revision

Simulation Studies of Item Bias Estimation Accuracy

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Effects of Discontinue Rules on Psychometric Properties of Test Scores

Abstract

Access this article

Similar content being viewed by others

Variable-Length Stopping Rules for Multidimensional Computerized Adaptive Testing

Statistical Foundations for Computerized Adaptive Testing with Response Revision

Simulation Studies of Item Bias Estimation Accuracy

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation