A New Online Calibration Method for Multidimensional Computerized Adaptive Testing

Chen, Ping; Wang, Chun

doi:10.1007/s11336-015-9482-9

A New Online Calibration Method for Multidimensional Computerized Adaptive Testing

Published: 25 November 2015

Volume 81, pages 674–701, (2016)
Cite this article

Psychometrika Aims and scope Submit manuscript

1327 Accesses
22 Citations
Explore all metrics

Abstract

Multidimensional-Method A (M-Method A) has been proposed as an efficient and effective online calibration method for multidimensional computerized adaptive testing (MCAT) (Chen & Xin, Paper presented at the 78th Meeting of the Psychometric Society, Arnhem, The Netherlands, 2013). However, a key assumption of M-Method A is that it treats person parameter estimates as their true values, thus this method might yield erroneous item calibration when person parameter estimates contain non-ignorable measurement errors. To improve the performance of M-Method A, this paper proposes a new MCAT online calibration method, namely, the full functional MLE-M-Method A (FFMLE-M-Method A). This new method combines the full functional MLE (Jones & Jin in Psychometrika 59:59–75, 1994; Stefanski & Carroll in Annals of Statistics 13:1335–1351, 1985) with the original M-Method A in an effort to correct for the estimation error of ability vector that might otherwise adversely affect the precision of item calibration. Two correction schemes are also proposed when implementing the new method. A simulation study was conducted to show that the new method generated more accurate item parameter estimation than the original M-Method A in almost all conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving test efficiency for a grid multidimensional computerized classification test by the application of a conditional latent-trait distribution to a sequential probability ratio test

Article 16 December 2021

Multidimensional Computerized Adaptive Testing for Classifying Examinees

Variable-Length Stopping Rules for Multidimensional Computerized Adaptive Testing

Article 03 December 2018

Notes

Note that M-Method A, M-OEM, and M-MEM are multidimensional generalizations of the original Method A, OEM and MEM methods in UCAT, respectively.
Because the measurement error model assumes that \({\varvec{\upvarepsilon }}_i \) is independent of \(y_{ij} \), \({{\varvec{\uptheta }} }_i^O \) is independent of \(y_{ij} \).
In classical functional models, the unobserved true values \({{\varvec{\uptheta }} }_i \)’s are regarded as unknown fixed constants or parameters (Carroll et al., 2006, pp. 25)
The rationale of selecting these two sample sizes is as follows. Consistent with Stefanski and Carroll (1985), the number of examinees who answer each new item is varied at two levels (\(n_{j}\) = 300 and 600). Because we simulate m = 30 new items (see Section 4.2) and assume that each examinee only responds to D = 6 new items (see Section 4.3), therefore, the resulting sample sizes are equal to \(n_{j }\times \)(m/D) (i.e., N =1500 and 3000).
When the expected a posterior (EAP) method is employed to update the ability vector estimates, the Bayesian version of the D-optimality strategy is used here, in which the prior covariance matrix \({\varvec{\upvarphi }}\) is set to be \({\varvec{\Omega }}_\theta \).
For the grid search method, 41 points are evenly taken from [-4, 4] of each coordinate dimension, thus the step size is equal to 0.2, and a total of 41\(^{3}\) = 68921 ability points are considered.
Because each examinee answers six new items and his/her MLE of \({\varvec{\uptheta }}\) is updated once via Equation (15) or Equation (16) for each new item he/she answers, six FFMLE_Individual or FFMLE_Mean estimates can be obtained for each examinee; also, we have 100 replications. Thus, to provide the \({\varvec{\uptheta }}\) recovery of the two proposed estimators, we calculate an averaged \({\varvec{\uptheta }}\) taken over 100 replications and 6 estimates for each examinee as his/her new FFMLE estimate before computing the evaluation indictors.

References

Adams, R., Wilson, M., & Wang, W.-C. (1997). The multidimensional random coefficients multinomial logit model. Applied Psychological Measurement, 21, 1–23.
Article Google Scholar
Baker, F. B., & Kim, S. H. (2004). Item response theory: Parameter estimation techniques \((2^{{\rm nd}}\) Edition). New York: Dekker.
Ban, J.-C., Hanson, B. H., Wang, T. Y., Yi, Q., & Harris, D. J. (2001). A comparative study of on—line pretest item-calibration/scaling methods in computerized adaptive testing. Journal of Educational Measurement., 38, 191–212.
Article Google Scholar
Ban, J.-C, Hanson, B. H., Yi, Q., & Harris, D. J. (2002). Data sparseness and online pretest item calibration/scaling methods in CAT (ACT Research Report 02-01). Iowa City, IA, ACT, Inc. Available at http://www.eric.ed.gov/ERICDocs/data/ericdocs2sql/content_storage_01/0000019b/80/19/da/e9.pdf
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R., Novick (Eds.), Statistical theories of mental test scores (pp. 379–479). Reading, MA: Addison-Welsey.
Carroll, R. J., Ruppert, D., Stefanski, L. A., & Crainiceanu, C. M. (2006). Measurement error in nolinear models: A modern perspective (2nd edn). London: Chapman and Hall.
Chang, H. H., & Stout, W. (1993). The asymptotic posterior normality of the latent trait in an IRT model. Psychometrika, 58, 37–52.
Article Google Scholar
Chen, P., Xin, T., Wang, C., & Chang, H. H. (2012). Online calibration methods for the DINA model with independent attributes in CD-CAT. Psychometrika, 77, 201–222.
Article Google Scholar
Chen, P., & Xin, T. (2013). Developing online calibration methods for multidimensional computerized adaptive testing. Paper presented at the 78th Meeting of the Psychometric Society, Arnhem, the Netherlands, July.
Cheng, Y., & Yuan, K. (2010). The impact of fallible item parameter estimates on latent trait recovery. Psychometrika, 75, 280–291.
Article PubMed PubMed Central Google Scholar
Debeer, D., Buchholz, J., Hartig, J., & Janssen, R. (2014). Student, school, and country differences in sustained test-taking effort in the 2009 PISA reading assessment. Journal of Educational and Behavioral Statistics, 39, 502–523.
Article Google Scholar
Debeer, D., & Janssen, R. (2013). Modeling item-position effects within an IRT framework. Journal of Educational Measurement, 50, 164–185.
Article Google Scholar
Eggen, T. J. H. M., & Verhelst, N. D. (2011). Item calibration in incomplete testing designs. Psicologica, 32, 107–132.
Google Scholar
Folk, V. G., & Golub-Smith, M. (1996). Calibration of on-line pretest data using BILOG. Paper presented at the annual meeting of the National Council on Measurement in Education, New York, April.
Haberman, S. J., & von Davier, A. A. (2014). Considerations on parameter estimation, scoring, and linking in multistage testing. In D. L. Yan, A. A. von Davier, & C. Lewis (Eds.), Computerized multistage testing: Theory and applications (pp. 229–248). Boca Raton, FL: CRC Press.
Google Scholar
Haberman, S. J. (2009). Linking parameter estimates derived from an item response model through separate calibrations. Research Report RR-09-40. Princeton, NJ: Educational Testing Service.
Google Scholar
Hartig, J., & Höhler, J. (2008). Representation of competencies in multidimensional IRT models with within-item and between-item multidimensionality. Journal of Psychology, 216, 89–101.
Google Scholar
Hecht, M., Weirich, S., Siegle, T., & Frey, A. (2015). Modeling booklet effects for nonequivalent group designs in large-scale assessment. Educational and Psychological Measurement,. doi:10.1177/0013164414554219.
Google Scholar
Hsu, Y., Thompson, T. D., & Chen, W. (1998). CAT item calibration. Paper presented at the annual meeting of the National Council on Measuement in Education, San Diego, CA, April.
Jones, D. H., & Jin, Z. Y. (1994). Optimal sequential designs for on-line item estimation. Psychometrika, 59, 59–75.
Article Google Scholar
Lehmann, E. L., & Casella, G. C. (1998). Theory of point estimation (2nd edn). New York: Springer.
Lien, D.-H. D. (1985). Moments of truncated bivariate log-normal distributions. Economics Letters, 19, 243–247.
Article Google Scholar
Lord, F. M. (1971). Tailored testing, an application of stochastic approximation. Journal of the American Statistical Association, 66, 707–711.
Article Google Scholar
Mislevy, R. J. (1986). Bayes modal estimation in item response models. Psychometrika, 51, 177–195.
Article Google Scholar
Mislevy, R. J., & Chang, H. (2000). Does adaptive testing violate local independence? Psychometrika, 65, 149–156.
Article Google Scholar
Mulder, J., & van der Linden, W. J. (2009). Multidimensional adaptive testing with optimal design criteria for item selection. Psychometrika, 74, 273–296.
Article PubMed Google Scholar
Newman, M. E. J., & Barkema, G. T. (1999). Monte Carlo methods in statistical physics. Oxford: Clarendon Press.
Google Scholar
Parshall, C. G. (1998). Item development and pretesting in a computer-based testing environment. Paper presented at the colloquium Computer-Based Testing: Building the Foundation for Future Assessments, Philadelphia, PA, September.
Press, W. H., Teukolsky, S. A., Vetterling, W. T., & Flannery, B. P. (2007). Numerical recipes: the art of scientific computing (3rd edn.). New York: Cambridge University Press.
Reckase, M. D. (2009). Multidimensional item response theory. New York: Springer.
Book Google Scholar
Segall, D. O. (1996). Multidimensional adaptive testing. Psychometrika, 61, 331–354.
Article Google Scholar
Segall, D. O. (2001). General ability measurement: An application of multidimensional item response theory. Psychometrika, 66, 79–97.
Article Google Scholar
Segall, D. O. (2003). Calibrating CAT pools and online pretest items using MCMC methods. Paper presented at the annual meeting of National Council on Measurement in Education, Chicago, IL, April.
Stefanski, L. A., & Carroll, R. J. (1985). Covariate measurement error in logistic regression. Annals of Statistics, 13, 1335–1351.
Article Google Scholar
Stocking, M. L. (1988). Scale drift in on-line calibration (Research Rep. 88-28). Princeton, NJ: ETS.
van der Linden, W. J., & Ren, H. (2014). Optimal Bayesian adaptive design for test-item calibration. Psychometrika. doi:10.1007/s11336-013-9391-8.
Wainer, H., & Mislevy, R. J. (1990). Item response theory, item calibration, and proficiency estimation. In H. Wainer (Ed.), Computerized adaptive testing: A primer (pp. 65–102). Hillsdale, NJ: Erlbaum.
Google Scholar
Wang, C. (2014a). On latent trait estimation in multidimensional compensatory item response models. Psychometrika. doi:10.1007/s11336-013-9399-0.
Wang, C. (2014b). Improving measurement precision of hierarchical latent traits using adaptive testing. Journal of Educational and Behavioral Statistics, 39, 452–477.
Article Google Scholar
Wang, C., & Chang, H. H. (2011). Item selection in multidimensional computerized adaptive testing—gaining information from different angles. Psychometrika, 76, 363–384.
Article Google Scholar
Wang, C., & Chang, H. H. (2012). Reducing bias in MIRT trait estimation. Paper presented at the annual meeting of National Council on Measurement in Education, Vancouver, Canada, April.
Wang, C., Chang, H. H., & Boughton, K. A. (2011). Kullback-Leibler information and its applications in multi-dimensional adaptive testing. Psychometrika, 76, 13–39.
Article Google Scholar
Wang, C., Chang, H. H., & Boughton, K. A. (2013). Deriving stopping rules for multidimensional computerized adaptive testing. Applied Psychological Measurement, 37, 99–122.
Article Google Scholar
Yao, L. H. (2013). Comparing the performance of five multidimensional CAT selection procedures with different stopping rules. Applied Psychological Measurement, 37, 3–23.
Article Google Scholar
Yao, L. H., Pommerich, M., & Segall, D. O. (2014). Using multidimensional CAT to administer a short, yet precise, screening test. Applied Psychological Measurement, 38, 614–631.
Article Google Scholar

Download references

Acknowledgments

This study was partially supported by the National Natural Science Foundation of China (Grant No. 31300862), the Specialized Research Fund for the Doctoral Program of Higher Education (Grant No. 20130003120002), the Fundamental Research Funds for the Central Universities (Grant No. 2013YB26), the National Academy of Education/Spencer Fellowship (Grant No. 792269), and KLAS (Grant No. 130026509). Part of the paper was originally presented in 2014 annual meeting of the National Council on Measurement in Education, Philadelphia, Pennsylvania. The authors are indebted to the editor, associate editor, and four anonymous reviewers for their suggestions and comments on the earlier manuscript.

Author information

Authors and Affiliations

National Innovation Center for Assessment of Basic Education Quality, Beijing Normal University, No. 19, Xin Jie Kou Wai Street, Hai Dian District, Beijing, 100875 , China
Ping Chen
Department of Psychology, University of Minnesota, Minneapolis, MN, USA
Chun Wang

Authors

Ping Chen
View author publications
You can also search for this author in PubMed Google Scholar
Chun Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ping Chen.

Additional information

Both authors made equal contributions to the paper, and the order of authorship is alphabetical.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, P., Wang, C. A New Online Calibration Method for Multidimensional Computerized Adaptive Testing. Psychometrika 81, 674–701 (2016). https://doi.org/10.1007/s11336-015-9482-9

Download citation

Received: 10 November 2014
Published: 25 November 2015
Issue Date: September 2016
DOI: https://doi.org/10.1007/s11336-015-9482-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A New Online Calibration Method for Multidimensional Computerized Adaptive Testing

Abstract

Access this article

Similar content being viewed by others

Improving test efficiency for a grid multidimensional computerized classification test by the application of a conditional latent-trait distribution to a sequential probability ratio test

Multidimensional Computerized Adaptive Testing for Classifying Examinees

Variable-Length Stopping Rules for Multidimensional Computerized Adaptive Testing

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A New Online Calibration Method for Multidimensional Computerized Adaptive Testing

Abstract

Access this article

Similar content being viewed by others

Improving test efficiency for a grid multidimensional computerized classification test by the application of a conditional latent-trait distribution to a sequential probability ratio test

Multidimensional Computerized Adaptive Testing for Classifying Examinees

Variable-Length Stopping Rules for Multidimensional Computerized Adaptive Testing

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation