Skip to main content
Log in

A New Online Calibration Method for Multidimensional Computerized Adaptive Testing

  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

Multidimensional-Method A (M-Method A) has been proposed as an efficient and effective online calibration method for multidimensional computerized adaptive testing (MCAT) (Chen & Xin, Paper presented at the 78th Meeting of the Psychometric Society, Arnhem, The Netherlands, 2013). However, a key assumption of M-Method A is that it treats person parameter estimates as their true values, thus this method might yield erroneous item calibration when person parameter estimates contain non-ignorable measurement errors. To improve the performance of M-Method A, this paper proposes a new MCAT online calibration method, namely, the full functional MLE-M-Method A (FFMLE-M-Method A). This new method combines the full functional MLE (Jones & Jin in Psychometrika 59:59–75, 1994; Stefanski & Carroll in Annals of Statistics 13:1335–1351, 1985) with the original M-Method A in an effort to correct for the estimation error of ability vector that might otherwise adversely affect the precision of item calibration. Two correction schemes are also proposed when implementing the new method. A simulation study was conducted to show that the new method generated more accurate item parameter estimation than the original M-Method A in almost all conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. Note that M-Method A, M-OEM, and M-MEM are multidimensional generalizations of the original Method A, OEM and MEM methods in UCAT, respectively.

  2. Because the measurement error model assumes that \({\varvec{\upvarepsilon }}_i \) is independent of \(y_{ij} \), \({{\varvec{\uptheta }} }_i^O \) is independent of \(y_{ij} \).

  3. In classical functional models, the unobserved true values \({{\varvec{\uptheta }} }_i \)’s are regarded as unknown fixed constants or parameters (Carroll et al., 2006, pp. 25)

  4. The rationale of selecting these two sample sizes is as follows. Consistent with Stefanski and Carroll (1985), the number of examinees who answer each new item is varied at two levels (\(n_{j}\) = 300 and 600). Because we simulate m = 30 new items (see Section 4.2) and assume that each examinee only responds to D = 6 new items (see Section 4.3), therefore, the resulting sample sizes are equal to \(n_{j }\times \)(m/D) (i.e., N =1500 and 3000).

  5. When the expected a posterior (EAP) method is employed to update the ability vector estimates, the Bayesian version of the D-optimality strategy is used here, in which the prior covariance matrix \({\varvec{\upvarphi }}\) is set to be \({\varvec{\Omega }}_\theta \).

  6. For the grid search method, 41 points are evenly taken from [-4, 4] of each coordinate dimension, thus the step size is equal to 0.2, and a total of 41\(^{3}\) = 68921 ability points are considered.

  7. Because each examinee answers six new items and his/her MLE of \({\varvec{\uptheta }}\) is updated once via Equation (15) or Equation (16) for each new item he/she answers, six FFMLE_Individual or FFMLE_Mean estimates can be obtained for each examinee; also, we have 100 replications. Thus, to provide the \({\varvec{\uptheta }}\) recovery of the two proposed estimators, we calculate an averaged \({\varvec{\uptheta }}\) taken over 100 replications and 6 estimates for each examinee as his/her new FFMLE estimate before computing the evaluation indictors.

References

  • Adams, R., Wilson, M., & Wang, W.-C. (1997). The multidimensional random coefficients multinomial logit model. Applied Psychological Measurement, 21, 1–23.

    Article  Google Scholar 

  • Baker, F. B., & Kim, S. H. (2004). Item response theory: Parameter estimation techniques \((2^{{\rm nd}}\) Edition). New York: Dekker.

  • Ban, J.-C., Hanson, B. H., Wang, T. Y., Yi, Q., & Harris, D. J. (2001). A comparative study of on—line pretest item-calibration/scaling methods in computerized adaptive testing. Journal of Educational Measurement., 38, 191–212.

    Article  Google Scholar 

  • Ban, J.-C, Hanson, B. H., Yi, Q., & Harris, D. J. (2002). Data sparseness and online pretest item calibration/scaling methods in CAT (ACT Research Report 02-01). Iowa City, IA, ACT, Inc. Available at http://www.eric.ed.gov/ERICDocs/data/ericdocs2sql/content_storage_01/0000019b/80/19/da/e9.pdf

  • Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R., Novick (Eds.), Statistical theories of mental test scores (pp. 379–479). Reading, MA: Addison-Welsey.

  • Carroll, R. J., Ruppert, D., Stefanski, L. A., & Crainiceanu, C. M. (2006). Measurement error in nolinear models: A modern perspective (2nd edn). London: Chapman and Hall.

  • Chang, H. H., & Stout, W. (1993). The asymptotic posterior normality of the latent trait in an IRT model. Psychometrika, 58, 37–52.

    Article  Google Scholar 

  • Chen, P., Xin, T., Wang, C., & Chang, H. H. (2012). Online calibration methods for the DINA model with independent attributes in CD-CAT. Psychometrika, 77, 201–222.

    Article  Google Scholar 

  • Chen, P., & Xin, T. (2013). Developing online calibration methods for multidimensional computerized adaptive testing. Paper presented at the 78th Meeting of the Psychometric Society, Arnhem, the Netherlands, July.

  • Cheng, Y., & Yuan, K. (2010). The impact of fallible item parameter estimates on latent trait recovery. Psychometrika, 75, 280–291.

    Article  PubMed  PubMed Central  Google Scholar 

  • Debeer, D., Buchholz, J., Hartig, J., & Janssen, R. (2014). Student, school, and country differences in sustained test-taking effort in the 2009 PISA reading assessment. Journal of Educational and Behavioral Statistics, 39, 502–523.

    Article  Google Scholar 

  • Debeer, D., & Janssen, R. (2013). Modeling item-position effects within an IRT framework. Journal of Educational Measurement, 50, 164–185.

    Article  Google Scholar 

  • Eggen, T. J. H. M., & Verhelst, N. D. (2011). Item calibration in incomplete testing designs. Psicologica, 32, 107–132.

    Google Scholar 

  • Folk, V. G., & Golub-Smith, M. (1996). Calibration of on-line pretest data using BILOG. Paper presented at the annual meeting of the National Council on Measurement in Education, New York, April.

  • Haberman, S. J., & von Davier, A. A. (2014). Considerations on parameter estimation, scoring, and linking in multistage testing. In D. L. Yan, A. A. von Davier, & C. Lewis (Eds.), Computerized multistage testing: Theory and applications (pp. 229–248). Boca Raton, FL: CRC Press.

    Google Scholar 

  • Haberman, S. J. (2009). Linking parameter estimates derived from an item response model through separate calibrations. Research Report RR-09-40. Princeton, NJ: Educational Testing Service.

    Google Scholar 

  • Hartig, J., & Höhler, J. (2008). Representation of competencies in multidimensional IRT models with within-item and between-item multidimensionality. Journal of Psychology, 216, 89–101.

    Google Scholar 

  • Hecht, M., Weirich, S., Siegle, T., & Frey, A. (2015). Modeling booklet effects for nonequivalent group designs in large-scale assessment. Educational and Psychological Measurement,. doi:10.1177/0013164414554219.

    Google Scholar 

  • Hsu, Y., Thompson, T. D., & Chen, W. (1998). CAT item calibration. Paper presented at the annual meeting of the National Council on Measuement in Education, San Diego, CA, April.

  • Jones, D. H., & Jin, Z. Y. (1994). Optimal sequential designs for on-line item estimation. Psychometrika, 59, 59–75.

    Article  Google Scholar 

  • Lehmann, E. L., & Casella, G. C. (1998). Theory of point estimation (2nd edn). New York: Springer.

  • Lien, D.-H. D. (1985). Moments of truncated bivariate log-normal distributions. Economics Letters, 19, 243–247.

    Article  Google Scholar 

  • Lord, F. M. (1971). Tailored testing, an application of stochastic approximation. Journal of the American Statistical Association, 66, 707–711.

    Article  Google Scholar 

  • Mislevy, R. J. (1986). Bayes modal estimation in item response models. Psychometrika, 51, 177–195.

    Article  Google Scholar 

  • Mislevy, R. J., & Chang, H. (2000). Does adaptive testing violate local independence? Psychometrika, 65, 149–156.

    Article  Google Scholar 

  • Mulder, J., & van der Linden, W. J. (2009). Multidimensional adaptive testing with optimal design criteria for item selection. Psychometrika, 74, 273–296.

    Article  PubMed  Google Scholar 

  • Newman, M. E. J., & Barkema, G. T. (1999). Monte Carlo methods in statistical physics. Oxford: Clarendon Press.

    Google Scholar 

  • Parshall, C. G. (1998). Item development and pretesting in a computer-based testing environment. Paper presented at the colloquium Computer-Based Testing: Building the Foundation for Future Assessments, Philadelphia, PA, September.

  • Press, W. H., Teukolsky, S. A., Vetterling, W. T., & Flannery, B. P. (2007). Numerical recipes: the art of scientific computing (3rd edn.). New York: Cambridge University Press.

  • Reckase, M. D. (2009). Multidimensional item response theory. New York: Springer.

    Book  Google Scholar 

  • Segall, D. O. (1996). Multidimensional adaptive testing. Psychometrika, 61, 331–354.

    Article  Google Scholar 

  • Segall, D. O. (2001). General ability measurement: An application of multidimensional item response theory. Psychometrika, 66, 79–97.

    Article  Google Scholar 

  • Segall, D. O. (2003). Calibrating CAT pools and online pretest items using MCMC methods. Paper presented at the annual meeting of National Council on Measurement in Education, Chicago, IL, April.

  • Stefanski, L. A., & Carroll, R. J. (1985). Covariate measurement error in logistic regression. Annals of Statistics, 13, 1335–1351.

    Article  Google Scholar 

  • Stocking, M. L. (1988). Scale drift in on-line calibration (Research Rep. 88-28). Princeton, NJ: ETS.

  • van der Linden, W. J., & Ren, H. (2014). Optimal Bayesian adaptive design for test-item calibration. Psychometrika. doi:10.1007/s11336-013-9391-8.

  • Wainer, H., & Mislevy, R. J. (1990). Item response theory, item calibration, and proficiency estimation. In H. Wainer (Ed.), Computerized adaptive testing: A primer (pp. 65–102). Hillsdale, NJ: Erlbaum.

    Google Scholar 

  • Wang, C. (2014a). On latent trait estimation in multidimensional compensatory item response models. Psychometrika. doi:10.1007/s11336-013-9399-0.

  • Wang, C. (2014b). Improving measurement precision of hierarchical latent traits using adaptive testing. Journal of Educational and Behavioral Statistics, 39, 452–477.

    Article  Google Scholar 

  • Wang, C., & Chang, H. H. (2011). Item selection in multidimensional computerized adaptive testing—gaining information from different angles. Psychometrika, 76, 363–384.

    Article  Google Scholar 

  • Wang, C., & Chang, H. H. (2012). Reducing bias in MIRT trait estimation. Paper presented at the annual meeting of National Council on Measurement in Education, Vancouver, Canada, April.

  • Wang, C., Chang, H. H., & Boughton, K. A. (2011). Kullback-Leibler information and its applications in multi-dimensional adaptive testing. Psychometrika, 76, 13–39.

    Article  Google Scholar 

  • Wang, C., Chang, H. H., & Boughton, K. A. (2013). Deriving stopping rules for multidimensional computerized adaptive testing. Applied Psychological Measurement, 37, 99–122.

    Article  Google Scholar 

  • Yao, L. H. (2013). Comparing the performance of five multidimensional CAT selection procedures with different stopping rules. Applied Psychological Measurement, 37, 3–23.

    Article  Google Scholar 

  • Yao, L. H., Pommerich, M., & Segall, D. O. (2014). Using multidimensional CAT to administer a short, yet precise, screening test. Applied Psychological Measurement, 38, 614–631.

    Article  Google Scholar 

Download references

Acknowledgments

This study was partially supported by the National Natural Science Foundation of China (Grant No. 31300862), the Specialized Research Fund for the Doctoral Program of Higher Education (Grant No. 20130003120002), the Fundamental Research Funds for the Central Universities (Grant No. 2013YB26), the National Academy of Education/Spencer Fellowship (Grant No. 792269), and KLAS (Grant No. 130026509). Part of the paper was originally presented in 2014 annual meeting of the National Council on Measurement in Education, Philadelphia, Pennsylvania. The authors are indebted to the editor, associate editor, and four anonymous reviewers for their suggestions and comments on the earlier manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ping Chen.

Additional information

Both authors made equal contributions to the paper, and the order of authorship is alphabetical.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, P., Wang, C. A New Online Calibration Method for Multidimensional Computerized Adaptive Testing. Psychometrika 81, 674–701 (2016). https://doi.org/10.1007/s11336-015-9482-9

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11336-015-9482-9

Keywords

Navigation