Abstract
Item response theory (IRT) observed-score kernel equating is introduced for the non-equivalent groups with anchor test equating design using either chain equating or post-stratification equating. The equating function is treated in a multivariate setting and the asymptotic covariance matrices of IRT observed-score kernel equating functions are derived. Equating is conducted using the two-parameter and three-parameter logistic models with simulated data and data from a standardized achievement test. The results show that IRT observed-score kernel equating offers small standard errors and low equating bias under most settings considered.
Similar content being viewed by others
References
Andersson, B., Bränberg, K., & Wiberg, M. (2013). Performing the kernel method of test equating with the package kequate. Journal of Statistical Software, 55(6), 1–25.
Battauz, M. (2015). equateIRT: An R package for IRT test equating. Journal of Statistical Software, 68(7), 1–22.
Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, 443–459.
Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1–29.
Dorans, N., & Feigenbaum, M. (1994). Equating issues engendered by changes to the SAT and PSAT/NMSQT. In I. Lawrence, N. Dorans, M. Feigenbaum, N. Feryok, A. Sehmitt, & N. Wright (Eds.), Technical issues related to the introduction of the new SAT and PSAT/NMSQT (pp. 91–122). Princeton, NJ: Educational Testing Service.
Ferguson, T. (1996). A course in large sample theory. London: Chapman & Hall.
Haebara, T. (1980). Equating logistic ability scales by a weighted least squares method. Japanese Psychological Research, 22, 144–149.
Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Boston: Kluwer.
Holland, P. W., & Thayer, D. T. (1989). The kernel method of equating score distributions (Technical Report No. 89-84). Princeton, NJ: Educational Testing Service.
Kim, S. (2006). A comparative study of IRT fixed parameter calibration methods. Journal of Educational Measurement, 43, 355–381.
Kolen, M. J., & Brennan, R. J. (2014). Test equating: Methods and practices (3rd ed.). New York: Springer.
Lee, Y.-H., & von Davier, A. A. (2011). Equating through alternative kernels. In A. A. von Davier (Ed.), Statistical models for test equating, scaling, and linking. New York: Springer.
Li, Y. H., & Lissitz, R. W. (2004). Applications of the analytically derived asymptotic standard errors of item response theory item parameter estimates. Journal of Educational Measurement, 41, 85–117.
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Erlbaum.
Lord, F. M., & Wingersky, M. S. (1984). Comparison of IRT true-score and equipercentile observed-score “equatings”. Applied Psychological Measurement, 8, 452–461.
Louis, T. A. (1982). Finding the observed information matrix when using the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 44, 226–233.
Loyd, B. H., & Hoover, H. (1980). Vertical equating using the Rasch model. Journal of Educational Measurement, 17, 179–193.
Marco, G. L. (1977). Item characteristic curve solutions to three intractable testing problems. Journal of Educational Measurement, 14, 139–160.
Mislevy, R. J., & Bock, R. D. (1990). BILOG 3: Item analysis and test scoring with binary logistic models. Mooresville, IN: Scientific Software.
Moses, T., & Holland, P. W. (2010). A comparison of statistical selection strategies for univariate and bivariate log-linear models. British Journal of Mathematical and Statistical Psychology, 63, 557–574.
Ogasawara, H. (2000). Asymptotic standard errors of IRT equating coefficients using moments. Economic Review (Otaru University of Commerce), 51, 1–23.
Ogasawara, H. (2001). Standard errors of item response theory equating/linking by response function methods. Applied Psychological Measurement, 25, 53–67.
Ogasawara, H. (2003). Asymptotic standard errors of IRT observed-score equating methods. Psychometrika, 68, 193–211.
Ogasawara, H. (2009). Asymptotic cumulants of the parameter estimators in item response theory. Computational Statistics, 24, 313–331.
R Development Core Team. (2013). R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing.
Rijmen, F., Qu, Y., & von Davier, A. A. (2011). Hypothesis testing of equating differences in the kernel equating framework. In A. A. von Davier (Ed.), Statistical models for test equating, scaling, and linking (pp. 317–326). New York: Springer.
Stocking, M. L., & Lord, F. M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7, 201–210.
van der Linden, W. J. (2011). Local observed-score equating. In A. A. von Davier (Ed.), Statistical models for test equating, scaling, and linking (pp. 317–326). New York: Springer.
von Davier, A. A., Holland, P. W., & Thayer, D. T. (2004). The kernel method of test equating. New York: Springer.
von Davier, A. A. (2010, July). Equating observed-scores: The percentile rank, gaussian kernel, and IRT observed-score equating methods. Workshop given at the International Meeting of the Psychometric Society, Athens, GA.
Wiberg, M., van der Linden, W. J., & von Davier, A. A. (2014). Local observed-score kernel equating. Journal of Educational Measurement, 51, 57–74.
Yuan, K.-H., Cheng, Y., & Patton, J. (2013). Information matrices and standard errors for MLEs of item parameters in IRT. Psychometrika, 79, 232–254.
Author information
Authors and Affiliations
Corresponding author
Additional information
The first author acknowledges the financial support from the Collaborative Innovation Center of Assessment toward Basic Education Quality at Beijing Normal University. The research in this article by the second author was funded by the Swedish Research Council Grant 2014-578.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendix
Appendix
1.1 The Partial Derivatives For the Score Probabilities From IRT Models
The \([(k_X + 1)+ (k_Y + 1)] \times [2(k_X+1)+2(k_Y+1)]\) matrix \(\frac{\partial \mathbf {v}(\varvec{r}_S, \varvec{s}_S)}{\partial \mathbf {v}(\varvec{r}_P, \varvec{r}_Q, \varvec{s}_P, \varvec{s}_Q) }\) is
The \([2(k_X+1)+2(k_Y+1)] \times (3k_X+3k_Y+2)\) matrix \(\frac{\partial \mathbf {v}(\varvec{r}_P, \varvec{r}_Q, \varvec{s}_P, \varvec{s}_Q) }{ \partial \mathbf {v}(\varvec{\alpha }_X, \varvec{\alpha }_Y, \beta _1, \beta _2)}\) is
where \(\frac{\partial \varvec{r}_P}{\partial \varvec{\alpha }_X}, \frac{\partial \varvec{s}_Q}{\partial \varvec{\alpha }_Y}, \frac{\partial \varvec{r}_Q}{\partial \varvec{\alpha }_X}, \frac{\partial \varvec{s}_P}{\partial \varvec{\alpha }_Y}, \frac{\partial \varvec{s}_P }{\partial \mathbf {v}(\beta _1, \beta _2) }\) and \(\frac{\partial \varvec{r}_Q }{\partial \mathbf {v}(\beta _1, \beta _2) }\) are partial derivative matrices with entries given in Ogasawara (2003). Lastly, the \((3k_X+3k_Y+2) \times [3(k_X+k_A)+3(k_Y+k_A)]\) matrix \(\frac{\partial \mathbf {v}(\varvec{\alpha }_X, \varvec{\alpha }_Y, \beta _1, \beta _2) }{\partial \mathbf {v}(\varvec{\alpha }_P, \varvec{\alpha }_Q) }\) is
where the partial derivative vectors in the last two rows depend on the method of estimating the equating coefficients. See Ogasawara (2000) for equating coefficients using moments and Ogasawara (2001) for equating coefficients using the Haebara and Stocking-Lord methods.
Rights and permissions
About this article
Cite this article
Andersson, B., Wiberg, M. Item Response Theory Observed-Score Kernel Equating. Psychometrika 82, 48–66 (2017). https://doi.org/10.1007/s11336-016-9528-7
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11336-016-9528-7