Item Response Theory Observed-Score Kernel Equating

Andersson, Björn; Wiberg, Marie

doi:10.1007/s11336-016-9528-7

Item Response Theory Observed-Score Kernel Equating

Published: 14 October 2016

Volume 82, pages 48–66, (2017)
Cite this article

Psychometrika Aims and scope Submit manuscript

Björn Andersson^1,2 &
Marie Wiberg³

965 Accesses
17 Citations
Explore all metrics

Abstract

Item response theory (IRT) observed-score kernel equating is introduced for the non-equivalent groups with anchor test equating design using either chain equating or post-stratification equating. The equating function is treated in a multivariate setting and the asymptotic covariance matrices of IRT observed-score kernel equating functions are derived. Equating is conducted using the two-parameter and three-parameter logistic models with simulated data and data from a standardized achievement test. The results show that IRT observed-score kernel equating offers small standard errors and low equating bias under most settings considered.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evaluating Equating Transformations from Different Frameworks

Detecting differential item functioning using generalized logistic regression in the context of large-scale assessments

Article Open access 26 June 2014

Nonequivalent Groups with Covariates Design Using Propensity Scores for Kernel Equating

References

Andersson, B., Bränberg, K., & Wiberg, M. (2013). Performing the kernel method of test equating with the package kequate. Journal of Statistical Software, 55(6), 1–25.
Article Google Scholar
Battauz, M. (2015). equateIRT: An R package for IRT test equating. Journal of Statistical Software, 68(7), 1–22.
Article Google Scholar
Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, 443–459.
Article Google Scholar
Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1–29.
Article Google Scholar
Dorans, N., & Feigenbaum, M. (1994). Equating issues engendered by changes to the SAT and PSAT/NMSQT. In I. Lawrence, N. Dorans, M. Feigenbaum, N. Feryok, A. Sehmitt, & N. Wright (Eds.), Technical issues related to the introduction of the new SAT and PSAT/NMSQT (pp. 91–122). Princeton, NJ: Educational Testing Service.
Google Scholar
Ferguson, T. (1996). A course in large sample theory. London: Chapman & Hall.
Book Google Scholar
Haebara, T. (1980). Equating logistic ability scales by a weighted least squares method. Japanese Psychological Research, 22, 144–149.
Google Scholar
Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Boston: Kluwer.
Book Google Scholar
Holland, P. W., & Thayer, D. T. (1989). The kernel method of equating score distributions (Technical Report No. 89-84). Princeton, NJ: Educational Testing Service.
Kim, S. (2006). A comparative study of IRT fixed parameter calibration methods. Journal of Educational Measurement, 43, 355–381.
Article Google Scholar
Kolen, M. J., & Brennan, R. J. (2014). Test equating: Methods and practices (3rd ed.). New York: Springer.
Book Google Scholar
Lee, Y.-H., & von Davier, A. A. (2011). Equating through alternative kernels. In A. A. von Davier (Ed.), Statistical models for test equating, scaling, and linking. New York: Springer.
Google Scholar
Li, Y. H., & Lissitz, R. W. (2004). Applications of the analytically derived asymptotic standard errors of item response theory item parameter estimates. Journal of Educational Measurement, 41, 85–117.
Article Google Scholar
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Erlbaum.
Google Scholar
Lord, F. M., & Wingersky, M. S. (1984). Comparison of IRT true-score and equipercentile observed-score “equatings”. Applied Psychological Measurement, 8, 452–461.
Article Google Scholar
Louis, T. A. (1982). Finding the observed information matrix when using the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 44, 226–233.
Google Scholar
Loyd, B. H., & Hoover, H. (1980). Vertical equating using the Rasch model. Journal of Educational Measurement, 17, 179–193.
Article Google Scholar
Marco, G. L. (1977). Item characteristic curve solutions to three intractable testing problems. Journal of Educational Measurement, 14, 139–160.
Article Google Scholar
Mislevy, R. J., & Bock, R. D. (1990). BILOG 3: Item analysis and test scoring with binary logistic models. Mooresville, IN: Scientific Software.
Google Scholar
Moses, T., & Holland, P. W. (2010). A comparison of statistical selection strategies for univariate and bivariate log-linear models. British Journal of Mathematical and Statistical Psychology, 63, 557–574.
Article PubMed Google Scholar
Ogasawara, H. (2000). Asymptotic standard errors of IRT equating coefficients using moments. Economic Review (Otaru University of Commerce), 51, 1–23.
Google Scholar
Ogasawara, H. (2001). Standard errors of item response theory equating/linking by response function methods. Applied Psychological Measurement, 25, 53–67.
Article Google Scholar
Ogasawara, H. (2003). Asymptotic standard errors of IRT observed-score equating methods. Psychometrika, 68, 193–211.
Article Google Scholar
Ogasawara, H. (2009). Asymptotic cumulants of the parameter estimators in item response theory. Computational Statistics, 24, 313–331.
Article Google Scholar
R Development Core Team. (2013). R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing.
Rijmen, F., Qu, Y., & von Davier, A. A. (2011). Hypothesis testing of equating differences in the kernel equating framework. In A. A. von Davier (Ed.), Statistical models for test equating, scaling, and linking (pp. 317–326). New York: Springer.
Google Scholar
Stocking, M. L., & Lord, F. M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7, 201–210.
Article Google Scholar
van der Linden, W. J. (2011). Local observed-score equating. In A. A. von Davier (Ed.), Statistical models for test equating, scaling, and linking (pp. 317–326). New York: Springer.
Google Scholar
von Davier, A. A., Holland, P. W., & Thayer, D. T. (2004). The kernel method of test equating. New York: Springer.
Book Google Scholar
von Davier, A. A. (2010, July). Equating observed-scores: The percentile rank, gaussian kernel, and IRT observed-score equating methods. Workshop given at the International Meeting of the Psychometric Society, Athens, GA.
Wiberg, M., van der Linden, W. J., & von Davier, A. A. (2014). Local observed-score kernel equating. Journal of Educational Measurement, 51, 57–74.
Article Google Scholar
Yuan, K.-H., Cheng, Y., & Patton, J. (2013). Information matrices and standard errors for MLEs of item parameters in IRT. Psychometrika, 79, 232–254.
Article PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Collaborative Innovation Center of Assessment toward Basic Education Quality, Beijing Normal University, No. 19 Xinjiekou Wai Street, Haidian District, 100875, Beijing, China
Björn Andersson
Uppsala University, Uppsala, Sweden
Björn Andersson
Department of Statistics USBE, Umeå University, 901 87, Umeå, Sweden
Marie Wiberg

Authors

Björn Andersson
View author publications
You can also search for this author in PubMed Google Scholar
Marie Wiberg
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Björn Andersson.

Additional information

The first author acknowledges the financial support from the Collaborative Innovation Center of Assessment toward Basic Education Quality at Beijing Normal University. The research in this article by the second author was funded by the Swedish Research Council Grant 2014-578.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 280 KB)

Appendix

1.1 The Partial Derivatives For the Score Probabilities From IRT Models

The $[(k_X + 1)+ (k_Y + 1)] \times [2(k_X+1)+2(k_Y+1)]$ matrix $\frac{\partial \mathbf {v}(\varvec{r}_S, \varvec{s}_S)}{\partial \mathbf {v}(\varvec{r}_P, \varvec{r}_Q, \varvec{s}_P, \varvec{s}_Q) }$ is

$$\begin{aligned}&\frac{\partial \mathbf {v}(\varvec{r}_S, \varvec{s}_S)}{\partial \mathbf {v}(\varvec{r}_P, \varvec{r}_Q, \varvec{s}_P, \varvec{s}_Q) }= \nonumber \\&\quad \begin{pmatrix} w_S \mathrm {diag}(\mathbf {1}_{k_X+1}) &{} (1-w_S) \mathrm {diag}(\mathbf {1}_{k_X+1}) &{} \mathbf {0} &{} \mathbf {0}\\ \mathbf {0} &{} \mathbf {0} &{} w_S \mathrm {diag}(\mathbf {1}_{k_Y+1}) &{} (1-w_S) \mathrm {diag}(\mathbf {1}_{k_Y+1}) \end{pmatrix}. \end{aligned}$$

(24)

The $[2(k_X+1)+2(k_Y+1)] \times (3k_X+3k_Y+2)$ matrix $\frac{\partial \mathbf {v}(\varvec{r}_P, \varvec{r}_Q, \varvec{s}_P, \varvec{s}_Q) }{ \partial \mathbf {v}(\varvec{\alpha }_X, \varvec{\alpha }_Y, \beta _1, \beta _2)}$ is

$$\begin{aligned} \frac{\partial \mathbf {v}(\varvec{r}_P, \varvec{r}_Q, \varvec{s}_P, \varvec{s}_Q) }{ \partial \mathbf {v}(\varvec{\alpha }_X, \varvec{\alpha }_Y, \beta _1, \beta _2)}=\begin{pmatrix} \frac{\partial \varvec{r}_P}{\partial \varvec{\alpha }_X} &{} \mathbf {0} &{} \mathbf {0} \\ \frac{\partial \varvec{r}_Q}{\partial \varvec{\alpha }_X} &{} \mathbf {0} &{} \frac{\partial \varvec{r}_Q }{\partial \mathbf {v} (\beta _1, \beta _2) }\\ \mathbf {0} &{} \frac{\partial \varvec{s}_P}{\partial \varvec{\alpha }_Y} &{} \frac{\partial \varvec{s}_P}{\partial \mathbf {v} (\beta _1, \beta _2)}\\ \mathbf {0} &{} \frac{\partial \varvec{s}_Q}{\partial \varvec{\alpha }_Y} &{} \mathbf {0} \end{pmatrix}, \end{aligned}$$

(25)

where $\frac{\partial \varvec{r}_P}{\partial \varvec{\alpha }_X}, \frac{\partial \varvec{s}_Q}{\partial \varvec{\alpha }_Y}, \frac{\partial \varvec{r}_Q}{\partial \varvec{\alpha }_X}, \frac{\partial \varvec{s}_P}{\partial \varvec{\alpha }_Y}, \frac{\partial \varvec{s}_P }{\partial \mathbf {v}(\beta _1, \beta _2) }$ and $\frac{\partial \varvec{r}_Q }{\partial \mathbf {v}(\beta _1, \beta _2) }$ are partial derivative matrices with entries given in Ogasawara (2003). Lastly, the $(3k_X+3k_Y+2) \times [3(k_X+k_A)+3(k_Y+k_A)]$ matrix $\frac{\partial \mathbf {v}(\varvec{\alpha }_X, \varvec{\alpha }_Y, \beta _1, \beta _2) }{\partial \mathbf {v}(\varvec{\alpha }_P, \varvec{\alpha }_Q) }$ is

$$\begin{aligned} \frac{\partial \mathbf {v}(\varvec{\alpha }_X, \varvec{\alpha }_Y, \beta _1, \beta _2) }{\partial \mathbf {v} (\varvec{\alpha }_P, \varvec{\alpha }_Q) }=\begin{pmatrix} \mathrm {diag}(\mathbf {1}_{3k_X}) &{} \mathbf {0} &{} \mathbf {0} &{} \mathbf {0} \\ \mathbf {0} &{} \mathbf {0} &{} \mathrm {diag}(\mathbf {1}_{3k_Y}) &{} \mathbf {0}\\ \frac{\partial \beta _1}{\partial \varvec{\alpha }_X} &{} \frac{\partial \beta _1}{\partial \varvec{\alpha }_{A_P}} &{} \frac{\partial \beta _1}{\partial \varvec{\alpha }_Y} &{} \frac{\partial \beta _1}{\partial \varvec{\alpha }_{A_Q}} \\ \frac{\partial \beta _2}{\partial \varvec{\alpha }_X} &{} \frac{\partial \beta _2}{\partial \varvec{\alpha }_{A_P}} &{} \frac{\partial \beta _2}{\partial \varvec{\alpha }_Y} &{} \frac{\partial \beta _2}{\partial \varvec{\alpha }_{A_Q}} \end{pmatrix}, \end{aligned}$$

(26)

where the partial derivative vectors in the last two rows depend on the method of estimating the equating coefficients. See Ogasawara (2000) for equating coefficients using moments and Ogasawara (2001) for equating coefficients using the Haebara and Stocking-Lord methods.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Andersson, B., Wiberg, M. Item Response Theory Observed-Score Kernel Equating. Psychometrika 82, 48–66 (2017). https://doi.org/10.1007/s11336-016-9528-7

Download citation

Received: 05 June 2014
Revised: 22 June 2016
Published: 14 October 2016
Issue Date: March 2017
DOI: https://doi.org/10.1007/s11336-016-9528-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Item Response Theory Observed-Score Kernel Equating

Abstract

Access this article

Similar content being viewed by others

Evaluating Equating Transformations from Different Frameworks

Detecting differential item functioning using generalized logistic regression in the context of large-scale assessments

Nonequivalent Groups with Covariates Design Using Propensity Scores for Kernel Equating

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material 1 (pdf 280 KB)

Appendix

1.1 The Partial Derivatives For the Score Probabilities From IRT Models

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Item Response Theory Observed-Score Kernel Equating

Abstract

Access this article

Similar content being viewed by others

Evaluating Equating Transformations from Different Frameworks

Detecting differential item functioning using generalized logistic regression in the context of large-scale assessments

Nonequivalent Groups with Covariates Design Using Propensity Scores for Kernel Equating

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material 1 (pdf 280 KB)

Appendix

Appendix

1.1 The Partial Derivatives For the Score Probabilities From IRT Models

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation