Skip to main content
Log in

Item Response Theory Observed-Score Kernel Equating

  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

Item response theory (IRT) observed-score kernel equating is introduced for the non-equivalent groups with anchor test equating design using either chain equating or post-stratification equating. The equating function is treated in a multivariate setting and the asymptotic covariance matrices of IRT observed-score kernel equating functions are derived. Equating is conducted using the two-parameter and three-parameter logistic models with simulated data and data from a standardized achievement test. The results show that IRT observed-score kernel equating offers small standard errors and low equating bias under most settings considered.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Andersson, B., Bränberg, K., & Wiberg, M. (2013). Performing the kernel method of test equating with the package kequate. Journal of Statistical Software, 55(6), 1–25.

    Article  Google Scholar 

  • Battauz, M. (2015). equateIRT: An R package for IRT test equating. Journal of Statistical Software, 68(7), 1–22.

    Article  Google Scholar 

  • Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, 443–459.

    Article  Google Scholar 

  • Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1–29.

    Article  Google Scholar 

  • Dorans, N., & Feigenbaum, M. (1994). Equating issues engendered by changes to the SAT and PSAT/NMSQT. In I. Lawrence, N. Dorans, M. Feigenbaum, N. Feryok, A. Sehmitt, & N. Wright (Eds.), Technical issues related to the introduction of the new SAT and PSAT/NMSQT (pp. 91–122). Princeton, NJ: Educational Testing Service.

    Google Scholar 

  • Ferguson, T. (1996). A course in large sample theory. London: Chapman & Hall.

    Book  Google Scholar 

  • Haebara, T. (1980). Equating logistic ability scales by a weighted least squares method. Japanese Psychological Research, 22, 144–149.

    Google Scholar 

  • Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Boston: Kluwer.

    Book  Google Scholar 

  • Holland, P. W., & Thayer, D. T. (1989). The kernel method of equating score distributions (Technical Report No. 89-84). Princeton, NJ: Educational Testing Service.

  • Kim, S. (2006). A comparative study of IRT fixed parameter calibration methods. Journal of Educational Measurement, 43, 355–381.

    Article  Google Scholar 

  • Kolen, M. J., & Brennan, R. J. (2014). Test equating: Methods and practices (3rd ed.). New York: Springer.

    Book  Google Scholar 

  • Lee, Y.-H., & von Davier, A. A. (2011). Equating through alternative kernels. In A. A. von Davier (Ed.), Statistical models for test equating, scaling, and linking. New York: Springer.

    Google Scholar 

  • Li, Y. H., & Lissitz, R. W. (2004). Applications of the analytically derived asymptotic standard errors of item response theory item parameter estimates. Journal of Educational Measurement, 41, 85–117.

    Article  Google Scholar 

  • Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Erlbaum.

    Google Scholar 

  • Lord, F. M., & Wingersky, M. S. (1984). Comparison of IRT true-score and equipercentile observed-score “equatings”. Applied Psychological Measurement, 8, 452–461.

    Article  Google Scholar 

  • Louis, T. A. (1982). Finding the observed information matrix when using the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 44, 226–233.

    Google Scholar 

  • Loyd, B. H., & Hoover, H. (1980). Vertical equating using the Rasch model. Journal of Educational Measurement, 17, 179–193.

    Article  Google Scholar 

  • Marco, G. L. (1977). Item characteristic curve solutions to three intractable testing problems. Journal of Educational Measurement, 14, 139–160.

    Article  Google Scholar 

  • Mislevy, R. J., & Bock, R. D. (1990). BILOG 3: Item analysis and test scoring with binary logistic models. Mooresville, IN: Scientific Software.

    Google Scholar 

  • Moses, T., & Holland, P. W. (2010). A comparison of statistical selection strategies for univariate and bivariate log-linear models. British Journal of Mathematical and Statistical Psychology, 63, 557–574.

    Article  PubMed  Google Scholar 

  • Ogasawara, H. (2000). Asymptotic standard errors of IRT equating coefficients using moments. Economic Review (Otaru University of Commerce), 51, 1–23.

    Google Scholar 

  • Ogasawara, H. (2001). Standard errors of item response theory equating/linking by response function methods. Applied Psychological Measurement, 25, 53–67.

    Article  Google Scholar 

  • Ogasawara, H. (2003). Asymptotic standard errors of IRT observed-score equating methods. Psychometrika, 68, 193–211.

    Article  Google Scholar 

  • Ogasawara, H. (2009). Asymptotic cumulants of the parameter estimators in item response theory. Computational Statistics, 24, 313–331.

    Article  Google Scholar 

  • R Development Core Team. (2013). R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing.

  • Rijmen, F., Qu, Y., & von Davier, A. A. (2011). Hypothesis testing of equating differences in the kernel equating framework. In A. A. von Davier (Ed.), Statistical models for test equating, scaling, and linking (pp. 317–326). New York: Springer.

    Google Scholar 

  • Stocking, M. L., & Lord, F. M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7, 201–210.

    Article  Google Scholar 

  • van der Linden, W. J. (2011). Local observed-score equating. In A. A. von Davier (Ed.), Statistical models for test equating, scaling, and linking (pp. 317–326). New York: Springer.

    Google Scholar 

  • von Davier, A. A., Holland, P. W., & Thayer, D. T. (2004). The kernel method of test equating. New York: Springer.

    Book  Google Scholar 

  • von Davier, A. A. (2010, July). Equating observed-scores: The percentile rank, gaussian kernel, and IRT observed-score equating methods. Workshop given at the International Meeting of the Psychometric Society, Athens, GA.

  • Wiberg, M., van der Linden, W. J., & von Davier, A. A. (2014). Local observed-score kernel equating. Journal of Educational Measurement, 51, 57–74.

    Article  Google Scholar 

  • Yuan, K.-H., Cheng, Y., & Patton, J. (2013). Information matrices and standard errors for MLEs of item parameters in IRT. Psychometrika, 79, 232–254.

    Article  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Björn Andersson.

Additional information

The first author acknowledges the financial support from the Collaborative Innovation Center of Assessment toward Basic Education Quality at Beijing Normal University. The research in this article by the second author was funded by the Swedish Research Council Grant 2014-578.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 280 KB)

Appendix

Appendix

1.1 The Partial Derivatives For the Score Probabilities From IRT Models

The \([(k_X + 1)+ (k_Y + 1)] \times [2(k_X+1)+2(k_Y+1)]\) matrix \(\frac{\partial \mathbf {v}(\varvec{r}_S, \varvec{s}_S)}{\partial \mathbf {v}(\varvec{r}_P, \varvec{r}_Q, \varvec{s}_P, \varvec{s}_Q) }\) is

$$\begin{aligned}&\frac{\partial \mathbf {v}(\varvec{r}_S, \varvec{s}_S)}{\partial \mathbf {v}(\varvec{r}_P, \varvec{r}_Q, \varvec{s}_P, \varvec{s}_Q) }= \nonumber \\&\quad \begin{pmatrix} w_S \mathrm {diag}(\mathbf {1}_{k_X+1}) &{} (1-w_S) \mathrm {diag}(\mathbf {1}_{k_X+1}) &{} \mathbf {0} &{} \mathbf {0}\\ \mathbf {0} &{} \mathbf {0} &{} w_S \mathrm {diag}(\mathbf {1}_{k_Y+1}) &{} (1-w_S) \mathrm {diag}(\mathbf {1}_{k_Y+1}) \end{pmatrix}. \end{aligned}$$
(24)

The \([2(k_X+1)+2(k_Y+1)] \times (3k_X+3k_Y+2)\) matrix \(\frac{\partial \mathbf {v}(\varvec{r}_P, \varvec{r}_Q, \varvec{s}_P, \varvec{s}_Q) }{ \partial \mathbf {v}(\varvec{\alpha }_X, \varvec{\alpha }_Y, \beta _1, \beta _2)}\) is

$$\begin{aligned} \frac{\partial \mathbf {v}(\varvec{r}_P, \varvec{r}_Q, \varvec{s}_P, \varvec{s}_Q) }{ \partial \mathbf {v}(\varvec{\alpha }_X, \varvec{\alpha }_Y, \beta _1, \beta _2)}=\begin{pmatrix} \frac{\partial \varvec{r}_P}{\partial \varvec{\alpha }_X} &{} \mathbf {0} &{} \mathbf {0} \\ \frac{\partial \varvec{r}_Q}{\partial \varvec{\alpha }_X} &{} \mathbf {0} &{} \frac{\partial \varvec{r}_Q }{\partial \mathbf {v} (\beta _1, \beta _2) }\\ \mathbf {0} &{} \frac{\partial \varvec{s}_P}{\partial \varvec{\alpha }_Y} &{} \frac{\partial \varvec{s}_P}{\partial \mathbf {v} (\beta _1, \beta _2)}\\ \mathbf {0} &{} \frac{\partial \varvec{s}_Q}{\partial \varvec{\alpha }_Y} &{} \mathbf {0} \end{pmatrix}, \end{aligned}$$
(25)

where \(\frac{\partial \varvec{r}_P}{\partial \varvec{\alpha }_X}, \frac{\partial \varvec{s}_Q}{\partial \varvec{\alpha }_Y}, \frac{\partial \varvec{r}_Q}{\partial \varvec{\alpha }_X}, \frac{\partial \varvec{s}_P}{\partial \varvec{\alpha }_Y}, \frac{\partial \varvec{s}_P }{\partial \mathbf {v}(\beta _1, \beta _2) }\) and \(\frac{\partial \varvec{r}_Q }{\partial \mathbf {v}(\beta _1, \beta _2) }\) are partial derivative matrices with entries given in Ogasawara (2003). Lastly, the \((3k_X+3k_Y+2) \times [3(k_X+k_A)+3(k_Y+k_A)]\) matrix \(\frac{\partial \mathbf {v}(\varvec{\alpha }_X, \varvec{\alpha }_Y, \beta _1, \beta _2) }{\partial \mathbf {v}(\varvec{\alpha }_P, \varvec{\alpha }_Q) }\) is

$$\begin{aligned} \frac{\partial \mathbf {v}(\varvec{\alpha }_X, \varvec{\alpha }_Y, \beta _1, \beta _2) }{\partial \mathbf {v} (\varvec{\alpha }_P, \varvec{\alpha }_Q) }=\begin{pmatrix} \mathrm {diag}(\mathbf {1}_{3k_X}) &{} \mathbf {0} &{} \mathbf {0} &{} \mathbf {0} \\ \mathbf {0} &{} \mathbf {0} &{} \mathrm {diag}(\mathbf {1}_{3k_Y}) &{} \mathbf {0}\\ \frac{\partial \beta _1}{\partial \varvec{\alpha }_X} &{} \frac{\partial \beta _1}{\partial \varvec{\alpha }_{A_P}} &{} \frac{\partial \beta _1}{\partial \varvec{\alpha }_Y} &{} \frac{\partial \beta _1}{\partial \varvec{\alpha }_{A_Q}} \\ \frac{\partial \beta _2}{\partial \varvec{\alpha }_X} &{} \frac{\partial \beta _2}{\partial \varvec{\alpha }_{A_P}} &{} \frac{\partial \beta _2}{\partial \varvec{\alpha }_Y} &{} \frac{\partial \beta _2}{\partial \varvec{\alpha }_{A_Q}} \end{pmatrix}, \end{aligned}$$
(26)

where the partial derivative vectors in the last two rows depend on the method of estimating the equating coefficients. See Ogasawara (2000) for equating coefficients using moments and Ogasawara (2001) for equating coefficients using the Haebara and Stocking-Lord methods.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Andersson, B., Wiberg, M. Item Response Theory Observed-Score Kernel Equating. Psychometrika 82, 48–66 (2017). https://doi.org/10.1007/s11336-016-9528-7

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11336-016-9528-7

Keywords

Navigation