Evaluating Equating Transformations from Different Frameworks

  • Waldir LeôncioEmail author
  • Marie Wiberg
Conference paper
Part of the Springer Proceedings in Mathematics & Statistics book series (PROMS, volume 233)


Test equating is used to ensure that test scores from different test forms can be used interchangeably. This paper aims to compare the statistical and computational properties from three equating frameworks: item response theory observed-score equating (IRTOSE), kernel equating and kernel IRTOSE. The real data applications suggest that IRT-based frameworks tend to provide more stable and accurate results than kernel equating. Nonetheless, kernel equating can provide satisfactory results if we can find a good model for the data, while also being much faster than the IRT-based frameworks. Our general recommendation is to try all methods and examine how much the equated scores change, always ensuring that the assumptions are met and that a good model for the data can be found.


Test equating Item response theory Kernel equating Observed-score equating 



The research in this article was funded by the Swedish Research Council grant 2014-578 and by the Fondazione Cassa di Risparmio di Padova e Rovigo.


  1. Andersson, B., & Wiberg, M. (2017). Item response theory observed-score kernel equating. Psychometrika, 82(1), 48–66. Scholar
  2. Andersson, B., Bränberg, K., & Wiberg, M. (2013). Performing the kernel method of test equating with the package kequate. Journal of Statistical Software, 55(6), 1–25.CrossRefGoogle Scholar
  3. Battauz, M. (2015). equateIRT: An R package for IRT test equating. Journal of Statistical Software, 68(7), 1–22.CrossRefGoogle Scholar
  4. Braun, H. I., & Holland, P. W. (1982). Observed-score test equating: A mathematical analysis of some ETS equating procedures. In P. W. Holland & D. B. Rubin (Eds.), Test equating (Vol. 1, pp. 9–49). New York: Academic Press.Google Scholar
  5. von Davier, A. A., Holland, P. W., & Thayer, D. T. (2004). The kernel method of test equating. New York: Springer.CrossRefGoogle Scholar
  6. Dorans, N. J., & Feigenbaum, M. D. (1994). Equating issues engendered by changes to the SAT and PSAT/NMSQT. Technical issues related to the introduction of the new SAT and PSAT/NMSQT (pp. 91–122).Google Scholar
  7. González, J., & Wiberg, M. (2017). Applying test equating methods using R. New York: Springer.CrossRefGoogle Scholar
  8. González, J., Wiberg, M., & von Davier, A. A. (2016). A note on the Poisson’s binomial distribution in item response theory. Applied Psychological Measurement, 40(4), 302–310.CrossRefGoogle Scholar
  9. Haebara, T. (1980). Equating logistic ability scales by a weighted least squares method. Japanese Psychological Research, 22(3), 144–149.CrossRefGoogle Scholar
  10. Harris, D. J., & Crouse, J. D. (1993). A study of criteria used in equating. Applied Measurement in Education, 6(3), 195–240.CrossRefGoogle Scholar
  11. Jiang, Y., von Davier, A. A., & Chen, H. (2012). Evaluating equating results: Percent relative error for chained kernel equating. Journal of Educational Measurement, 49(1), 39–58.CrossRefGoogle Scholar
  12. Kolen, M. J., & Brennan, R. L. (2014). Test equating, scaling, and linking: Methods and practices (3rd ed.). New York: Springer.CrossRefGoogle Scholar
  13. van der Linden, W. J. (2011). Local observed-score equating. In A. von Davier (Ed.), Statistical models for test equating, scaling, and linking (pp. 201–223). New York: Springer.Google Scholar
  14. Lord, F. M. (1977). Practical applications of item response theory. Journal of Educational Measurement, 14(2), 177–138.CrossRefGoogle Scholar
  15. Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum Associates.Google Scholar
  16. Lord, F. M., & Wingersky, M. S. (1984). Comparison of IRT true-score and equipercentile observed-score ‘equatings’. Applied Psychological Measurement, 8(4), 453–461.CrossRefGoogle Scholar
  17. Meng, Y. (2012). Comparison of kernel equating and item response theory equating methods. Dissertation submitted to the graduate school of the University of Massachusetts Amherst in partial fulfillment of the requirements for the degree of doctor of education, University of Massachusetts Amherst.Google Scholar
  18. R Core Team. (2017). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
  19. Rizopoulos, D. (2006). ltm: An R package for latent variable modelling and item response theory analyses. Journal of Statistical Software, 17(5), 1–25.
  20. Wiberg, M., & González, J. (2016). Statistical assessment of estimated transformations in observed-score equating. Journal of Educational Measurement, 53(1), 106–125.CrossRefGoogle Scholar
  21. Wiberg, M., van der Linden, W. J., & von Davier, A. A. (2014). Local observed-score kernel equating. Journal of Educational Measurement, 51, 57–74.CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Dipartimento di Scienze StatisticheUniversity of PaduaPaduaItaly
  2. 2.Department of Statistics, USBEUmeå UniversityUmeåSweden

Personalised recommendations