Item Response Theory Equating

  • Jorge González
  • Marie Wiberg
Part of the Methodology of Educational Measurement and Assessment book series (MEMA)


In this chapter, different methods of Item Response Theory (IRT) linking and equating will be discussed and illustrated using the SNSequate (González, J Stat Softw 59(7):1–30, 2014) and equateIRT (Battauz, J Stat Softw 68(7):1–22, 2015) packages. Other useful packages include ltm (Rizopoulos, J Stat Softw 17(5):1–25, 2006) and mirt (Chalmers, J Stat Softw, 48(6):1–29, 2012), which allow the user to model response data using different IRT models. IRT objects obtained from the latter packages can also be read into equateIRT and kequate (Andersson et al., J Stat Softw, 55(6):1–25, 2013) to perform IRT equating and linking.


Item Response Theory Test Form Item Parameter Test Taker Item Response Theory Model 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Andersson, B., Bränberg, K., & Wiberg, M. (2013). Performing the kernel method of test equating with the package kequate. Journal of Statistical Software, 55(6), 1–25.CrossRefGoogle Scholar
  2. Andersson, B., & Wiberg, M. (2016). Item response theory observed-score kernel equating. Psychometrika. doi:  10.1007/s11336--016--9528--7.Google Scholar
  3. Baker, F., & Kim, S. (2004). Item response theory: Parameter estimation techniques. New York: Marcel Dekker.Google Scholar
  4. Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1–48.CrossRefGoogle Scholar
  5. Battauz, M. (2015). equateIRT: An R package for IRT test equating. Journal of Statistical Software, 68(7), 1–22.CrossRefGoogle Scholar
  6. Bechger, T., & Maris, G. (2015). A statistical test for differential item pair functioning. Psychometrika, 80(2), 317–340.CrossRefGoogle Scholar
  7. Birnbaum, A. (1968). Some latent trait models and their use in inferring any examinee’s ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 395–479). Reading: Adison-Wesley.Google Scholar
  8. Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1–29.Google Scholar
  9. Chen, M. (2004). Skewed link models for categorical response data. In M. Genton (Ed.), Skew-elliptical distributions and their applications: A journey beyond normality (Vol. 1, pp. 131–152). Boca Raton: Chapman & Hall/CRC.Google Scholar
  10. Chen, M.-H., Dey, D. K., & Shao, Q.-M. (1999). A new skewed link model for dichotomous quantal response data. Journal of the American Statistical Association, 94(448), 1172–1186.Google Scholar
  11. Cook, L. L., & Eignor, D. (1991). IRT equating methods. Educational Measurement: Issues and Practice, 10(3), 37–45.Google Scholar
  12. De Boeck, P., Bakker, M., Zwitser, R., Nivard, M., Hofman, A., Tuerlinckx, F., & Partchev, I. (2011). The estimation of item response models with the lmer function from the lme4 package in R. Journal of Statistical Software, 39(12), 1–28.Google Scholar
  13. De Boeck, P., & Wilson, M. (2004). Explanatory item response models: A generalized linear and nonlinear approach. New York: Springer.Google Scholar
  14. DeMars, C. (2002). Incomplete data and item parameter estimates under JMLE and MML estimation. Applied Measurement in Education, 15(1), 15–31.CrossRefGoogle Scholar
  15. Estay, G. (2012). Characteristic curves scale transformation methods using asymmetric ICCs for IRT equating. Unpublished master’s thesis, Department of Statistics, Pontificia Universidad Catolica de Chile.Google Scholar
  16. Fischer, G., & Molenaar, I. (1995). Rasch models: Foundations and recent developments. New York: Springer.CrossRefGoogle Scholar
  17. González, J. (2014). SNSequate: Standard and nonstandard statistical models and methods for test equating. Journal of Statistical Software, 59(7), 1–30.CrossRefGoogle Scholar
  18. González, J., Wiberg, M., & von Davier A. A. (2016). A note on the Poisson’s binomial distribution in item response theory. Applied Psychological Measurement, 40(4), 302–310.Google Scholar
  19. Haebara, T. (1980). Equating logistic ability scales by a weighted least squares method. Japanese Psychological Research, 22, 144–149.Google Scholar
  20. Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Dordrecht: Kluwer Nijhoff Publishing.Google Scholar
  21. Kiefer, T., Robitzsch, A., & Wu, M. (2016). TAM: Test analysis modules. R Package Version 1.995-0.Google Scholar
  22. Kim, S. (2006). A comparative study of IRT fixed parameter calibration methods. Journal of Educational Measurement, 43(4), 355–381.CrossRefGoogle Scholar
  23. Kolen, M., & Brennan, R. (2014). Test equating, scaling, and linking: Methods and practices (3rd ed.). New York: Springer.CrossRefGoogle Scholar
  24. Lord, F. (1980). Applications of item response theory to practical testing problems. Hillsdale: Lawrence Erlbaum Associates.Google Scholar
  25. Lord, F., & Novick, M. (1968). Statistical theories of mental test scores. Reading: Addison-Wesley.Google Scholar
  26. Lord, F., & Wingersky, M. (1984). Comparison of IRT true-score and equipercentile observed-score “equatings”. Applied Psychological Measurement, 8(4), 453–461.CrossRefGoogle Scholar
  27. Loyd, B. H., & Hoover, H. (1980). Vertical equating using the rasch model. Journal of Educational Measurement, 17(3), 179–193.Google Scholar
  28. Mair, P., & Hatzinger, R. (2007). Extended Rasch modeling: The eRm package for the application of IRT models in R. Journal of Statistical Software, 20, 1–20.CrossRefGoogle Scholar
  29. Marco, G. L. (1977). Item characteristic curve solutions to three intractable testing problems. Journal of Educational Measurement, 14(2), 139–160.Google Scholar
  30. Mislevy, R. J., & Bock, R. D. (1990). BILOG 3: Item analysis and test scoring with binary logistic models. Mooresville: Scientific Software International.Google Scholar
  31. Ogasawara, H. (2000). Asymptotic standard errors of IRT equating coefficients using moments. Economic Review (Otaru University of Commerce), 51(1), 1–23.Google Scholar
  32. Partchev, I. (2014). Irtoys: Simple interface to the estimation and plotting of IRT models. R Package Version 0.1.7.Google Scholar
  33. Rizopoulos, D. (2006). ltm: An R package for latent variable modeling and item response theory analyses. Journal of Statistical Software, 17(5), 1–25.CrossRefGoogle Scholar
  34. Robitzsch, A. (2016). sirt: Supplementary item response theory models. R Package Version 1.12.2.Google Scholar
  35. San Martín, E., González, J., & Tuerlinckx, F. (2015). On the unidentifiability of the fixed-effects 3PL model. Psychometrika, 80(2), 450–467.Google Scholar
  36. Skaggs, G., & Lissitz, R. (1986). An exploration of the robustness of four test equating models. Applied Psychological Measurement, 10(3), 303.CrossRefGoogle Scholar
  37. Stocking, M., & Lord, F. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7(2), 201–210.CrossRefGoogle Scholar
  38. Tuerlinckx, F., Rijmen, F., Molenberghs, G., Verbeke, G., Briggs, D., van den Noortgate, W., Meulders, M., & De Boeck, P. (2004). Estimation and software. In P. D. Boeck & M. Wilson (Eds.), Explanatory item response models: A generalized linear and nonlinear approach (Vol. 1, pp. 343–373). New York: Springer.Google Scholar
  39. van der Linden, W. J. (Ed.) (2016). Handbook of item response theory. Three volume set. Boca Raton: Chapman and Hall/CRC.Google Scholar
  40. van der Linden, W. J., & Barrett, M. (2016). Linking item response model parameters. Psychometrika, 81(3), 650–673.Google Scholar
  41. von Davier, M., & von Davier, A. (2011). A general model for irt scale linking and scale transformations. In A. von Davier (Ed.), Statistical models for test equating, scaling, and linking (Vol. 1, pp. 225–242). New York: Springer.Google Scholar
  42. Weeks, J. P. (2010). plink: An R package for linking mixed-format tests using IRT-based methods. Journal of Statistical Software, 35(12), 1–33.Google Scholar
  43. Wiberg, M., van der Linden, W. J., & von Davier, A. A. (2014). Local observed-score kernel equating. Journal of Educational Measurement, 51, 57–74.Google Scholar
  44. Wingersky, M. S., & Lord, F. M. (1984). An investigation of methods for reducing sampling error in certain IRT procedures. Applied Psychological Measurement, 8(3), 347–364.Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Jorge González
    • 1
  • Marie Wiberg
    • 2
  1. 1.Faculty of MathematicsPontificia Universidad CatÓlica de ChileSantiagoChile
  2. 2.Department of Statistics, Umeå School of Business and EconomicsUmeå UniversityUmeåSweden

Personalised recommendations