, Volume 82, Issue 3, pp 610–636 | Cite as

Multiple Equating of Separate IRT Calibrations

  • Michela Battauz


When test forms are calibrated separately, item response theory parameters are not comparable because they are expressed on different measurement scales. The equating process includes the conversion of item parameter estimates on a common scale and the determination of comparable test scores. Various statistical methods have been proposed to perform equating between two test forms. This paper provides a generalization to multiple test forms of the mean-geometric mean, the mean-mean, the Haebara, and the Stocking–Lord methods. The proposed methods estimate simultaneously the equating coefficients that permit the scale transformation of the parameters of all forms to the scale of the base form. Asymptotic standard errors of the equating coefficients are derived. A simulation study is presented to illustrate the performance of the methods.


equating coefficients Haebara item response theory linking mean-geometric mean mean-mean standard errors Stocking–Lord 



The author wishes to thank the Editor, the Associate Editor, and two anonymous reviewers for their helpful comments and suggestions that greatly improved this work. The author is grateful to Professor R. Bellio for his suggestions.


  1. Baldwin, P. (2013). On mean-sigma estimators and bias. British Journal of Mathematical and Statistical Psychology, 66, 277–289. doi: 10.1111/j.2044-8317.2012.02048.x.CrossRefPubMedGoogle Scholar
  2. Battauz, M. (2013). IRT test equating in complex linkage plans. Psychometrika, 78, 464–480. doi: 10.1007/s11336-012-9316-y.CrossRefPubMedGoogle Scholar
  3. Battauz, M. (2015). equateIRT: An R package for IRT test equating. Journal of Statistical Software, 68, 1–22. doi: 10.18637/jss.v068.i07.CrossRefGoogle Scholar
  4. Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, 443–459. doi: 10.1007/BF02293801.CrossRefGoogle Scholar
  5. Deming, W. E., & Stephan, F. F. (1940). On a least squares adjustment of a sampled frequency table when the expected marginal totals are known. The Annals of Mathematical Statistics, 11, 427–444. doi: 10.1214/aoms/1177731829.CrossRefGoogle Scholar
  6. Goodman, L. A. (1968). The analysis of cross-classified data: independence, quasi-independence and interactions in contingency tables with or without missing entries. Journal of the American Statistical Association, 63, 1091–1131. doi: 10.1080/01621459.1968.10480916.Google Scholar
  7. Haberman, S. J. (2009). Linking parameter estimates derived from an item response model through separate calibrations. ETS Research Report Series, 2009, i-9. doi: 10.1002/j.2333-8504.2009.tb02197.x.
  8. Haberman, S. J., Lee, Y. H. & Qian, J. (2009). Jackknifing techniques for evaluation of equating accuracy . ETS Research Report Series, 2009, i-37. doi: 10.1002/j.2333-8504.2009.tb02196.x.
  9. Haebara, T. (1980). Equating logistic ability scales by a weighted least squares method. Japanese Psychological Research, 22, 144–149.CrossRefGoogle Scholar
  10. Kim, S., & Kolen, M. J. (2007). Effects on scale linking of different definitions of criterion functions for the IRT characteristic curve methods. Journal of Educational and Behavioral Statistics, 32, 371–397. doi: 10.3102/1076998607302632.CrossRefGoogle Scholar
  11. Kolen, M. J., & Brennan, R. L. (2014). Test equating, scaling, and linking: methods and practices (3rd ed.). New York: Springer.CrossRefGoogle Scholar
  12. Lee, Y.-H., & Haberman, S. J. (2013). Harmonic regression and scale stability. Psychometrika, 78, 815–829. doi: 10.1007/S11336-013-9337-1.CrossRefPubMedGoogle Scholar
  13. Loyd, B. H., & Hoover, H. D. (1980). Vertical equating using the Rasch model. Journal of Educational Measurement, 17, 179–193. doi: 10.1111/j.1745-3984.1980.tb00825.x.CrossRefGoogle Scholar
  14. Marco, G. L. (1977). Item characteristic curve solutions to three intractable testing problems. Journal of Educational Measurement, 14, 139–160. doi: 10.1111/j.1745-3984.1977.tb00033.x.CrossRefGoogle Scholar
  15. Michaelides, M. P., & Haertel, E. H. (2014). Selection of common items as an unrecognized source of variability in test equating: A bootstrap approximation assuming random sampling of common items. Applied Measurement in Education, 27, 46–57. doi: 10.1080/08957347.2013.853069.CrossRefGoogle Scholar
  16. Mislevy R. J. & Bock R. D. (1990). BILOG 3. Item analysis and test scoring with binary logistic models. Mooresville, IN: Scientific Software.Google Scholar
  17. Ogasawara, H. (2000). Asymptotic standard errors of IRT equating coefficients using moments. Economic Review (Otaru University of Commerce), 51, 1–23.Google Scholar
  18. Ogasawara, H. (2001a). Item response theory true score equatings and their standard errors. Journal of Educational and Behavioral Statistics, 26, 31–50. doi: 10.3102/10769986026001031.CrossRefGoogle Scholar
  19. Ogasawara, H. (2001b). Standard errors of item response theory equating/linking by response function methods. Applied Psychological Measurement, 25, 53–67. doi: 10.1177/01466216010251004.CrossRefGoogle Scholar
  20. Ogasawara, H. (2003). Asymptotic standard errors of IRT observed-score equating methods. Psychometrika, 68, 193–211. doi: 10.1007/BF02294797.CrossRefGoogle Scholar
  21. R Development Core Team. (2016). R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing.Google Scholar
  22. Rizopoulos, D. (2006). ltm: An R package for latent variable modelling and item response theory analyses. Journal of Statistical Software, 17, 1–25. doi: 10.18637/jss.v017.i05.CrossRefGoogle Scholar
  23. Stocking, M., & Lord, M. L. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7, 201–210. doi: 10.1177/014662168300700208.CrossRefGoogle Scholar
  24. van der Linden, W. J., & Hambleton, R. K. (1997). Handbook of modern item response theory. New York: Springer.CrossRefGoogle Scholar

Copyright information

© The Psychometric Society 2016

Authors and Affiliations

  1. 1.Department of Economics and StatisticsUniversity of UdineUdineItaly

Personalised recommendations