Skip to main content

Item Response Theory Equating

  • Chapter
  • First Online:
Applying Test Equating Methods

Part of the book series: Methodology of Educational Measurement and Assessment ((MEMA))

  • 1327 Accesses

Abstract

In this chapter, different methods of Item Response Theory (IRT) linking and equating will be discussed and illustrated using the SNSequate (González, J Stat Softw 59(7):1–30, 2014) and equateIRT (Battauz, J Stat Softw 68(7):1–22, 2015) packages. Other useful packages include ltm (Rizopoulos, J Stat Softw 17(5):1–25, 2006) and mirt (Chalmers, J Stat Softw, 48(6):1–29, 2012), which allow the user to model response data using different IRT models. IRT objects obtained from the latter packages can also be read into equateIRT and kequate (Andersson et al., J Stat Softw, 55(6):1–25, 2013) to perform IRT equating and linking.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The model shown in Example 1.2 corresponds to the fixed-effects version of IRT models. For more details on the difference between the fixed-effects version and the random-effects version of the model that is presented here, see San Martín et al. (2015).

  2. 2.

    When multiple test forms are to be linked, the argument coef needs a list of matrices containing the item parameter estimates corresponding to each test form.

  3. 3.

    In this case, an internal call to irt.link() is made.

  4. 4.

    Note that the item parameter estimates shown in Table 6.10 in Kolen and Brennan (2014) are already rescaled. This is why we have set the equating coefficients as A=1 and B=0 so that comparable results with those obtained in Kolen and Brennan (2014) are obtained.

  5. 5.

    Figure 6.6 in Kolen and Brennan (2014) also shows the curve for frequency estimation equating. This curve can easily be obtained and added using the equate package as illustrated in Chap. 3

  6. 6.

    Because a Rasch model is used to fit the 0/1 data, item discrimination parameters are fixed to 1 and guessing parameters fixed to 0.

  7. 7.

    Some columns in the output are omitted.

  8. 8.

    The mirt() function implement a general four parameter model from which the 1PL, 2PL and 3PL models are particular cases. The discrimination, difficulty and guessing parameters are denoted by a1, d, and g, respectively, whereas a fourth upper asymptote parameter is denoted by u. In the case of the Rasch model, a1=u=1 and c=0.

References

  • Andersson, B., Bränberg, K., & Wiberg, M. (2013). Performing the kernel method of test equating with the package kequate. Journal of Statistical Software, 55(6), 1–25.

    Article  Google Scholar 

  • Andersson, B., & Wiberg, M. (2016). Item response theory observed-score kernel equating. Psychometrika. doi: 10.1007/s11336--016--9528--7.

    Google Scholar 

  • Baker, F., & Kim, S. (2004). Item response theory: Parameter estimation techniques. New York: Marcel Dekker.

    Google Scholar 

  • Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1–48.

    Article  Google Scholar 

  • Battauz, M. (2015). equateIRT: An R package for IRT test equating. Journal of Statistical Software, 68(7), 1–22.

    Article  Google Scholar 

  • Bechger, T., & Maris, G. (2015). A statistical test for differential item pair functioning. Psychometrika, 80(2), 317–340.

    Article  Google Scholar 

  • Birnbaum, A. (1968). Some latent trait models and their use in inferring any examinee’s ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 395–479). Reading: Adison-Wesley.

    Google Scholar 

  • Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1–29.

    Google Scholar 

  • Chen, M. (2004). Skewed link models for categorical response data. In M. Genton (Ed.), Skew-elliptical distributions and their applications: A journey beyond normality (Vol. 1, pp. 131–152). Boca Raton: Chapman & Hall/CRC.

    Google Scholar 

  • Chen, M.-H., Dey, D. K., & Shao, Q.-M. (1999). A new skewed link model for dichotomous quantal response data. Journal of the American Statistical Association, 94(448), 1172–1186.

    Google Scholar 

  • Cook, L. L., & Eignor, D. (1991). IRT equating methods. Educational Measurement: Issues and Practice, 10(3), 37–45.

    Google Scholar 

  • De Boeck, P., Bakker, M., Zwitser, R., Nivard, M., Hofman, A., Tuerlinckx, F., & Partchev, I. (2011). The estimation of item response models with the lmer function from the lme4 package in R. Journal of Statistical Software, 39(12), 1–28.

    Google Scholar 

  • De Boeck, P., & Wilson, M. (2004). Explanatory item response models: A generalized linear and nonlinear approach. New York: Springer.

    Google Scholar 

  • DeMars, C. (2002). Incomplete data and item parameter estimates under JMLE and MML estimation. Applied Measurement in Education, 15(1), 15–31.

    Article  Google Scholar 

  • Estay, G. (2012). Characteristic curves scale transformation methods using asymmetric ICCs for IRT equating. Unpublished master’s thesis, Department of Statistics, Pontificia Universidad Catolica de Chile.

    Google Scholar 

  • Fischer, G., & Molenaar, I. (1995). Rasch models: Foundations and recent developments. New York: Springer.

    Book  Google Scholar 

  • González, J. (2014). SNSequate: Standard and nonstandard statistical models and methods for test equating. Journal of Statistical Software, 59(7), 1–30.

    Article  Google Scholar 

  • González, J., Wiberg, M., & von Davier A. A. (2016). A note on the Poisson’s binomial distribution in item response theory. Applied Psychological Measurement, 40(4), 302–310.

    Google Scholar 

  • Haebara, T. (1980). Equating logistic ability scales by a weighted least squares method. Japanese Psychological Research, 22, 144–149.

    Google Scholar 

  • Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Dordrecht: Kluwer Nijhoff Publishing.

    Google Scholar 

  • Kiefer, T., Robitzsch, A., & Wu, M. (2016). TAM: Test analysis modules. R Package Version 1.995-0.

    Google Scholar 

  • Kim, S. (2006). A comparative study of IRT fixed parameter calibration methods. Journal of Educational Measurement, 43(4), 355–381.

    Article  Google Scholar 

  • Kolen, M., & Brennan, R. (2014). Test equating, scaling, and linking: Methods and practices (3rd ed.). New York: Springer.

    Book  Google Scholar 

  • Lord, F. (1980). Applications of item response theory to practical testing problems. Hillsdale: Lawrence Erlbaum Associates.

    Google Scholar 

  • Lord, F., & Novick, M. (1968). Statistical theories of mental test scores. Reading: Addison-Wesley.

    Google Scholar 

  • Lord, F., & Wingersky, M. (1984). Comparison of IRT true-score and equipercentile observed-score “equatings”. Applied Psychological Measurement, 8(4), 453–461.

    Article  Google Scholar 

  • Loyd, B. H., & Hoover, H. (1980). Vertical equating using the rasch model. Journal of Educational Measurement, 17(3), 179–193.

    Google Scholar 

  • Mair, P., & Hatzinger, R. (2007). Extended Rasch modeling: The eRm package for the application of IRT models in R. Journal of Statistical Software, 20, 1–20.

    Article  Google Scholar 

  • Marco, G. L. (1977). Item characteristic curve solutions to three intractable testing problems. Journal of Educational Measurement, 14(2), 139–160.

    Google Scholar 

  • Mislevy, R. J., & Bock, R. D. (1990). BILOG 3: Item analysis and test scoring with binary logistic models. Mooresville: Scientific Software International.

    Google Scholar 

  • Ogasawara, H. (2000). Asymptotic standard errors of IRT equating coefficients using moments. Economic Review (Otaru University of Commerce), 51(1), 1–23.

    Google Scholar 

  • Partchev, I. (2014). Irtoys: Simple interface to the estimation and plotting of IRT models. R Package Version 0.1.7.

    Google Scholar 

  • Rizopoulos, D. (2006). ltm: An R package for latent variable modeling and item response theory analyses. Journal of Statistical Software, 17(5), 1–25.

    Article  Google Scholar 

  • Robitzsch, A. (2016). sirt: Supplementary item response theory models. R Package Version 1.12.2.

    Google Scholar 

  • San Martín, E., González, J., & Tuerlinckx, F. (2015). On the unidentifiability of the fixed-effects 3PL model. Psychometrika, 80(2), 450–467.

    Google Scholar 

  • Skaggs, G., & Lissitz, R. (1986). An exploration of the robustness of four test equating models. Applied Psychological Measurement, 10(3), 303.

    Article  Google Scholar 

  • Stocking, M., & Lord, F. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7(2), 201–210.

    Article  Google Scholar 

  • Tuerlinckx, F., Rijmen, F., Molenberghs, G., Verbeke, G., Briggs, D., van den Noortgate, W., Meulders, M., & De Boeck, P. (2004). Estimation and software. In P. D. Boeck & M. Wilson (Eds.), Explanatory item response models: A generalized linear and nonlinear approach (Vol. 1, pp. 343–373). New York: Springer.

    Google Scholar 

  • van der Linden, W. J. (Ed.) (2016). Handbook of item response theory. Three volume set. Boca Raton: Chapman and Hall/CRC.

    Google Scholar 

  • van der Linden, W. J., & Barrett, M. (2016). Linking item response model parameters. Psychometrika, 81(3), 650–673.

    Google Scholar 

  • von Davier, M., & von Davier, A. (2011). A general model for irt scale linking and scale transformations. In A. von Davier (Ed.), Statistical models for test equating, scaling, and linking (Vol. 1, pp. 225–242). New York: Springer.

    Google Scholar 

  • Weeks, J. P. (2010). plink: An R package for linking mixed-format tests using IRT-based methods. Journal of Statistical Software, 35(12), 1–33.

    Google Scholar 

  • Wiberg, M., van der Linden, W. J., & von Davier, A. A. (2014). Local observed-score kernel equating. Journal of Educational Measurement, 51, 57–74.

    Google Scholar 

  • Wingersky, M. S., & Lord, F. M. (1984). An investigation of methods for reducing sampling error in certain IRT procedures. Applied Psychological Measurement, 8(3), 347–364.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

González, J., Wiberg, M. (2017). Item Response Theory Equating. In: Applying Test Equating Methods. Methodology of Educational Measurement and Assessment. Springer, Cham. https://doi.org/10.1007/978-3-319-51824-4_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-51824-4_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-51822-0

  • Online ISBN: 978-3-319-51824-4

  • eBook Packages: EducationEducation (R0)

Publish with us

Policies and ethics