Skip to main content
Log in

A Comparison of Equating Methods and Linking Designs for Developing an Item Pool under Item Response Theory

  • Published:
Behaviormetrika Aims and scope Submit manuscript

Abstract

The existence of an item pool can bring out the various merits of using item response theory (IRT). This study considered the case where the development of an item pool is in progress. We examined the robustness of four calibration methods in three linking designs using simulated data. The data were generated assuming that a small-sized item pool had already been developed and new items were to be added to that item pool. The results suggested that the item characteristic curve method generally performed well. The performance of the fixed common item parameter calibration method and the concurrent calibration method worsened in one of the linking designs where the number of common items was small. The results also suggested that performance was better when the sample size per form and the number of common items were large.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Haebara, T. (1980). Equating logistic ability scales by a weighted least squares method. Japanese Psychological Research, 22, 144–149.

    Article  Google Scholar 

  • Hanson, B.A. & Béguin, A.A., (2002). Obtaining a common scale for item response theory item parameters using separate versus concurrent estimation in the common-item equating design. Applied Psychological Measurement, 26, 3–24.

    Article  MathSciNet  Google Scholar 

  • Hu, H., Rogers, W.T. & Vukmirovic, Z. (2008). Investigation of IRT-based equating methods in the presence of outlier common items. Applied Psychological Measurement, 32, 311–333.

    Article  MathSciNet  Google Scholar 

  • Jodoin, M.G., Keller, L.A., & Swaminathan, H. (2003). A comparison of linear, fixed common item, and concurrent parameter estimation equating procedures in capturing academic growth. Journal of Experimental Education, 71, 229–250.

    Article  Google Scholar 

  • Kang, T., & Petersen, N.S. (2009). Linking Item Parameters to a Base Scale. Paper presented at the National Council on Measurement in Education, San Diego, CA. http://www.cse.ucla.edu/products/overheads/AERA2009/FIPC_NCME2009.pdf

    Google Scholar 

  • Kim, S.H., & Cohen, A.S. (1998). A comparison of linking and concurrent calibration under item response theory. Applied Psychological Measurement, 22, 131–143.

    Article  Google Scholar 

  • Kolen, M.J., & Brennan, R.L. (2004). Test equating, scaling, and linking: Methods and practices (2nd ed.). New York: Springer-Verlag.

    Book  Google Scholar 

  • Li, Y.H., Tam, H.P., & Tompkins, L.J. (2004). A comparison of using the fixed common-precalibrated parameter method and the matched characteristic curve method for linking multiple-test items. International Journal of Testing, 4, 267–293.

    Article  Google Scholar 

  • Lord, F.M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum.

    Google Scholar 

  • Loyd, B.H., & Hoover, H.D. (1980). Vertical equating using the Rasch model. Journal of Educational Measurement, 17, 169–194.

    Article  Google Scholar 

  • Marco, G.L. (1977). Item characteristic curve solutions to three intractable testing problems. Journal of Educational Measurement, 14, 139–160.

    Article  Google Scholar 

  • Mayekawa, S. (1991). Parameter Estimation. In Shiba, S. (Eds.), Koumoku hannou riron -kiso to ouyou (Item response theory: bases and applications) (pp. 87–129). Tokyo: The University of Tokyo Press (in Japanese).

    Google Scholar 

  • Petersen, N.S., Cook, L.L., & Stocking, M.L. (1983). IRT versus conventional equating methods: A comparative study of scale stability. Journal of Educational Statistics, 8, 137–156.

    Article  Google Scholar 

  • Stocking, M.L., & Lord, F.M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7, 201–210.

    Article  Google Scholar 

  • Stone, C.A., & Lane, S. (1991). Use of restricted item response theory models for examining the stability of item parameter estimates over time. Applied Measurement in Education, 4, 125–141.

    Article  Google Scholar 

  • van der Linden, W.J., & Hambleton, R.K. (1997). Handbook of modern item response theory. New York: Springer.

    Book  Google Scholar 

  • Zimowski, M., Muraki, E., Mislevy, R., & Bock, R. (2003). BILOG-MG 3 [Computer program]. Chicago: Scientific Software International.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sayaka Arai.

About this article

Cite this article

Arai, S., Mayekawa, Si. A Comparison of Equating Methods and Linking Designs for Developing an Item Pool under Item Response Theory. Behaviormetrika 38, 1–16 (2011). https://doi.org/10.2333/bhmk.38.1

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.2333/bhmk.38.1

Key Words and Phrases

Navigation