Skip to main content

Assessing the performance of classical test theory item discrimination estimators in Monte Carlo simulations

Abstract

The performance of various classical test theory (CTT) item discrimination estimators has been compared in the literature using both empirical and simulated data, resulting in mixed results regarding the preference of some discrimination estimators over others. This study analyzes the performance of various item discrimination estimators in CTT: point-biserial correlation, point-biserial correlation with item excluded from the test total score, biserial correlation, phi coefficient splitting total score using the median, and discrimination index. For this study, data were generated from unidimensional logistic item response theory (IRT) models with one and two parameters. The factors considered in the study were test length, intervals for item difficulty and item discrimination parameters, as well as the composition of one or two groups of the examinees with specific ability distribution parameters. Results indicate that the biserial coefficient was most highly correlated with the IRT discrimination parameter across different simulation conditions. The degree of comparability among estimators and estimator invariance varied across conditions.

This is a preview of subscription content, access via your institution.

References

  1. Alemoni, L. M., & Spencer, R. E. (1969). A comparison of biserial discrimination, point-biserial discrimination, and difficulty indices in item analysis data. Educational and Psychological Measurement, 29, 353–358.

    Article  Google Scholar 

  2. Baker, F. B., & Kim, S. H. (2004). Item response theory: Parameter estimation techniques. Boca Raton, FL: CRC Press.

    Google Scholar 

  3. Bechger, T. M., Maris, G., Verstralen, H. H., & Béguin, A. A. (2003). Using classical test theory in combination with item response theory. Applied Psychological Measurement, 27(5), 319–334.

    Article  Google Scholar 

  4. Berk, R. A., & Griesemer, H. A. (1976). ITEMAN: An item analysis program for tests, questionnaires, and scales. Educational and Psychological Measurement, 36(1), 189–191.

    Article  Google Scholar 

  5. Beuchert, A., & Mendoza, J. L. (1979). A Monte Carlo comparison of ten item discrimination indices. Journal of Educational Measurement, 16(2), 109–117.

    Article  Google Scholar 

  6. Bowers, J. (1972). A note on comparing r biserial and r point-biserial. Educational and Psychological Measurement, 32, 771–775.

    Article  Google Scholar 

  7. Brennan, R. L. (1972). A generalized U-L item discrimination index. Educational and Psychological Measurement, 32, 289–303.

    Article  Google Scholar 

  8. Brodgen, H. E. (1949). A new coefficient: Application to biserial correlation and to estimation of selective efficiency. Psychometrika, 14, 169–182.

    Article  Google Scholar 

  9. Crocker, L. M., & Algina, J. (1986). Introduction to classical and modern test theory. New York, NY: Holt, Rinehart and Winston.

    Google Scholar 

  10. Davis, F. B. (1951). Item selection techniques. In E. F. Lindquist (Ed.), Educational Measurement (pp. 266–328). Washington, DC: American Council on Education.

    Google Scholar 

  11. Dawber, T., Rogers, W. T., & Carbonaro, M. (2009). Robustness of Lord’s formulas for item difficulty and discrimination conversions between classical and item response theory models. The Alberta Journal of Educational Research, 55(4), 512–533.

    Google Scholar 

  12. de Ayala, R. J. (2009). The theory and practice of item response theory. New York: The Guilford Press.

    Google Scholar 

  13. Embretson, S. E. (1996). The new rules of measurement. Psychological Assessment, 8(4), 341–349.

    Article  Google Scholar 

  14. Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Psychology Press.

    Google Scholar 

  15. Engelhart, M. D. (1965). A comparison of several item discrimination indices. Journal of Educational Measurement, 2(1), 69–76.

    Article  Google Scholar 

  16. Fan, X. (1998). Item response theory and classical test theory: An empirical comparison of their item/person statistics. Educational and Psychological Measurement, 58(3), 357–381.

    Article  Google Scholar 

  17. Fletcher, T. D. (2010). Psychometric: Applied Psychometric Theory. R package version 2.2. Retrieved from http://CRAN.R-project.org/package=psychometric.

  18. Fox, J. (2010). Polycor: Polychoric and polyserial correlations. R package version 0.7-8. Retrieved from http://CRAN.R-project.org/package=polycor.

  19. Hambleton, R. K., & Jones, R. W. (1993). An NCME instructional module on comparison of classical test theory and item response theory and their applications to test development. Educational Measurement: Issues and Practice, 12(3), 38–47.

    Article  Google Scholar 

  20. Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park, CA: Sage.

    Google Scholar 

  21. Harwell, M., Stone, C. A., Hsu, T. C., & Kirisci, L. (1996). Monte Carlo studies in item response theory. Applied Psychological Measurement, 20(2), 101–125.

    Article  Google Scholar 

  22. Ivens, S. H. (1971). Non-parametric item evaluation index. Educational and Psychological Measurement, 31, 843–849.

    Article  Google Scholar 

  23. Kamata, A., & Bauer, D. J. (2008). A note on the relation between factor analytic and item response theory models. Structural Equation Modeling, 15, 136–153.

    Article  Google Scholar 

  24. Kim, S. H. (1997). BILOG 3 for windows: Item analysis and test scoring with binary logistic models. Applied Psychological Measurement, 21(4), 371–376.

    Article  Google Scholar 

  25. Kohli, N., Koran, J., & Henn, L. (2014). Relationships among classical test theory and item response theory frameworks via factor analytic models. Educational and Psychological Measurement, 75(3), 389–405.

    Article  Google Scholar 

  26. Liu, F. (2008). Comparison of several popular discrimination indices based on different criteria and their application in item analysis (Unpublished master thesis). Athens, GA: University of Georgia.

    Google Scholar 

  27. Lord, F. M. (1980). Applications of item response theory to practical testing problems. Englewood Cliffs, NJ: Erlbaum.

    Google Scholar 

  28. Lord, F., & Novick, M. R. (1968). Statistical theories of mental test scores. Oxford, UK: Addison-Wesley.

    Google Scholar 

  29. MacDonald, P., & Paunonen, S. V. (2002). A Monte Carlo comparison of item and person statistics based on item response theory versus classical test theory. Educational and Psychological Measurement, 62(6), 921–943.

    Article  Google Scholar 

  30. McDonald, R. P. (1999). Test theory: A unified approach. Mahwah, NJ: Lawrence Erlbaum Associates.

    Google Scholar 

  31. Oosterhof, A. C. (1976). Similarity of various item discrimination indices. Journal of Educational Measurement, 13(2), 145–150.

    Article  Google Scholar 

  32. R Core Team (2015). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Retrieved from http://www.R-project.org/.

  33. Raykov, T., & Marcoulides, G. A. (2015). On the relationship between classical test theory and item response theory: from one to the other and back. Educational and Psychological Measurement, 76(2), 325–338.

    Article  Google Scholar 

  34. Revelle, W. (2015). Psych: Procedures for personality and psychological research. R package version 1.5.1. Retrieved from http://personality-project.org/r/psych-manual.pdf.

  35. Rizzo, M. L. (2008). Statistical computing with R. Boca Raton, FL: Chapman & Hall/CRC.

    Google Scholar 

  36. Takane, Y., & de Leeuw, J. (1987). On the relationship between item response theory and factor analysis of discretized variables. Psychometrika, 52(3), 393–408.

    Article  Google Scholar 

  37. Traub, R. E. (1997). Classical test theory in historical perspective. Educational Measurement: Issues and Practice, 16(4), 8–14.

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Diego A. Luna Bazaldua.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Bazaldua, D.A.L., Lee, YS., Keller, B. et al. Assessing the performance of classical test theory item discrimination estimators in Monte Carlo simulations. Asia Pacific Educ. Rev. 18, 585–598 (2017). https://doi.org/10.1007/s12564-017-9507-4

Download citation

Keywords

  • Item discrimination
  • Binary item scores
  • Classical test theory
  • Item response theory
  • Monte Carlo