Skip to main content
Log in

Assessing the performance of classical test theory item discrimination estimators in Monte Carlo simulations

  • Published:
Asia Pacific Education Review Aims and scope Submit manuscript

Abstract

The performance of various classical test theory (CTT) item discrimination estimators has been compared in the literature using both empirical and simulated data, resulting in mixed results regarding the preference of some discrimination estimators over others. This study analyzes the performance of various item discrimination estimators in CTT: point-biserial correlation, point-biserial correlation with item excluded from the test total score, biserial correlation, phi coefficient splitting total score using the median, and discrimination index. For this study, data were generated from unidimensional logistic item response theory (IRT) models with one and two parameters. The factors considered in the study were test length, intervals for item difficulty and item discrimination parameters, as well as the composition of one or two groups of the examinees with specific ability distribution parameters. Results indicate that the biserial coefficient was most highly correlated with the IRT discrimination parameter across different simulation conditions. The degree of comparability among estimators and estimator invariance varied across conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Alemoni, L. M., & Spencer, R. E. (1969). A comparison of biserial discrimination, point-biserial discrimination, and difficulty indices in item analysis data. Educational and Psychological Measurement, 29, 353–358.

    Article  Google Scholar 

  • Baker, F. B., & Kim, S. H. (2004). Item response theory: Parameter estimation techniques. Boca Raton, FL: CRC Press.

    Google Scholar 

  • Bechger, T. M., Maris, G., Verstralen, H. H., & Béguin, A. A. (2003). Using classical test theory in combination with item response theory. Applied Psychological Measurement, 27(5), 319–334.

    Article  Google Scholar 

  • Berk, R. A., & Griesemer, H. A. (1976). ITEMAN: An item analysis program for tests, questionnaires, and scales. Educational and Psychological Measurement, 36(1), 189–191.

    Article  Google Scholar 

  • Beuchert, A., & Mendoza, J. L. (1979). A Monte Carlo comparison of ten item discrimination indices. Journal of Educational Measurement, 16(2), 109–117.

    Article  Google Scholar 

  • Bowers, J. (1972). A note on comparing r biserial and r point-biserial. Educational and Psychological Measurement, 32, 771–775.

    Article  Google Scholar 

  • Brennan, R. L. (1972). A generalized U-L item discrimination index. Educational and Psychological Measurement, 32, 289–303.

    Article  Google Scholar 

  • Brodgen, H. E. (1949). A new coefficient: Application to biserial correlation and to estimation of selective efficiency. Psychometrika, 14, 169–182.

    Article  Google Scholar 

  • Crocker, L. M., & Algina, J. (1986). Introduction to classical and modern test theory. New York, NY: Holt, Rinehart and Winston.

    Google Scholar 

  • Davis, F. B. (1951). Item selection techniques. In E. F. Lindquist (Ed.), Educational Measurement (pp. 266–328). Washington, DC: American Council on Education.

    Google Scholar 

  • Dawber, T., Rogers, W. T., & Carbonaro, M. (2009). Robustness of Lord’s formulas for item difficulty and discrimination conversions between classical and item response theory models. The Alberta Journal of Educational Research, 55(4), 512–533.

    Google Scholar 

  • de Ayala, R. J. (2009). The theory and practice of item response theory. New York: The Guilford Press.

    Google Scholar 

  • Embretson, S. E. (1996). The new rules of measurement. Psychological Assessment, 8(4), 341–349.

    Article  Google Scholar 

  • Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Psychology Press.

    Google Scholar 

  • Engelhart, M. D. (1965). A comparison of several item discrimination indices. Journal of Educational Measurement, 2(1), 69–76.

    Article  Google Scholar 

  • Fan, X. (1998). Item response theory and classical test theory: An empirical comparison of their item/person statistics. Educational and Psychological Measurement, 58(3), 357–381.

    Article  Google Scholar 

  • Fletcher, T. D. (2010). Psychometric: Applied Psychometric Theory. R package version 2.2. Retrieved from http://CRAN.R-project.org/package=psychometric.

  • Fox, J. (2010). Polycor: Polychoric and polyserial correlations. R package version 0.7-8. Retrieved from http://CRAN.R-project.org/package=polycor.

  • Hambleton, R. K., & Jones, R. W. (1993). An NCME instructional module on comparison of classical test theory and item response theory and their applications to test development. Educational Measurement: Issues and Practice, 12(3), 38–47.

    Article  Google Scholar 

  • Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park, CA: Sage.

    Google Scholar 

  • Harwell, M., Stone, C. A., Hsu, T. C., & Kirisci, L. (1996). Monte Carlo studies in item response theory. Applied Psychological Measurement, 20(2), 101–125.

    Article  Google Scholar 

  • Ivens, S. H. (1971). Non-parametric item evaluation index. Educational and Psychological Measurement, 31, 843–849.

    Article  Google Scholar 

  • Kamata, A., & Bauer, D. J. (2008). A note on the relation between factor analytic and item response theory models. Structural Equation Modeling, 15, 136–153.

    Article  Google Scholar 

  • Kim, S. H. (1997). BILOG 3 for windows: Item analysis and test scoring with binary logistic models. Applied Psychological Measurement, 21(4), 371–376.

    Article  Google Scholar 

  • Kohli, N., Koran, J., & Henn, L. (2014). Relationships among classical test theory and item response theory frameworks via factor analytic models. Educational and Psychological Measurement, 75(3), 389–405.

    Article  Google Scholar 

  • Liu, F. (2008). Comparison of several popular discrimination indices based on different criteria and their application in item analysis (Unpublished master thesis). Athens, GA: University of Georgia.

    Google Scholar 

  • Lord, F. M. (1980). Applications of item response theory to practical testing problems. Englewood Cliffs, NJ: Erlbaum.

    Google Scholar 

  • Lord, F., & Novick, M. R. (1968). Statistical theories of mental test scores. Oxford, UK: Addison-Wesley.

    Google Scholar 

  • MacDonald, P., & Paunonen, S. V. (2002). A Monte Carlo comparison of item and person statistics based on item response theory versus classical test theory. Educational and Psychological Measurement, 62(6), 921–943.

    Article  Google Scholar 

  • McDonald, R. P. (1999). Test theory: A unified approach. Mahwah, NJ: Lawrence Erlbaum Associates.

    Google Scholar 

  • Oosterhof, A. C. (1976). Similarity of various item discrimination indices. Journal of Educational Measurement, 13(2), 145–150.

    Article  Google Scholar 

  • R Core Team (2015). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Retrieved from http://www.R-project.org/.

  • Raykov, T., & Marcoulides, G. A. (2015). On the relationship between classical test theory and item response theory: from one to the other and back. Educational and Psychological Measurement, 76(2), 325–338.

    Article  Google Scholar 

  • Revelle, W. (2015). Psych: Procedures for personality and psychological research. R package version 1.5.1. Retrieved from http://personality-project.org/r/psych-manual.pdf.

  • Rizzo, M. L. (2008). Statistical computing with R. Boca Raton, FL: Chapman & Hall/CRC.

    Google Scholar 

  • Takane, Y., & de Leeuw, J. (1987). On the relationship between item response theory and factor analysis of discretized variables. Psychometrika, 52(3), 393–408.

    Article  Google Scholar 

  • Traub, R. E. (1997). Classical test theory in historical perspective. Educational Measurement: Issues and Practice, 16(4), 8–14.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Diego A. Luna Bazaldua.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bazaldua, D.A.L., Lee, YS., Keller, B. et al. Assessing the performance of classical test theory item discrimination estimators in Monte Carlo simulations. Asia Pacific Educ. Rev. 18, 585–598 (2017). https://doi.org/10.1007/s12564-017-9507-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12564-017-9507-4

Keywords

Navigation