The performance of various classical test theory (CTT) item discrimination estimators has been compared in the literature using both empirical and simulated data, resulting in mixed results regarding the preference of some discrimination estimators over others. This study analyzes the performance of various item discrimination estimators in CTT: point-biserial correlation, point-biserial correlation with item excluded from the test total score, biserial correlation, phi coefficient splitting total score using the median, and discrimination index. For this study, data were generated from unidimensional logistic item response theory (IRT) models with one and two parameters. The factors considered in the study were test length, intervals for item difficulty and item discrimination parameters, as well as the composition of one or two groups of the examinees with specific ability distribution parameters. Results indicate that the biserial coefficient was most highly correlated with the IRT discrimination parameter across different simulation conditions. The degree of comparability among estimators and estimator invariance varied across conditions.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Alemoni, L. M., & Spencer, R. E. (1969). A comparison of biserial discrimination, point-biserial discrimination, and difficulty indices in item analysis data. Educational and Psychological Measurement, 29, 353–358.
Baker, F. B., & Kim, S. H. (2004). Item response theory: Parameter estimation techniques. Boca Raton, FL: CRC Press.
Bechger, T. M., Maris, G., Verstralen, H. H., & Béguin, A. A. (2003). Using classical test theory in combination with item response theory. Applied Psychological Measurement, 27(5), 319–334.
Berk, R. A., & Griesemer, H. A. (1976). ITEMAN: An item analysis program for tests, questionnaires, and scales. Educational and Psychological Measurement, 36(1), 189–191.
Beuchert, A., & Mendoza, J. L. (1979). A Monte Carlo comparison of ten item discrimination indices. Journal of Educational Measurement, 16(2), 109–117.
Bowers, J. (1972). A note on comparing r biserial and r point-biserial. Educational and Psychological Measurement, 32, 771–775.
Brennan, R. L. (1972). A generalized U-L item discrimination index. Educational and Psychological Measurement, 32, 289–303.
Brodgen, H. E. (1949). A new coefficient: Application to biserial correlation and to estimation of selective efficiency. Psychometrika, 14, 169–182.
Crocker, L. M., & Algina, J. (1986). Introduction to classical and modern test theory. New York, NY: Holt, Rinehart and Winston.
Davis, F. B. (1951). Item selection techniques. In E. F. Lindquist (Ed.), Educational Measurement (pp. 266–328). Washington, DC: American Council on Education.
Dawber, T., Rogers, W. T., & Carbonaro, M. (2009). Robustness of Lord’s formulas for item difficulty and discrimination conversions between classical and item response theory models. The Alberta Journal of Educational Research, 55(4), 512–533.
de Ayala, R. J. (2009). The theory and practice of item response theory. New York: The Guilford Press.
Embretson, S. E. (1996). The new rules of measurement. Psychological Assessment, 8(4), 341–349.
Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Psychology Press.
Engelhart, M. D. (1965). A comparison of several item discrimination indices. Journal of Educational Measurement, 2(1), 69–76.
Fan, X. (1998). Item response theory and classical test theory: An empirical comparison of their item/person statistics. Educational and Psychological Measurement, 58(3), 357–381.
Fletcher, T. D. (2010). Psychometric: Applied Psychometric Theory. R package version 2.2. Retrieved from http://CRAN.R-project.org/package=psychometric.
Fox, J. (2010). Polycor: Polychoric and polyserial correlations. R package version 0.7-8. Retrieved from http://CRAN.R-project.org/package=polycor.
Hambleton, R. K., & Jones, R. W. (1993). An NCME instructional module on comparison of classical test theory and item response theory and their applications to test development. Educational Measurement: Issues and Practice, 12(3), 38–47.
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park, CA: Sage.
Harwell, M., Stone, C. A., Hsu, T. C., & Kirisci, L. (1996). Monte Carlo studies in item response theory. Applied Psychological Measurement, 20(2), 101–125.
Ivens, S. H. (1971). Non-parametric item evaluation index. Educational and Psychological Measurement, 31, 843–849.
Kamata, A., & Bauer, D. J. (2008). A note on the relation between factor analytic and item response theory models. Structural Equation Modeling, 15, 136–153.
Kim, S. H. (1997). BILOG 3 for windows: Item analysis and test scoring with binary logistic models. Applied Psychological Measurement, 21(4), 371–376.
Kohli, N., Koran, J., & Henn, L. (2014). Relationships among classical test theory and item response theory frameworks via factor analytic models. Educational and Psychological Measurement, 75(3), 389–405.
Liu, F. (2008). Comparison of several popular discrimination indices based on different criteria and their application in item analysis (Unpublished master thesis). Athens, GA: University of Georgia.
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Englewood Cliffs, NJ: Erlbaum.
Lord, F., & Novick, M. R. (1968). Statistical theories of mental test scores. Oxford, UK: Addison-Wesley.
MacDonald, P., & Paunonen, S. V. (2002). A Monte Carlo comparison of item and person statistics based on item response theory versus classical test theory. Educational and Psychological Measurement, 62(6), 921–943.
McDonald, R. P. (1999). Test theory: A unified approach. Mahwah, NJ: Lawrence Erlbaum Associates.
Oosterhof, A. C. (1976). Similarity of various item discrimination indices. Journal of Educational Measurement, 13(2), 145–150.
R Core Team (2015). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Retrieved from http://www.R-project.org/.
Raykov, T., & Marcoulides, G. A. (2015). On the relationship between classical test theory and item response theory: from one to the other and back. Educational and Psychological Measurement, 76(2), 325–338.
Revelle, W. (2015). Psych: Procedures for personality and psychological research. R package version 1.5.1. Retrieved from http://personality-project.org/r/psych-manual.pdf.
Rizzo, M. L. (2008). Statistical computing with R. Boca Raton, FL: Chapman & Hall/CRC.
Takane, Y., & de Leeuw, J. (1987). On the relationship between item response theory and factor analysis of discretized variables. Psychometrika, 52(3), 393–408.
Traub, R. E. (1997). Classical test theory in historical perspective. Educational Measurement: Issues and Practice, 16(4), 8–14.
About this article
Cite this article
Bazaldua, D.A.L., Lee, YS., Keller, B. et al. Assessing the performance of classical test theory item discrimination estimators in Monte Carlo simulations. Asia Pacific Educ. Rev. 18, 585–598 (2017). https://doi.org/10.1007/s12564-017-9507-4
- Item discrimination
- Binary item scores
- Classical test theory
- Item response theory
- Monte Carlo