Abstract
The primary purpose of this study is to investigate the mathematical characteristics of the test reliability coefficient ρXX′ as a function of item response theory (IRT) parameters and present the lower and upper bounds of the coefficient. Another purpose is to examine relative performances of the IRT reliability statistics and two classical test theory (CTT) reliability statistics (Cronbach’s alpha and Feldt–Gilmer congeneric coefficients) under various testing conditions that result from manipulating large-scale real data. For the first purpose, two alternative ways of exactly quantifying ρXX′ are compared in terms of computational efficiency and statistical usefulness. In addition, the lower and upper bounds for ρXX′ are presented in line with the assumptions of essential tau-equivalence and congeneric similarity, respectively. Empirical studies conducted for the second purpose showed across all testing conditions that (1) the IRT reliability coefficient was higher than the CTT reliability statistics; (2) the IRT reliability coefficient was closer to the Feldt–Gilmer coefficient than to the Cronbach’s alpha coefficient; and (3) the alpha coefficient was close to the lower bound of IRT reliability. Some advantages of the IRT approach to estimating test-score reliability over the CTT approaches are discussed in the end.
Similar content being viewed by others
References
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 397–479). Reading, MA: Addison-Wesley.
Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, 443–459.
Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297–334.
Dimitrov, D. M. (2003). Marginal true-score measures and reliability for binary items as a function of their IRT parameters. Applied Psychological Measurement, 27, 440–458.
Feldt, L. S. (2002). Estimating the internal consistency reliability of tests composed of testlets varying in length. Applied Measurement in Education, 15, 33–48.
Feldt, L. S., & Brennan, R. L. (1989). Reliability. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 105–146). New York: Macmillan.
Gilmer, J. S., & Feldt, L. S. (1983). Reliability estimation for a test with parts of unknown lengths. Psychometrika, 48, 99–111.
Gulliksen, H. (1950). Theory of mental tests. New York: Wiley.
Haertel, E. H. (2006). Reliability. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 65–110). Westport, CT: American Council on Education and Praeger.
Harwell, M. R., & Baker, F. B. (1991). The use of prior distributions in marginalized Bayesian item parameter estimation: A didactic. Applied Psychological Measurement, 15, 375–389.
Harwell, M. R., Baker, F. B., & Zwarts, M. (1988). Item parameter estimation via marginal maximum likelihood and an EM algorithm: A didactic. Journal of Educational Statistics, 13, 243–271.
Holland, P. W., & Hoskens, M. (2003). Classical test theory as a first-order item response theory: Application to true-score prediction from a possibly nonparallel test. Psychometrika, 68, 123–149.
Kolen, M. J., & Brennan, R. L. (2004). Test equating, scaling, and linking: Methods and practices (2nd ed.). New York: Springer.
Kolen, M. J., Zeng, L., & Hanson, B. A. (1996). Conditional standard errors of measurement for scale scores using IRT. Journal of Educational Measurement, 33, 129–140.
Lord, F. M. (1980). Applications of item response theory to practical testing applications. Hillsdale, NJ: Lawrence Erlbaum.
May, K., & Nicewander, W. A. (1994). Reliability and information functions for percentile ranks. Journal of Educational Measurement, 31, 313–325.
Meredith, W. (1965). Some results based on a general stochastic model for mental tests. Psychometrika, 30, 419–440.
Mislevy, R. J. (1984). Estimating latent distributions. Psychometrika, 49, 359–381.
Mislevy, R. J. (1986). Bayes modal estimation in item response models. Psychometrika, 51, 177–195.
Mislevy, R. J., & Bock, R. D. (1990). BILOG 3: Item analysis and test scoring with binary logistic models (2nd ed.). Mooresville, IN: Scientific Software International.
Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16, 159–176.
Muraki, E., & Bock, R. D. (2003). PARSCALE 4: IRT item analysis and test scoring for rating-scale data. Lincolnwood, IL: Scientific Software International.
Novick, M. R., & Lewis, C. (1967). Coefficient alpha and the reliability of composite measurements. Psychometrika, 32, 1–13.
Press, W. H., Teukolsky, S. A., Vetterling, W. T., & Flannery, B. P. (1992). Numerical recipes in C: The art of scientific computing (2nd ed.). New York, NY: Cambridge University Press.
Shojima, K., & Toyoda, H. (2002). Estimation of Cronbach’s alpha coefficient in the context of item response theory. The Japanese Journal of Psychology, 73, 227–233. (In Japanese).
Thissen, D. (1982). Marginal maximum likelihood estimation for the one-parameter logistic model. Psychometrika, 47, 175–186.
Thissen, D., Pommerich, M., Billeaud, K., & Williams, V. S. L. (1995). Item response theory for scores on tests including polytomous items with ordered responses. Applied Psychological Measurement, 19, 39–49.
Tsutakawa, R. K., & Lin, H. Y. (1986). Bayesian estimation of item response curves. Psychometrika, 51, 251–267.
Woodruff, D. J., & Hanson, B. A. (1996). Estimation of item response models using the EM algorithm for finite mixtures. Iowa City, IA: ACT, Inc. (ACT Research Report 96–6).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kim, S., Feldt, L.S. The estimation of the IRT reliability coefficient and its lower and upper bounds, with comparisons to CTT reliability statistics. Asia Pacific Educ. Rev. 11, 179–188 (2010). https://doi.org/10.1007/s12564-009-9062-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12564-009-9062-8