The estimation of the IRT reliability coefficient and its lower and upper bounds, with comparisons to CTT reliability statistics

Kim, Seonghoon; Feldt, Leonard S.

doi:10.1007/s12564-009-9062-8

The estimation of the IRT reliability coefficient and its lower and upper bounds, with comparisons to CTT reliability statistics

Published: 28 January 2010

Volume 11, pages 179–188, (2010)
Cite this article

Asia Pacific Education Review Aims and scope Submit manuscript

Seonghoon Kim¹ &
Leonard S. Feldt²

1482 Accesses
34 Citations
Explore all metrics

Abstract

The primary purpose of this study is to investigate the mathematical characteristics of the test reliability coefficient ρ_XX′ as a function of item response theory (IRT) parameters and present the lower and upper bounds of the coefficient. Another purpose is to examine relative performances of the IRT reliability statistics and two classical test theory (CTT) reliability statistics (Cronbach’s alpha and Feldt–Gilmer congeneric coefficients) under various testing conditions that result from manipulating large-scale real data. For the first purpose, two alternative ways of exactly quantifying ρ_XX′ are compared in terms of computational efficiency and statistical usefulness. In addition, the lower and upper bounds for ρ_XX′ are presented in line with the assumptions of essential tau-equivalence and congeneric similarity, respectively. Empirical studies conducted for the second purpose showed across all testing conditions that (1) the IRT reliability coefficient was higher than the CTT reliability statistics; (2) the IRT reliability coefficient was closer to the Feldt–Gilmer coefficient than to the Cronbach’s alpha coefficient; and (3) the alpha coefficient was close to the lower bound of IRT reliability. Some advantages of the IRT approach to estimating test-score reliability over the CTT approaches are discussed in the end.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Performance of Five Reliability Estimates in Multidimensional Test Situations

A comparison of reliability coefficients for psychometric tests that consist of two parts

Article Open access 08 February 2015

Some Relationships Between Cronbach’s Alpha and the Spearman-Brown Formula

Article 21 March 2015

References

Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 397–479). Reading, MA: Addison-Wesley.
Google Scholar
Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, 443–459.
Article Google Scholar
Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297–334.
Article Google Scholar
Dimitrov, D. M. (2003). Marginal true-score measures and reliability for binary items as a function of their IRT parameters. Applied Psychological Measurement, 27, 440–458.
Article Google Scholar
Feldt, L. S. (2002). Estimating the internal consistency reliability of tests composed of testlets varying in length. Applied Measurement in Education, 15, 33–48.
Article Google Scholar
Feldt, L. S., & Brennan, R. L. (1989). Reliability. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 105–146). New York: Macmillan.
Google Scholar
Gilmer, J. S., & Feldt, L. S. (1983). Reliability estimation for a test with parts of unknown lengths. Psychometrika, 48, 99–111.
Article Google Scholar
Gulliksen, H. (1950). Theory of mental tests. New York: Wiley.
Google Scholar
Haertel, E. H. (2006). Reliability. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 65–110). Westport, CT: American Council on Education and Praeger.
Google Scholar
Harwell, M. R., & Baker, F. B. (1991). The use of prior distributions in marginalized Bayesian item parameter estimation: A didactic. Applied Psychological Measurement, 15, 375–389.
Article Google Scholar
Harwell, M. R., Baker, F. B., & Zwarts, M. (1988). Item parameter estimation via marginal maximum likelihood and an EM algorithm: A didactic. Journal of Educational Statistics, 13, 243–271.
Article Google Scholar
Holland, P. W., & Hoskens, M. (2003). Classical test theory as a first-order item response theory: Application to true-score prediction from a possibly nonparallel test. Psychometrika, 68, 123–149.
Article Google Scholar
Kolen, M. J., & Brennan, R. L. (2004). Test equating, scaling, and linking: Methods and practices (2nd ed.). New York: Springer.
Google Scholar
Kolen, M. J., Zeng, L., & Hanson, B. A. (1996). Conditional standard errors of measurement for scale scores using IRT. Journal of Educational Measurement, 33, 129–140.
Article Google Scholar
Lord, F. M. (1980). Applications of item response theory to practical testing applications. Hillsdale, NJ: Lawrence Erlbaum.
Google Scholar
May, K., & Nicewander, W. A. (1994). Reliability and information functions for percentile ranks. Journal of Educational Measurement, 31, 313–325.
Article Google Scholar
Meredith, W. (1965). Some results based on a general stochastic model for mental tests. Psychometrika, 30, 419–440.
Article Google Scholar
Mislevy, R. J. (1984). Estimating latent distributions. Psychometrika, 49, 359–381.
Article Google Scholar
Mislevy, R. J. (1986). Bayes modal estimation in item response models. Psychometrika, 51, 177–195.
Article Google Scholar
Mislevy, R. J., & Bock, R. D. (1990). BILOG 3: Item analysis and test scoring with binary logistic models (2nd ed.). Mooresville, IN: Scientific Software International.
Google Scholar
Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16, 159–176.
Article Google Scholar
Muraki, E., & Bock, R. D. (2003). PARSCALE 4: IRT item analysis and test scoring for rating-scale data. Lincolnwood, IL: Scientific Software International.
Google Scholar
Novick, M. R., & Lewis, C. (1967). Coefficient alpha and the reliability of composite measurements. Psychometrika, 32, 1–13.
Article Google Scholar
Press, W. H., Teukolsky, S. A., Vetterling, W. T., & Flannery, B. P. (1992). Numerical recipes in C: The art of scientific computing (2nd ed.). New York, NY: Cambridge University Press.
Google Scholar
Shojima, K., & Toyoda, H. (2002). Estimation of Cronbach’s alpha coefficient in the context of item response theory. The Japanese Journal of Psychology, 73, 227–233. (In Japanese).
Google Scholar
Thissen, D. (1982). Marginal maximum likelihood estimation for the one-parameter logistic model. Psychometrika, 47, 175–186.
Article Google Scholar
Thissen, D., Pommerich, M., Billeaud, K., & Williams, V. S. L. (1995). Item response theory for scores on tests including polytomous items with ordered responses. Applied Psychological Measurement, 19, 39–49.
Article Google Scholar
Tsutakawa, R. K., & Lin, H. Y. (1986). Bayesian estimation of item response curves. Psychometrika, 51, 251–267.
Article Google Scholar
Woodruff, D. J., & Hanson, B. A. (1996). Estimation of item response models using the EM algorithm for finite mixtures. Iowa City, IA: ACT, Inc. (ACT Research Report 96–6).
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Education, Keimyung University, 2800 Dalgubeoldaero, Dalseo-Gu, Daegu, 704-701, South Korea
Seonghoon Kim
The University of Iowa, Iowa City, IA, USA
Leonard S. Feldt

Authors

Seonghoon Kim
View author publications
You can also search for this author in PubMed Google Scholar
Leonard S. Feldt
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Seonghoon Kim.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, S., Feldt, L.S. The estimation of the IRT reliability coefficient and its lower and upper bounds, with comparisons to CTT reliability statistics. Asia Pacific Educ. Rev. 11, 179–188 (2010). https://doi.org/10.1007/s12564-009-9062-8

Download citation

Received: 16 February 2009
Revised: 10 October 2009
Accepted: 30 November 2009
Published: 28 January 2010
Issue Date: June 2010
DOI: https://doi.org/10.1007/s12564-009-9062-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The estimation of the IRT reliability coefficient and its lower and upper bounds, with comparisons to CTT reliability statistics

Abstract

Access this article

Similar content being viewed by others

The Performance of Five Reliability Estimates in Multidimensional Test Situations

A comparison of reliability coefficients for psychometric tests that consist of two parts

Some Relationships Between Cronbach’s Alpha and the Spearman-Brown Formula

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The estimation of the IRT reliability coefficient and its lower and upper bounds, with comparisons to CTT reliability statistics

Abstract

Access this article

Similar content being viewed by others

The Performance of Five Reliability Estimates in Multidimensional Test Situations

A comparison of reliability coefficients for psychometric tests that consist of two parts

Some Relationships Between Cronbach’s Alpha and the Spearman-Brown Formula

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation