Coefficient alpha and the internal structure of tests

Abstract

A general formula (α) of which a special case is the Kuder-Richardson coefficient of equivalence is shown to be the mean of all split-half coefficients resulting from different splittings of a test. α is therefore an estimate of the correlation between two random samples of items from a universe of items like those in the test. α is found to be an appropriate index of equivalence and, except for very short tests, of the first-factor concentration in the test. Tests divisible into distinct subtests should be so divided before using the formula. The index\(\bar r_{ij} \), derived from α, is shown to be an index of inter-item homogeneity. Comparison is made to the Guttman and Loevinger approaches. Parallel split coefficients are shown to be unnecessary for tests of common types. In designing tests, maximum interpretability of scores is obtained by increasing the first-factor concentration in any separately-scored subtest and avoiding substantial group-factor clusters within a subtest. Scalability is not a requisite.

This is a preview of subscription content, access via your institution.

References

  1. 1.

    Brogden, H. E. Variation in test validity with variation in the distribution of item difficulties, number of items, and degree of their intercorrelation.Psychometrika, 1946,11, 197–214.

    Google Scholar 

  2. 2.

    Brown, W. Some experimental results in the correlation of mental abilities.Brit. J. Psychol., 1910,3, 296–322.

    Google Scholar 

  3. 3.

    Brownell, W. A. On the accuracy with which reliability may be measured by correlating test halves.J. exper. Educ., 1933,1, 204–215.

    Google Scholar 

  4. 4.

    Burt, C. The influence of differential weighting.Brit. J. Psychol., Stat. Sect., 1950,3, 105–128.

    Google Scholar 

  5. 5.

    Clark, E. L. Methods of splitting vs. samples as sources of instability in test-reliability coefficients.Harvard educ. Rev., 1949,19, 178–182.

    Google Scholar 

  6. 6.

    Coombs, C. H. The concepts of reliability and homogeneity.Educ. psychol. Meas., 1950,10, 43–56.

    Google Scholar 

  7. 7.

    Cronbach, L. J. On estimates of test reliability.J. educ. Psychol., 1943,34, 485–494.

    Google Scholar 

  8. 8.

    Cronbach, L. J. A case study of the split-half reliability coefficient.J. educ. Psychol., 1946,37, 473–480.

    Google Scholar 

  9. 9.

    Cronbach, L. J. Test “reliability”: its meaning and determination.Psychometrika, 1947,12, 1–16.

    Article  Google Scholar 

  10. 10.

    Dressel, P. L. Some remarks on the Kuder-Richardson reliability coefficient.Psychometrika, 1940,5, 305–310.

    Article  Google Scholar 

  11. 11.

    Ferguson, G. The factorial interpretation of test difficulty.Psychometrika, 1941,6, 323–329.

    Google Scholar 

  12. 12.

    Ferguson, G. The reliability of mental tests. London: Univ. of London Press, 1941.

    Google Scholar 

  13. 13.

    Festinger, L. The treatment of qualitative data by “scale analysis.”Psychol. Bull., 1947,44, 149–161.

    Google Scholar 

  14. 14.

    Goodenough, F. L. A critical note on the use of the term “reliability” in mental measurement.J. educ. Psychol., 1936,27, 173–178.

    Google Scholar 

  15. 15.

    Guilford, J. P., ed. Printed classification tests. Report No. 5, Army Air Forces Aviation Psychology Program. Washington: U. S. Govt. Print. Off., 1947.

    Google Scholar 

  16. 16.

    Guilford, J. P. Fundamental statistics in psychology and education. Second ed. New York: McGraw-Hill, 1950.

    Google Scholar 

  17. 17.

    Guilford, J. P., and Michael, W. B. Changes in common-factor loadings as tests are altered homogeneously in length.Psychometrika, 1950,15, 237–249.

    Article  PubMed  Google Scholar 

  18. 18.

    Gulliksen, H. Theory of mental tests. New York: Wiley, 1950.

    Google Scholar 

  19. 19.

    Guttman, L. A basis for analyzing test-retest reliability.Psychometrika, 1945,10, 255–282.

    Article  Google Scholar 

  20. 20.

    Hoyt, C. Test reliability estimated by analysis of variance.Psychometrika, 1941,6, 153–160.

    Article  Google Scholar 

  21. 21.

    Humphreys, L. G. Test homogeneity and its measurement.Amer. Psychologist, 1949,4, 245.

    Google Scholar 

  22. 22.

    Jackson, R. W., and Ferguson, G. A. Studies on the reliability of tests. Bull. No. 12, Dept. of Educ. Res., University of Toronto, 1941.

  23. 23.

    Kelley, T. L. Note on the reliability of a test: a reply to Dr. Crum's criticism.J. educ. Psychol., 1924,15, 193–204.

    Google Scholar 

  24. 24.

    Kelley, T. L. Statistical method. New York: Macmillan, 1924.

    Google Scholar 

  25. 25.

    Kelley, T. L. The reliability coefficient.Psychometrika, 1942,7, 75–83.

    Article  Google Scholar 

  26. 26.

    Kuder, G. F., and Richardson, M. W. The theory of the estimation of test reliability.Psychometrika, 1937,2, 151–160.

    Google Scholar 

  27. 27.

    Loevinger, J. A systematic approach to the construction and evaluation of tests of ability.Psychol. Monogr., 1947,61, No. 4.

    Google Scholar 

  28. 28.

    Loevinger, J. The technic of homogeneous tests compared with some aspects of “scale analysis” and factor analysis.Psychol. Bull., 1948,45, 507–529.

    Google Scholar 

  29. 29.

    Mosier, C. I. A short cut in the estimation of split-halves coefficients.Educ. psychol. Meas., 1941,1, 407–408.

    Google Scholar 

  30. 30.

    Richardson, M. Combination of measures, pp. 379–401 in Horst, P. (Ed.) The prediction of personal adjustment. New York: Social Science Res. Council, 1941.

    Google Scholar 

  31. 31.

    Rulon, P. J. A simplified procedure for determining the reliability of a test by split-halves.Harvard educ. Rev., 1939,9, 99–103.

    Google Scholar 

  32. 32.

    Shannon, C. E. The mathematical theory of communication. Urbana: Univ. of Ill. Press, 1949.

    Google Scholar 

  33. 33.

    Spearman, C. Correlation calculated with faulty data.Brit. J. Psychol., 1910,3, 271–295.

    Google Scholar 

  34. 34.

    Stouffer, S. A., et. al. Measurement and prediction. Princeton: Princeton Univ. Press, 1950.

    Google Scholar 

  35. 35.

    Thurstone, L. L., and Thurstone, T. G. Factorial studies of intelligence, p. 37. Chicago: Univ. of Chicago Press, 1941.

    Google Scholar 

  36. 36.

    Tucker, L. R. Maximum validity of a test with equivalent items.Psychometrika, 1946,11, 1–13.

    Article  Google Scholar 

  37. 37.

    Vernon, P. E. An application of factorial analysis to the study of test items.Brit. J. Psychol., Stat. Sec., 1950,3, 1–15.

    Google Scholar 

  38. 38.

    Wherry, R. J., and Gaylord, R. H. The concept of test and item reliability in relation to factor pattern.Psychometrika, 1943,8, 247–264.

    Article  Google Scholar 

  39. 39.

    Woodbury, M. A. On the standard length of a test. Res. Bull. 50–53, Educ. Test. Service, 1950.

Download references

Author information

Affiliations

Authors

Additional information

The assistance of Dora Damrin and Willard Warrington is gratefully acknowledged. Miss Damrin took major responsibility for the empirical studies reported. This research was supported by the Bureau of Research and Service, College of Education.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Cronbach, L.J. Coefficient alpha and the internal structure of tests. Psychometrika 16, 297–334 (1951). https://doi.org/10.1007/BF02310555

Download citation

Keywords

  • Random Sample
  • Public Policy
  • Internal Structure
  • Statistical Theory
  • Common Type