, Volume 8, Issue 4, pp 223–245 | Cite as

A course in the theory of mental tests

  • Harold Gulliksen


An outline for a course in test theory is presented, together with a list of assignments, problems, and a bibliography. The course has been given in the Psychology Department of the University of Chicago. The material is presented in outline form at the present time because of the increased need for training in test theory due to the increase in the use of psychological tests for classification of military personnel, and because much of the material in such a course must be selected from a wide array of articles in the literature. This material is presented in order that an organized body of material for instructional purposes may be readily available to those interested.


Public Policy Statistical Theory Outline Form Military Personnel Psychological Test 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Adkins, Dorothy C. A comparative study of methods of selecting items. Dissertation on file in library of Ohio State University. Abstract, Psychology Library.Google Scholar
  2. 2.
    Adkins, Dorothy C., and Toops, Herbert A., 1937. Simplified formulas for item selection and construction.Psychometrika,2, 165–171.Google Scholar
  3. 3.
    Ayres, Leonard P., 1911. A scale for measuring the quality of handwriting of school children. New York: Publication on Measurement in Education, Division of Education, Russell Sage Fund Bulletin No. 113.Google Scholar
  4. 4.
    Babitz, Milton, and Keys, Noel. 1940. A method for approximating the average intercorrelation coefficient by correlating the parts with the sum of the parts.Psychometrika,5, 283–288.Google Scholar
  5. 5.
    Board of Examinations, The University of Chicago. 1937. Manual of Examination Methods. Second Edition. Chicago: Univ. Chicago Bookstore. Pp. 177.Google Scholar
  6. 6.
    Boring, E. G. 1919. Mathematical vs. scientific significance.Psychol. Bull.,16, 335–338.Google Scholar
  7. 7.
    —— 1920. The logic of the normal law of error in mental measurement.Amer. J. Psychol.,31, 1–33.Google Scholar
  8. 8.
    Bradford, Leland P. 1940. The effect of practice upon standard errors of estimate.Psychol. Monogr.,52, No. 3, 56–71.Google Scholar
  9. 9.
    Brown, William, and Thomson, Godfrey. 1921. Essentials of mental measurement. Cambridge Univ. Press. Pp. viii + 216.Google Scholar
  10. 10.
    Buros, Oscar K. 1936. Educational, Psychological, and Personality Tests of 1933, 1934, and 1935. New Brunswick, New Jersey: School of Education, Rutgers University. Pp. 83. Reviews 1–503.Google Scholar
  11. 11.
    —— 1937. Educational, Psychological, and Personality Tests of 1936. New Brunswick, New Jersey: School of Education, Rutgers University. Pp. 141. Reviews 504–868.Google Scholar
  12. 12.
    —— 1938. The 1938 Mental Measurements Yearbook. New Brunswick, New Jersey: School of Education, Rutgers University. Pp. xiv + 415. Reviews 869–1181.Google Scholar
  13. 13.
    —— 1941. The 1940 Mental Measurements Yearbook. New Brunswick, New Jersey: School of Education, Rutgers University. Pp. xxi + 674. Reviews 1182–1684.Google Scholar
  14. 14.
    Burt, Cyril. 1936. Supplement. In “The Marks of Examiners” by Hartog, P. J., and Rhodes, E. C. London: Macmillan and Company. Pp. xix + 344.Google Scholar
  15. 15.
    Douglass, H. R. 1934. Some observations and data on certain methods of measuring the predictive significance of the Pearson product-moment coefficient of correlation.J. educ. Psychol.,25, 225–232.Google Scholar
  16. 16.
    Dressel, Paul L. 1940. Some remarks on the Kuder-Richardson reliability coefficient.Psychometrika,5, 305–310.Google Scholar
  17. 17.
    Dunlap, Jack W. 1936 (a). Note on the computation of bi-serial correlation in item evaluation.Psychometrika,1, 51–58.Google Scholar
  18. 18.
    —— 1939 (b). Nomograph for computing bi-serial correlations.Psychometrika,1, 59–60.Google Scholar
  19. 19.
    Dunlap, Jack and Kurtz, A. K. 1932. Handbook of statistical nomographs and formulas. New York: World Book Company. vii + 163.Google Scholar
  20. 20.
    Edgerton, H. A. and Toops, H. A. 1928. A formula for finding the average inter-correlation coefficient for unranked raw scores without solving any of the individual intercorrelations.J. educ. Psychol.,19, 131–138.Google Scholar
  21. 21.
    Edgerton, H. A. and Kolbe, Laverne E. 1936. The method of minimum variation for the combination of criteria.Psychometrika,1, 183–187.Google Scholar
  22. 22.
    Englehart, Max D. 1942. Unique types of achievement test exercises.Psychometrika,7, 103–115.Google Scholar
  23. 23.
    Flanagan, John C. 1936. A short method for selecting the best combination of test items for a particular purpose.Psychol. Bull.,33, 603–604.Google Scholar
  24. 24.
    ——. 1939. Scaled scores. New York: Cooperative Test Service.Google Scholar
  25. 25.
    Freeman, Frank N. 1917. A critique of the Yerkes-Bridges-Hardwick comparison of the Binet-Simon and point scales.Psychol. Rev. 24, 484.Google Scholar
  26. 26.
    ——. 1939. Mental tests: Their history, principles, and applications. Cambridge, Mass.: The Riverside Press, Rev.Google Scholar
  27. 27.
    Frisch, Ragnar. 1934. Statistical confluence analysis by means of complete regression systems. Oslo.Google Scholar
  28. 28.
    Garrett, Henry E. 1943. The discriminant function and its use in psychology.Psychometrika,8, 65–79.Google Scholar
  29. 29.
    Guilford, J. P. 1936 (a). The determination of item difficulty when chance success is a factor.Psychometrika,1, 259–264.Google Scholar
  30. 30.
    ——. 1936. (b). Psychometric methods. New York: McGraw-Hill.Google Scholar
  31. 31.
    ——. 1937. The psychophysics of mental test difficulty.Psychometrika,2, 121–133.Google Scholar
  32. 32.
    Gulliksen, Harold. 1936. The content reliability of a test.Psychometrika,1, 189–194.Google Scholar
  33. 33.
    Hawkes, H. E., Lindquist, E. F., and Mann, C. R. 1936. The construction and use of achievement examinations. Boston: Houghton-Mifflin Company.Google Scholar
  34. 34.
    Hildreth, G. H. 1939. A bibliography of mental tests and rating scales. 2nd Ed. New York: The Psychological Corporation. Pp. xxiv + 295.Google Scholar
  35. 35.
    Holmes, Henry W. 1917. A descriptive bibliography of measurement in elementary subjects. Cambridge, Mass.: Harvard Univ. Press.Google Scholar
  36. 36.
    Holzinger, Karl J., and Clayton, Blythe. 1925. Further experiments in the application of Spearman's prophecy formula.J. educ. Psychol.,16, 289–299.Google Scholar
  37. 37.
    Horst, Paul. 1934. (a). Item selection by the method of successive residuals.J. exper. Educ.,2, 254–263.Google Scholar
  38. 38.
    ——. 1934 (b). Increasing the efficiency of selection tests.The Personnel Journal,12, 254–259.Google Scholar
  39. 39.
    ——. 1936 (a). Obtaining a composite measure from different measures of the same attributes.Psychometrika,1, 53–60.Google Scholar
  40. 40.
    ——. 1936 (b). Item selection by means of a maximizing function.Psychometrika,1, 229–244.Google Scholar
  41. 41.
    Hull, Clark L. 1928. Aptitude testing. New York: World Book Company. Pp. xiv + 535.Google Scholar
  42. 42.
    Kelley, Truman L. 1927. Interpretation of educational measurements. New York: World Book Company.Google Scholar
  43. 43.
    ——. 1924. Statistical methods. New York: Macmillan Company. Pp. xi + 389.Google Scholar
  44. 44.
    ——. 1923. The principles and techniques of mental measurement.Amer. J. Psychol.,34, 408–432.Google Scholar
  45. 45.
    Kuder, G. F. 1937. Nomograph for point biserialr, biserialr, and fourfold correlations.Psychometrika,2, 135–138.Google Scholar
  46. 46.
    Kuder, G. F., and Richardson, M. W. 1937. The theory of the estimation of test reliability.Psychometrika,2, 151–160.Google Scholar
  47. 47.
    Lee, J. M., and Symonds, P. M. 1934. New type or objective tests: a summary of investigations (Oct. 1931-Oct. 1933).J. educ. Psychol.,25, 161–184.Google Scholar
  48. 48.
    Lentz, T. F., Hirshstein, Bertha, and Finch, J. H. 1932. Evaluation of methods of evaluating test items.J. educ. Psychol.,23, 344–350.Google Scholar
  49. 49.
    Lindquist, E. F. 1940. Statistical analysis in educational research. New York: Houghton Mifflin Co.Google Scholar
  50. 50.
    Long, John A., Sandiford, Peter, et al. 1935. The validation of test items. Bull. Dept. Educ. Res., Ontario Coll. Educ., No. 3, 126 pages.Google Scholar
  51. 51.
    McCall, W. A. 1922. How to measure in education. New York: The Macmillan Company. Pp. xii + 416.Google Scholar
  52. 52.
    Merrill, Walter W., Jr. 1937. Sampling theory in item analysis.Psychometrika,2, 215–224.Google Scholar
  53. 53.
    Monroe, Paul (Editor). 1939. Conference on examinations at Dinard, France, Sept. 16–19, 1938. New York: Bureau of Publications, Teachers College, Columbia University. Pp. xiii + 330.Google Scholar
  54. 54.
    Monroe, Walter S. 1923. The theory of educational measurements. New York: Houghton-Mifflin Company.Google Scholar
  55. 55.
    Monroe, Walter S. 1934. A note on efiiciency of prediction.J. educ. Psychol.,25, 547–548.Google Scholar
  56. 56.
    Moore, Clarence Carl. 1940. The rights-minus wrongs method of correcting chance factors in the T-F examination.J. genet. Psychol.,57, 317–326.Google Scholar
  57. 57.
    Monroe, Walter S. and Englehart, Max D. 1936. Scientific study of educational problems. New York: The Macmillan Company.Google Scholar
  58. 58.
    Mosier, Charles I. 1936. A note on item analysis and the criterion of internal consistency.Psychometrika,1, 275–282.Google Scholar
  59. 59.
    ——. 1940. Psychophysics and mental test theory: fundamental postulates and elementary theorems.Psychol. Rev.,47, 355–366.Google Scholar
  60. 60.
    National Society for the Study of Education. 1918. 17th Yearbook, Part II. Bloomington, Ill.: Public School Publishing Company.Google Scholar
  61. 61.
    Orleans, Jacob S. 1937. Measurement in education. New York: Thomas Nelson and Sons. Pp. xvi + 461.Google Scholar
  62. 62.
    Otis, A. S. 1922 (a). The method for finding the correspondence between scores in two tests.J. educ. Psychol.,13, 529–45.Google Scholar
  63. 63.
    ——. 1922 (b). A method of inferring the change in a coefficient of correlation resulting from a change in the heterogeneity of the group.J. educ. Psychol.,13, 293–294.Google Scholar
  64. 64.
    Otis, A. S., and Knollin, H. E. 1921. The reliability of the Binet scale and of pedagogical scales.J. educ. Research,4, 121–142.Google Scholar
  65. 65.
    Richardson, M. W. 1935. Abac for computing tetrachoric coefficients in item analysis. Chicago: Univ. Chicago Board of Examination.Google Scholar
  66. 66.
    ——. 1936 (a). Notes on the rationale of item analysis.Psychometrika,1, 69–76.Google Scholar
  67. 67.
    ——. 1936 (b). The relation of difficulty to the differential validity of a test.Psychometrika,1, 33–49.Google Scholar
  68. 68.
    Richardson, M. W., and Adkins, Dorothy C. 1938. A rapid method of selecting test items.J. educ. Psychol.,29, 547–552.Google Scholar
  69. 69.
    Richardson, M. W., and Kuder, G. F. 1939. The computation of test reliability by the method of rational equivalence.J. educ. Psychol.,30, 681–687.Google Scholar
  70. 70.
    Ruch, G. M. and Stoddard, G. P. 1927. Test and measurements in high-school instruction. New York: World Book Company.Google Scholar
  71. 71.
    Ruch, G. M., Ackerson, L., and Jackson, J. P. 1926. An empirical study of the Spearman-Brown formula as applied to educational test material.J. educ. Psychol.,17, 309–313.Google Scholar
  72. 72.
    Ruger, Georgie J. 1918. Bibliography of psychological tests. New York: Bureau of Educational Measurements.Google Scholar
  73. 73.
    Rulon, Phillip J. 1939. A simplified procedure for determining the reliability of a test by split halves.Harvard educ. Rev.,9, 99–103.Google Scholar
  74. 74.
    Segel, David. 1933. A note of an error made in investigations of homogeneous grouping.J. educ. Psychol.,24, 64–66.Google Scholar
  75. 75.
    Smith, B. O. 1938. Logical aspects of educational measurement. New York: Columbia Univ. Press.Google Scholar
  76. 76.
    Spearman, Charles. 1910. Correlation from faulty data.Brit. J. Psychol.,3, 271–295.Google Scholar
  77. 77.
    ——. 1907. Demonstration of formulae for true measurement of correlation.Amer. J. Psychol.,18, 161–169.Google Scholar
  78. 78.
    Stalnaker, J. M. 1938. Weighting questions in the essay-type examination.J. educ. Psychol.,29, 481–490.Google Scholar
  79. 79.
    Stalnaker, J. M. and Richardson, M. W. 1933. A note concerning the combination of test scores.J. gen. Psychol.,8, 460–463.Google Scholar
  80. 80.
    Starch, D., and Elliot, E. C. 1912. Reliability of grading high-school work in English.School Review, September, 442–457.Google Scholar
  81. 81.
    --. 1913. Reliability of grading high-school work in mathematics.School Review, April, 254–259.Google Scholar
  82. 82.
    --. 1913. Reliability of grading high-school work in history.School Review, December. 676–681.Google Scholar
  83. 83.
    Stern, William. 1914. The psychological methods of testing intelligence. Baltimore: Warwick and York.Google Scholar
  84. 84.
    Swineford, F. 1936. Validity of test items.J. educ. Psychol.,27, 68–78.Google Scholar
  85. 85.
    Thorndike, E. L. 1922. On finding equivalent scores in tests of intelligence.J. appl. Psychol.,6, 29–33.Google Scholar
  86. 86.
    Thurstone, L. L. 1919. A method for scoring tests.Psychol. Bull.,16, 235–240.Google Scholar
  87. 87.
    —— 1925. A method of scaling psychological and educational tests.J. educ. Psychol,16, 433–451.Google Scholar
  88. 88.
    —— 1926. The mental age concept.Psychol. Rev.,33, 268–278.Google Scholar
  89. 89.
    —— 1928. The absolute zero in intelligence measurement.Psychol. Rev.,35, 175–197.Google Scholar
  90. 90.
    —— 1931. The reliability and validity of tests. Ann Arbor, Mich.: Edwards Brothers. Planographed.Google Scholar
  91. 91.
    —— 1924. Fundamentals of statistics. New York: The Macmillan Company. Pp. xvi + 237.Google Scholar
  92. 92.
    Thurstone, T. G. 1932. The difficulty of a test and its diagnostic value.J. educ. Psychol.,23, 335–343.Google Scholar
  93. 93.
    Toops, H. A., and Symonds, P. M. 1922. What shall we expect of the A. Q.?J. educ. Psychol.,13, 513–528.Google Scholar
  94. 94.
    Travers, R. M. W. 1939. The use of a discriminant function in the treatment of psychological group differences.Psychometrika,4, 25–32.Google Scholar
  95. 95.
    Walker, Helen M. 1929. Studies in the history of statistical method. Baltimore: The Williams and Wilkins Company. Pp. 186.Google Scholar
  96. 96.
    Whipple, Guy M. 1910. Manual of mental and physical tests. Baltimore: Warwick and York. Vols. I and II.Google Scholar
  97. 97.
    Wilks, S. S. 1938. Weighting systems for linear functions of correlated variables when there is no dependent variable.Psychometrika,3, 23–40.Google Scholar
  98. 98.
    Yerkes, R. M., Bridges, J. W., and Hardwick, R. S. 1915. A point scale for measuring mental ability. Baltimore: Warwick and York.Google Scholar

Copyright information

© Psychometric Society 1943

Authors and Affiliations

  • Harold Gulliksen
    • 1
  1. 1.Psychology DepartmentThe University of ChicagoUSA

Personalised recommendations