Skip to main content
Log in

An approximation of theK outN reliability of a test, and a scoring procedure for determining which items an examinee knows

  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

Consider any scoring procedure for determining whether an examinee knows the answer to a test item. Letx i = 1 if a correct decision is made about whether the examinee knows the ith item; otherwisex i = 0. Thek out ofn reliability of a test isρ k = Pr (Σx i k). That is,ρ k is the probability of making at leastk correct decisions for a typical (randomly sampled) examinee. This paper proposes an approximation ofρ k that can be estimated with an answer-until-correct test. The paper also suggests a scoring procedure that might be used whenρ k is judged to be too small under a conventional scoring rule where it is decided an examinee knows if and only if the correct response is given.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Ashler, D. Biserial estimators in the presence of guessing.Journal of Educational Statistics, 1979,4, 325–355.

    Google Scholar 

  • Bahadur, R. R. A representation of the joint distribution of responses ton dichotomous items. In H. Solomon (Ed.)Studies in Item Analysis and Prediction. Stanford: Stanford University Press, 1961.

    Google Scholar 

  • Barlow, R., Bartholomew, D., Bremner, J., & Brunk, H.Statistical inference under order restrictions. New York: Wiley, 1972.

    Google Scholar 

  • Bliss, L. B. A test of Lord's assumption regarding examinee guessing behavior on multiple-choice tests using elementary school students.Journal of Educational Measurement, 1980,17, 147–153.

    Google Scholar 

  • Coombs, C. H., Milholland, J. E., & Womer, F. B. The assessment of partial information.Educational and Psychological Measurement, 1956,16, 13–37.

    Google Scholar 

  • Copas, J. B. On symmetric compound decision rules for dichotomies.Annals of Statistics, 1974,2, 199–204.

    Google Scholar 

  • Cross, L. H., & Frary, R. B. An empirical test of Lord's theoretical results regarding formula-scoring of multiple-choice tests.Journal of Educational Measurement, 1977,14, 313–321.

    Google Scholar 

  • Dayton, C. M., & Macready, G. B. A probabilistic model for validation of behavioral hierarchies.Psychometrika, 1976,41, 189–204.

    Google Scholar 

  • Dillon, W. R., & Goldstein, M. On the performance of some multinomial classification rules.Journal of the American Statistical Association, 1978,73, 305–313.

    Google Scholar 

  • Gilbert, E. S. On discrimination using qualitative variables.Journal of the American Statistical Association, 1968,63, 1399–1412.

    Google Scholar 

  • Macready, G. B., & Dayton, C. M. The use of probabilistic models in the assessment of mastery.Journal of Educational Statistics, 1977,2, 99–120.

    Google Scholar 

  • Moore, II, D. H. Evaluation of five discrimination procedures for binary variables.Journal of the American Statistical Association, 1973,68, 399–404.

    Google Scholar 

  • Robertson, T. Testing for and against an order restriction on multinomial parameters.Journal of the American Statistical Association, 1978,73, 197–202.

    Google Scholar 

  • Tong, Y. L.Probability inequalities in multivariate distributions. New York: Academic Press, 1980.

    Google Scholar 

  • van den Brink, W. P., & Koele, P. Item sampling, guessing and decision-making in achievement testing.British Journal of Mathematical and Statistical Psychology, 1980,33, 104–108.

    Google Scholar 

  • Weitzman, R. A. Ideal multiple-choice items.Journal of the American Statistical Association, 1970,65, 71–89.

    Google Scholar 

  • Wilcox, R. R. Determining the length of a criterion-referenced test.Applied Psychological Measurement, 1980,4, 425–446.

    Google Scholar 

  • Wilcox, R. R. Some empirical and theoretical results on an answer-until-correct scoring procedure.British Journal of Mathematical and Statistical Psychology, 1982,35, 57–70. (a)

    Google Scholar 

  • Wilcox, R. R. Some new results on an answer-until-correct scoring procedure.Journal of Educational Measurement, 1982,19, 67–74. (b)

    Google Scholar 

  • Wilcox, R. R. Using results onk out ofn system reliability to study and characterize tests.Educational and Psychological Measurement, 1982,42, 153–165. (c)

    Google Scholar 

  • Wilcox, R. R. Bounds on thek out ofn reliability of a test, and an exact test for hierarchically related items.Applied Psychological Measurement, in press. (a)

  • Wilcox, R. R. How do examinees behave when taking multiple-choice tests?Applied Psychological Measurement, in press. (b)

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wilcox, R.R. An approximation of theK outN reliability of a test, and a scoring procedure for determining which items an examinee knows. Psychometrika 48, 211–222 (1983). https://doi.org/10.1007/BF02294016

Download citation

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02294016

Key Word

Navigation