Abstract
Consider any scoring procedure for determining whether an examinee knows the answer to a test item. Letx i = 1 if a correct decision is made about whether the examinee knows the ith item; otherwisex i = 0. Thek out ofn reliability of a test isρ k = Pr (Σx i ≥k). That is,ρ k is the probability of making at leastk correct decisions for a typical (randomly sampled) examinee. This paper proposes an approximation ofρ k that can be estimated with an answer-until-correct test. The paper also suggests a scoring procedure that might be used whenρ k is judged to be too small under a conventional scoring rule where it is decided an examinee knows if and only if the correct response is given.
Similar content being viewed by others
References
Ashler, D. Biserial estimators in the presence of guessing.Journal of Educational Statistics, 1979,4, 325–355.
Bahadur, R. R. A representation of the joint distribution of responses ton dichotomous items. In H. Solomon (Ed.)Studies in Item Analysis and Prediction. Stanford: Stanford University Press, 1961.
Barlow, R., Bartholomew, D., Bremner, J., & Brunk, H.Statistical inference under order restrictions. New York: Wiley, 1972.
Bliss, L. B. A test of Lord's assumption regarding examinee guessing behavior on multiple-choice tests using elementary school students.Journal of Educational Measurement, 1980,17, 147–153.
Coombs, C. H., Milholland, J. E., & Womer, F. B. The assessment of partial information.Educational and Psychological Measurement, 1956,16, 13–37.
Copas, J. B. On symmetric compound decision rules for dichotomies.Annals of Statistics, 1974,2, 199–204.
Cross, L. H., & Frary, R. B. An empirical test of Lord's theoretical results regarding formula-scoring of multiple-choice tests.Journal of Educational Measurement, 1977,14, 313–321.
Dayton, C. M., & Macready, G. B. A probabilistic model for validation of behavioral hierarchies.Psychometrika, 1976,41, 189–204.
Dillon, W. R., & Goldstein, M. On the performance of some multinomial classification rules.Journal of the American Statistical Association, 1978,73, 305–313.
Gilbert, E. S. On discrimination using qualitative variables.Journal of the American Statistical Association, 1968,63, 1399–1412.
Macready, G. B., & Dayton, C. M. The use of probabilistic models in the assessment of mastery.Journal of Educational Statistics, 1977,2, 99–120.
Moore, II, D. H. Evaluation of five discrimination procedures for binary variables.Journal of the American Statistical Association, 1973,68, 399–404.
Robertson, T. Testing for and against an order restriction on multinomial parameters.Journal of the American Statistical Association, 1978,73, 197–202.
Tong, Y. L.Probability inequalities in multivariate distributions. New York: Academic Press, 1980.
van den Brink, W. P., & Koele, P. Item sampling, guessing and decision-making in achievement testing.British Journal of Mathematical and Statistical Psychology, 1980,33, 104–108.
Weitzman, R. A. Ideal multiple-choice items.Journal of the American Statistical Association, 1970,65, 71–89.
Wilcox, R. R. Determining the length of a criterion-referenced test.Applied Psychological Measurement, 1980,4, 425–446.
Wilcox, R. R. Some empirical and theoretical results on an answer-until-correct scoring procedure.British Journal of Mathematical and Statistical Psychology, 1982,35, 57–70. (a)
Wilcox, R. R. Some new results on an answer-until-correct scoring procedure.Journal of Educational Measurement, 1982,19, 67–74. (b)
Wilcox, R. R. Using results onk out ofn system reliability to study and characterize tests.Educational and Psychological Measurement, 1982,42, 153–165. (c)
Wilcox, R. R. Bounds on thek out ofn reliability of a test, and an exact test for hierarchically related items.Applied Psychological Measurement, in press. (a)
Wilcox, R. R. How do examinees behave when taking multiple-choice tests?Applied Psychological Measurement, in press. (b)
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Wilcox, R.R. An approximation of theK outN reliability of a test, and a scoring procedure for determining which items an examinee knows. Psychometrika 48, 211–222 (1983). https://doi.org/10.1007/BF02294016
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF02294016