Statistical inference for multiple choice tests

Hsu, John S. J.; Leonard, Tom; Tsui, Kam-Wah

doi:10.1007/BF02294466

Statistical inference for multiple choice tests

Published: June 1991

Volume 56, pages 327–348, (1991)
Cite this article

Psychometrika Aims and scope Submit manuscript

John S. J. Hsu¹,
Tom Leonard² &
Kam-Wah Tsui²

155 Accesses
7 Citations
Explore all metrics

Abstract

Finite sample inference procedures are considered for analyzing the observed scores on a multiple choice test with several items, where, for example, the items are dissimilar, or the item responses are correlated. A discrete p-parameter exponential family model leads to a generalized linear model framework and, in a special case, a convenient regression of true score upon observed score. Techniques based upon the likelihood function, Akaike's information criteria (AIC), an approximate Bayesian marginalization procedure based on conditional maximization (BCM), and simulations for exact posterior densities (importance sampling) are used to facilitate finite sample investigations of the average true score, individual true scores, and various probabilities of interest. A simulation study suggests that, when the examinees come from two different populations, the exponential family can adequately generalize Duncan's beta-binomial model. Extensions to regression models, the classical test theory model, and empirical Bayes estimation problems are mentioned. The Duncan, Keats, and Matsumura data sets are used to illustrate potential advantages and flexibility of the exponential family model, and the BCM technique.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A new criterion for assessing discriminant validity in variance-based structural equation modeling

Article Open access 22 August 2014

Small is beautiful: In defense of the small-N design

Article Open access 19 March 2018

Recognize the Value of the Sum Score, Psychometrics’ Greatest Accomplishment

Article Open access 17 April 2024

References

Akaike, H. (1978). A Bayesian analysis of the minimum AIC procedure.Annals of the Institute of Statistical Mathematics, 30(A), 9–14.
Google Scholar
Altham, P. M. E. (1978). Two generalizations of the binomial distribution.Applied Statistics, 27, 162–167.
Google Scholar
Anderson, D. A., & Aitken, M. (1985). Marginal maximum likelihood estimation of item parameters: Application of an algorithm.Journal of Royal Statistical Society, Series B, 26, 203–210.
Google Scholar
Atilgan, T. (1983).Parameter parsimony, model selection, and smooth density estimation. Unpublished doctoral dissertation, University of Wisconsin-Madison.
Atilgan, T., & Leonard, T. (1988). On the application of AIC to bivariate density estimation, non-parametric regression, and discrimination. In H. Bozadogan & A. K. Gupta (Eds.),Multivariate statistical modeling and data analysis (pp. 1–16). Dordrecht, Holland: Reidel.
Google Scholar
Bell, S. S. (1990). Empirical Bayes alternatives to the beta-binomial model. Unpublished doctoral dissertation, Columbia University.
Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories.Psychometrika, 37, 29–51.
Google Scholar
Bock, R. D., & Aitken, M. (1981). Marginal maximum likelihood estimation of item parameters: application of an EM algorithm.Psychometrika, 46, 443–454.
Google Scholar
Carter, M. C., & Williford, W. O. (1975). Estimation in a modified binomial distribution.Applied Statistics, 24, 319–328.
Google Scholar
Consul, P. C. (1974). A simple urn model dependent upon predetermined strategy.Sankhya, Series B, 36, 391–399.
Google Scholar
Consul, P. C. (1975). On a characterization of Lagrangian Poisson and quasi-binomial distributions.Communications in Statistics, 4, 555–563.
Google Scholar
Dalal, S. R., & Hall, W. J. (1983). Approximating priors by mixtures of natural conjugate priors.Journal of Royal Statistical Society, Series B, 45, 278–286.
Google Scholar
Duncan, G. T. (1974). An empirical Bayes approach to scoring multiple-choice tests in the misinformation model.Journal of the American Statistical Association, 69, 50–57.
Google Scholar
Gelfand, A. E., & Smith, A. F. M. (1990). Sampling based approaches to calculating marginal densities.Journal of the American Statistical Association, 85, 398–409.
Google Scholar
Geweke, J. (1988). Antithetic acceleration of Monte-Carlo integration in Bayesian inference.Journal of Econometrics, 38, 73–89.
Google Scholar
Geweke, J. (1989). Exact predictive density for linear models with arch distributions.Journal of Econometrics, 40, 63–86.
Google Scholar
Hsu, J. S.J. (1990).Bayesian inference and marginalization. Unpublished doctoral dissertation, University of Wisconsin-Madison.
Keats, J. A. (1964). Some generalizations of a theoretical distribution of mental test scores.Psychometrika, 29, 215–231.
Google Scholar
Lehmann, E. L. (1983).Theory of point estimation. New York: John Wiley & Sons.
Google Scholar
Leonard, T. (1972). Bayesian methods for binomial data.Biometrika, 59, 581–589.
Google Scholar
Leonard, T. (1973). A Bayesian method for histograms.Biometrika, 60, 297–308.
Google Scholar
Leonard, T. (1982). Comment on the paper by Lejeune and Faulkenberry.Journal of the American Statistical Association, 77, 657–658.
Google Scholar
Leonard, T. (1984). Some data-analytic modifications to Bayes-Stein estimation.Annals of the Institute of Statistical Mathematics, 36, 11–21.
Google Scholar
Leonard, T., Hsu, J. S.J., & Tsui, K. (1989). Bayesian marginal inference.Journal of the American Statistical Association, 84, 1051–1058.
Google Scholar
Leonard, T., & Novick, J. B. (1986). Bayesian full rank marginalization for two-way contingency tables.Journal of Educational Statistics, 11, 33–56.
Google Scholar
Lord, F. M. (1965). A strong true-score theory, with applications.Psychometrika, 30, 239–270.
Google Scholar
Lord, F. M. (1969). Estimating true-score distributions in psychological testing: An empirical Bayes estimation problem.Psychometrika, 34, 259–299.
Google Scholar
Lord, F. M., & Novick, M. R. (1968).Statistical theories of mental test scores (with contributions by Allen Birnbaum). Reading, MA: Addison-Wiley.
Google Scholar
Lord, F. M., & Stocking, M. L. (1976). An interval estimate for making statistical inference about true scores.Psychometrika, 41, 79–87.
Google Scholar
McCullagh, P., & Nelder, J. A. (1985).Generalized linear models. New York: Chapman and Hall.
Google Scholar
Mislevy, R. J. (1986). Bayes modal estimation in item response.Psychometrika, 51, 177–195.
Google Scholar
Morrison, D. G., & Brockway, G. (1979). A modified beta-binomial model with applications to multiple choice and taste tests.Psychometrika, 44, 427–442.
Google Scholar
Prentice, R. L., & Barlow, W. E. (1988). Correlated binary regression with covariates specific to each binary observation.Biometrics, 44, 1033–48.
Google Scholar
Rubinstein, R. Y. (1981).Simulation and the Monte Carlo method. New York: John Wiley and Sons.
Google Scholar
Schwarz, G. (1978). Estimating the dimension of a model.Annals of Mathematical Statistics, 6, 461–464.
Google Scholar
Takane, Y., Bozdogan, H., & Shibayama, T. (1987). Ideal point discriminant analysis.Psychometrika, 52, 371–392.
Google Scholar
Wilcox, R. R. (1981a). A review of the beta-binomial model and its extensions.Journal of Educational Statistics, 6, 3–32.
Google Scholar
Wilcox, R. R. (1981b). A cautionary note on estimating the reliability of a mastery test with the beta-binomial model.Applied Psychological Measurement, 5, 531–537.
Google Scholar
Young, A. S. (1977). A Bayesian approach to prediction using polynomials.Biometrika, 64, 309–318.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics and Applied Probability, University of California-Santa Barbara, 93106, Santa Barbara, CA
John S. J. Hsu
Department of Statistics, The University of Wisconsin, Madison
Tom Leonard & Kam-Wah Tsui

Authors

John S. J. Hsu
View author publications
You can also search for this author in PubMed Google Scholar
Tom Leonard
View author publications
You can also search for this author in PubMed Google Scholar
Kam-Wah Tsui
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

The authors wish to thank Ella Mae Matsumura for her data set and helpful comments, Frank Baker for his advice on item response theory, Hirotugu Akaike and Taskin Atilgan, for helpful discussions regarding AIC, Graham Wood for his advice concerning the class of all binomial mixture models, Yiu Ming Chiu for providing useful references and information on tetrachoric models, and the Editor and two referees for suggesting several references and alternative approaches.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hsu, J.S.J., Leonard, T. & Tsui, KW. Statistical inference for multiple choice tests. Psychometrika 56, 327–348 (1991). https://doi.org/10.1007/BF02294466

Download citation

Received: 06 November 1989
Revised: 06 June 1990
Issue Date: June 1991
DOI: https://doi.org/10.1007/BF02294466

Key words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Statistical inference for multiple choice tests

Abstract

Access this article

Similar content being viewed by others

A new criterion for assessing discriminant validity in variance-based structural equation modeling

Small is beautiful: In defense of the small-N design

Recognize the Value of the Sum Score, Psychometrics’ Greatest Accomplishment

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Key words

Navigation

Statistical inference for multiple choice tests

Abstract

Access this article

Similar content being viewed by others

A new criterion for assessing discriminant validity in variance-based structural equation modeling

Small is beautiful: In defense of the small-N design

Recognize the Value of the Sum Score, Psychometrics’ Greatest Accomplishment

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

Search

Navigation