Psychometrika

, Volume 68, Issue 2, pp 169–191 | Cite as

Bayesian modeling of measurement error in predictor variables using item response theory

Articles

Abstract

It is shown that measurement error in predictor variables can be modeled using item response theory (IRT). The predictor variables, that may be defined at any level of an hierarchical regression model, are treated as latent variables. The normal ogive model is used to describe the relation between the latent variables and dichotomous observed variables, which may be responses to tests or questionnaires. It will be shown that the multilevel model with measurement error in the observed predictor variables can be estimated in a Bayesian framework using Gibbs sampling. In this article, handling measurement error via the normal ogive model is compared with alternative approaches using the classical true score model. Examples using real data are given.

Key words

classical test theory Gibbs sampler item response theory hierarchical linear models Markov Chain Monte Carlo measurement error multilevel model multilevel IRT two-parameter normal ogive model 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Albert, J.H. (1992). Bayesian estimation of normal ogive item response curves using Gibbs sampling.Journal of Educational Statistics, 17, 251–269.Google Scholar
  2. Bock, R.D., & Zimowski, M.F. (1997). Multiple group IRT. In W.J. van der Linden & R.K. Hambleton (Eds.),Handbook of modern item response theory (pp. 433–448). New York, NY: Springer.Google Scholar
  3. Béguin, A.A. (2000).Robustness of equating high-stakes tests. Unpublished doctoral dissertation, Twente University, Enschede, Netherlands.Google Scholar
  4. Béguin, A.A., & Glas, C.A.W. (2001). MCMC estimation of multidimensional IRT models.Psychometrika, 66, 541–562.Google Scholar
  5. Bernardo, J.M., & Smith, A.F.M. (1994).Bayesian theory. New York, NY: John Wiley & Sons.Google Scholar
  6. Best, N.G., Cowles, M.K., & Vines, S.K. (1995).CODA Convergence diagnosis and output analysis software for Gibbs sampler output: Version 0.3 [Computer software and manual]. University of Cambridge: MRC Biostatistics Unit.Google Scholar
  7. Bollen, K.A. (1989).Structural equations with latent variables. New York, NY: John Wiley & Sons.Google Scholar
  8. Bosker, R.J., Blatchford, P., & Meijnen, G.W. (1999). Enhancing educational excellence, equity and efficiency. In R.J. Bosker, B.P.M. Creemers, & S. Stringfield (Eds.),Evidence from evaluations of systems and schools in change (pp. 89–112). Dordrecht/Boston/London: Kluwer Academic Publishers.Google Scholar
  9. Box, G.E.P., & Tiao, G.C. (1973).Bayesian inference in statistical analysis. Reading, MA: Addison-Wesley Publishing.Google Scholar
  10. Bryk, A.S., & Raudenbush, S.W. (1992).Hierarchical linear models. Newbury Park, CA: Sage Publications.Google Scholar
  11. Carlin, B.P., & Louis, T.A. (1996).Bayes and empirical Bayes methods for data analysis. London: Chapman & Hall.Google Scholar
  12. Carroll, R., Ruppert, D., & Stefanski, L.A. (1995).Measurement error in nonlinear models. London: Chapman & Hall.Google Scholar
  13. Chen, M.-H., & Shao, Q.-M. (1999). Monte Carlo estimation of Bayesian credible and HPD intervals.Journal of Computational and Graphical Statistics, 8, 69–92.Google Scholar
  14. Chib, S., & Greenberg, E. (1995). Understanding the Metropolis-Hastings Algorithm.The American Statistician, 49, 327–335.Google Scholar
  15. Cook, T.D., & Campbell, D.T. (1979).Quasi-experimentation, design & analysis issues for field settings. Chicago, IL: Rand McNally College Publishing.Google Scholar
  16. de Leeuw, J., & Kreft, I.G.G. (1986). Random coefficient models for multilevel analysis.Journal of Educational and Behavioral Statistics, 11, 57–86.Google Scholar
  17. Fox, J.-P. (2001).Multilevel IRT: A Bayesian perspective on estimating parameters and testing statistical hypotheses. Unpublished doctoral dissertation, Twente University, Enschede, Netherlands.Google Scholar
  18. Fox, J.-P., & Glas, C.A.W. (2001). Bayesian estimation of a multilevel IRT model using Gibbs sampling.Psychometrika, 66, 269–286.Google Scholar
  19. Fuller, W.A. (1987).Measurement error models. New York, NY: John Wiley & Sons.Google Scholar
  20. Gelfand, A.E., & Smith, A.F.M. (1990). Sampling-based approaches to calculating marginal densities.Journal of the American Statistical Association, 85, 398–409.Google Scholar
  21. Gelfand, A.E., Hills, S.E., Racine-Poon, A., & Smith, A.F.M. (1990). Illustration of Bayesian inference in normal data models using Gibbs sampling.Journal of the American Statistical Association, 85, 972–985.Google Scholar
  22. Gelman A., Carlin, J.B., Stern, H.S., & Rubin, D.B. (1995).Bayesian data analysis. London: Chapman & Hall.Google Scholar
  23. Gelman, A., Meng X.-L., & Stern, H.S. (1996). Posterior predictive assessment of model fitness via realized discrepancies.Statistica Sinica, 6, 733–807.Google Scholar
  24. Geman, S., & Geman, D. (1984). Stochastic relaxation, Gibbs distribution, and the Bayesian restoration of images.IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 721–741.Google Scholar
  25. Gilks, W.R., & Roberts, G.O. (1996). Strategies for improving MCMC. In W.R. Gilks, S. Richardson, & D.J. Spiegelhalter (Eds.),Markov Chain Monte Carlo in practice (pp. 89–114). London: Chapman & Hall.Google Scholar
  26. Goldstein, H. (1995).Multilevel statistical models (2nd ed.). London: Edward Arnold.Google Scholar
  27. Gruber, M.H.J. (1998).Improving efficiency by shrinkage. New York, NY: Marcel Dekker.Google Scholar
  28. Hoijtink, H., & Boomsma, A. (1995). On person parameter estimation in the dichotomous Rasch model. In G.H. Fischer & I.W. Molenaar (Eds.),Rasch models: Foundations, recent developments and applications (pp. 53–68). New York, NY: Springer.Google Scholar
  29. Johnson, V.E., & Albert, J.H. (1999).Ordinal data modeling. New York, NY: Springer-Verlag.Google Scholar
  30. Lindley, D.V., & Smith, A.F.M. (1972). Bayes estimates for the linear model.Journal of the Royal Statistical Society, Series B,34, 1–41.Google Scholar
  31. Liu, J.S., Wong, H.W., & Kong, A. (1994). Covariance structure of the Gibbs sampler with applications to the comparisons of estimators and augmentation schemes.Biometrika, 81, 27–40.Google Scholar
  32. Lord, F.M. (1980).Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum Associates.Google Scholar
  33. Lord, F.M., & Novick, M.R. (1968).Statistical theories of mental test scores. Reading, MA: Addison-Wesley.Google Scholar
  34. MacEachern, S.N., & Berliner, L.M. (1994). Subsampling the Gibbs sampler.The American Statistician, 48, 188–190.Google Scholar
  35. McDonald, R.P. (1967). Nonlinear factor analysis.Psychometrika Monograph Number 15.Google Scholar
  36. McDonald, R.P. (1982). Linear versus nonlinear models in latent trait theory.Applied Psychological Measurement, 6, 379–396.Google Scholar
  37. McDonald, R.P. (1997). Normal-ogive multidimensional model. In W.J. van der Linden & R.K. Hambleton (Eds.),Handbook of modern item response theory (pp. 257–269). New York, NY: Springer.Google Scholar
  38. Muthén, B.O. (1989). Latent variable modeling in heterogeneous populations.Psychometrika, 54, 557–585.Google Scholar
  39. Patz, J.P., & Junker, B.W. (1999). Applications and extensions of MCMC in IRT: Multiple item types, missing data, and rated responses.Journal of Educational and Behavioral Statistics, 24, 342–366.Google Scholar
  40. Raudenbush, S.W. (1988). Educational applications of hierarchical linear models: A review.Journal of Educational Statistics, 13, 85–116.Google Scholar
  41. Raudenbush, S.W., Bryk, A.S., Cheong, Y.F., & Congdon, R.T., Jr. (2000).HLM 5. Hierarchical linear and nonlinear modeling. Lincolnwood, IL: Scientific Software International.Google Scholar
  42. Richardson, S. (1996). Measurement error. In W.R. Gilks, S. Richardson, & D.J. Spiegelhalter (Eds.),Markov Chain Monte Carlo in practice (pp. 401–417). London: Chapman & Hall.Google Scholar
  43. Robert, C.P., & Casella, G. (1999).Monte Carlo statistical methods. New York, NY: Springer.Google Scholar
  44. Roberts, G.O., & Sahu, S.K. (1997). Updating schemes, correlation structure, blocking and parameterization for the Gibbs sampler.Journal of the Royal Statistical Society, Series B,59, 291–317.Google Scholar
  45. Seltzer, M.H. (1993). Sensitivity analysis for fixed effects in the hierarchical model: A Gibbs sampling approach.Journal of Educational Statistics, 18, 207–235.Google Scholar
  46. Seltzer, M.H., Wong, W.H., & Bryk, A.S. (1996). Bayesian analysis in applications of hierarchical models: Issues and methods.Journal of Educational and Behavioral Statistics, 21, 131–167.Google Scholar
  47. Snijders, T.A.B., & Bosker, R.J. (1999).Multilevel analysis. London: Sage Publications.Google Scholar
  48. Tanner, M.A., & Wong, W.H. (1987). The calculation of posterior distributions by data augmentation.Journal of the American Statistical Association, 82, 528–550.Google Scholar
  49. Tierney, L. (1994). Markov chains for exploring posterior distributions.The Annals of Statistics, 22, 1701–1762.Google Scholar
  50. van der Linden, W.J. (1998). Optimal assembly of psychological and educational tests.Applied Psychological Measurement, 22, 195–211.Google Scholar
  51. Zellner, A. (1971).An introduction to Bayesian inference in econometrics. New York, NY: John Wiley & Sons.Google Scholar
  52. Zimowski, M.F., Muraki, E., Mislevy, R.J., & Bock, R.D. (1996).Bilog MG, Multiple-group IRT analysis and test maintenance for binary items. Chicago, IL: Scientific Software International.Google Scholar

Copyright information

© The Psychometric Society 2003

Authors and Affiliations

  1. 1.Department of Educational Measurement and Data AnalysisUniversity of TwenteEnschedeThe Netherlands

Personalised recommendations