Skip to main content
Log in

A Markov Chain Monte Carlo Approach to Confirmatory Item Factor Analysis

  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

Item factor analysis has a rich tradition in both the structural equation modeling and item response theory frameworks. The goal of this paper is to demonstrate a novel combination of various Markov chain Monte Carlo (MCMC) estimation routines to estimate parameters of a wide variety of confirmatory item factor analysis models. Further, I show that these methods can be implemented in a flexible way which requires minimal technical sophistication on the part of the end user. After providing an overview of item factor analysis and MCMC, results from several examples (simulated and real) will be discussed. The bulk of these examples focus on models that are problematic for current “gold-standard” estimators. The results demonstrate that it is possible to obtain accurate parameter estimates using MCMC in a relatively user-friendly package.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Adams, R.J., Wilson, M., & Wang, W. (1997). The multidimensional random coefficients multinomial logit model. Applied Psychological Measurement, 21, 1–23.

    Article  Google Scholar 

  • Albert, J.H. (1992). Bayesian estimation of normal ogive item response curves using Gibbs sampling. Journal of Educational Statistics, 17, 251–269.

    Article  Google Scholar 

  • Albert, J.H., & Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association, 88, 669–679.

    Article  Google Scholar 

  • Béguin, A.A., & Glas, C.A.W. (2001). MCMC estimation and some model-fit analysis of multidimensional IRT models. Psychometrika, 66, 541–561.

    Article  Google Scholar 

  • Best, N.G., Cowles, M.K., & Vines, S.K. (1997). coda: Convergence diagnosis and output analysis software for Gibbs sampling output (Version 0.4) [Computer software]. Cambridge: University of Cambridge, Institute of Public Health, Medical Research Council Biostatistics Unit.

    Google Scholar 

  • Bock, R.D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: An application of the EM algorithm. Psychometrika, 46, 443–459.

    Article  Google Scholar 

  • Bock, R.D., Gibbons, R., & Muraki, E. (1988). Full-information item factor analysis. Applied Psychological Measurement, 12, 261–280.

    Article  Google Scholar 

  • Bock, R.D., Gibbons, R., Schilling, S.G., Muraki, E., Wilson, D.T., & Wood, R. (2002). TESTFACT 4 [Computer software]. Chicago: Scientific Software International, Inc.

    Google Scholar 

  • Bolt, D.M., & Lall, V.F. (2003). Estimation of compensatory and noncompensatory multidimensional IRT models using Markov chain Monte Carlo. Applied Psychological Measurement, 27, 395–414.

    Article  Google Scholar 

  • Bradlow, E.T., Wainer, H., & Wang, X. (1999). A Bayesian random effects model for testlets. Psychometrika, 64, 153–168.

    Article  Google Scholar 

  • Cai, L. (In Press-a). High-dimensional exploratory item factor analysis by a Metropolis–Hastings Robbins–Monro algorithm. Psychometrika.

  • Cai, L. (In Press-b). Metropolis–Hastings Robbins–Monro algorithm for confirmatory item factor analysis. Journal of Educational and Behavioral Statistics.

  • Cai, L., Maydeu-Olivares, A., Coffman, D.L., & Thissen, D. (2006). Limited-information goodness-of-fit testing of item response models for sparse 2p tables. British Journal of Mathematical and Statistical Psychology, 59, 173–194.

    Article  PubMed  Google Scholar 

  • Casella, G., & George, E.I. (1992). Explaining the Gibbs sampler. The American Statistician, 46, 167–174.

    Article  Google Scholar 

  • Chen, M.-H., Shao, Q.-M., & Ibrahim, J.G. (2000). Monte Carlo methods in Bayesian computation. New York: Springer.

    Google Scholar 

  • Chib, S., & Greenberg, E. (1995). Understanding the Metropolis–Hastings algorithm. The American Statistician, 49, 327–335.

    Article  Google Scholar 

  • Cowles, M.K. (1996). Accelerating Monte Carlo Markov chain convergence for cumulative-link generalized linear models. Statistics and Computing, 6, 101–111.

    Article  Google Scholar 

  • Cowles, M.K., & Carlin, B. (1996). Markov chain Monte Carlo convergence diagnostics: A comparative review. Journal of the American Statistical Association, 91, 883–904.

    Article  Google Scholar 

  • de la Torre, J., & Patz, R.J. (2005). Making the most of what we have: A practical application of multidimensional item response theory in test scoring. Journal of Educational and Behavioral Statistics, 30, 295–311.

    Article  Google Scholar 

  • DeMars, C.E. (2006). Application of the bi-factor multidimensional item response theory model to testlet-based tests. Journal of Educational Measurement, 43, 145–168.

    Article  Google Scholar 

  • DeMars, C.E. (2007). “Guessing” parameter estimates for multidimensional item response theory models. Educational and Psychological Measurement, 67, 433–446.

    Article  Google Scholar 

  • Edwards, M.C. (2005a). A Markov chain Monte Carlo approach to confirmatory item factor analysis. Unpublished doctoral dissertation, University of North Carolina at Chapel Hill.

  • Edwards, M.C. (2005b). MultiNorm: Multidimensional normal ogive item response theory analysis [Computer software].

  • Edwards, M.C., & Vevea, J.L. (2006). An empirical Bayes approach to subscore augmentation: How much strength can we borrow? The Journal of Educational and Behavioral Statistics, 31, 241–259.

    Article  Google Scholar 

  • Edwards, M.C., & Wirth, R.J. (2009). Measurement and the study of change. Research in Human Development, 6, 74–96.

    Article  Google Scholar 

  • Fox, J.-P., & Glas, C.A.W. (2001). Bayesian estimation of a multilevel IRT model using Gibbs sampling. Psychometrika, 66, 269–286.

    Article  Google Scholar 

  • Gamerman, D. (1997). Markov chain Monte Carlo. New York: Chapman and Hall.

    Google Scholar 

  • Gelman, A. (1996). Inference and monitoring convergence. In W.R. Gilks, S. Richardson, & D.J. Spiegelhalter (Eds.), Markov chain Monte Carlo in practice (pp. 131–143). London: Chapman and Hall.

    Google Scholar 

  • Gelman, A., Carlin, J.B., Stern, H.S., & Rubin, D.B. (2004). Bayesian data analysis (2nd ed.). New York: Chapman and Hall.

    Google Scholar 

  • Gelman, A., & Rubin, D.B. (1992). Inference from iterative simulation using multiple sequences (with discussion). Statistical Science, 7, 457–511.

    Article  Google Scholar 

  • Geman, S., & Geman, D. (1984). Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 721–741.

    Article  Google Scholar 

  • Geweke, J. (1992). Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments. In J.M. Bernardo, J. Berger, A.P. Dawid, & A.F.M. Smith (Eds.), Bayesian statistics 4 (pp. 169–193). Oxford: Oxford University Press.

    Google Scholar 

  • Gibbons, R.D., Bock, R.D., Hedeker, D., Weiss, D.J., Segawa, E., Bhaumik, D.K., et al. (2007). Full-information item bifactor analysis of graded response data. Applied Psychological Measurement, 31, 4–19.

    Article  Google Scholar 

  • Gibbons, R.D., & Hedeker, D.R. (1992). Full-information item bi-factor analysis. Psychometrika, 57, 423–436.

    Article  Google Scholar 

  • Gibbons, R.D., Rush, A.J., & Immekus, J.C. (2009). On the psychometric validity of the domains of the pdsq: An illustration of the bi-factor item response theory model. Journal of Psychiatric Research, 43, 401–410.

    Article  PubMed  Google Scholar 

  • Gilks, W.R., Richardson, S., & Spiegelhalter, D.J. (1996a). Introducing Markov chain Monte Carlo. In W.R. Gilks, S. Richardson, & D.J. Spiegelhalter (Eds.), Markov chain Monte Carlo in practice (pp. 1–19). New York: Chapman and Hall.

    Google Scholar 

  • Gilks, W.R., Richardson, S., & Spiegelhalter, D.J. (Eds.) (1996b). Markov chain Monte Carlo in practice. New York: Chapman and Hall.

    Google Scholar 

  • Gill, J. (2008). Bayesian methods: A social and behavioral sciences approach. New York: Chapman and Hall/CRC.

    Google Scholar 

  • Hastings, W.K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57, 97–109.

    Article  Google Scholar 

  • Heidelberger, P., & Welch, P.D. (1983). Simulation run length control in the presence of an initial transient. Operations Research, 31, 1109–1144.

    Article  Google Scholar 

  • Hill, C.D., Edwards, M.C., Thissen, D., Langer, M.M., Wirth, R.J., Burwinkle, T.M., et al. (2007). Practical issues in the application of item response theory: A demonstration using item form the Pediatric Quality of Life Inventory (PedsQL) 4.0 Generic Core Scales. Medical Care, 45, S39–S47.

    Article  PubMed  Google Scholar 

  • Holzinger, K.J., & Swineford, F. (1937). The bi-factor method. Psychometrika, 2, 41–54.

    Article  Google Scholar 

  • Jöreskog, K.G., & Sörbom, D. (2001). LISREL user’s guide. Chicago: SSI International.

    Google Scholar 

  • Jöreskog, K.G., & Sörbom, D. (2003). LISREL 8.54 [Computer software]. Chicago: Scientific Software International, Inc.

    Google Scholar 

  • Kang, T., & Cohen, A.S. (2007). Irt model selection methods for dichotomous items. Applied Psychological Measurement, 31, 331–358.

    Article  Google Scholar 

  • Kass, R.E., Carlin, B.P., Gelman, A., & Neal, R.M. (1998). Markov chain Monte Carlo in practice: A roundtable discussion. The American Statistician, 52, 93–100.

    Article  Google Scholar 

  • Lord, F.M., & Novick, M.R. (1968). Statistical theories of mental test scores. Reading: Addison-Wesley.

    Google Scholar 

  • Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., & Teller, E. (1953). Equation of state calculations by fast computing machines. The Journal of Chemical Physics, 21, 1087–1092.

    Article  Google Scholar 

  • Metropolis, N., & Ulam, S. (1949). The Monte Carlo method. Journal of the American Statistical Association, 44, 335–341.

    Article  PubMed  Google Scholar 

  • Patz, R.J., & Junker, B.W. (1999a). A straightforward approach to Markov chain Monte Carlo methods for item response models. Journal of Educational and Behavioral Statistics, 24, 146–178.

    Google Scholar 

  • Patz, R.J., & Junker, B.W. (1999b). Applications and extensions of MCMC in IRT: Multiple item types, missing data, and rated responses. Journal of Educational and Behavioral Statistics, 24, 342–366.

    Google Scholar 

  • Pearson, K. (1914). The life, letters and labours of Francis Gallon (Vol. I). Cambridge: Cambridge University Press.

    Google Scholar 

  • R Development Core Team (2005). R: A language and environment for statistical computing [Computer software]. Vienna: R Foundation for Statistical Computing. Retrieved from http://www.R-project.org. Available from http://www.R-project.org.

  • Raftery, A.E., & Lewis, S. (1992). How many iterations in the Gibbs sampler? In J.M. Bernardo, J. Berger, A.P. Dawid, & A.F.M. Smith (Eds.), Bayesian statistics 4 (pp. 763–773). Oxford: Oxford University Press.

    Google Scholar 

  • Roberts, G.O. (1996). Markov chain concepts related to sampling algorithms. In W.R. Gilks, S. Richardson, & D.J. Spiegelhalter (Eds.), Markov chain Monte Carlo in practice (pp. 45–57). New York: Chapman and Hall.

    Google Scholar 

  • Samejima, F. (1969). Psychometrika Monograph, No. 17: Estimation of latent ability using a response pattern of graded scores.

  • Schilling, S., & Bock, R.D. (2005). High-dimensional maximum marginal likelihood item factor analysis by adaptive quadrature. Psychometrika, 70, 533–555.

    Google Scholar 

  • Segall, D.O. (2002). Confirmatory item factor analysis using Markov chain Monte Carlo estimation with applications to online calibration in CAT. Paper presented at the annual meeting of the National Council on Measurement in Education, New Orleans, LA.

  • Shi, J.-Q., & Lee, S.-Y. (1998). Bayesian sampling-based approach for factor analysis models with continuous and polytomous data. British Journal of Mathematical and Statistical Psychology, 51, 233–252.

    Google Scholar 

  • Sinharay, S. (2004). Experiences with Markov chain Monte Carlo convergence assessment in two psychometric examples. Journal of Educational and Behavioral Statistics, 29, 461–488.

    Article  Google Scholar 

  • Sinharay, S. (2006). Posterior predictive assessment of item response theory models. Applied Psychological Measurement, 30, 298–321.

    Article  Google Scholar 

  • Takane, Y., & de Leeuw, J. (1987). On the relationship between item response theory and factor analysis of discretized variables. Psychometrika, 52, 393–408.

    Article  Google Scholar 

  • Tanner, M.A. (1996). Tools for statistical inference. New York: Springer.

    Google Scholar 

  • Tanner, M.A., & Wong, W.H. (1987). The calculation of posterior distributions by data augmentation (with discussion). Journal of the American Statistical Association, 82, 528–550.

    Article  Google Scholar 

  • Thissen, D. (1991). Multilog: Multiple category item analysis and test scoring using item response theory [Computer software]. Chicago: Scientific Software International, Inc.

    Google Scholar 

  • Thurstone, L.L. (1947). Multiple-factor analysis. Chicago: University of Chicago Press.

    Google Scholar 

  • Wainer, H., Bradlow, E.T., & Du, Z. (2000). Testlet response theory: An analog for the 3-PL useful in testlet-based adaptive testing. In W.J. van der Linden & C.A.W. Glas (Eds.), Computerized adaptive testing: Theory and practice (pp. 245–270). Boston: Kluwer Academic.

    Google Scholar 

  • Wainer, H., & Kiely, G. (1987). Item clusters and computerized adaptive testing: A case for testlets. Journal of Educational Measurement, 24, 185–202.

    Article  Google Scholar 

  • Wainer, H., Vevea, J.L., Camacho, F., Reeve, B.B., Rosa, K., Nelson, L., et al. (2001). Augmented scores—“Borrowing strength” to compute scores based on a small number of items. In D. Thissen & H. Wainer (Eds.), Test scoring (pp. 347–387). Mahwah: Lawrence Erlbaum Associates, Inc.

    Google Scholar 

  • Wang, X., Bradlow, E.T., & Wainer, H. (2002). A general Bayesian model for testlets: Theory and applications. Applied Psychological Measurement, 26, 109–128.

    Article  Google Scholar 

  • Wirth, R.J., & Edwards, M.C. (2007). Item factor analysis: Current approaches and future directions. Psychological Methods, 12, 58–79.

    Article  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael C. Edwards.

Additional information

I would like to thank Li Cai, David Thissen, and R.J. Wirth for comments on earlier versions of this draft. I would like to thank Roger Millsap and the reviewers for their guidance on revisions. The resulting paper is better for all of your efforts. Any remaining faults are my own.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Edwards, M.C. A Markov Chain Monte Carlo Approach to Confirmatory Item Factor Analysis. Psychometrika 75, 474–497 (2010). https://doi.org/10.1007/s11336-010-9161-9

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11336-010-9161-9

Keywords

Navigation