Model Evaluation in the Presence of Categorical Data: Bayesian Model Checking as an Alternative to Traditional Methods

Bonifay, Wes; Depaoli, Sarah

doi:10.1007/s11121-021-01293-w

Model Evaluation in the Presence of Categorical Data: Bayesian Model Checking as an Alternative to Traditional Methods

Published: 14 September 2021

Volume 24, pages 467–479, (2023)
Cite this article

Prevention Science Aims and scope Submit manuscript

486 Accesses
2 Citations
Explore all metrics

Abstract

Statistical analysis of categorical data often relies on multiway contingency tables; yet, as the number of categories and/or variables increases, the number of table cells with few (or zero) observations also increases. Unfortunately, sparse contingency tables invalidate the use of standard goodness-of-fit statistics. Limited-information fit statistics and bootstrapping procedures offer valuable solutions to this problem, but they present an additional concern in their strict reliance on the (potentially misleading) observed data. To address both of these issues, we demonstrate the Bayesian model checking technique, which yields insightful, useful, and comprehensive evaluations of specific properties of a given model. We illustrate this technique using item response data from a patient-reported psychopathology screening questionnaire, and we provide annotated R code to promote dissemination of this informative method in other prevention science modeling scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reporting reliability, convergent and discriminant validity with structural equation modeling: A review and best-practice recommendations

Article Open access 30 January 2023

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

Article 04 June 2018

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

Article Open access 01 April 2016

Notes

Although we largely use default settings here, we also note that subjective prior settings are a helpful way of incorporating specific opinions, theory, or knowledge into the estimation process. We describe this element of subjectivity in more detail in the “Discussion” section.
Thank you to Dr. Waguih IsHak of the Geffen School of Medicine at UCLA for providing this data set.
One important aspect when deciding on the specification of priors is the prior-data disagreement issue that can arise. If priors are misaligned with the evidence in the data, then the posteriors can be impacted by the priors, as well as GOF. This issue is particularly common if informative, but “inaccurate,” priors are implemented. We used non-informative priors to avoid this issue. For more on data-prior conflict, please see Evans and Moshonov (2006).

References

Ackerman, T. A. (1991). The use of unidimensional parameter estimates of multidimensional items in adaptive testing. Applied Psychological Measurement, 15, 13–24.
Article Google Scholar
Ansley, T. N., & Forsyth, R. A. (1985). An examination of the characteristics of unidimensional IRT parameter estimates derived from two-dimensional data. Applied Psychological Measurement, 9, 37–48.
Article Google Scholar
Bartholomew, D. J., & Tzamourani, P. (1999). The goodness of fit of latent trait models in attitude measurement. Sociological Methods & Research, 27, 525–546.
Article Google Scholar
Bayes, T. (1764). An essay toward solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53, 370–418.
Béguin, A. A., & Glas, C. A. (2001). MCMC estimation and some model-fit analysis of multidimensional IRT models. Psychometrika, 66(4), 541–561.
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 397–472). Reading, MA: AddisonWesley.
Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, 443–459.
Article Google Scholar
Bolt, D. M. (1999). Evaluating the effects of multidimensionality on IRT true-score equating. Applied Measurement in Education, 12, 383–407.
Article Google Scholar
Bonifay, W. (2015). An illustration of the two-tier item factor analysis model. In S. P. Reise & D. A. Revicki (Eds.), Handbook of item response theory modeling (pp. 207–225). Routledge.
Google Scholar
Bonifay, W., & Cai, L. (2017). On the complexity of item response theory models. Multivariate Behavioral Research, 52, 465–484.
Article PubMed Google Scholar
Box, G. E. (1976). Science and statistics. Journal of the American Statistical Association, 71, 791–799.
Article Google Scholar
Cai, L. (2010). Metropolis-Hastings Robbins-Monro algorithm for confirmatory item factor analysis. Journal of Educational and Behavioral Statistics, 35, 307–335.
Article Google Scholar
Cai, L. (2020). flexMIRT R version 3.6: Flexible multilevel multidimensional item analysis and test scoring [Computer software]. Chapel Hill, NC: Vector Psychometric Group.
Cai, L. Chung, S. W., & Lee, T. (in press). Incremental model fit assessment in the case of categorical data: Tucker-Lewis Index for item response theory. Prevention Science.
Cai, L., & Hansen, M. (2013). Limited‐information goodness‐of‐fit testing of hierarchical item factor models. British Journal of Mathematical and Statistical Psychology, 66, 245–276.
Castel, S., Rush, B., Kennedy, S., Fulton, K., & Toneatto, T. (2007). Screening for mental health problems among patients with substance use disorders: Preliminary findings on the validation of a self-assessment instrument. The Canadian Journal of Psychiatry, 52, 22–27.
Article PubMed Google Scholar
Chen, W. H., & Thissen, D. (1997). Local dependence indexes for item pairs using item response theory. Journal of Educational and Behavioral Statistics, 22, 265–289.
Article Google Scholar
Depaoli, S., Yang, Y., & Felt, J. (2017). Using Bayesian statistics to model uncertainty in mixture models: A sensitivity analysis of priors. Structural Equation Modeling: A Multidisciplinary Journal, 24, 198–215.
Article Google Scholar
Depaoli, S., & Van de Schoot, R. (2017). Improving transparency and replication in Bayesian statistics: The WAMBS-Checklist. Psychological Methods, 22, 240.
Article PubMed Google Scholar
Efron, B. (1979). Bootstrap methods: Another look at the jackknife. Annals of Statistics, 7, 1–26.
Article Google Scholar
Evans, M., & Moshonov, H. (2006). Checking for prior-data conflict. Bayesian. Analysis, 1, 893–914. https://doi.org/10.1214/06-BA129
Article Google Scholar
Fox, J. P. (2010). Bayesian item response modeling: Theory and applications. Springer Science & Business Media.
Book Google Scholar
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2013). Bayesian data analysis. CRC Press.
Book Google Scholar
Gelman, A., Meng, X. L., & Stern, H. (1996). Posterior predictive assessment of model fitness via realized discrepancies. Statistica Sinica, 6, 733–760.
Google Scholar
Gelman, A., & Shalizi, C. R. (2013). Philosophy and the practice of Bayesian statistics. British Journal of Mathematical and Statistical Psychology, 66, 8–38.
Article PubMed Google Scholar
Gibbons, R. D., Rush, A. J., & Immekus, J. C. (2009). On the psychometric validity of the domains of the PDSQ: An illustration of the bi-factor item response theory model. Journal of Psychiatric Research, 43, 401–410.
Article PubMed Google Scholar
Guttman, I. (1967). The use of the concept of a future observation in goodness-of-fit problems. Journal of the Royal Statistical Society. Series B (Methodological), 83–100.
Hayduk, L., Cummings, G., Boadu, K., Pazderka-Robinson, H., & Boulianne, S. (2007). Testing! testing! one, two, three–Testing the theory in structural equation models! Personality and Individual Differences, 42, 841–850.
Article Google Scholar
Hoff, P. D. (2009). A first course in Bayesian statistical methods (Vol. 580). Springer.
Book Google Scholar
Houben, M., Claes, L., Vansteelandt, K., Berens, A., Sleuwaegen, E., & Kuppens, P. (2017). The emotion regulation function of nonsuicidal self-injury: A momentary assessment study in inpatients with borderline personality disorder features. Journal of Abnormal Psychology, 126, 89–95.
Article PubMed Google Scholar
Kadane, J. B. (2015). Bayesian methods for prevention research. Prevention Science, 16, 1017–1025.
Article PubMed Google Scholar
Kaplan, D. (2014). Bayesian statistics for the social sciences. Guilford Press.
Google Scholar
Kass, R. E., & Wasserman, L. (1996). The selection of prior distributions by formal rules. Journal of the American Statistical Association, 91, 1343–1370.
Article Google Scholar
Langeheine, R., Pannekoek, J., & Van de Pol, F. (1996). Bootstrapping goodness-of-fit measures in categorical data analysis. Sociological Methods & Research, 24, 492–516.
Article Google Scholar
Levy, R. (2011). Posterior predictive model checking for conjunctive multidimensionality in item response theory. Journal of Educational and Behavioral Statistics, 36, 672–694.
Article Google Scholar
Li, Z., & Cai, L. (2018). Summed score likelihood–based indices for testing latent variable distribution fit in item response theory. Educational and Psychological Measurement, 78(5), 857–886.
Lim, H., & Wells, C. S. (2020). irtplay: An R package for online item calibration, scoring, evaluation of model fit, and useful functions for unidimensional IRT. Applied Psychological Measurement. https://doi.org/10.1177/0146621620921247
Article PubMed PubMed Central Google Scholar
MacCallum, R. C., Browne, M. W., & Sugawara, H. M. (1996). Power analysis and determination of sample size for covariance structure modeling. Psychological Methods, 1, 130–149.
Article Google Scholar
Marsh, H. W., Balla, J. R., & McDonald, R. P. (1988). Goodness-of-fit indexes in confirmatory factor analysis: The effect of sample size. Psychological Bulletin, 103(3), 391–410.
Marsh, H. W., & Balla, J. (1994). Goodness of fit in confirmatory factor analysis: The effects of sample size and model parsimony. Quality and Quantity, 28, 185–217.
Article Google Scholar
Maydeu-Olivares, A. (2013). Goodness-of-fit assessment of item response theory models. Measurement, 11, 71–101.
Google Scholar
McNeish, D. (2018). Thanks coefficient alpha, we’ll take it from here. Psychological Methods, 23, 412–433.
Article PubMed Google Scholar
McNeish, D., & Wolf, M. G. (2020). Thinking twice about sum scores. Behavior research methods, 1–19.
Monroe, S. (2021). Testing latent variable distribution fit in IRT using posterior residuals. Journal of Educational and Behavioral Statistics, 46(3), 374–398.
Orlando, M., & Thissen, D. (2000). New item fit indices for dichotomous item response theory models. Applied Psychological Measurement, 24, 50–64.
Article Google Scholar
Ory, D. T., & Mokhtarian, P. L. (2010). The impact of non-normality, sample size and estimation technique on goodness-of-fit measures in structural equation modeling: Evidence from ten empirical models of travel behavior. Quality & Quantity, 44, 427–445.
Article Google Scholar
R Core Team (2017). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.
Reise, S. R., Cook, K. F., & Moore, T. M. (2015). Evaluating the impact of multidimensionality on unidimensional item response theory model parameters. In S. P. Reise & D. A. Revicki (Eds.), Handbook of item response theory modeling (pp. 13–40). Routledge.
Google Scholar
Roberts, S., & Pashler, H. (2000). How persuasive is a good fit? A comment on theory testing. Psychological Review, 107, 358–367.
Article CAS PubMed Google Scholar
Rubin, D. B. (1981). The Bayesian bootstrap. The Annals of Statistics, 9, 130–134.
Article Google Scholar
Rubin, D. B. (1984). Bayesianly justifiable and relevant frequency calculations for the applied statistician. The Annals of Statistics, 12, 1151–1172.
Article Google Scholar
Rush, A. J., Fava, M., Wisniewski, S. R., Lavori, P. W., Trivedi, M. H., Sackeim, H. A., & Niederehe, G. (2004). Sequenced treatment alternatives to relieve depression (STAR* D): Rationale and design. Controlled Clinical Trials, 25, 119–142.
Article PubMed Google Scholar
Sinharay, S. (2006). Bayesian item fit analysis for unidimensional item response theory models. British Journal of Mathematical and Statistical Psychology, 59(2), 429–449.
Stone, C. A., & Zhu, X. (2015). Bayesian analysis of item response theory models using SAS®. SAS Institute Inc.
Google Scholar
van de Schoot, R., Kaplan, D., Denissen, J., Asendorpf, J. B., Neyer, F. J., & Van Aken, M. A. (2014). A gentle introduction to Bayesian analysis: Applications to developmental research. Child Development, 85, 842–860.
Article PubMed Google Scholar
van Erp, S., Mulder, J., & Oberski, D. (2018). Prior sensitivity analysis in default Bayesian structural equation modeling. Psychological Methods, 23, 363–388.
Article PubMed Google Scholar
Way, W. D., Ansley, T. N., & Forsyth, R. A. (1988). The comparative effects of compensatory and noncompensatory two-dimensional data on unidimensional IRT estimates. Applied Psychological Measurement, 12, 239–252.
Article Google Scholar
Zimmerman, M., & Mattia, J. I. (2001). A self-report scale to help make psychiatric diagnoses: The Psychiatric Diagnostic Screening Questionnaire. Archives of General Psychiatry, 58, 787–794.
Article CAS PubMed Google Scholar
Zhu, X., & Stone, C. A. (2012). Bayesian comparison of alternative graded response models for performance assessment applications. Educational and Psychological Measurement, 72(5), 774–799.

Download references

Funding

The research reported here was supported by the Institute of Education Sciences, US Department of Education, through Grant R305D210032.

Author information

Authors and Affiliations

University of Missouri, Columbia, USA
Wes Bonifay
University of California, Merced, USA
Sarah Depaoli

Authors

Wes Bonifay
View author publications
You can also search for this author in PubMed Google Scholar
Sarah Depaoli
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wes Bonifay.

Ethics declarations

Ethics Approval

All procedures performed in the STAR*D trial was approved by the institutional review board of the STAR*D National Coordinating Center at the University of Texas Southwestern Medical Center in accordance with the ethical standards as laid down in the 1964 Declaration of Helsinki and its later amendments or comparable ethical standards.

Disclaimer

The opinions expressed are those of the authors and do not represent views of the Institute or the US Department of Education.

Consent to Participate

Informed consent was obtained from all the participants.

Conflict of Interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bonifay, W., Depaoli, S. Model Evaluation in the Presence of Categorical Data: Bayesian Model Checking as an Alternative to Traditional Methods. Prev Sci 24, 467–479 (2023). https://doi.org/10.1007/s11121-021-01293-w

Download citation

Accepted: 14 August 2021
Published: 14 September 2021
Issue Date: April 2023
DOI: https://doi.org/10.1007/s11121-021-01293-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Model Evaluation in the Presence of Categorical Data: Bayesian Model Checking as an Alternative to Traditional Methods

Abstract

Access this article

Similar content being viewed by others

Reporting reliability, convergent and discriminant validity with structural equation modeling: A review and best-practice recommendations

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Ethics Approval

Disclaimer

Consent to Participate

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Model Evaluation in the Presence of Categorical Data: Bayesian Model Checking as an Alternative to Traditional Methods

Abstract

Access this article

Similar content being viewed by others

Reporting reliability, convergent and discriminant validity with structural equation modeling: A review and best-practice recommendations

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Ethics Approval

Disclaimer

Consent to Participate

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation