Skip to main content

Advertisement

Log in

A non-parametric Bayesian diagnostic for detecting differential item functioning in IRT models

  • Published:
Health Services and Outcomes Research Methodology Aims and scope Submit manuscript

Abstract

Differential item functioning (DIF) in tests and multi-item surveys occurs when a lack of conditional independence exists between the response to one or more items and membership to a particular group, given equal levels of proficiency. We develop an approach to detecting DIF in the context of item response theory (IRT) models based on computing a diagnostic which is the posterior mean of a p-value. IRT models are fit in a Bayesian framework, and simulated proficiency parameters from the posterior distribution are retained. Monte Carlo estimates of the p-value diagnostic are then computed by comparing the fit of nonparametric regressions of item responses on simulated proficiency parameters and group membership. Some properties of our approach are examined through a simulation experiment. We apply our method to the analysis of responses from two separate studies to the BASIS-24, a widely used self-report mental health assessment instrument, to examine DIF between the English and Spanish-translated version of the survey.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Angoff, W.H.: Use of difficulty and discrimination indices for detecting item bias. In: Berk, R.A. (ed.) Handbook of Methods for Detecting Test Bias, pp. 96–116. Johns Hopkins University Press, Baltimore, MD (1982)

    Google Scholar 

  • Beguin, A.A., Glas, C.A.W.: MCMC estimation and some model-fit analysis of multidimensional IRT models. Psychometrika 66, 541–562 (2001)

    Article  Google Scholar 

  • Birnbaum, A.: Some latent trait models and their use in inferring an examinee’s ability. In: Lord, F.M., Novick, M.R. (eds.) Statistical Theories of Mental Test Scores, pp. 396–479. Addison-Wesley, Reading, MA (1968)

    Google Scholar 

  • Bradlow, E.T., Wainer, H., Wang, X.: A Bayesian random effects model for testlets. Psychometrika 64, 153–168 (1999)

    Article  Google Scholar 

  • Cauffman, E., MacIntosh, R.: A Rasch differential item functioning analysis of the Massachusetts youth screening instrument. Educ. Psychol. Meas. 66(3), 502–521 (2006)

    Article  Google Scholar 

  • Cortés, D.E., Gerena, M., Canino, G., Aguilar-Gaxiola, S., Febo, V., Magaña, C., Soto, J., Eisen, S.V.: Translation and cultural adaptation of a mental health outcome measure: the BASIS-R. Cult. Med. Psychiatry 31(1), 25–49 (2007)

    Article  PubMed  Google Scholar 

  • Eisen, S.V., Dill, D.L., Grob, M.C.: Reliability and validity of a brief patient-reported instrument for psychiatric outcome evaluation. Hosp. Community Psychiatry 45, 242–247 (1994)

    PubMed  CAS  Google Scholar 

  • Eisen, S.V., Normand, S.L., Belanger, A.J., Spiro, A., Esch, D.: The revised Behavior and Symptom Identification Scale (BASIS-R). Med. Care 42(12), 1230–1241 (2004)

    Article  PubMed  Google Scholar 

  • Eisen, S.V., Gerena, M., Ranganathan, G., Esch, D., Idiculla, T.: Reliability and validity of the BASIS-24 mental health survey for whites, African-Americans, and Latinos. J. Behav. Health Ser. R. 33(3), 304–323 (2006)

    Article  Google Scholar 

  • Eisen, S.V., Seal, P., Glickman, M.E., Cortés, D.E., Gerena, M.G., Aguilar-Gaxiola, S., Febo, V., Soto, J., Magaña, C., Canino, G.: Psychometric properties of the Spanish BASIS-24 mental health survey. J. Behav. Health Ser. R. (2009). doi:10.1007/s11414-009-9170-6

  • Fox, J.P., Glas, C.A.W.: Bayesian estimation of a multilevel IRT model using Gibbs sampling. Psychometrika 66, 269–286 (2001)

    Article  Google Scholar 

  • Gardner, W., Kelleher, K., Pajer, K.: Multidimensional adaptive testing for mental health problems in primary care. Med. Care 40, 812–823 (2002)

    Article  PubMed  Google Scholar 

  • Gelman, A., Hill, J.: Data Analysis Using Regression and Multilevel/hierarchical Models. Cambridge University Press, New York (2007)

    Google Scholar 

  • Gelman, A., Rubin, D.B.: Inference from iterative simulation using multiple sequences. Stat. Sci. 7, 457–511 (1992)

    Article  Google Scholar 

  • Gelman, A., Meng, X.L., Stern, H.S.: Posterior predictive assessment of model fitness via realized discrepancies. Stat. Sin. 6, 733–807 (1996)

    Google Scholar 

  • Geltman, D., Chang, G.: Hallucinations in Latino psychiatric outpatients: a preliminary investigation. Gen. Hosp. Psychiatry 26(2), 153–157 (2004)

    Article  PubMed  Google Scholar 

  • Glas, C.A.W.: Differential item functioning depending on general covariates. In: Boomsma, A., van Duijn, M.A.J., Snijders, T.A.B. (eds.) Essays on Item Response Theory, pp. 131–148. Springer, New York (2001)

    Google Scholar 

  • Glas, C.A.W., Meijer, R.: A Bayesian approach to person fit analysis in item response theory models. Appl. Psychol. Meas. 27(3), 217–233 (2003)

    Article  Google Scholar 

  • Guarnaccia, P.J., Guevara, L.M., González, G., Canino, G., Bird, H.R.: Cross cultural aspects of psychotic symptoms in Puerto Rico. Res. Comm. Ment. Health 7, 99–110 (1992)

    Google Scholar 

  • Hastie, T.J., Tibshirani, R.J.: Generalized Additive Models. Chapman and Hall, New York (1990)

    Google Scholar 

  • Hoijtink, H.: Conditional independence and differential item functioning in the two-parameter logistic model. In: Boomsma, A., van Duijn, M.A.J., Snijders, T.A.B. (eds.) Essays in Item Response Theory, pp. 109–129. Springer-Verlag, New York (2001)

    Google Scholar 

  • Holland, P.W., Thayer, D.T.: Differential item functioning and the Mantel-Haenszel procedure. In: Wainer H., Braun, H.I. (eds.) Test Validity, pp. 129–145. Erlbaum, Hillsdale, NJ (1988)

    Google Scholar 

  • Janssen, R., Tuerlinckx, F., Meulders, M., De Boeck, P.: A hierarchical IRT model for criterion-referenced measurement. J. Educ. Behav. Stat. 25, 285–306 (2000)

    Google Scholar 

  • Johnson, M.S., Sinharay, S.: Calibration of polytomous item families using Bayesian hierarchical modeling. Appl. Psychol. Meas. 29, 369–400 (2005)

    Article  Google Scholar 

  • Junker, B.W.: Conditional association, essential independence and monotone unidimensional item response models. Ann. Stat. 3, 1359–1378 (1993)

    Article  Google Scholar 

  • Kang, T., Cohen, A.S.: IRT model selection methods for dichotomous items. Appl. Psychol. Meas. 31, 331–358 (2007)

    Article  Google Scholar 

  • Kok, F.G., Mellenbergh, G.J., van der Flier, H.: Detecting experimentally induced item bias using the iterative logit method. J. Educ. Meas. 22, 295–303 (1985)

    Article  Google Scholar 

  • May, H.: A multilevel Bayesian item response theory method for scaling. J. Educ. Behav. Stat. 31, 63–79 (2006)

    Article  Google Scholar 

  • Muraki, E.: A generalized partial credit model: application of an EM algorithm. Appl. Psychol. Meas. 16, 159–176 (1992)

    Article  Google Scholar 

  • Pagano, I.S., Gotay, C.C.: Ethnic differential item functioning in the assessment of quality of life in cancer patients. Health Qual. Life Outcomes (2005). doi:10.1186/1477-7525-3-60

  • Patz, R.J., Junker, B.W.: A straightforward approach to Markov chain Monte Carlo methods for item response models. J. Educ. Behav. Stat. 24, 146–178 (1999a)

    Google Scholar 

  • Patz, R.J., Junker, B.W.: Applications and extensions of MCMC in IRT: multiple types, missing data, and rated responses. J. Educ. Behav. Stat. 24, 342–366 (1999b)

    Google Scholar 

  • Perkins, A.J., Stump, T.E., Monahan, P.O., McHorney, C.A.: Assessment of differential item functioning for demographic comparisons in the MOS SF-36 health survey. Qual. Life Res. 15(3), 331–348 (2006)

    Article  PubMed  Google Scholar 

  • R Development Core Team: R: A language and environment for statistical computing. (R Foundation for Statistical Computing), Vienna, Austria. http://www.R-project.org (2008)

  • Rosenthal, J.A.: Qualitative descriptors of strength of association and effect size. J. Soc. Service Res. 21(4), 37–59 (1996)

    Article  Google Scholar 

  • Samejima, F.: Estimation of latent trait ability using a response pattern of graded scores. Psychometrika Monograph, No. 17 (1969)

  • Shealy, R., Stout, W.: A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF. Psychometrika 58, 159–194 (1993)

    Article  Google Scholar 

  • Sinharay, S.: Assessing fit of unidimensional item response theory models using a Bayesian approach. J. Educ. Meas. 42(4), 375–394 (2005)

    Article  Google Scholar 

  • Spiegelhalter, D.J., Thomas, A., Best, N.G., Lunn, D.: WinBUGS 1.4 User Manual (Computer Program). MRC Biostatistics Unit, Cambridge, UK (2003)

    Google Scholar 

  • Swaminathan, H., Rogers, H.J.: Detecting differential item functioning using the logistic regression procedures. J. Educ. Meas. 27, 361–370 (1990)

    Article  Google Scholar 

  • Teresi, J.A.: Different approaches to differential item functioning in health applications: advantages, disadvantages and some neglected topics. Med. Care 44, 152–170 (2006)

    Article  Google Scholar 

  • Thissen, D., Steinberg, L., Wainer, H.: Detection of differential item functioning using the parameters of IRT models. In: Holland, P.W., Wainer, H. (eds.) Differential Item Functioning, pp. 67–113. Erlbaum, Hillsdale, NJ (1993)

    Google Scholar 

  • Thomas, A., O’Hara, B.O., Ligges, U., Sturtz, S.: OpenBUGS software package. R News 6, 12–17 (2006)

    Google Scholar 

  • Vega, W.A., Sribney, W.M., Miskimen, T.M., Escobar, J.I., Aguilar-Gaxiola, S.: Putative psychotic symptoms in the Mexican American population: prevalence and co-occurrence with psychiatric disorders. J. Nerv. Mental Dis. 194(7), 471–477 (2006)

    Article  Google Scholar 

  • Wainer, H., Bradlow, E.T., Wang, X.: Testlet Response Theory and its Applications, chapter 14, pp. 219–233. Cambridge University Press, New York (2007)

  • Yee, T.W.: VGAM family functions for categorical data. Technical report, Department of Statistics, University of Auckland, New Zealand (2006)

  • Yee, T.W., Wild, C.J.: Vector generalized additive models. J. R. Stat. Soc. B 58, 481–493 (1996)

    Google Scholar 

  • Zhang, J., Stout, W.: Conditional covariance structure for generalized compensatory multidimensional items. Psychometrika 64, 129–152 (1999)

    Article  Google Scholar 

Download references

Acknowledgments

This research was supported by Grant R01 MH58240 from the National Institute of Mental Health and by the Veterans Administration Health Services Research and Development program.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mark E. Glickman.

Additional information

The views expressed in this article are those of the authors and do not necessarily reflect the views of the Department of Veterans Affairs.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Glickman, M.E., Seal, P. & Eisen, S.V. A non-parametric Bayesian diagnostic for detecting differential item functioning in IRT models. Health Serv Outcomes Res Method 9, 145–161 (2009). https://doi.org/10.1007/s10742-009-0052-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10742-009-0052-4

Keywords

Navigation