Abstract
Differential item functioning (DIF) in tests and multi-item surveys occurs when a lack of conditional independence exists between the response to one or more items and membership to a particular group, given equal levels of proficiency. We develop an approach to detecting DIF in the context of item response theory (IRT) models based on computing a diagnostic which is the posterior mean of a p-value. IRT models are fit in a Bayesian framework, and simulated proficiency parameters from the posterior distribution are retained. Monte Carlo estimates of the p-value diagnostic are then computed by comparing the fit of nonparametric regressions of item responses on simulated proficiency parameters and group membership. Some properties of our approach are examined through a simulation experiment. We apply our method to the analysis of responses from two separate studies to the BASIS-24, a widely used self-report mental health assessment instrument, to examine DIF between the English and Spanish-translated version of the survey.
Similar content being viewed by others
References
Angoff, W.H.: Use of difficulty and discrimination indices for detecting item bias. In: Berk, R.A. (ed.) Handbook of Methods for Detecting Test Bias, pp. 96–116. Johns Hopkins University Press, Baltimore, MD (1982)
Beguin, A.A., Glas, C.A.W.: MCMC estimation and some model-fit analysis of multidimensional IRT models. Psychometrika 66, 541–562 (2001)
Birnbaum, A.: Some latent trait models and their use in inferring an examinee’s ability. In: Lord, F.M., Novick, M.R. (eds.) Statistical Theories of Mental Test Scores, pp. 396–479. Addison-Wesley, Reading, MA (1968)
Bradlow, E.T., Wainer, H., Wang, X.: A Bayesian random effects model for testlets. Psychometrika 64, 153–168 (1999)
Cauffman, E., MacIntosh, R.: A Rasch differential item functioning analysis of the Massachusetts youth screening instrument. Educ. Psychol. Meas. 66(3), 502–521 (2006)
Cortés, D.E., Gerena, M., Canino, G., Aguilar-Gaxiola, S., Febo, V., Magaña, C., Soto, J., Eisen, S.V.: Translation and cultural adaptation of a mental health outcome measure: the BASIS-R. Cult. Med. Psychiatry 31(1), 25–49 (2007)
Eisen, S.V., Dill, D.L., Grob, M.C.: Reliability and validity of a brief patient-reported instrument for psychiatric outcome evaluation. Hosp. Community Psychiatry 45, 242–247 (1994)
Eisen, S.V., Normand, S.L., Belanger, A.J., Spiro, A., Esch, D.: The revised Behavior and Symptom Identification Scale (BASIS-R). Med. Care 42(12), 1230–1241 (2004)
Eisen, S.V., Gerena, M., Ranganathan, G., Esch, D., Idiculla, T.: Reliability and validity of the BASIS-24 mental health survey for whites, African-Americans, and Latinos. J. Behav. Health Ser. R. 33(3), 304–323 (2006)
Eisen, S.V., Seal, P., Glickman, M.E., Cortés, D.E., Gerena, M.G., Aguilar-Gaxiola, S., Febo, V., Soto, J., Magaña, C., Canino, G.: Psychometric properties of the Spanish BASIS-24 mental health survey. J. Behav. Health Ser. R. (2009). doi:10.1007/s11414-009-9170-6
Fox, J.P., Glas, C.A.W.: Bayesian estimation of a multilevel IRT model using Gibbs sampling. Psychometrika 66, 269–286 (2001)
Gardner, W., Kelleher, K., Pajer, K.: Multidimensional adaptive testing for mental health problems in primary care. Med. Care 40, 812–823 (2002)
Gelman, A., Hill, J.: Data Analysis Using Regression and Multilevel/hierarchical Models. Cambridge University Press, New York (2007)
Gelman, A., Rubin, D.B.: Inference from iterative simulation using multiple sequences. Stat. Sci. 7, 457–511 (1992)
Gelman, A., Meng, X.L., Stern, H.S.: Posterior predictive assessment of model fitness via realized discrepancies. Stat. Sin. 6, 733–807 (1996)
Geltman, D., Chang, G.: Hallucinations in Latino psychiatric outpatients: a preliminary investigation. Gen. Hosp. Psychiatry 26(2), 153–157 (2004)
Glas, C.A.W.: Differential item functioning depending on general covariates. In: Boomsma, A., van Duijn, M.A.J., Snijders, T.A.B. (eds.) Essays on Item Response Theory, pp. 131–148. Springer, New York (2001)
Glas, C.A.W., Meijer, R.: A Bayesian approach to person fit analysis in item response theory models. Appl. Psychol. Meas. 27(3), 217–233 (2003)
Guarnaccia, P.J., Guevara, L.M., González, G., Canino, G., Bird, H.R.: Cross cultural aspects of psychotic symptoms in Puerto Rico. Res. Comm. Ment. Health 7, 99–110 (1992)
Hastie, T.J., Tibshirani, R.J.: Generalized Additive Models. Chapman and Hall, New York (1990)
Hoijtink, H.: Conditional independence and differential item functioning in the two-parameter logistic model. In: Boomsma, A., van Duijn, M.A.J., Snijders, T.A.B. (eds.) Essays in Item Response Theory, pp. 109–129. Springer-Verlag, New York (2001)
Holland, P.W., Thayer, D.T.: Differential item functioning and the Mantel-Haenszel procedure. In: Wainer H., Braun, H.I. (eds.) Test Validity, pp. 129–145. Erlbaum, Hillsdale, NJ (1988)
Janssen, R., Tuerlinckx, F., Meulders, M., De Boeck, P.: A hierarchical IRT model for criterion-referenced measurement. J. Educ. Behav. Stat. 25, 285–306 (2000)
Johnson, M.S., Sinharay, S.: Calibration of polytomous item families using Bayesian hierarchical modeling. Appl. Psychol. Meas. 29, 369–400 (2005)
Junker, B.W.: Conditional association, essential independence and monotone unidimensional item response models. Ann. Stat. 3, 1359–1378 (1993)
Kang, T., Cohen, A.S.: IRT model selection methods for dichotomous items. Appl. Psychol. Meas. 31, 331–358 (2007)
Kok, F.G., Mellenbergh, G.J., van der Flier, H.: Detecting experimentally induced item bias using the iterative logit method. J. Educ. Meas. 22, 295–303 (1985)
May, H.: A multilevel Bayesian item response theory method for scaling. J. Educ. Behav. Stat. 31, 63–79 (2006)
Muraki, E.: A generalized partial credit model: application of an EM algorithm. Appl. Psychol. Meas. 16, 159–176 (1992)
Pagano, I.S., Gotay, C.C.: Ethnic differential item functioning in the assessment of quality of life in cancer patients. Health Qual. Life Outcomes (2005). doi:10.1186/1477-7525-3-60
Patz, R.J., Junker, B.W.: A straightforward approach to Markov chain Monte Carlo methods for item response models. J. Educ. Behav. Stat. 24, 146–178 (1999a)
Patz, R.J., Junker, B.W.: Applications and extensions of MCMC in IRT: multiple types, missing data, and rated responses. J. Educ. Behav. Stat. 24, 342–366 (1999b)
Perkins, A.J., Stump, T.E., Monahan, P.O., McHorney, C.A.: Assessment of differential item functioning for demographic comparisons in the MOS SF-36 health survey. Qual. Life Res. 15(3), 331–348 (2006)
R Development Core Team: R: A language and environment for statistical computing. (R Foundation for Statistical Computing), Vienna, Austria. http://www.R-project.org (2008)
Rosenthal, J.A.: Qualitative descriptors of strength of association and effect size. J. Soc. Service Res. 21(4), 37–59 (1996)
Samejima, F.: Estimation of latent trait ability using a response pattern of graded scores. Psychometrika Monograph, No. 17 (1969)
Shealy, R., Stout, W.: A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF. Psychometrika 58, 159–194 (1993)
Sinharay, S.: Assessing fit of unidimensional item response theory models using a Bayesian approach. J. Educ. Meas. 42(4), 375–394 (2005)
Spiegelhalter, D.J., Thomas, A., Best, N.G., Lunn, D.: WinBUGS 1.4 User Manual (Computer Program). MRC Biostatistics Unit, Cambridge, UK (2003)
Swaminathan, H., Rogers, H.J.: Detecting differential item functioning using the logistic regression procedures. J. Educ. Meas. 27, 361–370 (1990)
Teresi, J.A.: Different approaches to differential item functioning in health applications: advantages, disadvantages and some neglected topics. Med. Care 44, 152–170 (2006)
Thissen, D., Steinberg, L., Wainer, H.: Detection of differential item functioning using the parameters of IRT models. In: Holland, P.W., Wainer, H. (eds.) Differential Item Functioning, pp. 67–113. Erlbaum, Hillsdale, NJ (1993)
Thomas, A., O’Hara, B.O., Ligges, U., Sturtz, S.: OpenBUGS software package. R News 6, 12–17 (2006)
Vega, W.A., Sribney, W.M., Miskimen, T.M., Escobar, J.I., Aguilar-Gaxiola, S.: Putative psychotic symptoms in the Mexican American population: prevalence and co-occurrence with psychiatric disorders. J. Nerv. Mental Dis. 194(7), 471–477 (2006)
Wainer, H., Bradlow, E.T., Wang, X.: Testlet Response Theory and its Applications, chapter 14, pp. 219–233. Cambridge University Press, New York (2007)
Yee, T.W.: VGAM family functions for categorical data. Technical report, Department of Statistics, University of Auckland, New Zealand (2006)
Yee, T.W., Wild, C.J.: Vector generalized additive models. J. R. Stat. Soc. B 58, 481–493 (1996)
Zhang, J., Stout, W.: Conditional covariance structure for generalized compensatory multidimensional items. Psychometrika 64, 129–152 (1999)
Acknowledgments
This research was supported by Grant R01 MH58240 from the National Institute of Mental Health and by the Veterans Administration Health Services Research and Development program.
Author information
Authors and Affiliations
Corresponding author
Additional information
The views expressed in this article are those of the authors and do not necessarily reflect the views of the Department of Veterans Affairs.
Rights and permissions
About this article
Cite this article
Glickman, M.E., Seal, P. & Eisen, S.V. A non-parametric Bayesian diagnostic for detecting differential item functioning in IRT models. Health Serv Outcomes Res Method 9, 145–161 (2009). https://doi.org/10.1007/s10742-009-0052-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10742-009-0052-4