A non-parametric Bayesian diagnostic for detecting differential item functioning in IRT models

Glickman, Mark E.; Seal, Pradipta; Eisen, Susan V.

doi:10.1007/s10742-009-0052-4

A non-parametric Bayesian diagnostic for detecting differential item functioning in IRT models

Published: 24 July 2009

Volume 9, pages 145–161, (2009)
Cite this article

Health Services and Outcomes Research Methodology Aims and scope Submit manuscript

Mark E. Glickman^1,2,
Pradipta Seal³ &
Susan V. Eisen^1,2

214 Accesses
2 Citations
3 Altmetric
Explore all metrics

Abstract

Differential item functioning (DIF) in tests and multi-item surveys occurs when a lack of conditional independence exists between the response to one or more items and membership to a particular group, given equal levels of proficiency. We develop an approach to detecting DIF in the context of item response theory (IRT) models based on computing a diagnostic which is the posterior mean of a p-value. IRT models are fit in a Bayesian framework, and simulated proficiency parameters from the posterior distribution are retained. Monte Carlo estimates of the p-value diagnostic are then computed by comparing the fit of nonparametric regressions of item responses on simulated proficiency parameters and group membership. Some properties of our approach are examined through a simulation experiment. We apply our method to the analysis of responses from two separate studies to the BASIS-24, a widely used self-report mental health assessment instrument, to examine DIF between the English and Spanish-translated version of the survey.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Using Credible Intervals to Detect Differential Item Functioning in IRT Models

Score-Based Tests of Differential Item Functioning via Pairwise Maximum Likelihood Estimation

Article 17 November 2017

Detection of Differential Item Functioning via the Credible Intervals and Odds Ratios Methods

References

Angoff, W.H.: Use of difficulty and discrimination indices for detecting item bias. In: Berk, R.A. (ed.) Handbook of Methods for Detecting Test Bias, pp. 96–116. Johns Hopkins University Press, Baltimore, MD (1982)
Google Scholar
Beguin, A.A., Glas, C.A.W.: MCMC estimation and some model-fit analysis of multidimensional IRT models. Psychometrika 66, 541–562 (2001)
Article Google Scholar
Birnbaum, A.: Some latent trait models and their use in inferring an examinee’s ability. In: Lord, F.M., Novick, M.R. (eds.) Statistical Theories of Mental Test Scores, pp. 396–479. Addison-Wesley, Reading, MA (1968)
Google Scholar
Bradlow, E.T., Wainer, H., Wang, X.: A Bayesian random effects model for testlets. Psychometrika 64, 153–168 (1999)
Article Google Scholar
Cauffman, E., MacIntosh, R.: A Rasch differential item functioning analysis of the Massachusetts youth screening instrument. Educ. Psychol. Meas. 66(3), 502–521 (2006)
Article Google Scholar
Cortés, D.E., Gerena, M., Canino, G., Aguilar-Gaxiola, S., Febo, V., Magaña, C., Soto, J., Eisen, S.V.: Translation and cultural adaptation of a mental health outcome measure: the BASIS-R. Cult. Med. Psychiatry 31(1), 25–49 (2007)
Article PubMed Google Scholar
Eisen, S.V., Dill, D.L., Grob, M.C.: Reliability and validity of a brief patient-reported instrument for psychiatric outcome evaluation. Hosp. Community Psychiatry 45, 242–247 (1994)
PubMed CAS Google Scholar
Eisen, S.V., Normand, S.L., Belanger, A.J., Spiro, A., Esch, D.: The revised Behavior and Symptom Identification Scale (BASIS-R). Med. Care 42(12), 1230–1241 (2004)
Article PubMed Google Scholar
Eisen, S.V., Gerena, M., Ranganathan, G., Esch, D., Idiculla, T.: Reliability and validity of the BASIS-24 mental health survey for whites, African-Americans, and Latinos. J. Behav. Health Ser. R. 33(3), 304–323 (2006)
Article Google Scholar
Eisen, S.V., Seal, P., Glickman, M.E., Cortés, D.E., Gerena, M.G., Aguilar-Gaxiola, S., Febo, V., Soto, J., Magaña, C., Canino, G.: Psychometric properties of the Spanish BASIS-24 mental health survey. J. Behav. Health Ser. R. (2009). doi:10.1007/s11414-009-9170-6
Fox, J.P., Glas, C.A.W.: Bayesian estimation of a multilevel IRT model using Gibbs sampling. Psychometrika 66, 269–286 (2001)
Article Google Scholar
Gardner, W., Kelleher, K., Pajer, K.: Multidimensional adaptive testing for mental health problems in primary care. Med. Care 40, 812–823 (2002)
Article PubMed Google Scholar
Gelman, A., Hill, J.: Data Analysis Using Regression and Multilevel/hierarchical Models. Cambridge University Press, New York (2007)
Google Scholar
Gelman, A., Rubin, D.B.: Inference from iterative simulation using multiple sequences. Stat. Sci. 7, 457–511 (1992)
Article Google Scholar
Gelman, A., Meng, X.L., Stern, H.S.: Posterior predictive assessment of model fitness via realized discrepancies. Stat. Sin. 6, 733–807 (1996)
Google Scholar
Geltman, D., Chang, G.: Hallucinations in Latino psychiatric outpatients: a preliminary investigation. Gen. Hosp. Psychiatry 26(2), 153–157 (2004)
Article PubMed Google Scholar
Glas, C.A.W.: Differential item functioning depending on general covariates. In: Boomsma, A., van Duijn, M.A.J., Snijders, T.A.B. (eds.) Essays on Item Response Theory, pp. 131–148. Springer, New York (2001)
Google Scholar
Glas, C.A.W., Meijer, R.: A Bayesian approach to person fit analysis in item response theory models. Appl. Psychol. Meas. 27(3), 217–233 (2003)
Article Google Scholar
Guarnaccia, P.J., Guevara, L.M., González, G., Canino, G., Bird, H.R.: Cross cultural aspects of psychotic symptoms in Puerto Rico. Res. Comm. Ment. Health 7, 99–110 (1992)
Google Scholar
Hastie, T.J., Tibshirani, R.J.: Generalized Additive Models. Chapman and Hall, New York (1990)
Google Scholar
Hoijtink, H.: Conditional independence and differential item functioning in the two-parameter logistic model. In: Boomsma, A., van Duijn, M.A.J., Snijders, T.A.B. (eds.) Essays in Item Response Theory, pp. 109–129. Springer-Verlag, New York (2001)
Google Scholar
Holland, P.W., Thayer, D.T.: Differential item functioning and the Mantel-Haenszel procedure. In: Wainer H., Braun, H.I. (eds.) Test Validity, pp. 129–145. Erlbaum, Hillsdale, NJ (1988)
Google Scholar
Janssen, R., Tuerlinckx, F., Meulders, M., De Boeck, P.: A hierarchical IRT model for criterion-referenced measurement. J. Educ. Behav. Stat. 25, 285–306 (2000)
Google Scholar
Johnson, M.S., Sinharay, S.: Calibration of polytomous item families using Bayesian hierarchical modeling. Appl. Psychol. Meas. 29, 369–400 (2005)
Article Google Scholar
Junker, B.W.: Conditional association, essential independence and monotone unidimensional item response models. Ann. Stat. 3, 1359–1378 (1993)
Article Google Scholar
Kang, T., Cohen, A.S.: IRT model selection methods for dichotomous items. Appl. Psychol. Meas. 31, 331–358 (2007)
Article Google Scholar
Kok, F.G., Mellenbergh, G.J., van der Flier, H.: Detecting experimentally induced item bias using the iterative logit method. J. Educ. Meas. 22, 295–303 (1985)
Article Google Scholar
May, H.: A multilevel Bayesian item response theory method for scaling. J. Educ. Behav. Stat. 31, 63–79 (2006)
Article Google Scholar
Muraki, E.: A generalized partial credit model: application of an EM algorithm. Appl. Psychol. Meas. 16, 159–176 (1992)
Article Google Scholar
Pagano, I.S., Gotay, C.C.: Ethnic differential item functioning in the assessment of quality of life in cancer patients. Health Qual. Life Outcomes (2005). doi:10.1186/1477-7525-3-60
Patz, R.J., Junker, B.W.: A straightforward approach to Markov chain Monte Carlo methods for item response models. J. Educ. Behav. Stat. 24, 146–178 (1999a)
Google Scholar
Patz, R.J., Junker, B.W.: Applications and extensions of MCMC in IRT: multiple types, missing data, and rated responses. J. Educ. Behav. Stat. 24, 342–366 (1999b)
Google Scholar
Perkins, A.J., Stump, T.E., Monahan, P.O., McHorney, C.A.: Assessment of differential item functioning for demographic comparisons in the MOS SF-36 health survey. Qual. Life Res. 15(3), 331–348 (2006)
Article PubMed Google Scholar
R Development Core Team: R: A language and environment for statistical computing. (R Foundation for Statistical Computing), Vienna, Austria. http://www.R-project.org (2008)
Rosenthal, J.A.: Qualitative descriptors of strength of association and effect size. J. Soc. Service Res. 21(4), 37–59 (1996)
Article Google Scholar
Samejima, F.: Estimation of latent trait ability using a response pattern of graded scores. Psychometrika Monograph, No. 17 (1969)
Shealy, R., Stout, W.: A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF. Psychometrika 58, 159–194 (1993)
Article Google Scholar
Sinharay, S.: Assessing fit of unidimensional item response theory models using a Bayesian approach. J. Educ. Meas. 42(4), 375–394 (2005)
Article Google Scholar
Spiegelhalter, D.J., Thomas, A., Best, N.G., Lunn, D.: WinBUGS 1.4 User Manual (Computer Program). MRC Biostatistics Unit, Cambridge, UK (2003)
Google Scholar
Swaminathan, H., Rogers, H.J.: Detecting differential item functioning using the logistic regression procedures. J. Educ. Meas. 27, 361–370 (1990)
Article Google Scholar
Teresi, J.A.: Different approaches to differential item functioning in health applications: advantages, disadvantages and some neglected topics. Med. Care 44, 152–170 (2006)
Article Google Scholar
Thissen, D., Steinberg, L., Wainer, H.: Detection of differential item functioning using the parameters of IRT models. In: Holland, P.W., Wainer, H. (eds.) Differential Item Functioning, pp. 67–113. Erlbaum, Hillsdale, NJ (1993)
Google Scholar
Thomas, A., O’Hara, B.O., Ligges, U., Sturtz, S.: OpenBUGS software package. R News 6, 12–17 (2006)
Google Scholar
Vega, W.A., Sribney, W.M., Miskimen, T.M., Escobar, J.I., Aguilar-Gaxiola, S.: Putative psychotic symptoms in the Mexican American population: prevalence and co-occurrence with psychiatric disorders. J. Nerv. Mental Dis. 194(7), 471–477 (2006)
Article Google Scholar
Wainer, H., Bradlow, E.T., Wang, X.: Testlet Response Theory and its Applications, chapter 14, pp. 219–233. Cambridge University Press, New York (2007)
Yee, T.W.: VGAM family functions for categorical data. Technical report, Department of Statistics, University of Auckland, New Zealand (2006)
Yee, T.W., Wild, C.J.: Vector generalized additive models. J. R. Stat. Soc. B 58, 481–493 (1996)
Google Scholar
Zhang, J., Stout, W.: Conditional covariance structure for generalized compensatory multidimensional items. Psychometrika 64, 129–152 (1999)
Article Google Scholar

Download references

Acknowledgments

This research was supported by Grant R01 MH58240 from the National Institute of Mental Health and by the Veterans Administration Health Services Research and Development program.

Author information

Authors and Affiliations

Department of Health Policy and Management, Boston University School of Public Health, Boston, MA, USA
Mark E. Glickman & Susan V. Eisen
Center for Health Quality, Outcomes and Economics Research, a Veteran Administration Center of Excellence, Edith Nourse Rogers Memorial Hospital (152), Bldg 70, 200 Springs Road, Bedford, MA, 01730, USA
Mark E. Glickman & Susan V. Eisen
Department of Mathematics and Statistics, Boston University, Boston, MA, USA
Pradipta Seal

Authors

Mark E. Glickman
View author publications
You can also search for this author in PubMed Google Scholar
Pradipta Seal
View author publications
You can also search for this author in PubMed Google Scholar
Susan V. Eisen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mark E. Glickman.

Additional information

The views expressed in this article are those of the authors and do not necessarily reflect the views of the Department of Veterans Affairs.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Glickman, M.E., Seal, P. & Eisen, S.V. A non-parametric Bayesian diagnostic for detecting differential item functioning in IRT models. Health Serv Outcomes Res Method 9, 145–161 (2009). https://doi.org/10.1007/s10742-009-0052-4

Download citation

Received: 01 July 2008
Revised: 13 April 2009
Accepted: 10 July 2009
Published: 24 July 2009
Issue Date: September 2009
DOI: https://doi.org/10.1007/s10742-009-0052-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A non-parametric Bayesian diagnostic for detecting differential item functioning in IRT models

Abstract

Access this article

Similar content being viewed by others

Using Credible Intervals to Detect Differential Item Functioning in IRT Models

Score-Based Tests of Differential Item Functioning via Pairwise Maximum Likelihood Estimation

Detection of Differential Item Functioning via the Credible Intervals and Odds Ratios Methods

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A non-parametric Bayesian diagnostic for detecting differential item functioning in IRT models

Abstract

Access this article

Similar content being viewed by others

Using Credible Intervals to Detect Differential Item Functioning in IRT Models

Score-Based Tests of Differential Item Functioning via Pairwise Maximum Likelihood Estimation

Detection of Differential Item Functioning via the Credible Intervals and Odds Ratios Methods

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation