Skip to main content
Log in

Differential Item Functioning Analyses of the Patient-Reported Outcomes Measurement Information System (PROMIS®) Measures: Methods, Challenges, Advances, and Future Directions

  • Application Reviews and Case Studies
  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

Several methods used to examine differential item functioning (DIF) in Patient-Reported Outcomes Measurement Information System (PROMIS®) measures are presented, including effect size estimation. A summary of factors that may affect DIF detection and challenges encountered in PROMIS DIF analyses, e.g., anchor item selection, is provided. An issue in PROMIS was the potential for inadequately modeled multidimensionality to result in false DIF detection. Section 1 is a presentation of the unidimensional models used by most PROMIS investigators for DIF detection, as well as their multidimensional expansions. Section 2 is an illustration that builds on previous unidimensional analyses of depression and anxiety short-forms to examine DIF detection using a multidimensional item response theory (MIRT) model. The Item Response Theory-Log-likelihood Ratio Test (IRT-LRT) method was used for a real data illustration with gender as the grouping variable. The IRT-LRT DIF detection method is a flexible approach to handle group differences in trait distributions, known as impact in the DIF literature, and was studied with both real data and in simulations to compare the performance of the IRT-LRT method within the unidimensional IRT (UIRT) and MIRT contexts. Additionally, different effect size measures were compared for the data presented in Section 2. A finding from the real data illustration was that using the IRT-LRT method within a MIRT context resulted in more flagged items as compared to using the IRT-LRT method within a UIRT context. The simulations provided some evidence that while unidimensional and multidimensional approaches were similar in terms of Type I error rates, power for DIF detection was greater for the multidimensional approach. Effect size measures presented in Section 1 and applied in Section 2 varied in terms of estimation methods, choice of density function, methods of equating, and anchor item selection. Despite these differences, there was considerable consistency in results, especially for the items showing the largest values. Future work is needed to examine DIF detection in the context of polytomous, multidimensional data. PROMIS standards included incorporation of effect size measures in determining salient DIF. Integrated methods for examining effect size measures in the context of IRT-based DIF detection procedures are still in early stages of development.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Ackerman, T. A. (1992). A didactic explanation of item bias, item impact, and item validity from a multidimensional perspective. Journal of Educational Measurement, 29, 67–91.

    Article  Google Scholar 

  • Ankenmann, R. D., Witt, E. A., & Dunbar, S. B. (1999). An investigation of the power of the likelihood ratio goodness-of-fit statistic in detecting differential item functioning. Journal of Educational Measurement, 36, 277–300. https://doi.org/10.1111/j.1745-3984.1999.tb00558.x.

    Article  Google Scholar 

  • Baker, F. B. (1995). EQUATE 2.1: Computer program for equating two metrics in item response theory. Madison: University of Wisconsin, Laboratory of Experimental Design.

    Google Scholar 

  • Bauer, D., Belzak, W., & Cole, V. (2019). Simplifying the assessment of measurement invariance over multiple background variables: Using regularized moderated nonlinear factor analysis to detect differential item functioning. Structural Equation Modeling A: Multidisciplinary Journal,. https://doi.org/10.1080/10705511.2019.1642754.

    Article  Google Scholar 

  • Belzak, W., & Bauer, D. (2020). Improving the assessment of measurement invariance: Using regularization to select anchor items and identify differential item functioning. Psychological Methods,. https://doi.org/10.1027/met0000253.

    Article  PubMed  PubMed Central  Google Scholar 

  • Benjamini, Y., & Hochberg, Y. (1995). Controlling for the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B, 57, 289–300. https://doi.org/10.2307/2346101.

    Article  Google Scholar 

  • Bjorner, J. B., Rose, M., Gandek, B., Stone, A. A., Junghaenel, D. U., & Ware, J. E. (2014). Difference in method of administration did not significantly impact item response: An IRT-based analysis from the Patient-Reported Outcomes Measurement Information System (PROMIS) initiative. Quality of Life Research, 23, 217–227.

    Article  PubMed  Google Scholar 

  • Bolt, D. M. (2002). A Monte Carlo comparison of parametric and nonparametric polytomous DIF detection methods. Applied Measurement in Education, 15, 113–141. https://doi.org/10.1207/S15324818AME1502_01.

    Article  Google Scholar 

  • Boorsboom, D. (2006). Commentary: When does measurement invariance matter? Medical Care, 44(11), S176–81.

    Article  Google Scholar 

  • Boorsboom, D., Mellenbergh, G. J., & van Heerdon, J. (2002). Different kinds of DIF: A distinction between absolute and relative forms of measurement invariance and bias. Applied Psychological Measurement, 26, 433–450.

    Article  Google Scholar 

  • Bulut, O., & Suh, Y. (2017). Detecting multidimensional differential item functioning with the multiple indicators multiple causes model, the item response theory likelihood ratio test, and logistic regression. Frontiers in Education, 2, 51. https://doi.org/10.3389/feduc.2017.00051.

    Article  Google Scholar 

  • Byrne, B. M., Shavelson, R. J., & Muthén, B. O. (1989). Testing for the equivalence of factor covariance and mean structures: The issue of partial measurement invariance. Psychological Bulletin, 105, 456–566. https://doi.org/10.1037/0033-2909.105.3.456.

    Article  Google Scholar 

  • Cai, L. (2008). SEM of another flavour: Two new applications of the supplemented EM algorithm. British Journal of Mathematical and Statistical Psychology, 61, 309–329. https://doi.org/10.1348/000711007X249603.

    Article  PubMed  Google Scholar 

  • Cai, L. (2013). FlexMIRT version 2: Flexible multilevel multidimensional item analysis and test scoring [Computer software]. Chapel Hill, NC: Vector Psychometric Group.

    Google Scholar 

  • Cai, L., Thissen, D., & du Toit, S. H. C. (2011). IRTPRO: Flexible, multidimensional, multiple categorical IRT Modeling [Computer software]. Lincolnwood, IL: Scientific Software International Inc.

  • Candell, G. L., & Drasgow, F. (1988). An iterative procedure for linking metrics and assessing item bias in item response theory. Applied Psychological Measurement, 12, 253–260.

    Article  Google Scholar 

  • Carle, A. C., Cella, D., Cai, L., Choi, S. W., Crane, P. K., Curtis, S. M., et al. (2011). Advancing PROMIS’s methodology: Results of the third PROMIS Psychometric Summit. Expert Review of Pharmacoeconomics & Outcome Research, 11(6), 677–684. https://doi.org/10.1586/erp.11.74.

    Article  Google Scholar 

  • Cella, D., Yount, S., Rothrock, N., Gershon, R., Cook, K., Reeve, B., Ader, D., Fries, J. F., Bruce, B., & Rose, M., on behalf of the PROMIS Cooperative Group. (2007). The patient-reported outcomes measurement information system (PROMIS): Progress of an NIH roadmap cooperative group during its first two years. Medical Care, 45(5 Suppl 1), S3–S11. https://doi.org/10.1097/01.mlr.0000258615.42478.55.

  • Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of statistical software, 48(6), 1–29.

    Article  Google Scholar 

  • Chalmers, R. P. (2016). A differential response functioning framework for understanding item, bundle, and test bias. Doctoral Dissertation, York University, Toronto, Ontario. https://pdfs.semanticscholar.org

  • Chalmers, R. P. (2018). Model-based measures for detecting and quantifying response bias. Psychometrika, 83, 696–732. https://doi.org/10.1007/s11336-018-9626-9.

    Article  PubMed  Google Scholar 

  • Chalmers, R. P., Counsell, A., & Flora, D. B. (2016). It might not make a big DIF: Improved differential test functioning statistics that account for sampling variability. Educational and Psychological Measurement, 76, 114–140.

    Article  PubMed  Google Scholar 

  • Chang, Y.-W., Hsu, N.-J., & Tsai, R.-C. (2017). Unifying differential item functioning in factor analysis for categorical data under a discretization of a normal variant. Psychometrika, 82(2), 382–406. https://doi.org/10.1007/s11336-017-9562-0.

    Article  PubMed  Google Scholar 

  • Chen, J.-H., Chen, C.-T., & Shih, C.-L. (2013). Improving the control of type I error rate in assessing differential item functioning for hierarchical generalized linear models when impact is present. Applied Psychological Measurement, 38, 18–36. https://doi.org/10.1177/0146621613488643.

    Article  Google Scholar 

  • Cheng, C.-P., Chen, C.-C., & Shih, C.-L. (2020). An exploratory strategy to identify and define sources of differential item functioning. Applied Psychological Measurement, 4, 548–560. https://doi.org/10.1177/014662/620931/90.

    Article  Google Scholar 

  • Cheng, Y., Shao, C., & Lathrop, Q. N. (2016). The mediated MIMIC model for understanding the underlying mechanisms of DIF. Educational and Psychological Measurement, 76(1), 43–63.

    Article  PubMed  Google Scholar 

  • Cheung, G. W., & Rensvold, R. B. (2003). Evaluating goodness-of-fit indexes for testing measurement invariance. Structural Equation Modeling, 9, 233–255. https://doi.org/10.1207/S15328007SEM0902_5.

    Article  Google Scholar 

  • Choi, S. W., Gibbons, L. E., & Crane, P. K. (2011). lordif: An R package for detecting differential item functioning using iterative hybrid ordinal logistic regression/item response theory and Monte Carlo simulations. Journal of Statistical Software, 39(8), 1–30. https://doi.org/10.18637/jss.v039.i08.

    Article  PubMed  PubMed Central  Google Scholar 

  • Choi, S. W., Reise, S. P., Pilkonis, P. A., Hays, R. D., & Cella, D. (2010). Efficiency of static and computer adaptive short forms compared to full-length measures of depressive symptoms. Quality of Life Research, 19, 125–136.

    Article  PubMed  Google Scholar 

  • Clauser, B. E., Mazor, K. M., & Hambleton, R. K. (1993). The effects of purification of the matching criterion on the identification of DIF using the Mantel–Haenszel procedure. Applied Measurement in Education, 6, 269–279.

    Article  Google Scholar 

  • Cohen, A. S., Kim, S.-H., & Baker, F. B. (1993). Detection of differential item functioning in the graded response model. Applied Psychological Measurement, 17, 335–350. https://doi.org/10.1177/014662169301700402.

    Article  Google Scholar 

  • Cohen, P., Cohen, J., Teresi, J., Marchi, P., & Velez, N. (1990). Problems in the measurement of latent variables in structural equation causal models. Applied Psychological Measurement, 14(2), 183–196. https://doi.org/10.1177/014662169001400207.

    Article  Google Scholar 

  • Crane, P. K., Gibbons, L. E., Jolley, L., & van Belle, G. (2006). Differential item functioning analysis with ordinal logistic regression techniques: Difdetect and difwithpar. Medical Care, 44, S115–S123. https://doi.org/10.1097/01.mlr.0000245183.28384.ed.

    Article  PubMed  Google Scholar 

  • Crane, P. K., Gibbons, L. E., Ocepek-Welikson, K., Cook, K., Cella, D., & Teresi, J. A. (2007). A comparison of three sets of criteria for determining the presence of differential item functioning using ordinal logistic regression. Quality of Life Research, 16, 69–84. https://doi.org/10.1007/s11136-007-9185-5.

    Article  PubMed  Google Scholar 

  • Crane, P. K., van Belle, G., & Larson, E. B. (2004). Test bias in a cognitive test: Differential item functioning in the CASI. Statistics in Medicine, 23, 241–256. https://doi.org/10.1002/sim.1713.

    Article  PubMed  Google Scholar 

  • Culpepper, S. A., Aguinis, H., Kern, J. L., & Millsap, R. (2019). High-stakes testing case study: A latent variable approach for assessing measurement and prediction invariance. Psychometrika, 84, 285–309. https://doi.org/10.1007/s11336-018-9549-2.

    Article  PubMed  Google Scholar 

  • DeMars, C. E. (2010). Type 1 error inflation for detecting DIF in the presence of impact. Educational and Psychological Measurement, 70, 961–972. https://doi.org/10.1177/0013164410366691.

    Article  Google Scholar 

  • DeMars, C. E. (2015). Modeling DIF for simulations: Continuous or categorical secondary trait? Psychological Test and Assessment Modeling, 57, 279–300.

    Google Scholar 

  • Edelen, M., Stucky, B., & Chandra, A. (2015). Quantifying “problematic” DIF within an IRT framework: Application to a cancer stigma index. Quality of Life Research, 24, 95–103. https://doi.org/10.1007/s11136-013-0540-4.

    Article  PubMed  Google Scholar 

  • Egberink, I. J. L., Meijer, R. R., & Tendeiro, J. N. (2015). Investigating measurement invariance in computer-based personality testing: The impact of using anchor items on effect size indices. Educational and Psychological Measurement, 75, 126–145. https://doi.org/10.1177/0013164414520965.

    Article  PubMed  Google Scholar 

  • Finch, H. (2005). The MIMIC model as a method for detecting DIF: Comparison with Mantel–Haenszel, SIBTEST and the IRT likelihood ratio test. Applied Psychological Measurement, 29, 278–295. https://doi.org/10.1177/0146621605275728.

    Article  Google Scholar 

  • Fleer, P. F. (1993). A Monte Carlo assessment of a new measure of item and test bias (p. 2266, Vol. 54, No. 04B), Illinois Institute of Technology, Dissertation Abstracts International.

  • Flowers, C. P., Oshima, T. C., & Raju, N. S. (1999). A description and demonstration of the polytomous DFIT framework. Applied Psychological Measurement, 23, 309–32. https://doi.org/10.1177/01466219922031437.

    Article  Google Scholar 

  • Furlow, C. F., Ross, T. R., & Gagné, P. (2009). The impact of multidimensionality on the detection of differential bundle functioning using simultaneous item bias test. Applied Psychological Measurement, 33(6), 441–464. https://doi.org/10.1177/0146621609331959.

    Article  Google Scholar 

  • Gelin, M. N., & Zumbo, B. D. (2003). Differential item functioning results may change depending on how an item is scored: An illustration with the center for epidemiologic studies depression scale. Educational and Psychological Measurement, 63(1), 65–74. https://doi.org/10.1177/0013164402239317.

    Article  Google Scholar 

  • González-Betanzos, F., & Abad, F. J. (2012). The effects of purification and the evaluation of differential item functioning with the likelihood ratio test. Methodology: European Journal of Research Methods for the Behavioral and Social Sciences, 8, 130–145. https://doi.org/10.1027/1614-2241/a000046.

    Article  Google Scholar 

  • Gómez-Benito, J., Dolores-Hidalgo, M., & Zumbo, B. D. (2013). Effectiveness of combining statistical tests and effect sizes when using logistic discriminant function regression to detect differential item functioning for polytomous items. Educational and Psychological Measurement, 73, 875–897. https://doi.org/10.1177/0013164413492419.

    Article  Google Scholar 

  • Gregorich, S. E. (2006). Do self-report instruments allow meaningful comparisons across diverse population groups?: Testing measurement invariance using the confirmatory factor analysis framework. Medical Care, 44(11), S78–S94.

    Article  PubMed  PubMed Central  Google Scholar 

  • Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park, California: Sage Publications Inc.

    Google Scholar 

  • Herrel, F. E. (2009). Design; design package. R package version 2:3.0. Retrieved from http://CRANR-project.org/package=Design

  • Hidalgo, M. D., Gomez-Benito, J., & Zumbo, B. D. (2014). Binary logistic regression analysis for detecting differential item functioning: Effectiveness of \(\text{ R}^{{2}}\) and delta log odds ratio effect size measures. Educational and Psychological Measurement, 74, 927–949. https://doi.org/10.1177/0013164414523618.

    Article  Google Scholar 

  • Houts, C. R., & Cai, L. (2013). FlexMIRT user’s manual version 2: Flexible multilevel multidimensional item analysis and test scoring. Chapel Hill, NC: Vector Psychometric Group.

    Google Scholar 

  • Jensen, R. E., Moinpour, C. M., Keegan, T. H. M., Cress, R. D., Wu, X.-C., Paddock, L. A., et al. (2016a). The Measuring Your Health Study: Leveraging community-based cancer registry recruitment to establish a large, diverse cohort of cancer survivors for analyses of measurement equivalence and validity of thepatient-reported Outcomes Measurement Information System®(PROMIS®) short form items. Psychological Test and Assessment Modeling, 58(1), 99–117.

    Google Scholar 

  • Jensen, R. E., King-Kallimanis, B. L., Sexton, E., Reeve, B. B., Moinpour, C. M., Potosky, A. L., et al. (2016b). Measurement properties of the PROMIS\(^{\textregistered }\) Sleep Disturbance short form in a large, ethnically diverse cancer cohort. Psychological Test and Assessment Modeling, 58(2), 353–370.

    Google Scholar 

  • Jin, K. Y., Chen, H. F., & Wang, W. C. (2018). Using odds ratios to detect differential item functioning. Applied Psychological Measurement, 42, 613–29.

    Article  PubMed  PubMed Central  Google Scholar 

  • Jodoin, M. G., & Gierl, M. J. (2001). Evaluating type I error and power rates using an effect size measure with the logistic regression procedure for DIF detection. Applied Measurement in Education, 14, 329–349. https://doi.org/10.1207/S15324818AME1404_2.

    Article  Google Scholar 

  • Jones, R. N. (2006). Identification of measurement differences between English and Spanish language versions for the Mini-Mental State Examination: Detecting differential item functioning using MIMIC modeling. Medical Care, 44(11 Suppl 3), S124–S133. https://doi.org/10.1097/01.mlr.0000245250.50114.0f.

    Article  PubMed  Google Scholar 

  • Jones, R. N. (2019). Differential item functioning and its relevance to epidemiology. Current Epidemiology Reports,. https://doi.org/10.1007/s40471-019-00194-5.

    Article  PubMed  PubMed Central  Google Scholar 

  • Jones, R. N., Tommet, D., Ramirez, M., Jensen, R. E., & Teresi, J. A. (2016). Differential item functioning in Patient Reported Outcomes Measurement Information System (PROMIS\(^{\textregistered }\)) Physical Functioning short forms: Analyses across ethnically diverse groups. Psychological Test and Assessment Modeling, 58(2), 371–402.

    Google Scholar 

  • Jöreskog, K. G. (1971). Simultaneous factor analysis in several populations. Psychometrika, 36(4), 408–426. https://doi.org/10.1007/BF02291366.

    Article  Google Scholar 

  • Jöreskog, K., & Goldberger, A. (1975). Estimation of a model of multiple indicators and multiple causes of a single latent variable. Journal of the American Statistical Association, 10, 631–639. https://doi.org/10.2307/2285946.

    Article  Google Scholar 

  • Jöreskog, K. G., & Moustaki, I. (2001). Factor analysis of ordinal variables: A comparison of three approaches. Multivariate Behavioral Research, 36(3), 347–387. https://doi.org/10.1207/S15327906347-387.

    Article  PubMed  Google Scholar 

  • Jöreskog, K., & Sorbom, D. (1996). LISREL8: Analysis of linear structural relationships: Users Reference Guide. Lincolnwood: Scientific Software International Inc.

    Google Scholar 

  • Junker, B. W. (1991). Essential independence and likelihood-based ability estimation for polytomous items. Psychometrika, 56, 255–278. https://doi.org/10.1007/BF02294462.

    Article  Google Scholar 

  • Kahraman, N., DeBoeck, P., & Janssen, R. (2009). Modeling DIF in complex response data using test design strategies. International Journal of Testing, 8, 151–166. https://doi.org/10.1080/15305050902880744.

    Article  Google Scholar 

  • Kim, E. S., & Yoon, M. (2011). Testing measurement invariance: A comparison of multiple group categorical CFA and IRT. Structural Equation Modeling, 18, 212–228. https://doi.org/10.1080/10705511-2011.557337.

    Article  Google Scholar 

  • Kim, E. S., Yoon, M., & Lee, T. (2012). Testing measurement invariance using MIMIC: Likelihood ratio test with a critical value adjustment. Educational and Psychological Measurement, 72, 469–492. https://doi.org/10.1177/0013164411427395.

    Article  Google Scholar 

  • Kim, S.-H., & Cohen, A. S. (1998). Detection of differential item functioning under the graded response model with the likelihood ratio test. Applied Psychological Measurement, 22, 345–355. https://doi.org/10.1177/014662169802200403.

    Article  Google Scholar 

  • Kim, S.-H., Cohen, A. S., Alagoz, C., & Kim, S. (2007). DIF detection and effect size measures for polytomously scored items. Journal of Educational Measurement, 44(2), 93–116. https://doi.org/10.1111/j.1745-3984.2007.00029.x.

    Article  Google Scholar 

  • Kleinman, M., & Teresi, J. A. (2016). Differential item functioning magnitude and impact measures from item response theory models. Psychological Test and Assessment Modeling, 58, 79–98.

    PubMed  PubMed Central  Google Scholar 

  • Kopf, J., Zeileis, A., & Stobl, C. (2015a). A framework for anchor methods and an iterative forward approach for DIF detection. Applied Psychological Measurement, 39, 83–103. https://doi.org/10.1177/0146621614544195.

    Article  PubMed  Google Scholar 

  • Kopf, J., Zeileis, A., & Stobl, C. (2015b). Anchor selection strategies for DIF analysis: Review, assessment and new approaches. Educational and Psychological Measurement, 75, 22–56. https://doi.org/10.1177/0013164414529792.

    Article  PubMed  Google Scholar 

  • Langer, M. M. (2008). A re-examination of Lord’s Wald test for differential item functioning using item response theory and modern error estimation (Doctoral dissertation, University of North Carolina at Chapel Hill library). http://search.lib.unc.edu/search?R=UNCb5878458.

  • Lee, S., Bulut, O., & Suh, Y. (2017). Multidimensional extension of multiple indicators multiple causes models to detect DIF. Educational and Psychological Measurement, 77(4), 545–569.

    Article  PubMed  Google Scholar 

  • Li, Y., Brooks, G. P., & Johanson, G. A. (2012). Item discrimination and Type I error in the detection of differential item functioning. Educational and Psychological Measurement, 72, 847–861. https://doi.org/10.1177/0013164411432333.

    Article  Google Scholar 

  • Liu, Y., Magnus, B. E., & Thissen, D. (2016). Modeling and testing differential item functioning in unidimensional binary item response models with a single continuous covariate: A functional data analysis approach. Psychometrika, 81, 371–398.

    Article  PubMed  Google Scholar 

  • Lopez Rivas, G. E., Stark, S., & Chernyshenko, O. S. (2009). The effects of referent item parameters on differential item functioning detection using the free baseline likelihood ratio test. Applied Psychological Measurement, 33, 251–265. https://doi.org/10.1177/0146621608321760.

    Article  Google Scholar 

  • Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum.

    Google Scholar 

  • Lord, F. M., Novick, M. R., & (with contributions by A. Birnbaum). (1968). Statistical theories of mental test scores. Reading Massachusetts: Addison-Wesley Publishing Company Inc.

  • Mazor, K. M., Hambleton, R. K., & Clauser, B. E. (1998). Multidimensional DIF analyses: The effects of matching on unidimensional subtest scores. Applied Psychological Measurement, 22, 357–367. https://doi.org/10.1177/014662169802200404.

    Article  Google Scholar 

  • McDonald, R. P. (2000). A basis for multidimensional item response theory. Applied Psychological Measurement, 24, 99–114. https://doi.org/10.1177/01466210022031552.

    Article  Google Scholar 

  • Meade, A. W., & Lautenschlager, G. J. (2004). A comparison of IRT and CFA methodologies for establishing measurement equivalence. Organizational Research Methods, 7, 361–388. https://doi.org/10.1177/1094428104268027.

    Article  Google Scholar 

  • Meade, A., Lautenschlager, G., & Johnson, E. (2007). A Monte Carlo examination of the sensitivity of the differential functioning of items and tests framework for tests of measurement invariance with Likert data. Applied Psychological Measurement, 31, 430–455. https://doi.org/10.1177/0146621606297316.

    Article  Google Scholar 

  • Meade, A. W., & Wright, N. A. (2012). Solving the measurement invariance anchor item problem in item response theory. Journal of Applied Psychology, 97, 1016–1031. https://doi.org/10.1037/a0027934.

    Article  PubMed  Google Scholar 

  • Mellenbergh, G. J. (1989). Item bias and item response theory. International Journal of Educational Research, 13, 127–143. https://doi.org/10.1016/0883-0355(89)90002-5.

    Article  Google Scholar 

  • Mellenbergh, G. J. (1994). Generalized linear item response theory. Psychological Bulletin, 115, 302–307. https://doi.org/10.1037/0033-2909.115.2.300.

    Article  Google Scholar 

  • Meredith, W. (1964). Notes on factorial invariance. Psychometrika, 29, 177–185. https://doi.org/10.1007/BF02289699.

    Article  Google Scholar 

  • Meredith, W. (1993). Measurement invariance, factor analysis and factorial invariance. Psychometrika, 58, 525–543. https://doi.org/10.1007/BF02294825.

    Article  Google Scholar 

  • Meredith, W., & Teresi, J. A. (2006). An essay on measurement and factorial invariance. Medical Care, 44(Suppl 3), S69–S77. https://doi.org/10.1097/01.mlr.0000245438.73837.89.

    Article  PubMed  Google Scholar 

  • Millsap, R. E., & Everson, H. T. (1993). Methodology review: Statistical approaches for assessing measurement bias. Applied Psychological Measurement, 17, 297–334. https://doi.org/10.1177/014662169301700401.

    Article  Google Scholar 

  • Mislevy, R. J. (1986). Bayes modal estimation in item response models. Psychometrika, 51, 177–195. https://doi.org/10.1007/BF02293979.

    Article  Google Scholar 

  • Montoya, A. K., & Jeon, M. (2020). MIMIC models for uniform and nonuniform DIF as moderated mediation models. Applied Psychological Measurement, 44(2), 118–136.

    Article  PubMed  Google Scholar 

  • Mukherjee, S., Gibbons, L. E., Kristjansson, E., & Crane, P. K. (2013). Extension of an iterative hybrid ordinal logistic regression/item response theory approach to detect and account for differential item functioning in longitudinal data. Psychological Test and Assessment Modeling, 55(2), 127–147.

    PubMed  PubMed Central  Google Scholar 

  • Muthén, B. O. (1984). A general structural equation model with dichotomous, ordered categorical, and continuous latent variable indicators. Psychometrika, 49, 115–132. https://doi.org/10.1007/BF02294210.

    Article  Google Scholar 

  • Muthén, B. (1989). Latent variable modeling in heterogeneous populations. Meetings of Psychometric Society (1989, Los Angeles, California and Leuven, Belgium). Psychometrika, 54(4), 557–585.

    Article  Google Scholar 

  • Muthén, B. O. (2002). Beyond SEM: General latent variable modeling. Behaviormetrika, 29, 81–117.

    Article  Google Scholar 

  • Muthén, B., & Asparouhov, T. (2002). Latent variable analysis with categorical outcomes: Multiple-group and growth modeling in Mplus (p 16). Los Angeles: University of California.

    Google Scholar 

  • Muthén, L. K. & Muthén, B. O. (1998–2019). M-PLUS Users Guide. Sixth Edition. Los Angeles, California: Authors Muthén and Muthén.

  • Muthén, B., du Toit, S.H.C. & Spisic, D. (1997). Robust inference using weighted least squares and quadratic estimating equations in latent variable modeling with categorical and continuous outcomes. Unpublished Technical Report. Available at https://www.statmodel.com/wlscv.shtml.

  • Narayanan, P., & Swaminathan, H. (1996). Identification of items that show nonuniform DIF. Applied Psychological Measurement, 20, 257–274.

    Article  Google Scholar 

  • Oort, E. J. (1998). Simulation study of item bias detection with restricted factor analysis. Structural Equation Modeling, 5, 107–124.

    Article  Google Scholar 

  • Orlando-Edelen, M., Stuckey, B. D., & Chandra, A. (2015). Quantifying ‘problematic’ DIF within an IRT framework: Application to a cancer stigma index. Quality of Life Research, 24, 95–103. https://doi.org/10.1007/s11136-013-0540-4.

    Article  Google Scholar 

  • Orlando-Edelen, M., Thissen, D., Teresi, J. A., Kleinman, M., & Ocepek-Welikson, K. (2006). Identification of differential item functioning using item response theory ad the likelihood-based model comparison approach: Applications to the Mini-Mental State Examination. Medical Care, 44, S134–S142. https://doi.org/10.1097/01.mlr.0000245251.83359.8c.

    Article  PubMed  Google Scholar 

  • Oshima, T. C., Kushubar, S., Scott, J. C., & Raju, N. S. (2009). DFIT8 for Window User’s Manual: Differential functioning of items and tests. St. Paul MN: Assessment Systems Corporation.

  • Oshima, T. C., Raju, N. S., & Nanda, A. O. (2006). A new method for assessing the statistical significance of the differential functioning of items and tests (DFIT) framework. Journal of Educational Measurement, 43, 1–17. https://doi.org/10.1111/j.1745-3984.2006.00001.x.

    Article  Google Scholar 

  • Paz, S. H., Spritzer, K. L., Morales, L., & Hays, R. D. (2013). Evaluation of the Patient-Reported outcomes Information System (PROMIS) Spanish-language physical functioning items. Quality of Life Research, 22, 1819–1830. https://doi.org/10.1007/s11136-012-0292-6.

    Article  PubMed  Google Scholar 

  • Pilkonis, P. A., Choi, S. W., Reise, S. P., Stover, A. M., Riley, W. T., & Cella, D. (2011). Item banks for measuring emotional distress from the patient-reported outcomes measurement information system (PROMIS): Depression, Anxiety and Anger. Assessment, 18, 263–283.

    Article  PubMed  PubMed Central  Google Scholar 

  • Raju, N. S. (1988). The area between two item characteristic curves. Psychometrika, 53, 495–502. https://doi.org/10.1007/BF02294403.

    Article  Google Scholar 

  • Raju, N. S. (1990). Determining the significance of estimated signed and unsigned areas between two item response functions. Applied Psychological Measurement, 14, 197–207. https://doi.org/10.1177/014662169001400208.

    Article  Google Scholar 

  • Raju, N. S. (1999). DFITP5: A Fortran program for calculating dichotomous DIF/DTF [Computer program]. Chicago: Illinois Institute of Technology.

    Google Scholar 

  • Raju, N. S., Fortmann-Johnson, K. A., Kim, W., Morris, S. B., Nering, M. L., & Oshima, T. C. (2009). The item parameter replication method for detecting differential functioning in the polytomous DFIT framework. Applied Psychological Measurement, 33, 133–147. https://doi.org/10.1177/0146621608319514.

    Article  Google Scholar 

  • Raju, N. S., Laffitte, L. J., & Byrne, B. M. (2002). Measurement equivalence: A comparison of methods based on confirmatory factor analysis and item response theory. Journal of Applied Psychology, 87, 517–528. https://doi.org/10.1037//0021-9010.87.3.517.

    Article  PubMed  Google Scholar 

  • Raju, N. S., van der Linden, W. J., & Fleer, P. F. (1995). IRT-based internal measures of differential functioning of items and tests. Applied Psychological Measurement, 19, 353–368. https://doi.org/10.1177/014662169501900405.

  • Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen, Denmark: DenmarksPaedagogiskeInstitut (Danish Institute of Educational Research).

  • Raykov, T., Marcoulides, G. A., Menold, N., & Harrison, M. (2019). Revisiting the bi-factor model: Can mixture modeling help assess its applicability? Structural Equation Modeling, 26, 110–118.

    Article  Google Scholar 

  • Reckase, M. D., & McKinley, R. L. (1991). The discriminating power of items that measure more than one dimension. Applied Psychological Measurement, 15, 361–373.

    Article  Google Scholar 

  • Reeve, B. B., Hays, R. D., Bjorner, J. B., Cook, K. F., Crane, P. K., Teresi, J. A., et al. (2007). Psychometric evaluation and calibration of health-related quality of life item banks: Plans for the Patient-Reported Outcome Measurement Information System (PROMIS). Medical Care, 45(5 Suppl 1), S22–S31. https://doi.org/10.1097/01.mlr.0000250483.85507.04.

    Article  PubMed  Google Scholar 

  • Reeve, B. B., & Teresi, J. A. (2016). Overview to the two-part series: Measurement equivalence of the Patient Reported Outcomes Measurement Information System\(^{@}\) (PROMIS)\(^{@}\) short forms. Psychological Test and Assessment Modeling, 58(1), 31–35.

    Google Scholar 

  • Reise, S. P. (2012). The rediscovery of bifactor measurement models. Multivariate Behavioral Research, 47, 667–696. https://doi.org/10.1080/00273171.2012.715555.

    Article  PubMed  PubMed Central  Google Scholar 

  • Reise, S. P., Widaman, K. F., & Pugh, R. H. (1993). Confirmatory factor analysis and item response theory: Two approaches for exploring measurement invariance. Psychological Bulletin, 114, 552–566. https://doi.org/10.1037/0033-2909.114.3.552.

    Article  PubMed  Google Scholar 

  • Rikis, D. R. J., & Oshima, T. C. (2017). Effect of purification procedures on DIF analysis in IRTPRO. Educational and Psychological Measurement, 77, 415–428.

    Article  Google Scholar 

  • Rizopoulus, D. (2006). Ltm: An R package for latent variable modeling and item response theory analyses. Journal of Statistical Software, 17, 1–25. https://doi.org/10.18637/jss.v017.i05.

    Article  Google Scholar 

  • Rizopoulus, D. (2009). Ltm: Latent Trait Models under IRT. http://cran.rproject.org/web/packages/ltm/index.html.

  • Rouquette, A., Hardouin, J. B., Vanhaesebrouck, A., Véronique Sébille, V., & Coste, J. (2019). Differential item functioning (DIF) in composite health measurement scale: Recommendations for characterizing DIF with meaningful consequences within the Rasch model framework. PLoS ONE, 14(4), e0215073. https://doi.org/10.1371/journal.pone.0215073.

    Article  PubMed  PubMed Central  Google Scholar 

  • Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph Supplement, 34, 100–114. https://doi.org/10.1007/BF02290599.

    Article  Google Scholar 

  • Schalet, B. D., Pilkonis, P. A., Yu, L., Dodds, N., Johnston, K. L., Yount, S., et al. (2016). Clinical validity of PROMIS depression, anxiety and anger across diverse clinical groups. Journal of Clinical Epidemiology, 73, 119–127. https://doi.org/10.1016/j.jclinepi2015.08.036.

    Article  PubMed  PubMed Central  Google Scholar 

  • Setodji, C. M., Reise, S. P., Morales, L. S., Fongwam, N., & Hays, R. D. (2011). Differential item functioning by survey language among older Hispanics enrolled in Medicare Managed Care a new method for anchor item selection. Medical Care, 49, 461–468. https://doi.org/10.1097/MLR.0b013e318207edb5.

    Article  PubMed  PubMed Central  Google Scholar 

  • Seybert, J., & Stark, S. (2012). Iterative linking with the differential functioning of items and tests (DFIT) Method: Comparison of testwide and item parameter replication (IPR) critical values. Applied Psychological Measurement, 36, 494–515. https://doi.org/10.1177/0146621612445182.

  • Shealy, R. T., & Stout, W. F. (1993). A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF. Psychometrika, 58, 159–194.

    Article  Google Scholar 

  • Shih, C.-L., Liu, T.-H., & Wang, W.-C. (2014). Controlling type 1 error rates in assessing DIF for logistic regression method with SIBTEST regression correction procedure and DIF-free-then-DIF strategy. Educational and Psychological Measurement, 74, 1018–1048. https://doi.org/10.1177/0013164413520545.

    Article  Google Scholar 

  • Shih, C.-L., & Wang, W.-C. (2009). Differential item functioning detection using multiple indicators, multiple causes method with a pure short anchor. Applied Psychological Measurement, 33, 184–199. https://doi.org/10.1177/0146621608321758.

    Article  Google Scholar 

  • Stark, S., Chernyshenko, O. S., & Drasgow, F. (2004). Examining the effects of differential item (functioning and differential) test functioning on selection decisions: When are statistically significant effects practically important? Journal of Applied Psychology, 89, 497–508. https://doi.org/10.1037/0021-9010.89.3.497.

    Article  PubMed  Google Scholar 

  • Stark, S., Chernyshenko, O. S., & Drasgow, F. (2006). Detecting differential item functioning with confirmatory factor analysis and item response theory: Toward a unified strategy. Journal of Applied Psychology, 91, 1292–1306. https://doi.org/10.1037/0021-9010.91.6.1292.

    Article  PubMed  Google Scholar 

  • Steinberg, L., & Thissen, D. (2006). Using effect sizes for research reporting: Examples using item response theory to analyze differential item functioning. Psychological Methods, 11, 402–415. https://doi.org/10.1007/s11136-011-9969-5.

    Article  PubMed  Google Scholar 

  • Stocking, M. L., & Lord, F. M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7(2), 201–210.

    Article  Google Scholar 

  • Stout, W. F. (1987). A nonparametric approach for assessing latent trait dimensionality. Psychometrika, 52, 589–617.

    Article  Google Scholar 

  • Stout, W. F. (1990). A new item response theory modeling approach with applications to unidimensional assessment and ability estimation. Psychometrika, 55, 293–326.

    Article  Google Scholar 

  • Stout, W., Li, H., Nandakumar, R., & Bolt, D. (1997). MULTISIB—A procedure to investigate DIF when a test is intentionally multidimensional. Applied Psychological Measurement, 21, 195–213.

    Article  Google Scholar 

  • Strobl, C., Kopf, J., & Zeileis, A. (2015). Rasch trees: A new method for detecting differential item functioning in the Rasch model. Psychometrika, 80, 289–316. https://doi.org/10.1007/s11366-013-9388-3.

    Article  PubMed  Google Scholar 

  • Suh, Y., & Cho, S.-J. (2014). Chi-square difference tests for detecting differential functioning in a multidimensional IRT model: A Monte Carlo study. Applied Psychological Measurement, 38(5), 359–375.

    Article  Google Scholar 

  • Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361–370. https://doi.org/10.1111/j.1745-3984.1990.tb00754.x.

    Article  Google Scholar 

  • Takane, Y., & de Leeuw, J. (1987). On the relationship between item response theory and factor analysis of discretized variables. Psychometrika, 52, 393–408. https://doi.org/10.1007/BF02294363.

    Article  Google Scholar 

  • Taple, B. J., Griffith, J. W., & Wolf, M. S. (2019). Interview administration of PROMIS depression and anxiety short forms. Health Literacy Research Practice, 6, e196–e204. https://doi.org/10.3928/24748307-20190626-01.

    Article  Google Scholar 

  • Teresi, J. A. (2006). Different approaches to differential item functioning in health applications: Advantages, disadvantages and some neglected topics. Medical Care, 44(Suppl. 11), S152–S170. https://doi.org/10.1097/01.mlr.0000245142.74628.ab.

    Article  PubMed  Google Scholar 

  • Teresi, J. A. (2019). Applying and Acting on DIF. Moderator at the 2019 PROMIS Psychometric Summit, Northwestern University, Chicago, IL.

  • Teresi, J. A. & Jones, R. N. (2013). Bias in psychological assessment and other measures. In K. F. Geisinger (Ed.), APA Handbook of Testing and Assessment in Psychology: Vol 1. Test Theory and Testing and Assessment in Industrial and Organizational Psychology (pp. 139–164). American Psychological Association: Washington, DC. https://doi.org/10.1037/14047-008.

  • Teresi, J. A., & Jones, R. N. (2016). Methodological issues in examining measurement equivalence in patient reported outcomes measures: Methods overview to the two-part series, “Measurement Equivalence of the Patient Reported Outcomes Measurement Information System (PROMIS) Short Form Measures”. Psychological Test and Assessment Modeling, 58(1), 37–78.

    PubMed  PubMed Central  Google Scholar 

  • Teresi, J. A., Kleinman, M., & Ocepek-Welikson, K. (2000). Modern psychometric methods for detection of differential item functioning: Application to cognitive assessment measures. Statistics in Medicine, 19, 1651–1683.

    Article  PubMed  Google Scholar 

  • Teresi, J. A., Ocepek-Welikson, K., Kleinman, M., Cook, K. F., Crane, P. K., Gibbons, L. E., et al. (2007). Evaluating measurement equivalence using the item response theory log-likelihood ratio (IRTLR) method to assess differential item functioning (DIF): Applications (with illustrations) to measure of physical functioning ability and general distress. Quality Life Research, 16, 43–68. https://doi.org/10.1007/s11136-007-9186-4.

    Article  Google Scholar 

  • Teresi, J., Ocepek-Welikson, K., Kleinman, M., Eimicke, J. E., Crane, P. K., Jones, R. N., et al. (2009). Analysis of differential item functioning in the depression item bank from the Patient Reported Outcome Measurement Information System (PROMIS): An item response theory approach. Psychology Science Quarterly, 51(2), 148–180. PMCID: PMC2844669. NIHMSID: 136951.

    PubMed  PubMed Central  Google Scholar 

  • Teresi, J. A., Ocepek-Welikson, K., Kleinman, M., Ramirez, M., & Kim, G. (2016a). Psychometric properties and performance of the Patient Reported Outcomes Measurement Information System®(PROMIS®) depression short forms in ethnically diverse groups. Psychological Test and Assessment Modeling, 58(1), 141–181.

    PubMed  PubMed Central  Google Scholar 

  • Teresi, J. A., Ocepek-Welikson, K., Kleinman, M., Ramirez, M., & Kim, G. (2016b). Measurement equivalence of the Patient Reported Outcomes Measurement Information System®(PROMIS®) anxiety short forms in ethnically diverse groups. Psychological Test and Assessment Modeling, 58(1), 183–219.

    PubMed  PubMed Central  Google Scholar 

  • Teresi, J. A., Ramirez, M., Jones, R. N., Choi, S., & Crane, P. K. (2012). Modifying measures based on Differential Item Functioning (DIF) impact analyses. Journal of Aging & Health, 24(6), 1044–1076. https://doi.org/10.1177/089826412436877.

    Article  Google Scholar 

  • Teresi, J. A., & Reeve, B. B. (2016). Epilogue to the two-part series: Measurement equivalence of the Patient Reported Outcomes Measurement Information System (PROMIS) short forms. Psychological Tests and Assessment Modeling, 58(2), 423–433.

    Google Scholar 

  • Thissen, D. (2001). IRTLRDIF v.2.0b: Software for the Computation of the Statistics Involved in Item Response Theory Likelihood Ratio Tests for Differential Item Functioning. Unpublished manual from the L.L. Thurstone Psychometric Laboratory: University of North Carolina at Chapel Hill.

  • Thissen, D. (1991). MULTILOG\(^{{\rm TM}}\)user’s guide multiple, categorical item analysis and test scoring using item response theory. Chicago: Scientific Software Inc.

  • Thissen, D., Steinberg, L., & Kuang, D. (2002). Quick and easy implementation of the Benjamini–Hochberg procedure for controlling the false discovery rate in multiple comparisons. Journal of Educational and Behavioral Statistics, 27, 77–83. https://doi.org/10.3102/10769986027001077.

    Article  Google Scholar 

  • Thissen, D., Steinberg, L., & Wainer, H. (1988). Use of item response theory in the study of group differences in trace lines. In H. Wainer & H. Braun (Eds.), Test validity (pp. 147–169). Hillsdale, New Jersey: Lawrence Erlbaum, Associates.

    Google Scholar 

  • Thissen, D., Steinberg, L., & Wainer, H. (1993). Detection of differential item functioning using the parameters of item response models. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 67–113). Hillsdale, NJ: Lawrence Erlbaum Inc.

    Google Scholar 

  • Vandenberg, R. J., & Lance, C. E. (2000). A review and synthesis of the measurement invariance literature: Suggestions, practices and recommendations for organizational research. Organizational Research Methods, 3(1), 4–70. https://doi.org/10.1177/109442810031002.

    Article  Google Scholar 

  • Wainer, H. (1993). Model-based standardization measurement of an item’s differential impact. In P. W. Holland & H. Wainer (Eds.), Differential Item Functioning (pp. 123–135). Hillsdale NJ: Lawrence Erlbaum Inc.

    Google Scholar 

  • Wang, T., Strobl, C., Zeileis, A., & Merkle, E. C. (2018). Score-based test of differential item functioning via pairwise maximum likelihood estimation. Psychometrika, 83, 132–135. https://doi.org/10.1007/s11336-017-9591-8.

    Article  PubMed  Google Scholar 

  • Wang, W. (2004). Effects of anchor item methods on detection of differential item functioning within the family of Rasch models. Journal of Experimental Education, 72, 221–261. https://doi.org/10.3200/JEXE.72.3.221-261.

    Article  Google Scholar 

  • Wang, W.-C., & Shih, C.-L. (2010). MIMIC methods for assessing differential item functioning in polytomous items. Applied Psychological Measurement, 34, 166–180. https://doi.org/10.1177/0146621609355279.

    Article  Google Scholar 

  • Wang, W.-C., Shih, C.-L., & Sun, G.-W. (2012). The DIF-free-then DIF strategy for the assessment of differential item functioning (DIF). Educational and Psychological Measurement, 72, 687–708. https://doi.org/10.1177/0013164411426157.

    Article  Google Scholar 

  • Wang, W.-C., Shih, C.-L., & Yang, C.-C. (2009). The MIMIC method with scale purification for detecting differential item functioning. Educational and Psychological Measurement, 69, 713–731. https://doi.org/10.1177/0013164409332228.

    Article  Google Scholar 

  • Wang, W. C., & Yeh, Y. L. (2003). Effects of anchor item methods on differential item functioning detection with likelihood ratio test. Applied Psychological Measurement, 27, 479–498. https://doi.org/10.1177/0146621603259902.

    Article  Google Scholar 

  • Wang, M., & Woods, C. M. (2017). Anchor selection using the Wald test anchor-all-test-all procedure. Applied Psychological Measurement, 41, 17–29. https://doi.org/10.1177/01466216166680|4.

    Article  PubMed  Google Scholar 

  • Woods, C. M. (2009a). Empirical selection of anchors for tests of differential item functioning. Applied Psychological Measurement, 33, 42–57. https://doi.org/10.1177/0146621607314044.

    Article  Google Scholar 

  • Woods, C. M. (2009b). Evaluation of MIMIC-model methods for DIF testing with comparison of two group analysis. Multivariate Behavioral Research, 44, 1–27. https://doi.org/10.1080/00273170802620121.

    Article  PubMed  Google Scholar 

  • Woods, C. M. (2011). DIF testing for ordinal items with Poly-SIBTEST, the Mantel and GMH tests and IRTLRDIF when the latent distribution is nonnormal for both groups. Applied Psychological Measurement, 35, 145–164. https://doi.org/10.1177/0146621610377450.

    Article  Google Scholar 

  • Woods, C. M., Cai, L., & Wang, M. (2013). The Langer-improved Wald test for DIF testing with multiple groups: Evaluation and comparison to two-group IRT. Educational and Psychological Measurement, 73, 532–547. https://doi.org/10.1177/0013164412464875.

    Article  Google Scholar 

  • Woods, C. M., & Grimm, K. J. (2011). Testing for nonuniform differential item functioning with multiple indicator multiple cause models. Applied Psychological Measurement, 35, 339–361. https://doi.org/10.1177/0146621611405984.

    Article  Google Scholar 

  • Woods, C. M., & Harpole, J. (2015). How item residual heterogeneity affects tests for differential item functioning. Applied Psychological Measurement, 39, 251–263. https://doi.org/10.1177/0146621614561313.

    Article  PubMed  Google Scholar 

  • Yost, K. J., Eton, D. T., Garcia, S. F., & Cella, D. (2011). Minimally important differences were estimated for six PROMIS cancer scales in advanced-stage cancer patients. Journal of Clinical Epidemiology, 64(5), 507–516.

    Article  PubMed  PubMed Central  Google Scholar 

  • Yu, Q., Medeiros, K. L., Wu, X., & Jensen, R. E. (2018). Nonlinear predictive models for multiple mediation analysis with an application to explore ethnic disparities in anxiety and depression among cancer survivors. Psychometrika, 83, 991–1006.

    Article  PubMed  PubMed Central  Google Scholar 

  • Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottawa, Canada: Directorate of Human Resources Research and Evaluation, Department of National Defense. Retrieved from http://www.educ.ubc.ca/faculty/zumbo/DIF/index.html.

  • Zwitser, R. J., Glaser, S. F., & Maris, G. (2017). Monitoring countries in a changing world: A new look at DIF in international surveys. Psychometrika, 82(1), 210–232. https://doi.org/10.1007/s11336-016-9543-8.

    Article  PubMed  Google Scholar 

Download references

Funding

U01AR057971 (PIs: Potosky, Moinpour), NCI P30CA051008, UL1TR000101 (previously UL1RR031975) from the National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, through the Clinical and Translational Science Awards Program (CTSA). Analyses of these data were supported by the Mount Sinai Claude D. Pepper Older Americans Independence Center (National Institute on Aging, 1P30AG028741, Siu) and the Columbia University Alzheimer’s Disease Resource Center for Minority Aging Research (National Institute on Aging, 1P30AG059303, Manly, Luchsinger). This research was also supported by the Eunice Kennedy Shriver National Institutes of Child Health and Human Development of the National Institutes of Health under Award Number R01HD079439 to the Mayo Clinic in Rochester Minnesota through subcontracts to the University of Minnesota and the University of Washington. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. The authors thank Katja Ocepek-Welikson, M.Phil., for analytic assistance and Ruoyi Zhu, a doctoral student in the College of Education, University of Washington for assistance in conducting the simulation study.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jeanne A. Teresi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Teresi, J.A., Wang, C., Kleinman, M. et al. Differential Item Functioning Analyses of the Patient-Reported Outcomes Measurement Information System (PROMIS®) Measures: Methods, Challenges, Advances, and Future Directions. Psychometrika 86, 674–711 (2021). https://doi.org/10.1007/s11336-021-09775-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11336-021-09775-0

Keywords

Navigation