Abstract
Objective
To evaluate a modified version of the Roland–Morris Disability Questionnaire for differential item functioning (DIF) related to several covariates.
Background
DIF occurs in an item when, after controlling for the underlying trait measured by the test, the probability of endorsing the item varies across groups.
Methods
Secondary data analysis of two studies of participants with back pain (total n = 875). We used a hybrid item response theory/ logistic regression approach for detecting DIF. We obtained scores that accounted for DIF. We evaluated the impact of DIF on individual and group scores, and compared scores that ignored or accounted for DIF in terms of the strength of association with SF-36 subscale scores.
Results
DIF was found in 18/23 items. Salient scale-level differential functioning was found related to age, education, and employment. Overall 24 participants (3%) had salient scale-level differential functioning. Mean scores across demographic groups differed minimally when accounting for DIF. The strength of association of scores with SF-36 scores was similar for scores that ignored and scores that accounted for DIF.
Conclusions
The modified version of the Roland–Morris Disability Questionnaire appears to have largely negligible DIF related to the covariates assessed here.
Similar content being viewed by others
Abbreviations
- 2PL:
-
2-parameter logistic model. In this parametric item response theory model, two parameters are modeled for each item: item difficulty and item discrimination
- DIF:
-
Differential item functioning. DIF occurs when an item has different statistical properties in different groups when controlling for the underlying trait or ability measured by the test
- IRT:
-
Item response theory. This is a technique for analyzing item-level test data based on the premise that item responses are a function of the relationship between an underlying latent trait and characteristics of the item
- SIP:
-
Sickness Impact Profile. This is a patient-reported outcome measure of the impact of illnesses
- SLIP:
-
Seattle Lumbar Imaging Project, one of the two datasets of low back pain subjects analyzed in this study
References
Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. Thousand Oaks: Sage.
Holland, P. W., & Wainer, H. (Eds.) (1993). Differential item functioning. Hillsdale, NJ: Erlbaum.
Millsap, R. E., & Everson, H. T. (1993). Methodology review: Statistical approaches for assessing measurement bias. Applied Psychological Measurement, 17, 297–334.
Roland, M., & Morris, R. (1983). A study of the natural history of back pain. Part I: Development of a reliable and sensitive measure of disability in low-back pain. Spine, 8, 141–144.
Bergner, M., Bobbitt, R. A., Carter, W. B., & Gilson, B. S. (1981). The sickness impact profile: Development and final revision of a health status measure. Medical Care, 19, 787–805.
Patrick, D. L., Deyo, R. A., Atlas, S. J., Singer, D. E., Chapin, A., & Keller, R. B. (1995). Assessing health-related quality of life in patients with sciatica. Spine, 20, 1899–1908; discussion 1909.
Kucukdeveci, A. A., Tennant, A., Elhan, A. H., & Niyazoglu, H. (2001). Validation of the Turkish version of the Roland–Morris disability questionnaire for use in low back pain. Spine, 26, 2738–2743.
Pietrobon, R., Taylor, M., Guller, U., Higgins, L. D., Jacobs, D. O., & Carey, T. (2004). Predicting gender differences as latent variables: Summed scores, and individual item responses: A methods case study. Health and Quality of Life Outcomes, 2, 59.
Deyo, R. A., Mirza, S. K., Heagerty, P. J., Turner, J. A., & Martin, B. I. (2005). A prospective cohort study of surgical treatment for back pain with degenerated discs; study protocol. BMC Musculoskeletal Disorder, 6, 24.
Jarvik, J. G., Hollingworth, W., Martin, B., Emerson, S. S., Gray, D. T, Overman, S., Robinson, D., Staiger, T., Wessbecher, F., Sullivan, S. D., Kreuter, W., & Deyo, R. A. (2003). Rapid magnetic resonance imaging vs radiographs for patients with low back pain: A randomized controlled trial. JAMA, 289, 2810–2818.
Ware, J. E. Jr. (2000). SF-36 health survey update. Spine, 25, 3130–3139.
StataCorp (2003). Stata statistical software: Release 8.0. College Station, TX: Stata Corporation.
Muraki, E., & Bock, D. (2003). PARSCALE for Windows version 4.1. Chicago: SSI.
Crane, P. K., Hart, D. L., Gibbons, L. E., & Cook, K. F. (2006). A 37-item shoulder functional status item pool had negligible differential item functioning. Journal of Clinical Epidemiology, 59, 478–484.
Crane, P. K., Gibbons, L. E., Jolley, L., & van Belle, G. (2006). Differential item functioning analysis with ordinal logistic regression techniques: DIFdetect and difwithpar. Medical Care, 44, S115–S123.
Crane, P. K., Gibbons, L. E., Narasimhalu, K., Lai, J. S., & Cella, D. (2007). Rapid detection of differential item functioning in assessments of health-related quality of life: The functional assessment of cancer therapy. Quality of Life Research, 16, 101–114.
Crane, P. K., van Belle, G., & Larson, E. B. (2004). Test bias in a cognitive test: differential item functioning in the CASI. Statistics in Medicine, 23, 241–256.
Crane, P. K., Gibbons, L. E., Ocepek-Welikson, K., Cook, K., Cella, D., Narasimhalu, K., Hays, R., & Teresi, J. (2007). A Comparison of three sets of criteria for determining the presence of differential item functioning using ordinal logistic regression. Quality of Life Research (in press).
Crane, P. K. (2006). Commentary on comparing translations of the EORTC QLQ-C30 using differential item functioning analyses. Quality of Life Research, 15, 1117–1118.
Perkins, A. J., Stump, T. E., Monahan, P. O., & McHorney, C. A. (2006). Assessment of differential item functioning for demographic comparisons in the MOS SF-36 health survey. Quality of Life Research, 15, 331–348.
Maldonado, G., & Greenland, S. (1993). Simulation study of confounder-selection strategies. American Journal of Epidemiology, 138, 923–936.
Acknowledgements
Data were collected under the auspices of grants P60 AR48093 from the National Institutes of Health, National Institute for Arthritis, Musculoskeletal, and Skin Diseases, and HS-09499 from the Agency for Healthcare Research and Quality. Data were analyzed under the auspices of U01AR52171-01 from the National Institutes of Health, National Institute of Arthritis and Musculoskeletal and Skin Diseases. Data collection and analyses were reviewed by the University of Washington’s Institutional Review Board.
Author information
Authors and Affiliations
Corresponding author
Appendix 1
Appendix 1
Detailed methods of DIF detection
We have developed an approach to DIF assessment that combines ordinal logistic regression and IRT. Details of this approach are outlined in earlier publications [14, 17]. The modified version of the Roland-Morris Disability Questionnaire contains only dichotomous items, so logistic regression was used for all DIF analyses.
We use IRT scores to initially evaluate items for DIF. We examine three models for each item for each demographic category (labeled here as “group”) selected for analysis:
In these equations, p(Y = 1) is the probability of endorsing an item, θ is the IRT estimate of back pain disability, and group is the demographic category.
Two types of DIF are identified in the literature. In items with non-uniform DIF, demographic interference between ability level and item responses differs at varying levels of back pain disability. In items with uniform DIF, this interference is the same across all levels of back pain disability.
To detect non-uniform DIF, we compare the log likelihoods of models 1 and 2 using a χ2 test, α = 0.05. To detect uniform DIF, we determine the relative difference between the parameters associated with θ (β1 from models 2 and 3) using the formula \(|(\beta _{1({\rm{model}}\;2)} - \beta _{1({\rm{model}}\;3)} )/\beta _{1({\rm{model}}\;3)} |.\) If the relative difference is large, group membership interferes with the expected relationship between back pain disability and item responses. There is little guidance from the literature regarding how large the relative difference should be. A simulation by Maldonado and Greenland on confounder selection strategies used a 10% change criterion in a very different context [21]. We have previously used 10% [17] and 5% [14] change criteria. In this data set, we compared results for each covariate using a 5 and 10% criterion. While there was little difference between results using a 5 and 10% criterion, we chose to show the results from the more sensitive 5% criterion.
We have developed an approach to generate scores that account for DIF [14]. When DIF is found, we create new datasets as summarized in Fig. 1. Items without DIF have item parameters estimated from the whole sample, while items with DIF have demographic-specific item parameters estimated.
Spurious false-positive and false-negative results may occur if the back pain disability score (θ) used for DIF detection includes many items with DIF [2]. We therefore use an iterative approach for each covariate. We generate IRT scores that account for DIF, and use these as the back pain disability score to detect DIF. If different items are identified with DIF, we repeat the process outlined in Fig. 1, modifying the assignments of items based on the most recent round of DIF detection. If the same items are identified with DIF on successive rounds, we are satisfied that we identified items with DIF (as opposed to spurious findings).
We have modified this approach for demographic categories with more than two groups (such as education in this data set). Indicator terms for each group are generated, and interaction terms are generated by multiplying θ by the indicator terms. All indicator terms and interaction terms are included in model 1; all indicator terms are included in model 2; and only the ability term θ is included in model 3. For the determination of non-uniform DIF, we compared the likelihoods of models 1 and 2 to a χ2 distribution with degrees of freedom equal to the number of groups minus 1. The determination of uniform DIF is unchanged, except all the group terms are included in model 2.
Rights and permissions
About this article
Cite this article
Crane, P.K., Cetin, K., Cook, K.F. et al. Differential item functioning impact in a modified version of the Roland–Morris Disability Questionnaire. Qual Life Res 16, 981–990 (2007). https://doi.org/10.1007/s11136-007-9200-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11136-007-9200-x