Skip to main content
Log in

Differential item functioning impact in a modified version of the Roland–Morris Disability Questionnaire

  • Original Paper
  • Published:
Quality of Life Research Aims and scope Submit manuscript

Abstract

Objective

To evaluate a modified version of the Roland–Morris Disability Questionnaire for differential item functioning (DIF) related to several covariates.

Background

DIF occurs in an item when, after controlling for the underlying trait measured by the test, the probability of endorsing the item varies across groups.

Methods

Secondary data analysis of two studies of participants with back pain (total n = 875). We used a hybrid item response theory/ logistic regression approach for detecting DIF. We obtained scores that accounted for DIF. We evaluated the impact of DIF on individual and group scores, and compared scores that ignored or accounted for DIF in terms of the strength of association with SF-36 subscale scores.

Results

DIF was found in 18/23 items. Salient scale-level differential functioning was found related to age, education, and employment. Overall 24 participants (3%) had salient scale-level differential functioning. Mean scores across demographic groups differed minimally when accounting for DIF. The strength of association of scores with SF-36 scores was similar for scores that ignored and scores that accounted for DIF.

Conclusions

The modified version of the Roland–Morris Disability Questionnaire appears to have largely negligible DIF related to the covariates assessed here.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Abbreviations

2PL:

2-parameter logistic model. In this parametric item response theory model, two parameters are modeled for each item: item difficulty and item discrimination

DIF:

Differential item functioning. DIF occurs when an item has different statistical properties in different groups when controlling for the underlying trait or ability measured by the test

IRT:

Item response theory. This is a technique for analyzing item-level test data based on the premise that item responses are a function of the relationship between an underlying latent trait and characteristics of the item

SIP:

Sickness Impact Profile. This is a patient-reported outcome measure of the impact of illnesses

SLIP:

Seattle Lumbar Imaging Project, one of the two datasets of low back pain subjects analyzed in this study

References

  1. Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. Thousand Oaks: Sage.

    Google Scholar 

  2. Holland, P. W., & Wainer, H. (Eds.) (1993). Differential item functioning. Hillsdale, NJ: Erlbaum.

  3. Millsap, R. E., & Everson, H. T. (1993). Methodology review: Statistical approaches for assessing measurement bias. Applied Psychological Measurement, 17, 297–334.

    Article  Google Scholar 

  4. Roland, M., & Morris, R. (1983). A study of the natural history of back pain. Part I: Development of a reliable and sensitive measure of disability in low-back pain. Spine, 8, 141–144.

    Article  PubMed  CAS  Google Scholar 

  5. Bergner, M., Bobbitt, R. A., Carter, W. B., & Gilson, B. S. (1981). The sickness impact profile: Development and final revision of a health status measure. Medical Care, 19, 787–805.

    Article  PubMed  CAS  Google Scholar 

  6. Patrick, D. L., Deyo, R. A., Atlas, S. J., Singer, D. E., Chapin, A., & Keller, R. B. (1995). Assessing health-related quality of life in patients with sciatica. Spine, 20, 1899–1908; discussion 1909.

    Article  PubMed  CAS  Google Scholar 

  7. Kucukdeveci, A. A., Tennant, A., Elhan, A. H., & Niyazoglu, H. (2001). Validation of the Turkish version of the Roland–Morris disability questionnaire for use in low back pain. Spine, 26, 2738–2743.

    Article  PubMed  CAS  Google Scholar 

  8. Pietrobon, R., Taylor, M., Guller, U., Higgins, L. D., Jacobs, D. O., & Carey, T. (2004). Predicting gender differences as latent variables: Summed scores, and individual item responses: A methods case study. Health and Quality of Life Outcomes, 2, 59.

    Article  PubMed  Google Scholar 

  9. Deyo, R. A., Mirza, S. K., Heagerty, P. J., Turner, J. A., & Martin, B. I. (2005). A prospective cohort study of surgical treatment for back pain with degenerated discs; study protocol. BMC Musculoskeletal Disorder, 6, 24.

    Article  Google Scholar 

  10. Jarvik, J. G., Hollingworth, W., Martin, B., Emerson, S. S., Gray, D. T, Overman, S., Robinson, D., Staiger, T., Wessbecher, F., Sullivan, S. D., Kreuter, W., & Deyo, R. A. (2003). Rapid magnetic resonance imaging vs radiographs for patients with low back pain: A randomized controlled trial. JAMA, 289, 2810–2818.

    Article  PubMed  Google Scholar 

  11. Ware, J. E. Jr. (2000). SF-36 health survey update. Spine, 25, 3130–3139.

    Article  PubMed  Google Scholar 

  12. StataCorp (2003). Stata statistical software: Release 8.0. College Station, TX: Stata Corporation.

  13. Muraki, E., & Bock, D. (2003). PARSCALE for Windows version 4.1. Chicago: SSI.

    Google Scholar 

  14. Crane, P. K., Hart, D. L., Gibbons, L. E., & Cook, K. F. (2006). A 37-item shoulder functional status item pool had negligible differential item functioning. Journal of Clinical Epidemiology, 59, 478–484.

    Article  PubMed  Google Scholar 

  15. Crane, P. K., Gibbons, L. E., Jolley, L., & van Belle, G. (2006). Differential item functioning analysis with ordinal logistic regression techniques: DIFdetect and difwithpar. Medical Care, 44, S115–S123.

    Article  PubMed  Google Scholar 

  16. Crane, P. K., Gibbons, L. E., Narasimhalu, K., Lai, J. S., & Cella, D. (2007). Rapid detection of differential item functioning in assessments of health-related quality of life: The functional assessment of cancer therapy. Quality of Life Research, 16, 101–114.

    Article  PubMed  Google Scholar 

  17. Crane, P. K., van Belle, G., & Larson, E. B. (2004). Test bias in a cognitive test: differential item functioning in the CASI. Statistics in Medicine, 23, 241–256.

    Article  PubMed  Google Scholar 

  18. Crane, P. K., Gibbons, L. E., Ocepek-Welikson, K., Cook, K., Cella, D., Narasimhalu, K., Hays, R., & Teresi, J. (2007). A Comparison of three sets of criteria for determining the presence of differential item functioning using ordinal logistic regression. Quality of Life Research (in press).

  19. Crane, P. K. (2006). Commentary on comparing translations of the EORTC QLQ-C30 using differential item functioning analyses. Quality of Life Research, 15, 1117–1118.

    Article  Google Scholar 

  20. Perkins, A. J., Stump, T. E., Monahan, P. O., & McHorney, C. A. (2006). Assessment of differential item functioning for demographic comparisons in the MOS SF-36 health survey. Quality of Life Research, 15, 331–348.

    Article  PubMed  Google Scholar 

  21. Maldonado, G., & Greenland, S. (1993). Simulation study of confounder-selection strategies. American Journal of Epidemiology, 138, 923–936.

    PubMed  CAS  Google Scholar 

Download references

Acknowledgements

Data were collected under the auspices of grants P60 AR48093 from the National Institutes of Health, National Institute for Arthritis, Musculoskeletal, and Skin Diseases, and HS-09499 from the Agency for Healthcare Research and Quality. Data were analyzed under the auspices of U01AR52171-01 from the National Institutes of Health, National Institute of Arthritis and Musculoskeletal and Skin Diseases. Data collection and analyses were reviewed by the University of Washington’s Institutional Review Board.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paul K. Crane.

Appendix 1

Appendix 1

Detailed methods of DIF detection

We have developed an approach to DIF assessment that combines ordinal logistic regression and IRT. Details of this approach are outlined in earlier publications [14, 17]. The modified version of the Roland-Morris Disability Questionnaire contains only dichotomous items, so logistic regression was used for all DIF analyses.

We use IRT scores to initially evaluate items for DIF. We examine three models for each item for each demographic category (labeled here as “group”) selected for analysis:

$$ {\hbox{Logit}}\;p(Y = 1|\theta ,\;{\hbox{group}}) = \beta _1 *\theta + \beta _2 *{\hbox{group}} + \beta _3 *\theta *{\hbox{group}} $$
(model 1)
$$ {\hbox{Logit}}\;p(Y = 1|\theta ,\;{\hbox{group}}) = \beta _1 *\theta + \beta _2 *{\hbox{group}} $$
(model 2)
$$ {\hbox{Logit}}\;p(Y = 1|\theta ) = \beta _1 *\theta . $$
(model 3)

In these equations, p(Y = 1) is the probability of endorsing an item, θ is the IRT estimate of back pain disability, and group is the demographic category.

Two types of DIF are identified in the literature. In items with non-uniform DIF, demographic interference between ability level and item responses differs at varying levels of back pain disability. In items with uniform DIF, this interference is the same across all levels of back pain disability.

To detect non-uniform DIF, we compare the log likelihoods of models 1 and 2 using a χ2 test, α = 0.05. To detect uniform DIF, we determine the relative difference between the parameters associated with θ (β1 from models 2 and 3) using the formula \(|(\beta _{1({\rm{model}}\;2)} - \beta _{1({\rm{model}}\;3)} )/\beta _{1({\rm{model}}\;3)} |.\) If the relative difference is large, group membership interferes with the expected relationship between back pain disability and item responses. There is little guidance from the literature regarding how large the relative difference should be. A simulation by Maldonado and Greenland on confounder selection strategies used a 10% change criterion in a very different context [21]. We have previously used 10% [17] and 5% [14] change criteria. In this data set, we compared results for each covariate using a 5 and 10% criterion. While there was little difference between results using a 5 and 10% criterion, we chose to show the results from the more sensitive 5% criterion.

We have developed an approach to generate scores that account for DIF [14]. When DIF is found, we create new datasets as summarized in Fig. 1. Items without DIF have item parameters estimated from the whole sample, while items with DIF have demographic-specific item parameters estimated.

Fig. 2
figure 2

Impact of DIF related to eight covariates on estimated modified Roland IRT score. In this box and whisker plot, the box indicates the 25th and 75th percentiles, while the whiskers indicate 1½ times the interquartile range. Observations more extreme are indicated by dots. The graph shows the difference between scores accounting for DIF for each covariate and unadjusted scores. If there were no impact of DIF for an individual that observation would be at 0. Vertical lines placed at multiples of 0.313 and −0.313 indicate the median standard error of measurement found in this study. The three covariates for which participants have differences greater than the median standard error of measurement (age, education, and employment) are said to be associated with salient scale-level differential functioning

Spurious false-positive and false-negative results may occur if the back pain disability score (θ) used for DIF detection includes many items with DIF [2]. We therefore use an iterative approach for each covariate. We generate IRT scores that account for DIF, and use these as the back pain disability score to detect DIF. If different items are identified with DIF, we repeat the process outlined in Fig. 1, modifying the assignments of items based on the most recent round of DIF detection. If the same items are identified with DIF on successive rounds, we are satisfied that we identified items with DIF (as opposed to spurious findings).

We have modified this approach for demographic categories with more than two groups (such as education in this data set). Indicator terms for each group are generated, and interaction terms are generated by multiplying θ by the indicator terms. All indicator terms and interaction terms are included in model 1; all indicator terms are included in model 2; and only the ability term θ is included in model 3. For the determination of non-uniform DIF, we compared the likelihoods of models 1 and 2 to a χ2 distribution with degrees of freedom equal to the number of groups minus 1. The determination of uniform DIF is unchanged, except all the group terms are included in model 2.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Crane, P.K., Cetin, K., Cook, K.F. et al. Differential item functioning impact in a modified version of the Roland–Morris Disability Questionnaire. Qual Life Res 16, 981–990 (2007). https://doi.org/10.1007/s11136-007-9200-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11136-007-9200-x

Keywords

Navigation