Differential item functioning impact in a modified version of the Roland–Morris Disability Questionnaire

Crane, Paul K.; Cetin, Karynsa; Cook, Karon F.; Johnson, Kurt; Deyo, Richard; Amtmann, Dagmar

doi:10.1007/s11136-007-9200-x

Differential item functioning impact in a modified version of the Roland–Morris Disability Questionnaire

Original Paper
Published: 19 April 2007

Volume 16, pages 981–990, (2007)
Cite this article

Quality of Life Research Aims and scope Submit manuscript

Paul K. Crane¹,
Karynsa Cetin²,
Karon F. Cook²,
Kurt Johnson²,
Richard Deyo¹ &
…
Dagmar Amtmann²

310 Accesses
19 Citations
Explore all metrics

Abstract

Objective

To evaluate a modified version of the Roland–Morris Disability Questionnaire for differential item functioning (DIF) related to several covariates.

Background

DIF occurs in an item when, after controlling for the underlying trait measured by the test, the probability of endorsing the item varies across groups.

Methods

Secondary data analysis of two studies of participants with back pain (total n = 875). We used a hybrid item response theory/ logistic regression approach for detecting DIF. We obtained scores that accounted for DIF. We evaluated the impact of DIF on individual and group scores, and compared scores that ignored or accounted for DIF in terms of the strength of association with SF-36 subscale scores.

Results

DIF was found in 18/23 items. Salient scale-level differential functioning was found related to age, education, and employment. Overall 24 participants (3%) had salient scale-level differential functioning. Mean scores across demographic groups differed minimally when accounting for DIF. The strength of association of scores with SF-36 scores was similar for scores that ignored and scores that accounted for DIF.

Conclusions

The modified version of the Roland–Morris Disability Questionnaire appears to have largely negligible DIF related to the covariates assessed here.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Differential item functioning of the PROMIS physical function, pain interference, and pain behavior item banks across patients with different musculoskeletal disorders and persons from the general population

Article 02 January 2019

The Roland–Morris Disability Questionnaire: one or more dimensions?

Article 24 November 2016

Demographic and functional differences among social security disability claimants

Article 21 February 2021

Abbreviations

2PL:: 2-parameter logistic model. In this parametric item response theory model, two parameters are modeled for each item: item difficulty and item discrimination
DIF:: Differential item functioning. DIF occurs when an item has different statistical properties in different groups when controlling for the underlying trait or ability measured by the test
IRT:: Item response theory. This is a technique for analyzing item-level test data based on the premise that item responses are a function of the relationship between an underlying latent trait and characteristics of the item
SIP:: Sickness Impact Profile. This is a patient-reported outcome measure of the impact of illnesses
SLIP:: Seattle Lumbar Imaging Project, one of the two datasets of low back pain subjects analyzed in this study

References

Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. Thousand Oaks: Sage.
Google Scholar
Holland, P. W., & Wainer, H. (Eds.) (1993). Differential item functioning. Hillsdale, NJ: Erlbaum.
Millsap, R. E., & Everson, H. T. (1993). Methodology review: Statistical approaches for assessing measurement bias. Applied Psychological Measurement, 17, 297–334.
Article Google Scholar
Roland, M., & Morris, R. (1983). A study of the natural history of back pain. Part I: Development of a reliable and sensitive measure of disability in low-back pain. Spine, 8, 141–144.
Article PubMed CAS Google Scholar
Bergner, M., Bobbitt, R. A., Carter, W. B., & Gilson, B. S. (1981). The sickness impact profile: Development and final revision of a health status measure. Medical Care, 19, 787–805.
Article PubMed CAS Google Scholar
Patrick, D. L., Deyo, R. A., Atlas, S. J., Singer, D. E., Chapin, A., & Keller, R. B. (1995). Assessing health-related quality of life in patients with sciatica. Spine, 20, 1899–1908; discussion 1909.
Article PubMed CAS Google Scholar
Kucukdeveci, A. A., Tennant, A., Elhan, A. H., & Niyazoglu, H. (2001). Validation of the Turkish version of the Roland–Morris disability questionnaire for use in low back pain. Spine, 26, 2738–2743.
Article PubMed CAS Google Scholar
Pietrobon, R., Taylor, M., Guller, U., Higgins, L. D., Jacobs, D. O., & Carey, T. (2004). Predicting gender differences as latent variables: Summed scores, and individual item responses: A methods case study. Health and Quality of Life Outcomes, 2, 59.
Article PubMed Google Scholar
Deyo, R. A., Mirza, S. K., Heagerty, P. J., Turner, J. A., & Martin, B. I. (2005). A prospective cohort study of surgical treatment for back pain with degenerated discs; study protocol. BMC Musculoskeletal Disorder, 6, 24.
Article Google Scholar
Jarvik, J. G., Hollingworth, W., Martin, B., Emerson, S. S., Gray, D. T, Overman, S., Robinson, D., Staiger, T., Wessbecher, F., Sullivan, S. D., Kreuter, W., & Deyo, R. A. (2003). Rapid magnetic resonance imaging vs radiographs for patients with low back pain: A randomized controlled trial. JAMA, 289, 2810–2818.
Article PubMed Google Scholar
Ware, J. E. Jr. (2000). SF-36 health survey update. Spine, 25, 3130–3139.
Article PubMed Google Scholar
StataCorp (2003). Stata statistical software: Release 8.0. College Station, TX: Stata Corporation.
Muraki, E., & Bock, D. (2003). PARSCALE for Windows version 4.1. Chicago: SSI.
Google Scholar
Crane, P. K., Hart, D. L., Gibbons, L. E., & Cook, K. F. (2006). A 37-item shoulder functional status item pool had negligible differential item functioning. Journal of Clinical Epidemiology, 59, 478–484.
Article PubMed Google Scholar
Crane, P. K., Gibbons, L. E., Jolley, L., & van Belle, G. (2006). Differential item functioning analysis with ordinal logistic regression techniques: DIFdetect and difwithpar. Medical Care, 44, S115–S123.
Article PubMed Google Scholar
Crane, P. K., Gibbons, L. E., Narasimhalu, K., Lai, J. S., & Cella, D. (2007). Rapid detection of differential item functioning in assessments of health-related quality of life: The functional assessment of cancer therapy. Quality of Life Research, 16, 101–114.
Article PubMed Google Scholar
Crane, P. K., van Belle, G., & Larson, E. B. (2004). Test bias in a cognitive test: differential item functioning in the CASI. Statistics in Medicine, 23, 241–256.
Article PubMed Google Scholar
Crane, P. K., Gibbons, L. E., Ocepek-Welikson, K., Cook, K., Cella, D., Narasimhalu, K., Hays, R., & Teresi, J. (2007). A Comparison of three sets of criteria for determining the presence of differential item functioning using ordinal logistic regression. Quality of Life Research (in press).
Crane, P. K. (2006). Commentary on comparing translations of the EORTC QLQ-C30 using differential item functioning analyses. Quality of Life Research, 15, 1117–1118.
Article Google Scholar
Perkins, A. J., Stump, T. E., Monahan, P. O., & McHorney, C. A. (2006). Assessment of differential item functioning for demographic comparisons in the MOS SF-36 health survey. Quality of Life Research, 15, 331–348.
Article PubMed Google Scholar
Maldonado, G., & Greenland, S. (1993). Simulation study of confounder-selection strategies. American Journal of Epidemiology, 138, 923–936.
PubMed CAS Google Scholar

Download references

Acknowledgements

Data were collected under the auspices of grants P60 AR48093 from the National Institutes of Health, National Institute for Arthritis, Musculoskeletal, and Skin Diseases, and HS-09499 from the Agency for Healthcare Research and Quality. Data were analyzed under the auspices of U01AR52171-01 from the National Institutes of Health, National Institute of Arthritis and Musculoskeletal and Skin Diseases. Data collection and analyses were reviewed by the University of Washington’s Institutional Review Board.

Author information

Authors and Affiliations

Department of Medicine, University of Washington, Harborview Medical Center, 325 Ninth Avenue, Box 359780, Seattle, WA, 98104, USA
Paul K. Crane & Richard Deyo
Department of Rehabilitation Medicine, University of Washington, Seattle, WA, 98195, USA
Karynsa Cetin, Karon F. Cook, Kurt Johnson & Dagmar Amtmann

Authors

Paul K. Crane
View author publications
You can also search for this author in PubMed Google Scholar
Karynsa Cetin
View author publications
You can also search for this author in PubMed Google Scholar
Karon F. Cook
View author publications
You can also search for this author in PubMed Google Scholar
Kurt Johnson
View author publications
You can also search for this author in PubMed Google Scholar
Richard Deyo
View author publications
You can also search for this author in PubMed Google Scholar
Dagmar Amtmann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Paul K. Crane.

Appendix 1

Detailed methods of DIF detection

We have developed an approach to DIF assessment that combines ordinal logistic regression and IRT. Details of this approach are outlined in earlier publications [14, 17]. The modified version of the Roland-Morris Disability Questionnaire contains only dichotomous items, so logistic regression was used for all DIF analyses.

We use IRT scores to initially evaluate items for DIF. We examine three models for each item for each demographic category (labeled here as “group”) selected for analysis:

$$ {\hbox{Logit}}\;p(Y = 1|\theta ,\;{\hbox{group}}) = \beta _1 *\theta + \beta _2 *{\hbox{group}} + \beta _3 *\theta *{\hbox{group}} $$

(model 1)

$$ {\hbox{Logit}}\;p(Y = 1|\theta ,\;{\hbox{group}}) = \beta _1 *\theta + \beta _2 *{\hbox{group}} $$

(model 2)

$$ {\hbox{Logit}}\;p(Y = 1|\theta ) = \beta _1 *\theta . $$

(model 3)

In these equations, p(Y = 1) is the probability of endorsing an item, θ is the IRT estimate of back pain disability, and group is the demographic category.

Two types of DIF are identified in the literature. In items with non-uniform DIF, demographic interference between ability level and item responses differs at varying levels of back pain disability. In items with uniform DIF, this interference is the same across all levels of back pain disability.

To detect non-uniform DIF, we compare the log likelihoods of models 1 and 2 using a χ² test, α = 0.05. To detect uniform DIF, we determine the relative difference between the parameters associated with θ (β₁ from models 2 and 3) using the formula $|(\beta _{1({\rm{model}}\;2)} - \beta _{1({\rm{model}}\;3)} )/\beta _{1({\rm{model}}\;3)} |.$ If the relative difference is large, group membership interferes with the expected relationship between back pain disability and item responses. There is little guidance from the literature regarding how large the relative difference should be. A simulation by Maldonado and Greenland on confounder selection strategies used a 10% change criterion in a very different context [21]. We have previously used 10% [17] and 5% [14] change criteria. In this data set, we compared results for each covariate using a 5 and 10% criterion. While there was little difference between results using a 5 and 10% criterion, we chose to show the results from the more sensitive 5% criterion.

We have developed an approach to generate scores that account for DIF [14]. When DIF is found, we create new datasets as summarized in Fig. 1. Items without DIF have item parameters estimated from the whole sample, while items with DIF have demographic-specific item parameters estimated.

Spurious false-positive and false-negative results may occur if the back pain disability score (θ) used for DIF detection includes many items with DIF [2]. We therefore use an iterative approach for each covariate. We generate IRT scores that account for DIF, and use these as the back pain disability score to detect DIF. If different items are identified with DIF, we repeat the process outlined in Fig. 1, modifying the assignments of items based on the most recent round of DIF detection. If the same items are identified with DIF on successive rounds, we are satisfied that we identified items with DIF (as opposed to spurious findings).

We have modified this approach for demographic categories with more than two groups (such as education in this data set). Indicator terms for each group are generated, and interaction terms are generated by multiplying θ by the indicator terms. All indicator terms and interaction terms are included in model 1; all indicator terms are included in model 2; and only the ability term θ is included in model 3. For the determination of non-uniform DIF, we compared the likelihoods of models 1 and 2 to a χ² distribution with degrees of freedom equal to the number of groups minus 1. The determination of uniform DIF is unchanged, except all the group terms are included in model 2.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Crane, P.K., Cetin, K., Cook, K.F. et al. Differential item functioning impact in a modified version of the Roland–Morris Disability Questionnaire. Qual Life Res 16, 981–990 (2007). https://doi.org/10.1007/s11136-007-9200-x

Download citation

Received: 13 December 2006
Accepted: 10 February 2007
Published: 19 April 2007
Issue Date: August 2007
DOI: https://doi.org/10.1007/s11136-007-9200-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Differential item functioning impact in a modified version of the Roland–Morris Disability Questionnaire