IRT health outcomes data analysis project: an overview and summary

Cook, Karon F.; Teal, Cayla R.; Bjorner, Jakob B.; Cella, David; Chang, Chih-Hung; Crane, Paul K.; Gibbons, Laura E.; Hays, Ron D.; McHorney, Colleen A.; Ocepek-Welikson, Katja; Raczek, Anastasia E.; Teresi, Jeanne A.; Reeve, Bryce B.

doi:10.1007/s11136-007-9177-5

IRT health outcomes data analysis project: an overview and summary

Original Paper
Published: 10 March 2007

Volume 16, pages 121–132, (2007)
Cite this article

Quality of Life Research Aims and scope Submit manuscript

Karon F. Cook¹,
Cayla R. Teal²,
Jakob B. Bjorner³,
David Cella⁴,
Chih-Hung Chang⁵,
Paul K. Crane⁶,
Laura E. Gibbons⁶,
Ron D. Hays⁷,
Colleen A. McHorney⁸,
Katja Ocepek-Welikson^9,10,
Anastasia E. Raczek³,
Jeanne A. Teresi^10,11 &
…
Bryce B. Reeve¹²

732 Accesses
40 Citations
Explore all metrics

Abstract

Background

In June 2004, the National Cancer Institute and the Drug Information Association co-sponsored the conference, “Improving the Measurement of Health Outcomes through the Applications of Item Response Theory (IRT) Modeling: Exploration of Item Banks and Computer-Adaptive Assessment.” A component of the conference was presentation of a psychometric and content analysis of a secondary dataset.

Objectives

A thorough psychometric and content analysis was conducted of two primary domains within a cancer health-related quality of life (HRQOL) dataset.

Research design

HRQOL scales were evaluated using factor analysis for categorical data, IRT modeling, and differential item functioning analyses. In addition, computerized adaptive administration of HRQOL item banks was simulated, and various IRT models were applied and compared.

Subjects

The original data were collected as part of the NCI-funded Quality of Life Evaluation in Oncology (Q-Score) Project. A total of 1,714 patients with cancer or HIV/AIDS were recruited from 5 clinical sites.

Measures

Items from 4 HRQOL instruments were evaluated: Cancer Rehabilitation Evaluation System–Short Form, European Organization for Research and Treatment of Cancer Quality of Life Questionnaire, Functional Assessment of Cancer Therapy and Medical Outcomes Study Short-Form Health Survey.

Results and conclusions

Four lessons learned from the project are discussed: the importance of good developmental item banks, the ambiguity of model fit results, the limits of our knowledge regarding the practical implications of model misfit, and the importance in the measurement of HRQOL of construct definition. With respect to these lessons, areas for future research are suggested. The feasibility of developing item banks for broad definitions of health is discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Using structural equation modeling to detect response shifts and true change in discrete variables: an application to the items of the SF-36

Article Open access 22 December 2015

Item response models for the longitudinal analysis of health-related quality of life in cancer clinical trials

Article Open access 26 September 2017

Modern Psychometric Approaches to Analysis of Scales for Health-Related Quality of Life

References

Chang, C.-H., & Cella, D. (1997). Equating health-related quality of life instruments in applied oncology settings. Physical Medicine and Rehabilitation: States of the Art Reviews, 11, 397–406.
Google Scholar
Ganz, P. A., Schag, C. A., Lee, J. J., & Sim, M. S. (1992). The CARES: A generic measure of health-related quality of life for patients with cancer. Quality of Life Research, 1, 19–29.
Article PubMed CAS Google Scholar
Schag, C. A., Ganz, P. A., & Heinrich, R. L. (1991). CAncer Rehabilitation Evaluation System-short form (CARES-SF). A cancer specific rehabilitation and quality of life instrument. Cancer, 68, 1406–1413.
Article PubMed CAS Google Scholar
Aaronson, N. K., Ahmedzai, S., Bergman, B., Bullinger, M., Cull, A., Duez, N. J., Filiberti, A., Flechtner, H., Fleishman, S. B., & de Haes, J. C. (1993). The European organization for research and treatment of cancer QLQ-C30: A quality-of-life instrument for use in international clinical trials in oncology. Journal of the National Cancer Institute, 85, 365–376.
Article PubMed CAS Google Scholar
Cella, D. F., & Bonomi, A. E. (1995). Measuring quality of life: 1995 update. Oncology (Williston Park), 9, 47–60.
CAS Google Scholar
Cella, D. F., Tulsky, D. S., Gray, G., Sarafian, B., Linn, E., Bonomi, A., Silberman, M., Yellen, S. B., Winicour, P., Brannon, J., & et al. (1993). The Functional Assessment of Cancer Therapy Scale: Development and validation of the general measure. Journal of Clinical Oncology, 11, 570–579.
PubMed CAS Google Scholar
Hays, R. D., Sherbourne, C. D., & Mazel, R. M. (1993). The RAND 36-Item Health Survey 1.0. Health Economics, 2, 217–227.
Article PubMed CAS Google Scholar
Ware, J. E., Jr., & Sherbourne, C. D. (1992). The MOS 36-item short-form health survey (SF-36). I. Conceptual framework and item selection. Medical Care, 30, 473–483.
Article PubMed Google Scholar
Nandakumar, R. (2004). Traditional dimensionality versus essential dimensionality. Journal of Educational Measurement, 28, 99–117.
Article Google Scholar
Smith, E. V., Jr. (2002). Detecting and evaluating the impact of multidimensionality using item fit statistics and principal component analysis of residuals. Journal of Applied Measurement, 3, 205–231.
PubMed Google Scholar
Muthen, B. O., & Muthen, L. K. (2001). Mplus User’s Guide. Version 2. Los Angeles, CA: Muthen & Muthen.
Google Scholar
Hu, L., & Bentler, P. M. (1995). Evaluating model fit. In: R. H. Hoyle (Ed.), Structural equation modeling: concepts, issues and applications (pp. 76–79). Thousand Oaks, CA: Sage Publications.
Bentler, P. (1990). Comparative fit indices in structural models. Psychological Bulletin, 107, 238–246.
Article PubMed CAS Google Scholar
Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing model fit. In: K. A. Bollen, & J. S. Long (Eds.), Testing structural equation models. Newbury Park, CA: Sage Publications.
Kline, R. B. (1998). Principles and practice of structural equation modeling. New York, NY: The Guilford Press.
Google Scholar
McDonald, R. P. (1999). Test theory: A unified treatment. Mahway, NJ: Lawrence Earlbaum.
Google Scholar
Hu, L. T., & Bentler, P. M. (1998). Fit indices in covariance structure modeling: Sensitivity to underparameterized model misspecification. Psychological Methods, 3, 424–453.
Article Google Scholar
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149–173.
Article Google Scholar
Muraki, E. (1992). A generalized partial credit model: Application of an EM-algorithm. Applied Psychological Measurement, 16, 159.
Article Google Scholar
Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph Supplement, No. 17.
Muraki, E., & Bock, R. D. (1997). PARSCALE 3: IRT based test scoring and item analysis for graded items and rating scales. Chicago, IL: Scientific Software International, Inc.
Google Scholar
Linacre, J. M. (2002). WINSTEPS: Rasch-model computer program. Version 3.36. Chicago: MESA Press.
Google Scholar
Verhelst, N. D., & Glas, C. A. W. (1995). The one parameter-logistic model. New York: Springer-Verlag.
Google Scholar
Stone, C. A., & Zhang, B. (2003). Assessing goodness of fit of item response theory models: A comparison of traditional and alternative procedures. Journal of Educational Measurement, 4, 331–352.
Article Google Scholar
Stone, C. A. (2000). Monte Carlo based null distribution for an alternative goodness-of-fit test statistic in IRT models. Journal of Educational Measurement, 37(1), 58–75.
Article Google Scholar
Stone, C. A. (2003). Empirical power and type I error rates for an IRT fit statistic that considers the precision of ability estimates. Educational and Psychological Measurement, 63, 566–586.
Article Google Scholar
Glas, C. A. W. (1999). Modification indices for the 2-PL and the nominal response model. Psychometrika, 64, 273–294.
Article Google Scholar
Orlando, M., & Thissen, D. (2000). Likelihood-based item-fit indices for dichotomous item response theory models. Applied Psychological Measurement, 24, 50–64.
Article Google Scholar
Wright, B. D., & Masters, G. N. (1982). Rating scale analysis. Chicago: Mesa Press.
Google Scholar
Wright, B. D. (1994). Reasonable mean-square fit. Rasch Measurement Transactions, 8, 370.
Google Scholar
Smith, R. M., & Suh, K. K. (2003). Rasch fit statistics as a test of the invariance of item parameter estimates. Journal of Applied Measurement, 4, 153–163.
PubMed Google Scholar
Groenvold, M., Bjorner, J. B., Klee, M. C., & Kreiner, S. (1995). Test for item bias in a quality of life questionnaire. Journal of Clinical Epidemiology, 48, 805–816.
Article PubMed CAS Google Scholar
Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361–370.
Article Google Scholar
Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottawa, Canada: Directorate of Human Resources Research and Evaluation, Department of National Defense.
Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. Thousand Oaks, CA: Sage Publishers.
Google Scholar
Thissen, D. (1991). MULTILOG TM User’s Guide multiple, categorical item analysis and test scoring using item response theory. Chicago, IL: Scientific Software Inc.
Google Scholar
Thissen, D. (2001). IRTLRDIF: Software for the computation of the statistics involved in item response theory likelihood-ratio tests for differential item functioning. Version 2.0b.
Collins, W. C., Raju, N. S., & Edwards, J. E. (2000). Assessing differential functioning in a satisfaction scale. Journal of Applied Measurement, 85, 451–461.
CAS Google Scholar
Raju, N. S., van der Linden, W. J., & Fleer, P. F. (1995). IRT-based internal measures of differential functioning of items and tests. Applied Psychological Measurement, 19, 353–368.
Article Google Scholar
STATA. (2004). College Station, TX: StataCorp LP
Crane, P. K., Jolley, L., & van Belle, G. (2003). DIFdetect. Seattle, WA: University of Sashington.
Google Scholar
Box, G., & Draper, N. (1987). Empirical model building and response surfaces. New York: John Wiley and Sons.
Google Scholar
Stewart, A. L., & Ware, J. E., Jr. (1992). Measuring functioning and well-being: The Medical Outcomes Study Approach. London: Duke University Press.
Google Scholar
Gardner, W., Kelleher, K. J., & Pajer, K. A. (2002). Multidimensional adaptive testing for mental health problems in primary care. Medical Care, 40, 812–823.
Article PubMed Google Scholar
Petersen, M. A., Groenvold, M., Aaronson, N., Fayers, P., Sprangers, M., & Bjorner, J. B. (2006). Multidimensional computerized adaptive testing of the EORTC QLQ-C30: Basic developments and evaluations. Quality of Life Research, 15, 315–329.
Article PubMed Google Scholar

Download references

Acknowledgments

Study supported by NIH/NCI (Y1-PC-3028-01) and NIH R01 (CA60068). Additional salary support provided by National Institute of Arthritis and Musculoskeletal and Skin Diseases (1U01AR52171-01).

Author information

Authors and Affiliations

Department of Rehabilitation Medicine, University of Washington School of Medicine, Seattle, Washington, USA
Karon F. Cook
Department of Medicine, Houston Center for Quality of Care & Utilization Studies, Veterans Affairs Health Services Research & Development Center of Excellence and Section of Health Services Research, Baylor College of Medicine, Houston, TX, USA
Cayla R. Teal
QualityMetric Incorporated, Lincoln, RI and Health Assessment Lab, Waltham, MA, USA
Jakob B. Bjorner & Anastasia E. Raczek
Center on Outcomes Research and Education, Evanston Northwestern Healthcare, Northwestern University, Feinberg School of Medicine, Chicago, IL, USA
David Cella
Buehler Center on Aging, Northwestern University, Feinberg School of Medicine, Chicago, IL, USA
Chih-Hung Chang
Division of General Internal Medicine, University of Washington School of Medicine, WA, Seattle, USA
Paul K. Crane & Laura E. Gibbons
Department of Medicine, and RAND Health Program, University of California, Los Angeles, CA, USA
Ron D. Hays
Outcomes Research, Merck & Co., Inc., West Point, PA, USA
Colleen A. McHorney
The New York Quality Improvement Organization, IPRO, Lake Success, NY, USA
Katja Ocepek-Welikson
New York State Psychiatric Institute and Research Division, Hebrew Home, Riverdale, NY, USA
Katja Ocepek-Welikson & Jeanne A. Teresi
Faculty of Medicine, Columbia University Stroud Center, Riverdale, NY, USA
Jeanne A. Teresi
Outcomes Research Branch, National Cancer Institute, Bethesda, MD, USA
Bryce B. Reeve

Authors

Karon F. Cook
View author publications
You can also search for this author in PubMed Google Scholar
Cayla R. Teal
View author publications
You can also search for this author in PubMed Google Scholar
Jakob B. Bjorner
View author publications
You can also search for this author in PubMed Google Scholar
David Cella
View author publications
You can also search for this author in PubMed Google Scholar
Chih-Hung Chang
View author publications
You can also search for this author in PubMed Google Scholar
Paul K. Crane
View author publications
You can also search for this author in PubMed Google Scholar
Laura E. Gibbons
View author publications
You can also search for this author in PubMed Google Scholar
Ron D. Hays
View author publications
You can also search for this author in PubMed Google Scholar
Colleen A. McHorney
View author publications
You can also search for this author in PubMed Google Scholar
Katja Ocepek-Welikson
View author publications
You can also search for this author in PubMed Google Scholar
Anastasia E. Raczek
View author publications
You can also search for this author in PubMed Google Scholar
Jeanne A. Teresi
View author publications
You can also search for this author in PubMed Google Scholar
Bryce B. Reeve
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Karon F. Cook.

Appendix Items included in factor analytic assessment of item bank(s) and unidimensionality

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cook, K.F., Teal, C.R., Bjorner, J.B. et al. IRT health outcomes data analysis project: an overview and summary. Qual Life Res 16 (Suppl 1), 121–132 (2007). https://doi.org/10.1007/s11136-007-9177-5

Download citation

Received: 25 August 2006
Accepted: 11 January 2007
Published: 10 March 2007
Issue Date: August 2007
DOI: https://doi.org/10.1007/s11136-007-9177-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

IRT health outcomes data analysis project: an overview and summary