Quality of Life Research

, Volume 16, Supplement 1, pp 121–132 | Cite as

IRT health outcomes data analysis project: an overview and summary

  • Karon F. Cook
  • Cayla R. Teal
  • Jakob B. Bjorner
  • David Cella
  • Chih-Hung Chang
  • Paul K. Crane
  • Laura E. Gibbons
  • Ron D. Hays
  • Colleen A. McHorney
  • Katja Ocepek-Welikson
  • Anastasia E. Raczek
  • Jeanne A. Teresi
  • Bryce B. Reeve
Original Paper



In June 2004, the National Cancer Institute and the Drug Information Association co-sponsored the conference, “Improving the Measurement of Health Outcomes through the Applications of Item Response Theory (IRT) Modeling: Exploration of Item Banks and Computer-Adaptive Assessment.” A component of the conference was presentation of a psychometric and content analysis of a secondary dataset.


A thorough psychometric and content analysis was conducted of two primary domains within a cancer health-related quality of life (HRQOL) dataset.

Research design

HRQOL scales were evaluated using factor analysis for categorical data, IRT modeling, and differential item functioning analyses. In addition, computerized adaptive administration of HRQOL item banks was simulated, and various IRT models were applied and compared.


The original data were collected as part of the NCI-funded Quality of Life Evaluation in Oncology (Q-Score) Project. A total of 1,714 patients with cancer or HIV/AIDS were recruited from 5 clinical sites.


Items from 4 HRQOL instruments were evaluated: Cancer Rehabilitation Evaluation System–Short Form, European Organization for Research and Treatment of Cancer Quality of Life Questionnaire, Functional Assessment of Cancer Therapy and Medical Outcomes Study Short-Form Health Survey.

Results and conclusions

Four lessons learned from the project are discussed: the importance of good developmental item banks, the ambiguity of model fit results, the limits of our knowledge regarding the practical implications of model misfit, and the importance in the measurement of HRQOL of construct definition. With respect to these lessons, areas for future research are suggested. The feasibility of developing item banks for broad definitions of health is discussed.


Quality of Life Health Status Measurement Outcomes 



Study supported by NIH/NCI (Y1-PC-3028-01) and NIH R01 (CA60068). Additional salary support provided by National Institute of Arthritis and Musculoskeletal and Skin Diseases (1U01AR52171-01).


  1. 1.
    Chang, C.-H., & Cella, D. (1997). Equating health-related quality of life instruments in applied oncology settings. Physical Medicine and Rehabilitation: States of the Art Reviews, 11, 397–406.Google Scholar
  2. 2.
    Ganz, P. A., Schag, C. A., Lee, J. J., & Sim, M. S. (1992). The CARES: A generic measure of health-related quality of life for patients with cancer. Quality of Life Research, 1, 19–29.PubMedCrossRefGoogle Scholar
  3. 3.
    Schag, C. A., Ganz, P. A., & Heinrich, R. L. (1991). CAncer Rehabilitation Evaluation System-short form (CARES-SF). A cancer specific rehabilitation and quality of life instrument. Cancer, 68, 1406–1413.PubMedCrossRefGoogle Scholar
  4. 4.
    Aaronson, N. K., Ahmedzai, S., Bergman, B., Bullinger, M., Cull, A., Duez, N. J., Filiberti, A., Flechtner, H., Fleishman, S. B., & de Haes, J. C. (1993). The European organization for research and treatment of cancer QLQ-C30: A quality-of-life instrument for use in international clinical trials in oncology. Journal of the National Cancer Institute, 85, 365–376.PubMedCrossRefGoogle Scholar
  5. 5.
    Cella, D. F., & Bonomi, A. E. (1995). Measuring quality of life: 1995 update. Oncology (Williston Park), 9, 47–60.Google Scholar
  6. 6.
    Cella, D. F., Tulsky, D. S., Gray, G., Sarafian, B., Linn, E., Bonomi, A., Silberman, M., Yellen, S. B., Winicour, P., Brannon, J., & et al. (1993). The Functional Assessment of Cancer Therapy Scale: Development and validation of the general measure. Journal of Clinical Oncology, 11, 570–579.PubMedGoogle Scholar
  7. 7.
    Hays, R. D., Sherbourne, C. D., & Mazel, R. M. (1993). The RAND 36-Item Health Survey 1.0. Health Economics, 2, 217–227.PubMedCrossRefGoogle Scholar
  8. 8.
    Ware, J. E., Jr., & Sherbourne, C. D. (1992). The MOS 36-item short-form health survey (SF-36). I. Conceptual framework and item selection. Medical Care, 30, 473–483.PubMedCrossRefGoogle Scholar
  9. 9.
    Nandakumar, R. (2004). Traditional dimensionality versus essential dimensionality. Journal of Educational Measurement, 28, 99–117.CrossRefGoogle Scholar
  10. 10.
    Smith, E. V., Jr. (2002). Detecting and evaluating the impact of multidimensionality using item fit statistics and principal component analysis of residuals. Journal of Applied Measurement, 3, 205–231.PubMedGoogle Scholar
  11. 11.
    Muthen, B. O., & Muthen, L. K. (2001). Mplus User’s Guide. Version 2. Los Angeles, CA: Muthen & Muthen.Google Scholar
  12. 12.
    Hu, L., & Bentler, P. M. (1995). Evaluating model fit. In: R. H. Hoyle (Ed.), Structural equation modeling: concepts, issues and applications (pp. 76–79). Thousand Oaks, CA: Sage Publications.Google Scholar
  13. 13.
    Bentler, P. (1990). Comparative fit indices in structural models. Psychological Bulletin, 107, 238–246.PubMedCrossRefGoogle Scholar
  14. 14.
    Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing model fit. In: K. A. Bollen, & J. S. Long (Eds.), Testing structural equation models. Newbury Park, CA: Sage Publications.Google Scholar
  15. 15.
    Kline, R. B. (1998). Principles and practice of structural equation modeling. New York, NY: The Guilford Press.Google Scholar
  16. 16.
    McDonald, R. P. (1999). Test theory: A unified treatment. Mahway, NJ: Lawrence Earlbaum.Google Scholar
  17. 17.
    Hu, L. T., & Bentler, P. M. (1998). Fit indices in covariance structure modeling: Sensitivity to underparameterized model misspecification. Psychological Methods, 3, 424–453.CrossRefGoogle Scholar
  18. 18.
    Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149–173.CrossRefGoogle Scholar
  19. 19.
    Muraki, E. (1992). A generalized partial credit model: Application of an EM-algorithm. Applied Psychological Measurement, 16, 159.CrossRefGoogle Scholar
  20. 20.
    Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph Supplement, No. 17.Google Scholar
  21. 21.
    Muraki, E., & Bock, R. D. (1997). PARSCALE 3: IRT based test scoring and item analysis for graded items and rating scales. Chicago, IL: Scientific Software International, Inc.Google Scholar
  22. 22.
    Linacre, J. M. (2002). WINSTEPS: Rasch-model computer program. Version 3.36. Chicago: MESA Press.Google Scholar
  23. 23.
    Verhelst, N. D., & Glas, C. A. W. (1995). The one parameter-logistic model. New York: Springer-Verlag.Google Scholar
  24. 24.
    Stone, C. A., & Zhang, B. (2003). Assessing goodness of fit of item response theory models: A comparison of traditional and alternative procedures. Journal of Educational Measurement, 4, 331–352.CrossRefGoogle Scholar
  25. 25.
    Stone, C. A. (2000). Monte Carlo based null distribution for an alternative goodness-of-fit test statistic in IRT models. Journal of Educational Measurement, 37(1), 58–75.CrossRefGoogle Scholar
  26. 26.
    Stone, C. A. (2003). Empirical power and type I error rates for an IRT fit statistic that considers the precision of ability estimates. Educational and Psychological Measurement, 63, 566–586.CrossRefGoogle Scholar
  27. 27.
    Glas, C. A. W. (1999). Modification indices for the 2-PL and the nominal response model. Psychometrika, 64, 273–294.CrossRefGoogle Scholar
  28. 28.
    Orlando, M., & Thissen, D. (2000). Likelihood-based item-fit indices for dichotomous item response theory models. Applied Psychological Measurement, 24, 50–64.CrossRefGoogle Scholar
  29. 29.
    Wright, B. D., & Masters, G. N. (1982). Rating scale analysis. Chicago: Mesa Press.Google Scholar
  30. 30.
    Wright, B. D. (1994). Reasonable mean-square fit. Rasch Measurement Transactions, 8, 370.Google Scholar
  31. 31.
    Smith, R. M., & Suh, K. K. (2003). Rasch fit statistics as a test of the invariance of item parameter estimates. Journal of Applied Measurement, 4, 153–163.PubMedGoogle Scholar
  32. 32.
    Groenvold, M., Bjorner, J. B., Klee, M. C., & Kreiner, S. (1995). Test for item bias in a quality of life questionnaire. Journal of Clinical Epidemiology, 48, 805–816.PubMedCrossRefGoogle Scholar
  33. 33.
    Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361–370.CrossRefGoogle Scholar
  34. 34.
    Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottawa, Canada: Directorate of Human Resources Research and Evaluation, Department of National Defense.Google Scholar
  35. 35.
    Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. Thousand Oaks, CA: Sage Publishers.Google Scholar
  36. 36.
    Thissen, D. (1991). MULTILOG TM User’s Guide multiple, categorical item analysis and test scoring using item response theory. Chicago, IL: Scientific Software Inc.Google Scholar
  37. 37.
    Thissen, D. (2001). IRTLRDIF: Software for the computation of the statistics involved in item response theory likelihood-ratio tests for differential item functioning. Version 2.0b.Google Scholar
  38. 38.
    Collins, W. C., Raju, N. S., & Edwards, J. E. (2000). Assessing differential functioning in a satisfaction scale. Journal of Applied Measurement, 85, 451–461.Google Scholar
  39. 39.
    Raju, N. S., van der Linden, W. J., & Fleer, P. F. (1995). IRT-based internal measures of differential functioning of items and tests. Applied Psychological Measurement, 19, 353–368.CrossRefGoogle Scholar
  40. 40.
    STATA. (2004). College Station, TX: StataCorp LPGoogle Scholar
  41. 41.
    Crane, P. K., Jolley, L., & van Belle, G. (2003). DIFdetect. Seattle, WA: University of Sashington.Google Scholar
  42. 42.
    Box, G., & Draper, N. (1987). Empirical model building and response surfaces. New York: John Wiley and Sons.Google Scholar
  43. 43.
    Stewart, A. L., & Ware, J. E., Jr. (1992). Measuring functioning and well-being: The Medical Outcomes Study Approach. London: Duke University Press.Google Scholar
  44. 44.
    Gardner, W., Kelleher, K. J., & Pajer, K. A. (2002). Multidimensional adaptive testing for mental health problems in primary care. Medical Care, 40, 812–823.PubMedCrossRefGoogle Scholar
  45. 45.
    Petersen, M. A., Groenvold, M., Aaronson, N., Fayers, P., Sprangers, M., & Bjorner, J. B. (2006). Multidimensional computerized adaptive testing of the EORTC QLQ-C30: Basic developments and evaluations. Quality of Life Research, 15, 315–329.PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media B.V. 2007

Authors and Affiliations

  • Karon F. Cook
    • 1
  • Cayla R. Teal
    • 2
  • Jakob B. Bjorner
    • 3
  • David Cella
    • 4
  • Chih-Hung Chang
    • 5
  • Paul K. Crane
    • 6
  • Laura E. Gibbons
    • 6
  • Ron D. Hays
    • 7
  • Colleen A. McHorney
    • 8
  • Katja Ocepek-Welikson
    • 9
    • 10
  • Anastasia E. Raczek
    • 3
  • Jeanne A. Teresi
    • 10
    • 11
  • Bryce B. Reeve
    • 12
  1. 1.Department of Rehabilitation MedicineUniversity of Washington School of MedicineWashingtonUSA
  2. 2.Department of Medicine, Houston Center for Quality of Care & Utilization Studies, Veterans Affairs Health Services Research & Development Center of Excellence and Section of Health Services ResearchBaylor College of MedicineHoustonUSA
  3. 3.QualityMetric Incorporated, LincolnRI and Health Assessment LabWalthamUSA
  4. 4.Center on Outcomes Research and Education, Evanston Northwestern HealthcareNorthwestern University, Feinberg School of MedicineChicagoUSA
  5. 5.Buehler Center on AgingNorthwestern University, Feinberg School of MedicineChicagoUSA
  6. 6.Division of General Internal MedicineUniversity of Washington School of MedicineSeattleUSA
  7. 7.Department of Medicine, and RAND Health ProgramUniversity of CaliforniaLos AngelesUSA
  8. 8.Outcomes ResearchMerck & Co., Inc.West PointUSA
  9. 9.The New York Quality Improvement OrganizationIPROLake SuccessUSA
  10. 10.New York State Psychiatric Institute and Research DivisionHebrew HomeRiverdaleUSA
  11. 11.Faculty of MedicineColumbia University Stroud CenterRiverdaleUSA
  12. 12.Outcomes Research BranchNational Cancer InstituteBethesdaUSA

Personalised recommendations