Efficiency of static and computer adaptive short forms compared to full-length measures of depressive symptoms
- 621 Downloads
Short-form patient-reported outcome measures are popular because they minimize patient burden. We assessed the efficiency of static short forms and computer adaptive testing (CAT) using data from the Patient-Reported Outcomes Measurement Information System (PROMIS) project.
We evaluated the 28-item PROMIS depressive symptoms bank. We used post hoc simulations based on the PROMIS calibration sample to compare several short-form selection strategies and the PROMIS CAT to the total item bank score.
Compared with full-bank scores, all short forms and CAT produced highly correlated scores, but CAT outperformed each static short form in almost all criteria. However, short-form selection strategies performed only marginally worse than CAT. The performance gap observed in static forms was reduced by using a two-stage branching test format.
Using several polytomous items in a calibrated unidimensional bank to measure depressive symptoms yielded a CAT that provided marginally superior efficiency compared to static short forms. The efficiency of a two-stage semi-adaptive testing strategy was so close to CAT that it warrants further consideration and study.
KeywordsComputer adaptive testing PROMIS Item response theory Short form Two-stage testing
This study was supported in part by NIH grant PROMIS Network (U-01 AR 052177-04, PI: David Cella). The Patient-Reported Outcomes Measurement Information System (PROMIS) is a National Institutes of Health (NIH) Roadmap initiative to develop a computerized system measuring patient-reported outcomes in respondents with a wide range of chronic diseases and demographic characteristics. PROMIS was funded by cooperative agreements to a Statistical Coordinating Center (Northwestern University, PI: David Cella, Ph.D., U01AR52177) and six Primary Research Sites (Duke University, PI: Kevin Weinfurt, Ph.D., U01AR52186; University of North Carolina, PI: Darren DeWalt, MD, MPH, U01AR52181; University of Pittsburgh, PI: Paul A. Pilkonis, Ph.D., U01AR52155; Stanford University, PI: James Fries, MD, U01AR52158; Stony Brook University, PI: Arthur Stone, Ph.D., U01AR52170; and University of Washington, PI: Dagmar Amtmann, Ph.D., U01AR52171). NIH Science Officers on this project are Deborah Ader, Ph.D., Susan Czajkowski, Ph.D., Lawrence Fine, MD, DrPH, Louis Quatrano, Ph.D., Bryce Reeve, Ph.D., William Riley, Ph.D., and Susana Serrate-Sztein, Ph.D. This manuscript was reviewed by the PROMIS Publications Subcommittee prior to external peer review. See the web site at www.nihpromis.org for additional information on the PROMIS cooperative group.
- 5.Pilkonis, P. A., Choi, S. W., Reise, S. P., Stover, A. M., Riley, W. T., & Cella, D. (in preparation). The development of scales for emotional distress from the patient-reported outcomes measurement information system (PROMIS): Depression, Anxiety, and Anger. Google Scholar
- 9.Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph, 17.Google Scholar
- 10.Thissen, D., Chen, W.-H., & Bock, R. D. (2003). Multilog (version 7) [Computer software]. Lincolnwood, IL: Scientific Software International.Google Scholar
- 13.Bjorner, J. B., Smith, K. J., Orlando, M., Stone, C., Thissen, D., & Sun, X. (2006). IRTFIT: A macro for item fit and local dependence tests under IRT models. Lincoln, RI: Quality Metric, Inc.Google Scholar
- 14.Liu, H., Cella, D., Gershon, R., Shen, J., Morales, L. S., Riley, W. T., & Hays, R. D. (in press). Representativeness of the PROMIS Internet Panel. Journal of Clinical Epidemiology.Google Scholar
- 15.Muthen, L. K. & Muthen, B. O. (1998). Mplus user’s guide. Google Scholar
- 17.Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Erlbaum.Google Scholar
- 21.van der Linden, W. J., & Pashley, P. J. (2000). Item selection and ability estimator in adaptive testing. In W. J. van der Linden & C. A. W. Glas (Eds.), Computerized adaptive testing: Theory and practice (pp. 1–25). Boston, MA: Kluwer Academic.Google Scholar
- 22.Veerkamp, W. J. J., & Berger, M. P. F. (1997). Some new item selection criteria for adaptive testing. Journal of Educational and Behavioral Statistics, 22(2), 203–226.Google Scholar
- 32.Altman, D. G., & Bland, J. M. (1994). Diagnostic tests 2: Predictive values. British Journal of Medicine, 309, 102.Google Scholar