Quality of Life Research

, Volume 19, Issue 1, pp 125–136 | Cite as

Efficiency of static and computer adaptive short forms compared to full-length measures of depressive symptoms

  • Seung W. Choi
  • Steven P. Reise
  • Paul A. Pilkonis
  • Ron D. Hays
  • David Cella
Article

Abstract

Purpose

Short-form patient-reported outcome measures are popular because they minimize patient burden. We assessed the efficiency of static short forms and computer adaptive testing (CAT) using data from the Patient-Reported Outcomes Measurement Information System (PROMIS) project.

Methods

We evaluated the 28-item PROMIS depressive symptoms bank. We used post hoc simulations based on the PROMIS calibration sample to compare several short-form selection strategies and the PROMIS CAT to the total item bank score.

Results

Compared with full-bank scores, all short forms and CAT produced highly correlated scores, but CAT outperformed each static short form in almost all criteria. However, short-form selection strategies performed only marginally worse than CAT. The performance gap observed in static forms was reduced by using a two-stage branching test format.

Conclusions

Using several polytomous items in a calibrated unidimensional bank to measure depressive symptoms yielded a CAT that provided marginally superior efficiency compared to static short forms. The efficiency of a two-stage semi-adaptive testing strategy was so close to CAT that it warrants further consideration and study.

Keywords

Computer adaptive testing PROMIS Item response theory Short form Two-stage testing 

References

  1. 1.
    Bjorner, J. B., Chang, C. H., Thissen, D., & Reeve, B. B. (2007). Developing tailored instruments: item banking and computerized adaptive assessment. Quality of Life Research, 16(Suppl 1), 95–108.CrossRefPubMedGoogle Scholar
  2. 2.
    Thissen, D., Reeve, B. B., Bjorner, J. B., & Chang, C. H. (2007). Methodological issues for building item banks and computerized adaptive scales. Quality of Life Research, 16(Suppl 1), 109–119.CrossRefPubMedGoogle Scholar
  3. 3.
    Cella, D., Yount, S., Rothrock, N., Gershon, R., Cook, K., Reeve, B., et al. (2007). The patient-reported outcomes measurement information system (PROMIS): progress of an NIH roadmap cooperative group during its first two years. Medical Care, 45(5 Suppl 1), S3–S11.CrossRefPubMedGoogle Scholar
  4. 4.
    Belov, D. I., & Armstrong, R. D. (2008). A Monte Carlo approach to the design, assembly, and evaluation of multistage adaptive tests. Applied Psychological Measurement, 32(2), 119–137.CrossRefGoogle Scholar
  5. 5.
    Pilkonis, P. A., Choi, S. W., Reise, S. P., Stover, A. M., Riley, W. T., & Cella, D. (in preparation). The development of scales for emotional distress from the patient-reported outcomes measurement information system (PROMIS): Depression, Anxiety, and Anger. Google Scholar
  6. 6.
    Fliege, H., Becker, J., Walter, O., Bjorner, J., Klapp, B., & Rose, M. (2005). Development of a computer-adaptive test for depression (D-CAT). Quality of Life Research, 14(10), 2277–2291.CrossRefPubMedGoogle Scholar
  7. 7.
    Gardner, W., Shear, K., Kelleher, K., Pajer, K., Mammen, O., Buysse, D., et al. (2004). Computerized adaptive measurement of depression: A simulation study. BMC Psychiatry, 4(1), 13.CrossRefPubMedGoogle Scholar
  8. 8.
    Gibbons, R. D., Weiss, D. J., Kupfer, D. J., Frank, E., Fagiolini, A., Grochocinski, V. J., et al. (2008). Using computerized adaptive testing to reduce the burden of mental health assessment. Psychiatric Services, 59(4), 361–368.CrossRefPubMedGoogle Scholar
  9. 9.
    Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph, 17.Google Scholar
  10. 10.
    Thissen, D., Chen, W.-H., & Bock, R. D. (2003). Multilog (version 7) [Computer software]. Lincolnwood, IL: Scientific Software International.Google Scholar
  11. 11.
    Kang, T., & Chen, T. (2008). Performance of the generalized S-X2 item fit index for polytomous IRT models. Journal of Educational Measurement, 45(4), 391–406.CrossRefGoogle Scholar
  12. 12.
    Orlando, M., & Thissen, D. (2003). Further investigation of the performance of S-X2: An item fit index for use with dichotomous item response theory models. Applied Psychological Measurement, 27(4), 289–298.CrossRefGoogle Scholar
  13. 13.
    Bjorner, J. B., Smith, K. J., Orlando, M., Stone, C., Thissen, D., & Sun, X. (2006). IRTFIT: A macro for item fit and local dependence tests under IRT models. Lincoln, RI: Quality Metric, Inc.Google Scholar
  14. 14.
    Liu, H., Cella, D., Gershon, R., Shen, J., Morales, L. S., Riley, W. T., & Hays, R. D. (in press). Representativeness of the PROMIS Internet Panel. Journal of Clinical Epidemiology.Google Scholar
  15. 15.
    Muthen, L. K. & Muthen, B. O. (1998). Mplus user’s guide. Google Scholar
  16. 16.
    Choi, S. W. (2009). Firestar: Computerized adaptive testing simulation program for polytomous IRT models. Applied Psychological Measurement, 33(8), 644–645.CrossRefPubMedGoogle Scholar
  17. 17.
    Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Erlbaum.Google Scholar
  18. 18.
    Weiss, D. J. (1982). Improving measurement quality and efficiency with adaptive testing. Applied Psychological Measurement, 6(4), 473–492.CrossRefGoogle Scholar
  19. 19.
    Chang, H.-H., & Ying, Z. (1996). A global information approach to computerized adaptive testing. Applied Psychological Measurement, 20(3), 213–229.CrossRefGoogle Scholar
  20. 20.
    Lima Passos, V., Berger, M. P. F., & Tan, F. E. (2007). Test design optimization in CAT early stage with the nominal response model. Applied Psychological Measurement, 31(3), 213–232.CrossRefGoogle Scholar
  21. 21.
    van der Linden, W. J., & Pashley, P. J. (2000). Item selection and ability estimator in adaptive testing. In W. J. van der Linden & C. A. W. Glas (Eds.), Computerized adaptive testing: Theory and practice (pp. 1–25). Boston, MA: Kluwer Academic.Google Scholar
  22. 22.
    Veerkamp, W. J. J., & Berger, M. P. F. (1997). Some new item selection criteria for adaptive testing. Journal of Educational and Behavioral Statistics, 22(2), 203–226.Google Scholar
  23. 23.
    Bock, R. D., & Mislevy, R. J. (1982). Adaptive EAP estimation of ability in a microcomputer environment. Applied Psychological Measurement, 6(4), 431–444.CrossRefGoogle Scholar
  24. 24.
    Choi, S. W., & Swartz, R. J. (2009). Comparison of CAT item selection criteria for polytomous items. Applied Psychological Measurement, 33(6), 419–440.CrossRefPubMedGoogle Scholar
  25. 25.
    van der Linden, W. (1998). Optimal assembly of psychological and education tests. Applied Psychological Measurement, 22(3), 195–211.CrossRefGoogle Scholar
  26. 26.
    Reise, S. P., & Henson, J. M. (2000). Computerization and adaptive administration of the NEO PI-R. Assessment, 7(4), 347–364.CrossRefPubMedGoogle Scholar
  27. 27.
    Hol, A. M., Vorst, H. C. M., & Mellenbergh, G. J. (2007). Computerized adaptive testing for polytomous motivation items: administration mode effects and a comparison with short forms. Applied Psychological Measurement, 31(5), 412–429.CrossRefGoogle Scholar
  28. 28.
    Kendall, M. G., & Babington, S. B. (1939). The problem of m rankings. The Annals of Mathematical Statistics, 10(3), 275–287.CrossRefGoogle Scholar
  29. 29.
    Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46.CrossRefGoogle Scholar
  30. 30.
    Yule, G. U. (1912). On the methods of measuring association between two attributes. Journal of the Royal Statistical Society, 75, 579–652.CrossRefGoogle Scholar
  31. 31.
    Warrens, M. (2008). On association coefficients for 2 × 2 tables and properties that do not depend on the marginal distributions. Psychometrika, 73, 777–789.CrossRefPubMedGoogle Scholar
  32. 32.
    Altman, D. G., & Bland, J. M. (1994). Diagnostic tests 2: Predictive values. British Journal of Medicine, 309, 102.Google Scholar
  33. 33.
    Strauss, M. E. & Smith, G. T. (2009). Construct validity: advances in theory and methodology. Annual Review of Clinical Psychology, 5, 1–25.CrossRefPubMedGoogle Scholar
  34. 34.
    Reise, S. P., Morizot, J., & Hays, R. D. (2007). The role of the bifactor model in resolving dimensionality issues in health outcomes measures. Quality of Life Research, 16(Suppl 1), 19–31.CrossRefPubMedGoogle Scholar
  35. 35.
    Dodd, B. G., Koch, W. R., & De Ayala, R. J. (1989). Operational characteristics of adaptive testing procedures using the graded response model. Applied Psychological Measurement, 13(2), 129–143.CrossRefGoogle Scholar
  36. 36.
    Smith, G. T., McCarthy, D. M., & Anderson, K. G. (2000). On the sins of short-form development. Psychological Assessment, 12(1), 102–111.CrossRefPubMedGoogle Scholar

Copyright information

© Springer Science+Business Media B.V. 2009

Authors and Affiliations

  • Seung W. Choi
    • 1
  • Steven P. Reise
    • 2
  • Paul A. Pilkonis
    • 3
  • Ron D. Hays
    • 4
    • 5
  • David Cella
    • 1
  1. 1.Department of Medical Social SciencesNorthwestern University Feinberg School of MedicineChicagoUSA
  2. 2.Department of PsychologyUniversity of California, Los AngelesLos AngelesUSA
  3. 3.Department of PsychiatryUniversity of Pittsburgh Medical CenterPittsburghUSA
  4. 4.Department of MedicineUniversity of California, Los AngelesLos AngelesUSA
  5. 5.Health ProgramRANDSanta MonicaUSA

Personalised recommendations