Abstract
The PROsetta Stone Project, summarized in this issue by Schalet et al. (Psychometrika 86, 2021), is a major step forward in enabling comparability between different patient-reported outcomes measures. Schalet et al. clearly describe the psychometric methods used in the PROsetta Stone project and other projects from the Patient-Reported Outcomes Measurement Information System (PROMIS): linking based on unidimensional item response theory (IRT), equipercentile linking, and calibrated projection based on multidimensional IRT. Analyses in a validation data set and simulation studies provide strong support that the linking methods are robust when basic assumptions are fulfilled. The links already established will be of great value to the field, and the methodology described by Schalet et al. will hopefully inspire the next series of linking studies. Among potential improvements that should be considered by new studies are: (1) a thorough evaluation of the content of the measures to be linked to better guide the evaluation of measurement assumptions, (2) improvements in the design of linking studies such as selection of the optimal sample to provide data in the score ranges where linking precision is most critical and using counterbalanced designs to control for order effects. Finally, it may be useful to consider how the linking algorithms are used in subsequent data analyses. Analytic strategies based on plausible values or latent regression IRT models may be preferable to the simple transformation of scores from one patient at the time.
Similar content being viewed by others
References
Bjorner, J. B., Kosinski, M., & Ware, J. E, Jr. (2003). Using item response theory to calibrate the Headache Impact Test (HIT) to the metric of traditional headache scales. Quality of Life Research, 12(8), 981–1002.
Bjorner, J. B., Rose, M., Gandek, B., Stone, A. A., Junghaenel, D. U., & Ware, J. E, Jr. (2014). Method of administration of PROMIS scales did not significantly impact score level, reliability, or validity. Journal of Clinical Epidemiology, 67(1), 108–113. https://doi.org/10.1016/j.jclinepi.2013.07.016.
Choi, S. W., Schalet, B., Cook, K. F., & Cella, D. (2014). Establishing a common metric for depressive symptoms: Linking the BDI-II, CES-D, and PHQ-9 to PROMIS depression. Psychological Assessment, 26(2), 513–527. https://doi.org/10.1037/a0035768.
DSM-IV-TR., A.P.A. (2000). Diagnostic and statistical manual of mental disorders, fourth edition, text revision: DSM-IV-TR (4th ed., text rev). Washington, DC: American Psychiatric Association.
Dorans, N. J. (2004). Equating, concordance, and expectation. Applied Psychological Measurement, 28(4), 227–246.
Fischer, H. F., & Rose, M. (2019). Scoring depression on a common metric: A comparison of EAP estimation, plausible value imputation, and full Bayesian IRT modeling. Multivariate Behavioral Research, 54(1), 85–99.
Holzapfel, N., Müller-Tasch, T., Wild, B., Jünger, J., Zugck, C., Remppis, A., et al. (2008). Depression profile in patients with and without chronic heart failure. Journal of Affective Disorders, 105(1–3), 53–62.
Katzan, I. L., Fan, Y., Griffith, S. D., Crane, P. K., Thompson, N. R., & Cella, D. (2017). Scale linking to enable patient-reported outcome performance measures assessed with different patient-reported outcome measures. Value in Health, 20(8), 1143–1149.
Kim, J., Chung, H., Askew, R. L., Park, R., Jones, S. M. W., Cook, K. F., et al. (2017). Translating CESD-20 and PHQ-9 scores to PROMIS depression. Assessment, 24(3), 300–307.
Kolen, M. L., & Brennan, R. L. (2014). Test equating, scaling, and linking: Methods and practices (3rd ed.). New York, NY: Springer.
Kroenke, K., Spitzer, R. L., & Williams, J. B. W. (2001). The PHQ-9: Validity of a brief depression severity measure. Journal of General Internal Medicine, 16(9), 606–613.
Mislevy, R. J. (1991). Randomization-based inference about latent variables from complex samples. Psychometrika, 56, 177–196.
Mor, V., & Guadagnoli, E. (1988). Quality of life measurement: A psychometric tower of Babel. Journal of Clinical Epidemiology, 41(11), 1055–1058.
Orlando, M., Sherbourne, C. D., & Thissen, D. (2000). Summed-score linking using item response theory: Application to depression measurement. Psychological Assessment, 12(3), 354–359.
Pilkonis, P. A., Choi, S. W., Reise, S. P., Stover, A. M., Riley, W. T., Cella, D., & PROMIS Cooperative Group. (2011). Item banks for measuring emotional distress from the Patient-Reported Outcomes Measurement Information System (PROMIS®): Depression, anxiety, and anger. Assessment, 18(3), 263–283.
Schalet, B. D., Lim, S., Cella, D., & Choi, S. W. (2021). Linking scores with patient-reported health outcome instruments: A validation study and comparison of three linking methods. Psychometrika, 86. https://doi.org/10.1007/s11336-021-09776-z.
van Knippenberg, F. C., & de Haes, J. C. (1988). Measuring the quality of life of cancer patients: Psychometric properties of instruments. [Review]. J.Clin.Epidemiol., 41(11), 1043–1053.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Bjorner, J.B. Solving the Tower of Babel Problem for Patient-Reported Outcome Measures. Psychometrika 86, 747–753 (2021). https://doi.org/10.1007/s11336-021-09778-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11336-021-09778-x