Abstract
Aims
A primary advantage of IRT-based patient-reported outcome measures such as PROMIS short forms and computer-adaptive tests is that each estimate of the latent trait comes with a standard error. Such measurement error needs to be acknowledged, in particular when monitoring individual patients over time. In this study, we use plausible values to account for measurement error and analyze the probability of true within-individual change.
Methods
We use a longitudinal, observational study of stable and exacerbated COPD patients (N = 185), providing PROMIS Physical Function and Fatigue T-scores over 3 months. At each measurement, we imputed 1000 plausible values from the scores’ posterior distribution. These were then used to calculate probability of true change using a pre-specified threshold such as minimally important difference supported by the literature, or \(\Delta T-score\) > 0. We demonstrate assessment of change in individuals and in groups, across different measures (Short Forms and CATs), and at various levels of confidence.
Results
Using plausible value imputation and with 95% certainty, 47.5% of participants in the exacerbated group reported less fatigue, compared with 26.5% of participants in the stable group. Comparison of Short Forms and CATs suggests that CATs have better ability to detect change compared to short forms. We also illustrate this method using an individual’s probability of change at different time points.
Conclusion
Plausible values offer a flexible way to include measurement error in analysis of individuals and on sample level. Assessment of probability of true change can complement existing distribution-based approaches and facilitates interpretation of improvement or decline.
Similar content being viewed by others
Data availability
Data for this study are publicly available in the Harvard HealthMeasures Dataverse under the https://doi.org/10.7910/DVN/UOQNJF.
Code availability
The code to reproduce the analyses is available on the Open Science Framework (https://osf.io/dbgnv/files/osfstorage).
References
Tulsky, D. S., Kisala, P. A., Victorson, D., Carlozzi, N., Bushnik, T., Sherer, M., & Cella, D. (2016). TBI-QOL: Development and calibration of item banks to measure patient reported outcomes following traumatic brain injury. The Journal of Head Trauma Rehabilitation, 31(1), 40–51. https://doi.org/10.1097/HTR.0000000000000131
Akshoomoff, N., Beaumont, J. L., Bauer, P. J., Dikmen, S., Gershon, R., Mungas, D., & Heaton, R. K. (2013). NIH toolbox cognitive function battery (CFB): Composite scores of crystallized, fluid, and overall cognition. Monographs of the Society for Research in Child Development, 78(4), 119–132. https://doi.org/10.1111/mono.12038
Beaumont, J. L., Havlik, R., Cook, K. F., Hays, R. D., Wallner-Allen, K., Korper, S. P., & Gershon, R. (2013). Norming plans for the NIH toolbox. Neurology, 80(11 Suppl 3), S87–S92. https://doi.org/10.1212/WNL.0b013e3182872e70
Cella, D., Riley, W., Stone, A., Rothrock, N., Reeve, B., Yount, S., & Hays, R. (2010). Initial adult health item banks and first wave testing of the patient-reported outcomes measurement information system (PROMIS™) Network: 2005–2008. Journal of clinical epidemiology, 63(11), 1179–1194. https://doi.org/10.1016/j.jclinepi.2010.04.011
Cella, D., Yount, S., Rothrock, N., Gershon, R., Cook, K., Reeve, B., & Rose, M. (2007). The patient-reported outcomes measurement information system (PROMIS). Medical care, 45(5 Suppl 1), S3–S11. https://doi.org/10.1097/01.mlr.0000258615.42478.55
LeBlanc, T. W., & Abernethy, A. P. (2017). Patient-reported outcomes in cancer care—hearing the patient voice at greater volume. Nature Reviews Clinical Oncology, 14(12), 763–772. https://doi.org/10.1038/nrclinonc.2017.153
Basch, E., Deal, A. M., Dueck, A. C., Scher, H. I., Kris, M. G., Hudis, C., & Schrag, D. (2017). Overall survival results of a trial assessing patient-reported outcomes for symptom monitoring during routine cancer treatment. JAMA, 318(2), 197. https://doi.org/10.1001/jama.2017.7156
Sands, W. A., & Waters, B. K. (1997). Introduction to ASVAB and CAT. In W. A. Sands, B. K. Waters, & J. R. McBride (Eds.), Computerized adaptive testing: From inquiry to operation (pp. 3–9). American Psychological Association.
Yang, J. S., Hansen, M., & Cai, L. (2012). Characterizing sources of uncertainty in item response theory scale scores. Educational and Psychological Measurement, 72(2), 264–290. https://doi.org/10.1177/0013164411410056
Norman, G. R., Sloan, J. A., & Wyrwich, K. W. (2003). Interpretation of changes in health-related quality of life: The remarkable universality of half a standard deviation. Medical Care, 41(5), 582–592. https://doi.org/10.1097/01.MLR.0000062554.74615.4C
Revicki, D. A., Cella, D., Hays, R. D., Sloan, J. A., Lenderking, W. R., & Aaronson, N. K. (2006). Responsiveness and minimal important differences for patient reported outcomes. Health and Quality of Life Outcomes, 4(1), 70. https://doi.org/10.1186/1477-7525-4-70
King, M. T. (2011). A point of minimal important difference (MID): A critique of terminology and methods. Expert Review of Pharmacoeconomics & Outcomes Research, 11(2), 171–184. https://doi.org/10.1586/erp.11.9
Yang, J. S., Hansen, M., & Cai, L. (2012). Characterizing sources of uncertainty in IRT scale scores. Educational and psychological measurement, 72(2), 264–290.
Chalmers, R. P., & Ng, V. (2017). Plausible-value imputation statistics for detecting item misfit. Applied Psychological Measurement, 41(5), 372–387. https://doi.org/10.1177/0146621617692079
Marsman, M., Maris, G., Bechger, T., & Glas, C. (2016). What can we learn from plausible values? Psychometrika, 81(2), 274–289. https://doi.org/10.1007/s11336-016-9497-x
von Davier, M., Gonzalez, E., & Mislevy, R. (2009). What are plausible values and why are they useful. IERI monograph series: Issues and methodologies in large-scale assessments (pp. 9–36). Education Testing Service.
Fischer, H. F., & Rose, M. (2019). Scoring depression on a common metric: A comparison of EAP estimation, plausible value imputation, and full Bayesian IRT modeling. Multivariate Behavioral Research, 54(1), 85–99. https://doi.org/10.1080/00273171.2018.1491381
Fischer, F., Gibbons, C., Coste, J., Valderas, J. M., Rose, M., & Leplège, A. (2018). Measurement invariance and general population reference values of the PROMIS Profile 29 in the UK, France, and Germany. Quality of Life Research, 27(4), 999–1014. https://doi.org/10.1007/s11136-018-1785-8
Jacobson, N. S., & Truax, P. (1991). Clinical significance: A statistical approach to defining meaningful change in psychotherapy research. Journal of Consulting and Clinical Psychology, 59(1), 12.
Bartholomew, D. J., Knott, M., & Moustaki, I. (2011). Latent variable models and factor analysis: A unified approach (3rd ed.). Wiley.
Chang, H.-H., & Stout, W. (1993). The asymptotic posterior normality of the latent trait in an IRT model. Psychometrika, 58(1), 37–52. https://doi.org/10.1007/BF02294469
Brown, A., & Croudace, T. J. (2015). Scoring and estimating score precision using multidimensional IRT models. Handbook of item response theory modeling: Applications to typical performance assessment (pp. 307–333). Routledge.
Little, R. J. A., & Rubin, D. B. (1987). Statistical analysis with missing data. John Wiley & Sons.
Asparouhov, T., & Muthen, B. (2010). Plausible values for latent variables using Mplus. Mplus.
Yount, S. E., Atwood, C., Donohue, J., Hays, R. D., Irwin, D., Leidy, N. K., & DeWalt, D. A. (2019). Responsiveness of PROMIS® to change in chronic obstructive pulmonary disease. Journal of Patient-Reported Outcomes. https://doi.org/10.1186/s41687-019-0155-9
DeWalt, D. (2016). PROMIS 1 wave 2 chronic obstructive pulmonary disease (COPD). Harvard Dataverse. https://doi.org/10.7910/DVN/UOQNJF
Schalet, B. D., Hays, R. D., Jensen, S. E., Beaumont, J. L., Fries, J. F., & Cella, D. (2016). Validity of PROMIS® physical function measures in diverse clinical samples. Journal of clinical epidemiology, 73, 112–118. https://doi.org/10.1016/j.jclinepi.2015.08.039
Lewko, A., Bidgood, P. L., & Garrod, R. (2009). Evaluation of psychological and physiological predictors of fatigue in patients with COPD. BMC Pulmonary Medicine, 9(1), 47. https://doi.org/10.1186/1471-2466-9-47
Breslin, E., van der Schans, C., Breukink, S., Meek, P., Mercer, K., Volz, W., & Louie, S. (1998). Perception of fatigue and quality of life in patients with COPD. Chest, 114(4), 958–964. https://doi.org/10.1378/chest.114.4.958
Wang, Q., & Bourbeau, J. (2005). Outcomes and health-related quality of life following hospitalization for an acute exacerbation of COPD. Respirology, 10(3), 334–340. https://doi.org/10.1111/j.1440-1843.2005.00718.x
Cote, C. G., Dordelly, L. J., & Celli, B. R. (2007). Impact of COPD exacerbations on patient-centered outcomes. Chest, 131(3), 696–704.
Irwin, D. E., Atwood, C. A., Hays, R. D., Spritzer, K., Liu, H., Donohue, J. F., & DeWalt, D. A. (2015). Correlation of PROMIS scales and clinical measures among chronic obstructive pulmonary disease patients with and without exacerbations. Quality of Life Research, 24(4), 999–1009. https://doi.org/10.1007/s11136-014-0818-1
Rose, M., Bjorner, J. B., Gandek, B., Bruce, B., Fries, J. F., & Ware, J. E. (2014). The PROMIS physical function item bank was calibrated to a standardized metric and shown to improve measurement efficiency. Journal of Clinical Epidemiology, 67(5), 516–526. https://doi.org/10.1016/j.jclinepi.2013.10.024
Fries, J. F., Krishnan, E., Rose, M., Lingala, B., & Bruce, B. (2011). Improved responsiveness and reduced sample size requirements of PROMIS physical function scales with item response theory. Arthritis Research & Therapy, 13(5), R147. https://doi.org/10.1186/ar3461
Lai, J.-S., Cella, D., Choi, S., Junghaenel, D. U., Christodoulou, C., Gershon, R., & Stone, A. (2011). How item banks and their application can influence measurement practice in rehabilitation medicine: A promis fatigue item bank example. Archives of physical medicine and rehabilitation, 92(10), S20–S27. https://doi.org/10.1016/j.apmr.2010.08.033
Ameringer, S., Elswick, R. K., Menzies, V., Robins, J. L., Starkweather, A., Walter, J., & Jallo, N. (2016). Psychometric evaluation of the patient-reported outcomes measurement information system fatigue-short form across diverse populations. Nursing Research, 65(4), 279–289. https://doi.org/10.1097/NNR.0000000000000162
Choi, S. W., & Swartz, R. J. (2009). Comparison of CAT item selection criteria for polytomous items. Applied psychological measurement, 33(6), 419–440. https://doi.org/10.1177/0146621608327801
Yost, K., Cella, D., Chawla, A., Holmgren, E., Eton, D., Ayanian, J., & West, D. (2005). Minimally important differences were estimated for the functional assessment of cancer therapy-colorectal (FACT-C) instrument using a combination of distribution- and anchor-based approaches. Journal of Clinical Epidemiology, 58(12), 1241–1251. https://doi.org/10.1016/j.jclinepi.2005.07.008
Cella, D., Hahn, E. A., & Dineen, K. (2002). Meaningful change in cancer-specific quality of life scores: Differences between improvement and worsening. Quality of Life Research, 11(3), 207–221.
Beaumont, J. L., Davis, E. S., Fries, J. F., Curtis, J. R., Cella, D., & Yun, H. (2021). Meaningful change thresholds for patient-reported outcomes measurement information system (PROMIS) fatigue and pain interference scores in patients with rheumatoid arthritis. The Journal of Rheumatology. https://doi.org/10.3899/jrheum.200990
Wyrwich, K. W. (2004). Minimal important difference thresholds and the standard error of measurement: Is there a connection? Journal of Biopharmaceutical Statistics, 14(1), 97–110. https://doi.org/10.1081/BIP-120028508
Hays, R. D., Spritzer, K. L., Fries, J. F., & Krishnan, E. (2015). Responsiveness and minimally important difference for the patient-reported outcomes measurement information system (PROMIS) 20-item physical functioning short form in a prospective observational study of rheumatoid arthritis. Annals of the Rheumatic Diseases, 74(1), 104–107. https://doi.org/10.1136/annrheumdis-2013-204053
Bartlett, S. J., Gutierrez, A. K., Andersen, K. M., Bykerk, V. P., Curtis, J. R., Haque, U. J., & Bingham, C. O. (2020). Identifying minimal and meaningful change in PROMIS(®) for rheumatoid arthritis: Use of multiple methods and perspectives. Arthritis Care Res (Hoboken), 74(4), 588–597.
Snapinn, S. M., & Jiang, Q. (2007). Responder analyses and the assessment of a clinically relevant treatment effect. Trials, 8(1), 31. https://doi.org/10.1186/1745-6215-8-31
Uryniak, T., Chan, I. S. F., Fedorov, V. V., Jiang, Q., Oppenheimer, L., Snapinn, S. M., & Zhang, J. (2011). Responder analyses—A PhRMA position paper. Statistics in Biopharmaceutical Research, 3(3), 476–487. https://doi.org/10.1198/sbr.2011.10070
Funding
The second author is funded partially by the National Science Foundation Grant ECR-1760491.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Ethical approval
This article does not contain any studies with human participants performed by any of the authors and uses publicly available data.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ho, E.H., Verkuilen, J. & Fischer, F. Measuring individual true change with PROMIS using IRT-based plausible values. Qual Life Res 32, 1369–1379 (2023). https://doi.org/10.1007/s11136-022-03264-2
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11136-022-03264-2