Abstract
Purpose
Patient-reported outcome (PRO) analyses often involve calculating raw change scores, but limitations of this approach are well documented. Regression estimators can incorporate information about measurement error and potential covariates, potentially improving change estimates. Yet, adoption of these regression-based change estimators is rare in clinical PRO research.
Methods
Both simulated and PROMIS® pain interference items were used to calculate change employing three methods: raw change scores and regression estimators proposed by Lord and Novick (LN) and Cronbach and Furby (CF). In the simulated data, estimators’ ability to recover true change was compared. Standard errors of measurement (SEM) and estimation (SEE) with associated 95% confidence limits were also used to identify criteria for significant improvement. These methods were then applied to real-world data from the PROMIS® study.
Results
In the simulation, both regression estimators reduced variability compared to raw change scores by almost half. Compared to CF, the LN regression better recovered true simulated differences. Analysis of the PROMIS® data showed similar themes, and change score distributions from the regression estimators showed less dispersion. Using distribution-based approaches to calculate thresholds for significant within-patient change, smaller changes could be detected using both regression estimators.
Conclusions
These results suggest that calculating change using regression estimates may result in more increased measurement sensitivity. Using these scores in lieu of raw differences can help better identify individuals who experience real underlying change in PROs in the course of a trial, and enhance the established methods for identifying thresholds for meaningful within-patient change in PROs.
This is a preview of subscription content,
to check access.

Similar content being viewed by others
Data availability
The PROMIS® 1 Wave 2 Pain Depression dataset can be requested here: https://doi.org/10.7910/DVN/ZDIITC.
Abbreviations
- CAT:
-
Computerized adaptive testing
- CF:
-
Cronbach & Furby (complete estimator)
- CTT:
-
Classical test theory
- EAP:
-
Expected a-priori
- GRM:
-
Graded response model
- IRT:
-
Item-response theory
- LN:
-
Lord & Novick
- MVN:
-
Multivariate normal distribution
- PRO:
-
Patient-reported outcome
- PROMIS®:
-
Patient-reported outcome measurement information system
- SE:
-
Standard Error
- SEM:
-
Standard error of measurement
- SEP:
-
Standard error of prediction
- SS:
-
Sum score
- TS:
-
T-score
References
U.S. Food and Drug Administration (2019) Patient-focused Drug Development Guidance Public Workshop - Discussion document: Incorporating clinical outcome assessments into endpoints for regulatory decision-making. Retrieved, from https://www.fda.gov/media/132505/download
Coon, C. D., & Cook, K. F. (2018). Moving from significance to real-world meaning: Methods for interpreting change in clinical outcome assessment scores. Quality of Life Research, 27, 33–40.
US Food and Drug Administration. (2018). Patient-Focused Drug Development Guidance Public Workshop: Methods to identify what is important to patients select, develop or modify fit-for-purpose clinical outcomes assessments.
Kim-Kang, G., & Weiss, D. J. (2008). Adaptive measurement of individual change. Zeitschrift für Psychologie / Journal of Psychology, 216, 49–58.
Lord, F. M. (1958). Further problems in the measurement of growth. Educational and Psychological Measurement, 18, 437–451.
Lord, F. M. (1956). The measurement of growth. ETS Res Bull Ser, 1956, i–22.
McNemar, Q. (1958). On growth measurement. Educational and Psychological Measurement, 18, 47–55.
Cronbach, L. J., & Furby, L. (1970). How we should measure change–or should we? Psychological Bulletin, 74, 68–80.
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Addison-Wesley Pub Co, Reading.
Cascio, W. F., & Kurtines, W. M. (1977). A practical method for identifying significant change scores. Educational and Psychological Measurement, 37, 889–895. https://doi.org/10.1177/001316447703700411
Cella, D., Riley, W., Stone, A., Rothrock, N., Reeve, B., Yount, S., Amtmann, D., Bode, R., Buysse, D., Choi, S., Cook, K., Devellis, R., Dewalt, D., Fries, J. F., Gershon, R., Hahn, E. A., Lai, J. S., Pilkonis, P., Revicki, D., … Hays, R. (2010). The patient-reported outcomes measurement information system (PROMIS) developed and tested its first wave of adult self-reported health outcome item banks: 2005–2008. Journal of Clinical Epidemiology, 63, 1179–1194. https://doi.org/10.1016/j.jclinepi.2010.04.011
Segawa, E., Schalet, B., & Cella, D. (2020). A comparison of computer adaptive tests (CATs) and short forms in terms of accuracy and number of items administrated using PROMIS profile. Quality of Life Research, 29, 213–221.
Dagmar Amtmann, 2016, "PROMIS 1 Wave 2 Pain", https://doi.org/10.7910/DVN/ESOAH5, Harvard Dataverse, V1, UNF:6:TYzYcoNorGguhqSjkVdL2Q== [fileUNF]
Amtmann, D., Cook, K. F., Jensen, M. P., Chen, W.-H., Choi, S., Revicki, D., Cella, D., Rothrock, N., Keefe, F., Callahan, L., & Lai, J.-S. (2010). Development of a PROMIS item bank to measure pain interference. Pain, 150, 173–182.
Samejima, F. (1994). Estimation of reliability coefficients using the test information function and its modifications. Applied Psychological Measurement, 18, 229–244.
R Core Team. (2020). A Language and Environment for Statistical Computing. R Found. Stat. Comput. Retrieved, from https://www.R--project.org
Chalmers, R. P. (2012). mirt: A Multidimensional Item Response Theory Package for the R Environment. Journal of Statistical Software, 48, 1–29. https://doi.org/10.18637/jss.v048.i06
der Elst, W., Molenberghs, G., Hilgers, RD., Verbeke, G., Heussen, N. (2019). CorrMixed: Estimate Correlations Between Repeatedly Measured Endpoints (Eg, Reliability) Based on Linear Mixed-Effects Models. R package version 1.0
Revelle, W. (2021). psych: Procedures for Personality and Psychological Research, Northwestern University, Evanston, Illinois, USA, Retrieved, from https://CRAN.R-project.org/package=psychVersion=2.1.9
Cohen, J. (1988). Statistical power analysis for the behavioral science (2nd ed.). Taylor & Francis Group.
Funding
The current project did not have explicit extramural funding sources. All authors are employees of their respective institutions.
Author information
Authors and Affiliations
Contributions
All authors contributed to the conceptualization, drafting, and review of the manuscript. DAA: conducted the analyses. JDP: supplied the PROMIS® dataset. All authors approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors have no competing interests to declare.
Ethical approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Andrae, D.A., Foster, B. & Peipert, J.D. Comparison of raw and regression approaches to capturing change on patient-reported outcome measures. Qual Life Res 32, 1381–1390 (2023). https://doi.org/10.1007/s11136-022-03196-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11136-022-03196-x