Abstract
Purpose
This study evaluates the interpretability of Patient-Reported Outcomes Measurement Information System® (PROMIS®)-16 profile domain scores (physical function, ability to participate in social roles and activities, anxiety, depression, sleep disturbance, pain interference, cognitive function – abilities, and fatigue) compared to the PROMIS-29 scores and a 5-item PROMIS cognitive function score. The study aims to provide insights into using these measures in clinical and research settings.
Methods
Analyses were conducted using data from 4130 adults from a nationally representative, probability-based internet panel between September and October 2022. A subset of 1256 individuals with back pain was followed up at six months. We compared the PROMIS-16 profile with the corresponding domain scores from the PROMIS-29 and a custom five-item cognitive function measure. We evaluated (1) reliability through inter-item correlations within each domain and (2) criterion validity by comparing PROMIS-16 profile with the corresponding longer PROMIS measures: (a) standardized mean differences in domain scores, (b) correlations, and (c) concordance of change (i.e., got worse, stayed the same, got better) among those with back pain from baseline to six months later using the reliable change index. We report the Kappa coefficient of agreement and the frequency and percentage of participants with concordant classifications.
Results
Inter-item correlations for the PROMIS-16 domains ranged from 0.65 in cognitive function to 0.92 in pain interference. Standardized mean differences between PROMIS-16 and the scores for the corresponding longer PROMIS domains were minimal (< 0.2). Correlations among the corresponding domain scores ranged from 0.82 for sleep disturbance to 0.98 for pain interference. The percentage of concordance in change groups ranged from 63% for sleep disturbance to 88% for pain interference. Except for sleep disturbance, the change groups derived from the PROMIS-16 showed moderate to substantial agreement with scores estimated from the longer PROMIS measures (Kappa coefficients ≥ 0.41).
Conclusion
The PROMIS-16 domain scores perform similarly to the longer PROMIS measures and can be interpreted in the same way. This similarity indicates that PROMIS-16 can be useful for research as a brief health-related quality-of-life profile measure.
Plain English summary
The Patient-Reported Outcomes Measurement Information System® (PROMIS®)-16 Profile assesses eight health-related quality of life domains (physical function, ability to participate in social roles and activities, anxiety, depression, sleep disturbance, pain interference, cognitive function – abilities, and fatigue) using two items per domain. We evaluated the PROMIS-16 profile in a sample drawn from a nationally representative, probability-based internet panel. The study supports the reliability and criterion validity of the PROMIS-16, showing that the domain scores closely align with and have high concordance in change with the PROMIS-29 scores and a custom five-item cognitive function score. The PROMIS-16 has the potential to be a brief health-related quality-of-life profile measure in research and clinical settings.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
Patient-reported outcome measures (PROMs) are used to assess functioning, well-being, and health-related quality of life (HRQOL) [1,2,3]. The Patient-Reported Outcomes Measurement Information System® (PROMIS®) assesses many aspects of physical, mental and social health. A subset of domains is included in profile measures that are widely used in research because of their favorable psychometric properties and scores normed to the United States general population [4, 5]. The PROMIS profiles provide both specific, actionable domain scores and global physical and mental health summary scores [6, 7]. However, the shortest adult PROMIS profile measure contains 29 items (4 items from each of seven domains and a pain intensity item), which may be too burdensome for use in routine clinical practice and some research contexts due to challenges of integration into existing clinical workflow and representativeness of the respondents [8]. This highlights the need for a concise PROMIS profile with adequate psychometric properties and similar interpretation as longer measures.
The PROMIS-16 profile was developed using empirical evaluation of data in a sample of 5775 respondents and stakeholder input of 50 candidate PROMIS items and item pairs [9]. It includes two items each to assess physical function, ability to participate in social roles and activities, anxiety, depression, sleep disturbance, pain interference, cognitive function – abilities, and fatigue. Beyond its simplicity, the PROMIS-16 was intended to provide scores comparable to those in the PROMIS-29 profile and the PROMIS cognitive function domains. The selected item pair for each domain showed strong psychometric properties.
The developmental work showed comparable results for the PROMIS-16 and corresponding longer PROMIS measures. These findings were based on data from a non-representative US sample of Amazon’s Mechanical Turk members [9]. They did not evaluate similarities from a longitudinal perspective, especially for patients with specific health conditions. The overall goal of this study is to re-evaluate the domain scores of the PROMIS-16 profile using data from a sample drawn from a nationally representative, probability-based internet panel. We also evaluate the PROMIS-16 domain scores’ ability to detect individual changes over six months relative to the longer PROMIS measures among participants with back pain. We hypothesize that the PROMIS-16 domain scores will (1) show similar distributions to and strongly correlate with the corresponding domain scores from longer PROMIS measures and (2) demonstrate individual changes comparable to those of the longer measures.
Methods
Data source and study sample
We analyzed longitudinal data from a sample drawn from a nationally representative, probability-based internet panel. These data were collected from KnowledgePanel members between September and October 2022. The KnowledgePanel was established in 1999 by Knowledge Networks and is currently operated by Ipsos Public Affairs. It is a high-quality probability-based panel with over 55000 members recruited through an address-based sampling method that uses the latest delivery sequence file of the US Postal Service [10,11,12]. This approach enhances the representation of the overall population and is particularly effective in recruiting underrepresented groups, such as young adults and minorities. Most KnowledgePanel members have internet access and computers; for those who do not, Ipsos provides them with a device and access as needed so that the panel reflects the full spectrum of US adults.
The survey included questions about demographic and clinical characteristics and PROMIS items. The survey was made available to 7224 individuals randomly selected from the 55000 KnowledgePanel members for completion within 10 days. The sample size (n = 7224) was based on the national prevalence of people with chronic back pain and our target of 1500 respondents with back pain in the original study [13]. Ipsos conducted some quality control and cleaning, and we included two fake conditions (i.e., “Syndomitis” and “Checkalism”) within a list of chronic health conditions to identify careless or insincere respondents [13]. Respondents who reported having back pain at baseline were followed up at six months to understand the changes of PROs using the same PROMIS items. Participants who responded to the baseline survey were entered into a monthly sweepstake for prizes; those eligible for and completed an additional survey on back pain and a 6-month assessment received 5000 points for each redeemable for cash (about $5).
Of the 4149 respondents (57%, 4149/7224) who consented and enrolled in the survey, 19 (0.5%) were excluded because they reported having a fake condition (i.e., “Syndomitis” and “Checkalism”) [13]. Differences in demographic characteristics between respondents and general populations can be adjusted using survey weights. However, given this study was a psychometric evaluation of the similarities between PROMIS-16 and longer PROMIS domain scores, minor differences in the demographic profiles of the sample and the general population would not change conclusions of the study. Thus, an unweighted sample was used. The final analytic sample for this study included 4130 participants at baseline, of whom 1533 (38%) reported having back pain and were assigned a follow-up survey at 6-months; 1256 (82% of the 1533 participants) completed the 6-month survey. Figure A1 in the appendix shows the flowchart of the analytic sample.
Measures
PROMIS-16 profile
The PROMIS-16 profile measures eight domains (physical function, pain interference, fatigue, sleep disturbance, depression, anxiety, ability to participate in social roles and activities, and cognitive function-abilities) using two PROMIS items per domain. Many, but not all, of the PROMIS-16 items, are included in the PROMIS-29 + 2; five domains have both items, two have only one item, and one has no items from the PROMIS-29 + 2. Following the PROMIS convention and recommended scoring [9], we estimated IRT-based T-scores (mean = 50, SD = 10) for each PROMIS-16 domain.
Comparison with the PROMIS-29 and cognitive function domain scores
We compared the PROMIS-16 with the four-item domain scores from the PROMIS-29 + 2 profile for seven domains. For the domain of cognitive function – abilities, we created a five-item score by adding three additional items to the two contained in the PROMIS-29 + 2.
Demographic characteristics
We collected information on age, race/ethnicity, gender, education, and annual income.
Statistical methods
We described the demographic characteristics of the sample using frequencies and percentages. We estimated polychoric correlations among items within each domain to evaluate internal consistency reliability. We classified correlation coefficients as weak (< 0.4), moderate (0.4–0.7), or strong (> 0.7) [14].
We evaluated the criterion validity of the PROMIS-16 profile by comparing it with the longer PROMIS measures (i.e., criterion measures) using (1) standardized mean differences (i.e., Cohen’s d) in domain scores [15], (2) correlations, and (3) concordance between individual changes. Since the five PROMIS-16 domains are the shortened versions of the longer PROMIS measures, and the other three domains share no more than one item with the longer measures, we hypothesize that there will be some minor differences in the domain scores. A Cohen’s d less than 0.2 was considered a trivial mean difference [15]. The correlations of corresponding domain scores between PROMIS-16 and the longer measures were used to evaluate convergent validity, with the hypothesis that the PROMIS-16 domain scores will be strongly correlated with those of the corresponding longer measures.
To evaluate the concordance of change groups between the PROMIS-16 and its corresponding longer PROMIS domains, we estimated the Reliable Change Index (RCI) for each domain in participants with back pain from baseline to the 6-month follow-up [16, 17]. Using a p < 0.05 cutoff of 1.96 on the RCI, we categorized participants into three change groups (got worse, stayed the same, got better). We evaluated the concordance of change groups using the frequency and percentage of participants with concordant classifications. We also calculated unweighted Kappa coefficients and 95% confidence intervals (CIs), and classified concordance as weak (≤ 0.2), fair (0.21–0.4), moderate (0.41–0.6), substantial (0.61–0.80), and strong (0.81–1.00) [18].
Given the small amount of missing data, we assumed that missingness was completely at random, and the analyses were conducted using complete cases. All analyses were performed using SAS version 9.4.
Results
Overview
Table 1 shows the demographic characteristics of the 4130 participants. Most (70%, n = 2887) were non-Hispanic White. Over half of the participants (59%, n = 2424) were 60 or younger. A total of 40% (n = 1667) of the participants had a bachelor’s degree or higher, and 73% had an annual income equal to or greater than $50,000. Those with back pain and who completed the 6-month assessment (30%, 1256/4130) tended to be older and have lower annual income than the rest of the sample (Table A1 in Appendix).
Reliability
The inter-item polychoric correlations for the PROMIS-16 domains were high, indicating good internal consistency. Six out of the eight domains had inter-item correlations greater than 0.8. The domains of sleep disturbance (r = 0.67) and cognitive function abilities (r = 0.65) had inter-item correlations less than 0.8 but greater than 0.6.
Criterion validity
Standardized mean differences in domain scores between PROMIS-16 and longer PROMIS criterion measures
Table 2 shows domain T-scores and standardized mean differences between the PROMIS-16 and longer PROMIS measures. The PROMIS-16 domain mean scores ranged from 47.2 (SD = 9.4) in the fatigue domain to 54.9 (SD = 8.3) in the ability to participate in social roles and activities domain, while the mean domain scores for the longer PROMIS measures ranged from 47.8 (SD = 9.8) in fatigue to 55.7 (SD = 8.9) in the ability to participate in social roles and activities. The standardized mean differences between the PROMIS-16 and the corresponding domain scores were minimal (< 0.2), ranging from − 0.12 in the cognitive function domain to 0.13 in the anxiety domain.
Correlations of domain scores estimated by the PROMIS-16 and longer PROMIS criterion measures
The PROMIS-16 domain scores consistently showed strong correlations with those for the longer PROMIS measures, ranging from 0.82 for the sleep disturbance domain to 0.98 for the pain interference domain.
Concordance of significant individual change in domain scores between PROMIS-16 and longer PROMIS criterion measures
The percentage of concordance in individual change ranged from 63% (789/1249) for the sleep disturbance domain to 88% (1100/1243) for the pain interference domain (Table 3). Seven out of the eight domains showed moderate to substantial agreement (Kappa coefficients ≥ 0.5). However, despite a 63% concordance rate of change groups in the sleep disturbance domain between PROMIS-16 and the longer PROMIS measures, the agreement statistic was only fair, with a Kappa coefficient of 0.29 (95%CI: 0.24–0.34).
We found that more participants were classified as “stayed the same” in the PROMIS-16 profile compared to the longer PROMIS measures. For example, 77% of participants were in the “stayed the same” group for the PROMIS-16 versus 71% for the PROMIS-29 physical function domains (Table 3).
Discussion
Using data from a nationally representative, probability-based panel, we evaluated PROMIS-16 domain scores in cross-sectional and longitudinal analyses. The results indicate small standardized mean differences and strong correlations between the corresponding PROMIS-16 and longer PROMIS domain scores, supporting their similarities. The longitudinal analyses of individual changes in participants with back pain from baseline to 6-month follow-up found moderate to substantial agreement in changes, except for sleep disturbance.
Fifty PROMIS items were considered for inclusion in the PROMIS-16 profile, and most of the selected PROMIS-16 items were derived from the PROMIS-29 + 2 profile which has been widely applied in clinical practice and research [6]. In addition to a comprehensive empirical evaluation of the 50 candidate items and their item pairs, the development process included collecting item preference ratings from stakeholders and an online sample from Amazon’s Mechanical Turk. This approach ensured that the selected items for each domain showed sound psychometric properties statistically and optimally reflected the respective domain from the perspectives of stakeholders and respondents [9]. Consequently, the PROMIS-16 profile is a measure that not only reflects the PROMIS-29 + 2 but also reduces the overall burden of data collection, making it feasible and easy to incorporate into research surveys and existing clinical workflows.
Our findings further corroborated the similarities between PROMIS-16 profile domain scores and the criterion measures. In cross-sectional analyses among all participants, the results demonstrated that the PROMIS-16 domain scores closely align with the corresponding criterion measure domains. In longitudinal analyses among participants with chronic back pain, we observed high concordance and moderate to substantial agreement in classifying changes between PROMIS-16 and corresponding criterion measure domains. These findings indicate that although the PROMIS-16 profile contains only two items per domain, it performs similarly to the longer PROMIS measures and can be interpreted similarly for the general population and potentially for those with specific health conditions. This evidence supports using the PROMIS-16 in clinical practice and research when clinicians seek a measure with relatively lower burden but comparable properties to existing longer measures.
Interestingly, sleep disturbance and cognitive function reflected the corresponding domains from longer PROMIS measures less accurately and demonstrated lower inter-item correlations than the other six domains. The possible reasons for this discrepancy could be due to the selected items and the longer PROMIS measures used for comparison. In the sleep disturbance domain, neither of the two selected items are included in the PROMIS-29 [9]. Consequently, using the domain score of sleep disturbance from the PROMIS-29 as a reference resulted in a relatively larger mean difference and lower agreement in individual change compared to other domains that shared at least one item with longer PROMIS measures. Further, the fact that the response options for the two sleep disturbance items in the PROMIS-16 were different likely reduced the magnitude of their inter-item correlation. In the cognitive function domain, although one of the two selected items was derived from the PROMIS-29 + 2, the criterion measure domain was constructed using five items, resulting in a relatively larger standardized mean difference for this domain [6, 19,20,21]. However, we found moderate agreement between cognitive function and its five-item criterion measure domain in change groups.
The study’s limitations warrant discussion. First, while the baseline sample was drawn from a panel nationally representative of US adults, only participants who reported back pain were surveyed at the 6-month follow-up. Therefore, caution is needed when interpreting the results of longitudinal analyses. Second, although about one-third reported back pain, future studies are encouraged to replicate our analyses in a clinical sample and in patients before and after a specific clinical event.
Our findings suggest that the domain scores of the PROMIS-16 profile generally reflect those of longer PROMIS measures, such as the PROMIS-29 domains plus a 5-item cognitive function domain scores. The PROMIS-16 profile can generate psychometrically sound (reliable and valid) domain-specific HRQOL scores. Future research is encouraged to evaluate these domain scores in different clinical contexts.
References
Kingsley, C., & Patel, S. (2017). Patient-reported outcome measures and patient-reported experience measures. BJA Education, 17(4), 137–144. https://doi.org/10.1093/bjaed/mkw060
Basch, E., Deal, A. M., Kris, M. G., Scher, H. I., Hudis, C. A., Sabbatini, P., et al. (2016). Symptom Monitoring with patient-reported outcomes during Routine Cancer treatment: A Randomized Controlled Trial. Journal of Clinical Oncology, 34(6), 557–565. https://doi.org/10.1200/JCO.2015.63.0830
Basch, E., Schrag, D., Henson, S., Jansen, J., Ginos, B., Stover, A. M., et al. (2022). Effect of electronic symptom monitoring on patient-reported outcomes among patients with metastatic Cancer: A Randomized Clinical Trial. Journal of the American Medical Association, 327(24), 2413–2422. https://doi.org/10.1001/jama.2022.9265
HealthMeasures (2023). December,. PROMIS® Score Cut Points. https://www.healthmeasures.net/score-and-interpret/interpret-scores/promis/promis-score-cut-points. Accessed.
Cella, D., Riley, W., Stone, A., Rothrock, N., Reeve, B., Yount, S., et al. (2010). The patient-reported outcomes Measurement Information System (PROMIS) developed and tested its first wave of adult self-reported health outcome item banks: 2005–2008. Journal of Clinical Epidemiology, 63(11), 1179–1194. https://doi.org/10.1016/j.jclinepi.2010.04.011
Hays, R. D., Spritzer, K. L., Schalet, B. D., & Cella, D. (2018). PROMIS®-29 v2.0 profile physical and mental health summary scores. Quality of Life Research, 27(7), 1885–1891. https://doi.org/10.1007/s11136-018-1842-3
Hays, R. D., Bjorner, J. B., Revicki, D. A., Spritzer, K. L., & Cella, D. (2009). Development of physical and mental health summary scores from the patient-reported outcomes measurement information system (PROMIS) global items. Quality of Life Research, 18(7), 873–880. https://doi.org/10.1007/s11136-009-9496-9
Palmer, M. J., Mercieca-Bebber, R., King, M., Calvert, M., Richardson, H., & Brundage, M. (2018). A systematic review and development of a classification framework for factors associated with missing patient-reported outcome data. Clinical Trials (London, England), 15(1), 95–106. https://doi.org/10.1177/1740774517741113
Edelen, M. O., Zeng, C., Hays, R. D., Rodriguez, A., Hanmer, J., Baumhauer, J., et al. (2024). Development of an ultra-short measure of eight domains of health-related quality of life for research and clinical care: the patient-reported outcomes measurement information system® PROMIS®-16 profile. Qual Life Res Published Online February, 6. https://doi.org/10.1007/s11136-023-03597-6
Torongo, R., KnowledgePanel New York, N. Y., & Ipsos (2023). https://www.knpanel.com/participate/faq.html. Accessed December, 2023.
Bradley, V. C., Kuriwaki, S., Isakov, M., Sejdinovic, D., Meng, X. L., & Flaxman, S. (2021). Unrepresentative big surveys significantly overestimated US vaccine uptake. Nature, 600(7890), 695–700. https://doi.org/10.1038/s41586-021-04198-4
Hays, R. D., Liu, H., & Kapteyn, A. (2015). Use of Internet panels to conduct surveys. Behavior Research Methods, 47(3), 685–690. https://doi.org/10.3758/s13428-015-0617-9
Hays, R. D., Qureshi, N., Herman, P. M., Rodriguez, A., Kapteyn, A., & Edelen, M. O. (2023). Effects of excluding those who Report having Syndomitis or Chekalism on Data Quality: Longitudinal Health Survey of a Sample from Amazon’s mechanical Turk. Journal of Medical Internet Research, 25, e46421. https://doi.org/10.2196/46421
Schober, P., Boer, C., & Schwarte, L. A. (2018). Correlation coefficients: Appropriate use and interpretation. Anesthesia and Analgesia, 126(5), 1763–1768. https://doi.org/10.1213/ANE.0000000000002864
Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155–159. https://doi.org/10.1037//0033-2909.112.1.155
Hays, R. D., Reise, S. P., & Herman, P. M. (2023). Estimating individual health-related quality of life changes in low back pain patients. Bmc Musculoskeletal Disorders, 24(1), 961. https://doi.org/10.1186/s12891-023-07093-3
Hays, R. D., Spritzer, K. L., & Reise, S. P. (2021). Using item response theory to identify responders to treatment: Examples with the patient-reported outcomes Measurement Information System (PROMIS®) physical function scale and emotional distress composite. Psychometrika, 86(3), 781–792. https://doi.org/10.1007/s11336-021-09774-1
McHugh, M. L. (2012). Interrater reliability: The kappa statistic. Biochem Med (Zagreb), 22(3), 6.
Cella, D., Choi, S. W., Condon, D. M., Schalet, B., Hays, R. D., Rothrock, N. E., et al. (2019). PROMIS® Adult Health profiles: Efficient short-form measures of Seven Health domains. Value In Health : The Journal of the International Society for Pharmacoeconomics and Outcomes Research, 22(5), 537–544. https://doi.org/10.1016/j.jval.2019.02.004
Hanmer, J., Cella, D., Feeny, D., Fischhoff, B., Hays, R. D., Hess, R., et al. (2018). Evaluation of options for presenting health-states from PROMIS® item banks for valuation exercises. Quality of Life Research, 27(7), 1835–1843. https://doi.org/10.1007/s11136-018-1852-1
Kinsky, S., Liang, Q., Bellon, J., Helwig, A., McCracken, P., Minnier, T., et al. (2021). Predicting Unplanned Health Care utilization and cost: Comparing patient-reported outcomes Measurement Information System and Claims. Medical Care, 59(10), 921–928. https://doi.org/10.1097/MLR.0000000000001601
Funding
This study was supported by the National Center for Complementary and Integrative Health (NCCIH). Grant No. 1R01AT010402-01A1.
Open access funding provided by SCELC
Author information
Authors and Affiliations
Contributions
All authors contributed to the study’s conception and design. CZ performed the data analysis and wrote the first draft of the manuscript. All authors provided a critical review of the manuscript.
Corresponding author
Ethics declarations
Ethical approval
This study was performed in line with the principles of the Declaration of Helsinki. The study protocol was reviewed and approved by the research team’s institutional review board (RAND Human Subjects Research Committee FWA00003425; IRB00000051).
Consent to participate
Written informed consent was obtained from study participants.
Competing interests
The authors have no relevant financial or non-financial interests to disclose.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Zeng, C., Hays, R.D., Rodriguez, A. et al. Comparing patient-reported outcomes measurement information system® (PROMIS®)-16 domain scores with the PROMIS-29 and 5-item PROMIS cognitive function scores. Qual Life Res (2024). https://doi.org/10.1007/s11136-024-03747-4
Accepted:
Published:
DOI: https://doi.org/10.1007/s11136-024-03747-4