Introduction

Pain catastrophizing is a coping strategy characterized by a pattern of maladaptive cognitive and affective response to pain1. Evidence from pain research indicates that higher pain catastrophizing is associated with worse pain intensity ratings and greater disability, both concurrently1,2,3 and several months after self-reports of pain catastrophizing4,5. Reductions in pain catastrophizing are associated with lowered pain ratings, improved disability status, and increased likelihood of returning to work6,7. Psychological interventions have been developed that are effective in reducing pain catastrophizing8,9.

Currently, the most well-known measure of pain catastrophizing is the 13-item Pain Catastrophizing Scale (PCS). Substantial evidence has accumulated to support the validity of PCS scores, and the measure is widely used in research1,10,11. Completing the 13 items, however, can be burdensome to research participants, who often complete extensive survey batteries involving multiple measures, and to patients and staff at busy outpatient clinics. Patients seeking treatment are likely to be asked to report on several relevant health outcomes, in addition to pain catastrophizing, including pain intensity, depression, anxiety, pain interference, and physical and social function12,13. Short-forms, such as the 4-item BriefPCS, have been developed to reduce response burden and promote the utility of the PCS14,15,16. The four items of the BriefPCS were selected from the 13-item PCS based on triangulation of qualitative and quantitative evaluations of the items14.

One study compared the PCS and BriefPCS based on Rasch analysis and calculated test information for the PCS, the BriefPCS, and other short form scales14,15,16. Using Rasch analysis, the study identified four items with an acceptable fit, adequate unidimensionality, and adequate reliability. Test information estimates the degree to which a measure discriminates different levels of the outcome being measured17,18. However, test information is the sum of the information in all the items of the scale. Adding items to a measure will always increase the information sum and deleting them will decrease it. What is more relevant is how well the BriefPCS will function in a randomized controlled trial (RCT) compared to the PCS, and this has not been evaluated.

Based on these considerations, the current study aimed to characterize the validity, reliability, and accuracy of the BriefPCS in distinguishing levels of pain catastrophizing in adults with chronic pain who sought treatments at a tertiary pain clinic, and to characterize the BriefPCS’s sensitivity to change following pain psychology interventions. We hypothesized that PCS scores crosswalked from BriefPCS scores would accurately predict actual PCS scores. Furthermore, the concurrent validity of the BriefPCS would be similar to that of the PCS. Finally, we hypothesized that the BriefPCS would perform similarly to the PCS in a large RCT of pain psychology interventions in patients with chronic pain.

Methods

Figure 1 summarizes the methodology used for this study and clarifies which samples and measures were used for the analyses of the study’s three research hypotheses.

Figure 1
figure 1

Research methodology diagram. PCS Pain Catastrophizing Scale, PCSCW PCS scores crosswalked from BriefPCS, RMSD Root mean squared deviations, MMRM Mixed models for repeated measure.

Samples

Study procedures, which involved exclusively retrospective review of clinical data, were approved by the Institutional Review Board at the Stanford University School of Medicine (IRB No. 28435). All methods were carried out in accordance with relevant guidelines and regulations. Informed consent for the standard care procedure and treatment at our clinic was obtained from all patients and their legal guardian. For the current study, we used data collected using CHOIR. CHOIR (http://choir.stanford.edu) is an open-source learning healthcare system that incorporates patient- and clinician reported outcomes across a variety of clinical domains, including pain intensity, physical and psychosocial function (including pain catastrophizing) and global health19,20. CHOIR administers both traditional long-form assessments (e.g., the PCS) and item response theory (IRT)-based assessments from the Patient-Reported Outcomes Measurement Information Systems (PROMIS) item banks developed by the National Institutes of Health. Data from CHOIR have been used in prior empirical work21,22,23,24,25,26,27; however, no publications have presented data extracted from CHOIR related to addressing the aims in this study.

We extracted self-report data from consecutive 21,226 adult patients with mixed etiology of chronic pain seeking treatment at a tertiary pain clinic collected by CHOIR. The data extracted were collected during the period of October 2014 to January 2021 and included responses to the PCS and measures from the PROMIS. Only the initial survey that participants completed was included for the current analysis. The extracted data were randomly divided into two halves to form a Linking Sample and a Validation Sample. Modeling equations, including linking equations, best fit the originating sample because they model all score variation, including random variation. Using an independent sample for linking and validation mitigates this effect, increasing the generalizability of the results. No missing values were found in PCS scores and a listwise deletion was used to examine the concurrent validity.

The third sample for this study was comprised of data collected for a previously published RCT in which PCS scores were the primary outcome, and other self-reported measures were secondary outcomes8. Hereafter, we refer to these data as the RCT Sample. In this RCT 263 adults with chronic low back pain were randomized to (1) single-session pain relief skills intervention (Empowered Relief; ER); (2) 8-session cognitive behavioral therapy (CBT) for chronic pain; or (3) single-session health and back pain education class (HE). At 3 months (the primary endpoint), empowered relief was found to be noninferior to CBT for pain catastrophizing scores as well as other self-reported outcomes.

MEASURES

The 13-item Pain Catastrophizing Scale (PCS) was developed to measure the degree of pain catastrophizing in people with chronic pain1. Research has established the validity of the scores for this purpose. For example, in the developmental study, PCS scores had high internal consistency (Cronbach α = 0.87) and high test–retest reliability (r = 0.70 at 10-week interval)1. Each item is rated on a 5-point scale (0: not at all to 4: all the time) and the total scores range from 0 to 52, with higher scores indicating higher pain catastrophizing. The CHOIR measures time to complete each item of the 13-item PCS. We have found that it takes, on average, 110 s to complete the PCS.

The BriefPCS is a 4-item PCS short-form14 consisting of one item measuring helplessness (item 4 on the original PCS) and three rumination items (items 9, 10, 11). The BriefPCS is rated on a 5-point Likert scale, from 0 (not at all) to 4 (all the time), and the total scores range from 0 to 16. BriefPCS scores showed adequate fit to a unidimensional model and were highly correlated with PCS scores (r = 0.94)14. Bootstrapped Pearson’s r correlations between BriefPCS scores and scores on self-reported measures of other domains such as depression and pain interference supported their concurrent validity14. The magnitude of the score correlations when the BriefPCS was used were very similar to those calculated using PCS scores (magnitude differences \(\le\) 0.04). Based on time measured for the 13-item PCS at an item level, estimated time to complete the 4-item BriefPCS is 36 s.

Patient-Reported Outcomes Measurement (PROMIS) The PROMIS-depression, anxiety28, average pain29, pain interference30, and physical function (upper extremity and mobility)31 measures are part of NIH’s PROMIS compendium of measures. These PROMIS measures were used to examine their relationship with the BriefPCS scores. All PROMIS measures were administered using computer adaptive testing to reduce response burden with sufficient precision32,33. The PROMIS uses T-score metric that is referenced to a sample matching the 2005 US Census population with respect to important demographic values (M = 50, SD = 10)34. Higher T scores indicate more of the symptom or outcome being measured.

Research questions and analyses

In this study we addressed three different questions. To avoid confusion, we highlight the differences here. For the first question, we evaluated the accuracy of predicting PCS scores based on BriefPCS scores using a crosswalk. PCS scores were compared to PCSCW. The second question did not use crosswalked scores. We compared PCS and BriefPCS scores with respect to their responsivity and concurrent validity. Our last question was to examine the impact of using BriefPCS scores on RCT results. BriefPCS scores were used for the third question. We compared the RCT results between when using BriefPCS scores and PCS scores.

Question 1: can crosswalked BriefPCS scores accurately predict actual PCS scores?

Developing a crosswalk

Based on the Linking Sample, loglinear modeling was used to continuize and smooth the cumulative distributions of the observed scores of the BriefPCS and the PCS35. Percentile ranks of the scores were calculated. Scores with common percentile ranks were identified and used to estimate a curve that best described the relationship between scores of each measure35. The direction of linking was from the BriefPCS total scores to the PCS total scores. R package (“equate”) was used for this analysis36. A crosswalk table was created that associated scores of the BriefPCS with scores on the PCS.

Crosswalk accuracy

Within the Validity Sample, we used participants’ BriefPCS scores and the crosswalk to obtain crosswalk-predicted PCS scores, hereafter referred to as PCSCW scores. We took actual PCS total scores as the criterion by which we evaluated the accuracy of the crosswalk, calculating the mean differences (PCS-PCSCW), standard deviation of difference, and root mean squared deviations (RMSD).

Question 2: how does the concurrent validity of BriefPCS scores compare to that of PCS scores?

To compare the concurrent validity of BriefPCS and PCS scores, we calculated Bootstrapped Pearson correlations between PCS scores and scores on PROMIS measures of pain intensity, pain interference, physical function (upper extremity movement, mobility), depression, and anxiety. We repeated these analyses using BriefPCS scores. Results were evaluated based on whether 95% CIs of coefficient estimates overlapped, which would indicate that coefficient estimate differences were not statistically significant (alpha ≤ 0.05)14.

Question 3: what would be the impact on RCT results of using BriefPCS scores instead of PCS scores?

The RCT Sample was used to repeat analyses conducted in the original published study, but using BriefPCS scores rather than PCS scores8. The details of the analyses used in the original study are published8. For our purposes, we targeted the primary outcome of pain catastrophizing at 3 months after treatment (the primary endpoint), operationalized in this study as BriefPCS scores rather than PCS scores. Mixed models for repeated measure (MMRM) regression analysis were conducted using baseline, 1-month, and 3-month scores. For all comparisons among ER, CBT, and HE in the original study, we recalculated the statistical tests using BriefPCS scores, and resulting p-values were compared to those reported for the original study. Of particular interest was whether any of the conclusions of the original study would have changed had BriefPCS been used instead of the PCS.

Results

Demographic and clinical characteristics

Table 1 summarizes the demographic and clinical characteristics for the Linking Sample (N = 10,613) and the Validation Sample (N = 10,613). Demographic and clinical characteristics were not significantly different between the two samples (p ≥ 0.165). Participants in both samples were predominantly female, white/Caucasian, married, and middle-aged. Both samples had a median education level of a bachelor’s degree. The mean average pain intensity rating was 5.4 for the two samples.

Table 1 Demographic and clinical characteristics of the linking and validation samples.

Can crosswalked Brief-PCS scores accurately predict actual PCS scores?

Table 2 is the crosswalk table for converting BriefPCS scores to their equivalent PCS scores based on equipercentile linking. The crosswalked PCS scored were highly correlated with actual PCS scores (r = 0.95). The mean difference (PCS-PCSCW) was 0.04, range = − 18.50 to 17.20. Both standard deviation of differences and RMSD values were 4.16. In the context of low back pain, minimal important change for the PCS has been estimated 8 points for those with scores < 30, and 11 points for score > 3037.

Table 2 Crosswalk tables for estimating PCS scores based on scores on the BriefPCS (linking sample n = 10,613).

How does the concurrent validity of BriefPCS scores compare to that of PCS scores?

Table 3 reports the 95% CIs of bootstrapped Pearson correlation coefficients, calculated in the validity sample, between PCS scores (i.e., PCS and BriefPCS) and scores on PROMIS measures (pain intensity, pain interference, physical function upper extremity and mobility, depression, and anxiety). There were no statistically significant differences in coefficient estimates with PROMIS scores based on the PCS and BriefPCS scores, as indicated by the overlaps of the 95% CIs.

Table 3 Intercorrelations among PCS scores, Brief PCS scores, and scores on PROMIS® measures (validation sample).

What would be the impact on RCT results of using BriefPCS scores instead of PCS scores?

Detailed demographic information for the RCT Sample has been published8. Briefly, the sample was 49.8% female, 60.2% White, 60.8% married; 27.4% of the sample held a bachelor’s degree or less. Like the Linking and Validity Samples, this longitudinal sample was predominately white and middle aged. The mean pain intensity rating was 4.6.

Table 4 presents comparisons for pain catastrophizing at months 1 to 3, adjusted for baseline pain catastrophizing scores and based on intention-to-treat analysis. The group comparison of p-values revealed that compared to HE, ER and CBT were significantly better in reducing pain catastrophizing at all timepoints after treatments and that ER and CBT were similar in reducing pain catastrophizing at 2 and 3 months after treatment. Therefore, the results and conclusions at Month 2 and 3 would be the same using the BriefPCS total scores instead of the PCS total scores8. However, the result at Month 1 (p value of 0.029) would be different from the original study (using α of 0.025 to determine the inferiority).

Table 4 Between-group differences in posttreatment pain catastrophizing as measured by the Pain Catastrophizing Scale (PCS) and the Brief Pain Catastrophizing Scale Scores (BriefPCS).

Discussion

Consistent with our study hypotheses, our findings indicate that the BriefPCS scores validly, reliably, and accurately distinguish levels of pain catastrophizing in adults with chronic pain seeking care at a tertiary pain clinic. In addition, the BriefPCS scores are sensitive to changes after pain psychology interventions, with substantially less respondent burden than the PCS. Our findings have important theoretical and clinical implications.

Our study's first aim was to develop a crosswalk table to associate BriefPCS scores with PCS scores. To our knowledge, this is the first study to use equipercentile linking to extend the interpretability of the BriefPCS scores. The crosswalk proved quite accurate in predicting mean PCS scores in the Validation Sample. The correlations between crosswalked PCS scores and actual PCS scores was high (0.95). The SD of the differences and the RMSD both had values 4.15. This is well below the clinically important difference values of 8 and 11 estimated for the PCS based on a low back pain sample37. A caveat, however, is the variability of the results at the individual level. Discrepancies in scores PCS-PCSCW ranged from − 18.50 to 17.20. The use of cross-walked scores at the individual level (e.g., for clinical monitoring) or with smaller sample sizes may not be appropriate. For example, if a patient has been administered the PCS and the BriefPCS at different times over a treatment period, it would not be appropriate to evaluate the patient’s trajectory by comparing PCS and PCSCW scores. A better approach would be to compute the individual’s BriefPCS score based on the PCS response. This would avoid the potential impact of linking error. Note, this caveat is not about the appropriateness of the BriefPCS scores in a clinical setting. Compared to PCS scores, the BriefPCS had comparable responsiveness and good construct validity as evidenced by their correlations with other self-report measures.

In our concurrent validity analysis, BriefPCS scores functioned quite well compared to PCS scores. Correlation coefficients for the association between BriefPCS scores and PROMIS scores were lower than those obtained using PCS scores, but not substantially. Some of this disparity may be due to the restriction of range in BriefPCS total scores compared to PCS total scores; in the former, there are 17 possible scores and, in the latter, there are 53. This difference may have attenuated the values of the correlation coefficients. BriefPCS and PCS scores were univocal on the relationship between pain catastrophizing and other relevant domains. Higher catastrophizing was moderately associated with greater pain intensity, pain interference, depression, and anxiety (rs = 0.400 ~ 0.599) and less physical function (rs = − 0.325 ~ − 0.295 for upper extremity movement and mobility). These results also concurred with those obtained in the study conducted to create and evaluate the BriefPCS14.

Of particular interest for clinical trials of pain intervention was our secondary analysis of a 3-arm RCT investigating the effectiveness of a single session ER and an 8-week CBT for pain in comparison to HE8. We found that our conclusions would have been unchanged if the study had used BriefPCS scores instead of PCS scores8. CBT and ER were significantly better at reducing pain catastrophizing than HE. The p values were smaller when the PCS scores were used, but incrementally so. There is a difference level between CBT, ER, and HE that, would have been statistically significant with PCS scores, but not with BriefPCs scores when setting p value at < 0.01. Based on the differences in p-values, however, the difference would have been quite small and the clinical relevance would be questionable. Nevertheless, the researcher is responsible for weighing the impact of response burden and increased precision for a particular research question and a specific setting. We concluded that the use of BriefPCS scores would be defensible for most such questions and contexts. Future research should compare the BriefPCS and PCS with regard to statistical power and impact on sample sizes to find different levels of treatment effect.

A few limitations should be noted in understanding our findings. First, we obtained our Linking and Validation Samples at a single tertiary pain clinic in Northern California, and thus, our results may not generalize to different clinical or community settings. Although our sample was inclusive of all patients seeking treatment at a tertiary pain clinic, our sample was mainly female, White/Caucasian, and highly educated people. Hence, the results from the present study may not generalize to male patients with chronic pain and individuals with ethic/racial diversity and low education levels. Yet, some evidence exist that the PCS is invariant across different clinical and non-clinical samples and across sex38. Secondly, the utility of BriefPCS will be limited for researchers interested in discriminating among the impact of the 3 sub-constructs of the PCS (i.e., magnification, helplessness, and rumination). Strength is a large sample size (N = 10,613 for linking and 10,613 for validation), which allowed us to investigate the full distribution of the BriefPCS scores in a large clinical sample. Our sample is approximately 25 times larger than the reference sample in the original study developing and validating the 13-item PCS (N = 851)1. Between these two samples, the means and standard deviations of the PCS total scores were similar (mean = 20.9 for the original sample and 21.1 for our sample; standard deviation = 12.5 for the original sample and 12.8 for our sample). Additionally, the respective 13-item PCS total scores of 10, 20, and 30 correspond to 25th, 50th, and 75th percentiles in the original1 and our samples. Therefore, our sample reproduces the distribution of the PCS total scores from the original sample to remarkable degree and our equipercentile linking results may have a high external validity.

Despite the limitations, our current study is the first to extend the interpretability of the BriefPCS scores using equipercentile linking to associate BriefPCS scores with scores on the PCS. The crosswalk could prove helpful in interpreting the results of research completed with the BriefPCS to the vast body of work that has been done using the PCS. Our results concluded that the BriefPCS is a viable alternative to the PCS, especially when clinicians and researchers have concerns about response burden. In conclusion, the BriefPCS is an efficient tool to assess pain catastrophizing and its responsiveness to pain psychology interventions.