Background

Chronic pain is a common medical condition that causes significant distress and disability [1]. The prevalence of chronic pain in adults, defined as lasting for at least 6 months, is estimated in the range of 10% to 55% depending on age, sex, setting and type of chronic pain with a weighted mean prevalence of 31% in US adults, and is consistently reported to be higher in women [2, 3]. Psychological interventions, either alone or in combination with pharmacological treatments, are widely recommended for pain management and treatment [4]. Psychological therapies consist of behavioural and cognitive treatments that are designed to ameliorate pain, distress and disability.

Psychological interventions were introduced over 40 years ago and are now well established in clinical practice [5]. Several randomized controlled trials (RCTs) but also uncontrolled trials, observational studies, and clinical case reports have suggested a positive effect of psychological interventions on pain management, although the reported effect sizes vary widely [6]. Moreover, narrative reviews have generally supported the effectiveness of psychological treatments on a range of pain conditions [7,8,9]. Meta-analyses and systematic reviews have provided additional evidence for the effectiveness of psychological treatments in the management of chronic pain [10,11,12]. However, the effect sizes across all meta-analyses are modest, only rising above a medium-size effect (i.e., standardised mean difference larger than 0.5) in lower quality studies [4]. The effectiveness of psychological treatments is shown to be over-estimated in poorly designed studies, and is reduced when controlled for quality and adjusted for potential bias [4, 13]. Thus, the reported heterogeneity in effect sizes is partly explained by the quality of the studies [13]. This observation is indicative of the possibility of bias in this literature, which could be due to publication or other selective reporting biases, where study authors employ several data collection and analysis techniques but publish only the most statistically significant findings [14,15,16,17,18]. Because of the wide implementation of psychological interventions in pain management and the elevated likelihood for biases in this field as shown in prior relevant empirical research [19, 20], we used an umbrella review approach [21, 22] that systematically appraises the evidence on an entire field across many meta-analyses. In the present study we aimed to broaden the scope of a typical umbrella review by further evaluating the strength of the evidence and the extent of potential biases [23,24,25,26,27] on this body of literature.

Methods

Literature search and data extraction

We identified all relevant meta-analyses investigating the association of psychological interventions on pain management. We searched PubMed (until July 2016) and the Cochrane (until September 2016) database of systematic reviews for papers written in English, performed in humans using the following three keywords: “pain”, “meta-analysis” and “psychology”. In addition, we performed a manual review of references from available systematic and narrative reviews. In total, 987 publications were identified in the electronic databases and additional 29 via manual review. Two investigators (GM and ER) examined independently the titles, abstracts and full texts of the shortlisted meta-analyses to decide on eligibility. Discrepancies were resolved by consensus and with discussion with a third investigator (KKT). We considered all age groups (i.e., children, adolescents and adults) and all types of pain, and examined the effect of psychological interventions both at short and long-term periods. Meta-analyses that did not report study-specific information (i.e., effect size, 95% confidence intervals [CIs], sample size) were excluded. When more than one meta-analysis on the same research question was identified, the one with the largest number of component studies was selected. Only seven meta-analyses were excluded by this criterion, all of them being substituted with updated meta-analyses published from the same author teams, thus no potentially relevant study was omitted. Two investigators (GM and ER) extracted independently the data from each meta-analysis, and a third investigator (ED) verified the validity of the extracted data. Information was abstracted from each study at the meta-analysis and individual study level. At the meta-analysis level, we abstracted information on first author, year of publication, examined interventions, outcomes, and number of included studies. At the individual study level, we abstracted information on study design, quality assessment/risk of bias score, sample size, effect estimate (i.e., mean difference [MD]; standardised mean difference [SMD]; risk ratio), and 95% CIs. For consistency, risk ratios and the corresponding CIs were converted into SMDs [28]. Positive and negative effect sizes were observed across the different meta-analyses because different outcome metrics were used, but all summary effect sizes were coined to express pain reduction. For example, assuming that a psychological intervention reduces pain, one can expect a positive effect in a meta-analysis examining the efficacy of the intervention in pain reduction, and a negative effect in another meta-analysis examining the difference in pain levels between intervention and control groups. In the current umbrella review, the primary analysis focused only in meta-analyses of RCTs and sensitivity analysis was performed including all study designs. Our study was conducted in accordance with guidelines for conducting and reporting umbrella reviews [21, 22].

Types of interventions and outcomes considered

Meta-analyses of psychological interventions with a variety of theoretical underpinnings were considered. Any type of cognitive intervention such as hypnosis, guided imagery and distraction, and any type of behavioural intervention, such as biofeedback and relaxation, as well as their combinations were included [29]. All types of psychotherapy and psycho-education were also included in our umbrella review, whereas meta-analyses of other non-formal psychological interventions, such as acupuncture, massage, yoga and meditation were excluded. Interventions on single patients, pairs or families, either by physical contact between the therapist and the subjects, or by utilizing web-based platforms were considered. Some studies assessed the effectiveness of a single technique, such as biofeedback, whereas others assessed the effectiveness of a comprehensive psychological approach, such as Cognitive Behavioural Therapy. A complete list of interventions considered in our umbrella review is shown on Table 1, which illustrates the complete list of included studies.

Table 1 Characteristics of the 38 included meta-analysis papers

Assessment of summary effects and heterogeneity

In the present umbrella review, both fixed and random effects meta-analysis methods were applied. Fixed effect meta-analysis is based on the assumption that every study in the meta-analysis is estimating the one true underlying effect and that the observed differences and heterogeneity thereof is due to chance alone. A random effect meta-analysis is based on the assumption that every study is estimating a different underlying effect and that all these effects follow a distribution. In order to test for between-study heterogeneity, we implemented the χ2-based Cochran Q test [30] and the I2 metric of inconsistency [31], which is defined as the ratio of between-study variance over the sum of the within-study and between-study variances. The I2 metric takes values between 0 and 100 and represents the percentage of the variability in the effect sizes that is due to between-study heterogeneity. I2 values of 25%, 50%, and 75% indicate low, moderate, and large heterogeneities, respectively. Ninety-five percent prediction intervals were also calculated, which further take into account the between-study heterogeneity and estimate the effect that would be expected in a future study investigating the same association [32, 33].

Assessment of small-study effects

The assessment of small-study effects was used to investigate whether smaller studies tend to give larger effect estimates compared to larger studies. Differences between small and large studies can reflect genuine heterogeneity, chance or biases. The regression asymmetry test, as proposed by Egger, was used to evaluate small-study effects [34, 35]. Based on the test, a p-value smaller than or equal to 0.10, along with the random effects summary estimate being inflated compared to the point estimate of the largest study in the meta-analysis, were an indication of small study effects. Effect magnitude asymmetry may arise due to several reasons, such as true heterogeneity, publication biases or chance, but the asymmetry test can only indicate its existence and cannot distinguish the reason behind it. However if the asymmetry is assumed to be a product of bias, the extrapolation of the Egger’s regression line to a zero standard error, which corresponds to a theoretical study of infinite size, can be regarded as an estimation of the effect size that is free from biases [35,36,37].

Evaluation of excess statistical significance

The excess statistical significance test was performed to investigate whether the observed number of studies with nominally statistically significant results (P < 0.05) is greater compared to an expected number of studies with statistically significant results [38]. An excess of statistical significant findings in a meta-analysis may imply the presence of selective reporting bias, as many underpowered studies with statistically significant results may be identified in the field. The sum of the statistical power estimates for each component study in a meta-analysis was used to calculate the expected number of studies with statistically significant results. The power of each individual component study depends on the effect size that the tested psychological intervention has on pain. The actual size of the true effect is not known but was estimated in the current umbrella review using the effect size of the largest study (i.e., smallest standard error) in each meta-analysis [38, 39]. The statistical power of each study was calculated using the power command in Stata (College Station, TX). Excess statistical significance was claimed if P < 0.10 (one-sided p < 0.05 with observed > expected number of studies with statistically significant results).

Quality of the included studies

We assessed the methodological quality of the included meta-analyses using the assessment of multiple systematic reviews (AMSTAR) tool [40]. We categorised the study quality based on the overall AMSTAR score as high (8-11 items achieved), moderate (4-7 items) and low (0-3 items). We further gathered any quality assessment/risk of bias score information pertaining to the primary studies, based on what the meta-analyses reported.

Grading the evidence

Using the criteria mentioned above, associations that presented nominally statistically significant random effects summary estimates (i.e., P < 0.05) were categorised into strong, highly suggestive, suggestive, or weak evidence, following a grading scheme that has already been applied in various fields [23,24,25,26,27]. A strong association was claimed when the p-value of the random effects meta-analysis was smaller than 10−6, the meta-analysis had more than 1000 participants, the largest study in the meta-analysis was nominally statistically significant (i.e., P < 0.05), the I2 statistic of between study heterogeneity was smaller than 50%, the 95% prediction intervals were excluding the null value, and there was no indication of small study effects or excess significance bias. The criteria for a highly suggestive association were met if: P < 10−6, >1000 participants, and largest study in the meta-analysis presenting nominally significant estimate (i.e., P < 0.05). An association was supported by suggestive evidence if the meta-analysis included more than 1000 participants and the random effects P was smaller than 10−3. All other nominally statistically significant associations (i.e., P < 0.05) were deemed to have weak evidence.

The vast majority of the primary trials in the meta-analyses included very small numbers of participants. However, as the majority of these trials are randomized experiments one would expect to see valid estimates even with lower sample sizes. We conducted a sensitivity analysis by lowering the threshold for the number of participants in a meta-analysis, as a method of checking the robustness of our evidence grading approach. Therefore, we reclassified all associations using a sample size threshold of more than 500 participants instead of 1000. All analyses were performed using Stata version 13 (College Station, TX) [41].

Results

Description of meta-analyses

Of the 1016 articles initially identified, 38 papers [6, 10, 11, 13, 42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75] including 150 meta-analyses models with 865 individual study estimates were finally selected (Table 1 and Fig. 1). These studies included associations between several psychological interventions (comprehensive therapies or single techniques) and 29 different types of pain (i.e., acute pain, affective pain, arthritis, breast cancer, cancer in general, cancer pain severity, chest, chronic and recurrent, chronic back, chronic low back, chronic musculoskeletal, chronic pain, chronic pelvic, expected pain, fibromyalgia, headache, irritable bowel syndrome, low back, muscle pain, muscle palpation, myofascial temporomandibular disorder, needle-related pain in children and adolescents, orofacial, osteoarthritis, pain on intercourse, pain relief, recurrent abdominal, rheumatoid arthritis, vaginal pain). Of the 865 individual studies included in this umbrella review, 741 (85.7%) were randomized controlled trials, 42 (4.9%) were non-randomized controlled trials or clinical controlled trials, 6 (0.7%) were quasi-RCTs, 4 (0.5%) were uncontrolled pre-post clinical trials, whereas for 72 studies this information was not reported. The evaluation of all 150 meta-analyses of the 865 individual studies is presented in detail on Additional file 1: Tables S1 and S2, but the critical appraisal of the evidence from now on focuses only on associations from the 141 meta-analyses using only RCTs that are summarized on Additional file 1: Tables S3 and S4. There were 2 to 38 individual studies combined per meta-analysis with a median of 3 studies. The median number of participants in the intervention and control groups in each meta-analysis were 115 and 107, respectively. The smallest total sample size in a meta-analysis was 44 and the largest was 4270.

Fig. 1
figure 1

Flow chart of literature selection

Summary effect size

Out of the 141 meta-analyses including only randomized evidence (Additional file 1: Table S3), the summary random effects estimates were statistically significant at the P = 0.05 level in 56 (40%) meta-analyses, whereas the summary fixed effects were significant in 75 (53%) meta-analyses. Reductions in pain were observed in all statistically significant meta-analyses comparing the intervention to the control group. When the P = 0.001 level was used as a threshold for statistical significance, only 28 (20%) and 47 (33%) meta-analyses remained statistically significant using the random and fixed effects method, respectively. Only four associations on psychological interventions for cancer pain severity, irritable bowel syndrome, headache, and chronic headache in children produced statistically significant results when a P value of 10−6 was used as the significance threshold based on the random effects model. The effect of the largest study included in each meta-analysis is also presented in Table S3, which was nominally statistically significant in only 41 (29%) out of the 141 meta-analyses. The findings from the largest studies were more conservative than the summary estimates in 65 (46%) comparisons. Finally, most of the largest studies in each meta-analysis (n = 103; 73%) suggested effects of small or small-to-medium magnitude (i.e., SMD < 0.5), and similar magnitudes were observed in the majority of the summary random effects estimates (n = 98; 70%). When 95% prediction intervals were calculated, the null value was excluded in only 9 meta-analyses that investigated psychological interventions for pain management in patients with irritable bowel syndrome, fibromyalgia, osteoarthritis, rheumatoid arthritis, arthritis and headache (Additional file 1: Table S3).

Between-study heterogeneity

Τhe Q test showed statistically significant heterogeneity (P ≤ 0.10) in 58 (42%) meta-analyses (Additional file 1: Table S4). There was moderate to high heterogeneity (I2 = 50%-75%) in 34 (24%) meta-analyses and very high heterogeneity (I2 > 75%) in 25 meta-analyses (18%) of eight different types of pain (i.e., chest pain frequency; chronic low back pain; chronic pain-excluding headache; needle-related pain/distress in children and adolescents; chronic pelvic pain; headache; fibromyalgia; pain on intercourse). Uncertainty around the heterogeneity estimates was often large, as reflected by wide 95% CI of the I2 (Additional file 1: Table S4).

Small study effects and excess significance bias

There was not substantial evidence for presence of small study effects according to the Egger’s regression asymmetry test. Only in eight out of 141 (6%) meta-analyses, the p-value was smaller than 0.10 and the effect of the largest study was more conservative than the summary effect estimate. Nominally statistically significant summary estimates were calculated only for five associations (4%) after extrapolating the Egger regression line on a funnel plot to an infinitively large study (Additional file 1: Table S4). Ten meta-analyses (7%) (i.e., pain in breast cancer patients and survivors, cancer pain severity, chronic pain-excluding headache; self-reported needle-related in children and adolescents for two different interventions; low back pain; chronic lows back pain for two different interventions, frequency of chest pain, and irritable bowel syndrome pain) had evidence of statistically significant excess of “positive” studies, when the plausible effect was assumed to be equal to the effect of the largest study in each meta-analysis (Additional file 1: Table S4). An excess of significant findings in a meta-analysis coupled with an indication of small study effects based on Egger’s p-value can provide further evidence for the presence of selective reporting biases in the field. Only two meta-analyses presented indication for both excess significance and small study effects bias.

Grading the evidence

None of the examined associations could claim either strong (random effects P < 10−6, > 1000 participants, statistically significant largest study, the I2 < 50%, the 95% prediction intervals were excluding the null value, and no indication of small study or excess significance bias) or highly suggestive (random effects P < 10−6, > 1000 participants, statistically significant largest study) evidence (Table 2). Twelve associations (i.e., cancer pain severity, pain from breast cancer; chronic musculoskeletal pain at 4 and 6 months follow-up; chronic pain; arthritis; osteoarthritis, rheumatoid arthritis; fibromyalgia; self-reported needle-related pain in children and adolescents; chronic non-headache pain; irritable bowel syndrome pain) were supported by suggestive evidence with random effects p-values smaller than 0.001 and more than 1000 participants in the relevant meta-analyses. None of these meta-analyses could reach the higher categories of evidence for a combination of reasons. Only 2 out of the 12 meta-analyses had P < 10−6, the largest study in the meta-analysis was not statistically significant in 7 out of 12, prediction intervals included almost always the null value (8 out of 12), and there was potential for small study effects (3 out of 12) and excess significance bias (4 out of 12). Finally, 44 associations were supported by weak evidence reporting just nominally statistically significant (P < 0.05) random effects calculations.

Table 2 Grading of the evidence for the meta-analyses of RCTs investigating the effectiveness of various psychological interventions for pain reduction

When in a sensitivity analysis, we altered the threshold of total population size to 500 instead of 1000 participants, seven associations (osteoarthritis, headache; chronic low back in two different time points; fibromyalgia in long term; chronic non-headache; chronic and recurrent non-headache in children and adolescents) were upgraded from weak to suggestive evidence and one (chronic and recurrent headache in children and adolescents) was upgraded from weak to highly suggestive evidence. When we also included non-RCT evidence in our appraisal, 13 and 51 associations were supported by suggestive and weak evidence, respectively (Additional file 1: Table S5). The evidence grading across all studies compared to the grading of the proposed associations using only randomized evidence did not change with the exception of biofeedback versus control on post-treatment chronic back pain and verbal suggestion on pain relief, which were supported by highly suggestive evidence in studies of unclear design.

Quality of the included studies

Based on the AMSTAR quality assessment tool (Additional file 1: Table S6), the quality of the included meta-analyses ranged widely, from 2 to 11 points, with a median of 7 points. Most of the included meta-analyses had high (16 of 38; 42%), or moderate (n = 16; 42%) quality and only 6 (16%) meta-analyses had low quality. To further evaluate the potential existence of bias in this evidence base, we collected and summarized on Additional file 1: Table S7 the quality assessment scores that were originally included in the evaluated meta-analyses. Briefly, most meta-analyses included on average primary studies of low to moderate quality.

Discussion

In the present large-scale umbrella review, we examined the strength of the evidence and extent of potential biases in 150 published meta-analyses of psychological interventions for pain reduction. None of the 150 associations was supported by either strong or highly suggestive evidence. Only 12 associations from the 141 RCT-only meta-analyses were supported by suggestive evidence indicating reductions in pain from breast cancer, arthritis, rheumatoid arthritis, osteoarthritis, chronic musculoskeletal pain (in two different time points), fibromyalgia, self-reported needle-related pain in children and adolescents, irritable bowel syndrome pain, chronic pain, chronic non-headache pain, and cancer pain severity comparing different psychological interventions to standard care.

Of the 12 associations that were supported by suggestive evidence, six were related to musculoskeletal conditions. Specifically, evidence suggested that the Arthritis Self-Management Program, a program of interventions that aim to increase the individual’s ability to manage pain, had a statistically significant effect in lowering chronic musculoskeletal pain after four (SMD, −0.23; 95% CI, −0.36 to −0.11) or 6 months (SMD, −0.29; 95% CI, −0.42 to −0.16) compared to usual care. There was only weak evidence of Arthritis Self-Management Program lowering chronic musculoskeletal pain after 1 year of intervention, and the magnitude of the effect was smaller (SMD, −0.13; 95% CI: -0.24 to −0.03) indicating that while such interventions are potentially effective in the short-term, the effect seems to wear off with time. Suggestive evidence supported the effect of psychological treatments, such as cognitive-behavioural therapy, hypnosis or stress management, in lowering arthritis pain (SMD, −0.2; 95% CI: -0.3 to −0.1). The evidence was suggestive also for the effect of self-regulation on pain reduction in patients with rheumatoid arthritis (SMD, 0.18; 95% CI: 0.07 to 0.29) compared to standard-care and for self-management programs on osteoarthritis pain reduction (SMD, −0.17; 95% CI: -0.26 to −0.08). Finally, the same was true for fibromyalgia (SMD, −0.30; 95% CI, −0.45 to −0.15). The remaining six associations that were supported by suggestive evidence regarded cancer pain severity (SMD, 0.34; 95% CI: 0.23 to 0.46), pain in breast cancer patients (SMD, 0.34; 95% CI: 0.18 to 0.50), self-reported needle-related pain in children and adolescents (SMD, −0.44; 95% CI: -0.67 to −0.21), irritable bowel syndrome (SMD, 0.40; 95% CI: 0.30 to 0.51), chronic non-headache pain (SMD, −0.37; 95% CI: -0.59 to −0.15), and chronic pain (SMD, 0.29; 95% CI: 0.15 to 0.43). Although the latter associations were statistically significant at P < 10−3 and the evidence was supported by an adequate sample size in the relevant meta-analyses (>1000 participants), they could not reach the strong and highly suggestive categories of evidence for a combination of reasons relevant to evidence strength (P < 10−6) and validity, as prediction intervals included almost always the null value and there was potential for small study effects and excess significance bias.

Our results come in discordance with the generally strong belief in the literature that psychological therapies are universally effective on a variety of pain conditions [76,77,78]. However, this belief is mainly established based on a limited number of small primary studies, and future larger studies are warranted. Notably, the median number of individuals in the intervention and control groups in each individual study included in our systematic evaluation was only 33 and 28 respectively, whereas the median number of studies included in each meta-analysis was only three. Our evaluation revealed that the reported effectiveness is usually overstated in the existing studies. The nominally statistically significant associations between psychological interventions and pain were confirmed in less than half of the examined meta-analyses. In addition, the random effects estimates were statistically significant in only 20% of the meta-analyses, when a P-value threshold of 0.001 was applied. Furthermore, in only nine meta-analyses the prediction interval excluded the null value, thus suggesting that only 6% of future studies are expected to demonstrate substantial “positive” (i.e. not null) associations between psychological interventions and pain treatment.

Regarding the validity of the examined associations, the effect of the largest study in each meta-analysis, which is expected to provide the most stable and valid estimate, was nominally statistically significant in only 29% of the cases and the effect size was of small magnitude and often more conservative than the summary effect estimate. Heterogeneity was high or very high (I2 > 50%) in 42% of the meta-analyses. The evidence for presence of small study effects or excess significance bias was low overall, but the existence of biases cannot be ruled out based only on a negative and potentially underpowered statistical test in meta-analyses with few primary studies. A combination of different forms of biases might still be affecting the results. One such is the selective reporting of “positive” versus “negative” findings. In various areas of clinical investigation “negative” findings are of “limited impact” and, therefore, remain often unpublished. Statistical significance testing should not be used in the future as a criterion for publication. Moreover, one cannot exclude the possibility of questionable research practices, such as selective reporting of study methods and results, p-value fishing, or deciding to collect more or stop collecting data only after looking whether the results are statistically significant, which have been shown to constitute common research practices [15, 79,80,81]. Most of the included meta-analyses had a moderate and high quality rating based on the AMSTAR quality assessment tool. However, the herein included meta-analyses evaluated the quality of their primary studies as low to moderate with only a few exceptions of high quality studies.

Pain is a challenging clinical entity to assess due to its multifaceted and subjective nature. In our approach, we assessed pain reduction as an outcome of interest. The pain management literature includes many more outcomes including, but not limited to, measures of function, quality of life, depression and perception of coping abilities, which lie beyond the scope of the present work. Nevertheless, the selection of valid outcome measures for pain and pain-related disability is of great importance due to its close relationship to treatment efficacy replication. Moreover, in pain-related clinical trials, there is generally a lack of standardization both in the pain-related outcome measurement and in pain-related outcome reporting, hampering efforts to synthesize evidence [82]. Even, for the pain reduction assessment per se, there are a number of parameters that can contribute to the observed heterogeneity and/or affect the level of bias operating in the field; statistical versus clinical significance and the usual lack of minimal important difference metrics, daily home data collection challenges, questionnaire and scale structure variations, length of follow-up and appropriateness thereof. The validity and feasibility of objective pain measurements are all attributes of the study design that affect the validity of the evidence base and jeopardize its translational potential.

A crisis of confidence in psychological science has recently emerged [83], following a series of revelations of questionable research practices and presence of bias coupled with reluctance to publish study protocols and conduct replication studies [14, 15, 80]. Psychotherapies have been questioned as effective approaches to reduce mental suffering in many conditions [84, 85], such as depression. There are few studies investigating potential biases in the reported associations of psychological interventions for pain management [86], although such interventions are widely used in clinical practice. A further strength of our study was that the main analysis used only evidence from randomized controlled trials, which are considered the gold standard for evidence. Some limitations should be also acknowledged in our work. Excess statistical significance and asymmetry tests offer hints of bias, not definitive proof thereof, but our estimates are likely to be conservative as a negative test result does not exclude the potential for bias.

Conclusions

In conclusion, the present findings support that the effectiveness of psychological treatments for pain management is overstated and the supporting empirical evidence is weak. The present findings combined with the fact that psychological intervention trials are still at an early research stage and fall short compared to drug trials [87] underline the necessity for larger and better-conducted RCTs [85] Future research should further focus on building networks involving all stakeholder groups to achieve consensus and develop guidance on best practices for assessing and reporting pain outcomes [88, 89]. The use of standardized definitions and protocols for exposures, outcomes, and statistical analyses may diminish the threat of biases and improve the reliability of this important literature.