Background

Recovery from anesthesia and surgery is often hampered by side effects such as pain or postoperative nausea and vomiting (PONV), by disturbances of well-being and even complications. Pain and PONV are usually prevented and treated by medications that carry their own side effects. Other challenges such as anxiety, hopelessness and negative expectations further impair recovery and outcome [1], or lead to nocebo effects [2] which usually cannot be treated with drugs and call for non-pharmacological approaches.

Among psychological interventions to improve recovery and well-being hypnotherapeutic approaches are most effective [3]. Several meta-analyses show small to large effect sizes of therapeutic suggestions given pre- or postoperatively with or without hypnosis induction on various outcomes [36].

Some of the studies included suggestions presented during general anesthesia to the unconscious patient [5, 6]. In this context, suggestions are defined “as verbal or nonverbal messages that the receiver involuntarily accepts and follows” [7] and that might affect emotions, behavior and autonomous body functions. This approach is based on the consideration that anesthesia does not interrupt perception of sounds and words by the brain [8]. Intraoperative measurement of auditory evoked potentials has shown that the central auditory pathway remains intact during general anesthesia [9, 10]. Even further processing of words in the central nervous system including development of memory and appropriate responses has been demonstrated by postoperative recognition of intraoperatively presented words [11, 12], and postoperative nonverbal responses to instructions given during anesthesia [1315]. In some cases, intraoperative awareness occurs under general anesthesia with explicit memory of the situation and of conversations [16]. In addition, the occurrence of implicit memory has been proven much more frequently [17]. Moreover, strong impact of negative intraoperative remarks on prognosis has been reported [18, 19].

One meta-analysis so far investigated the efficacy of therapeutic suggestions presented during general anesthesia to encourage well-being and recovery of surgical patients and has found mixed results [20]. Even though the effect on postoperative hospitalization was not statistically significant, the small positive effect of suggestions on patient-controlled analgesia reached statistical significance. However, these results must be interpreted with caution since a) the inclusion of non-randomized trials threatens the validity of meta-analytic results and b) the effects on patient-controlled analgesia are based on four studies only.

Hence, the present meta-analysis investigates the efficacy of therapeutic suggestions under general anesthesia on surgically relevant postoperative outcomes, i.e., pain intensity, mental distress, recovery, or the use of medication, and intraoperative outcomes, i.e., length of procedure and physiological parameters, by including randomized controlled trials only.

Methods

Objectives, inclusion criteria, and methods have been pre-specified in a review protocol [21].

Identification and selection of studies

Eligible studies were randomized controlled trials that investigated therapeutic suggestions presented during general anesthesia to adult patients undergoing surgery or medical procedures. If the intervention group received a combination of therapeutic suggestions and another psychological intervention or if therapeutic suggestions were not solely implemented intraoperatively, the study was excluded. Eligible control groups were “treatment as usual” (defined as the standard surgical care policy of the hospital) and “attention control” groups (defined as providing same amount of time and attention in addition to standard surgical care; e.g., blank tape, white noise). The included trials reported on at least one of the following outcomes measured via self- and/or observer reports: pain intensity, mental distress, recovery, use of medication, measured postoperatively within hospitalization. In addition, intraoperative outcomes, i.e., length of procedure and physiological parameters, were included (Additional file 1: Table S1).

Deviating from the protocol [21], we did not limit study inclusion to trials with a sample size of at least 20 participants in each trial arm, but rather tested this restriction in sensitivity analyses.

Electronic searches were carried out in the following databases (last search February 23, 2015): MEDLINE, CENTRAL, Web of Science, PsycINFO according to a search strategy that specified terms referring to the patient population (e.g., surg$.ti.ab.kw, General Surgery/, Anesthesia. General/), treatment (e.g., suggestion$.ti.ab.kw, Suggestion/), and study design (e.g., randomized controlled trial.pt). The search strategy was developed with consideration of validated search strategies for retrieving randomized controlled trials [22]. The MEDLINE search strategy is shown in Appendix. We adapted the strategy for the Cochrane Central Register of Controlled Trials (CENTRAL), Web of Science and PsycINFO.

In order to identify further trials, lists of references of relevant articles and previous reviews were also checked. Additionally, we screened ProQuest Dissertations and Theses Full Text Database to identify any unpublished material. One author (DJ) screened titles and abstracts of database records and retrieved full texts for eligibility assessment.

Data extraction and management

A pilot-tested data extraction form was used to collect the following information from eligible trials: characteristics of patients, intervention, control group, outcomes, bibliographic information, and effect size related data.

Data were independently extracted by two raters (DJ, JR). Inter-rater disagreement was resolved through consensus. In case of missing information, study authors were contacted. If information on effect sizes was missing and could not be retrieved, data had to be approximated using different estimation methods (e.g., estimating statistics from graphs without numerical data, setting an effect size to zero if non-significant results were mentioned without reporting statistical parameters).

Assessing the risk of bias in included studies

To assess risk of bias in the included studies, common markers of internal validity from the Cochrane Risk of Bias Tool were extracted [23]. The risk of bias assessment was conducted by two independent raters (DJ, SK) who were previously trained and blinded to extracted effect size estimates. Disagreements were resolved by discussion with one author (JR). Inter-rater agreement for the risk of bias assessment using Cohen’s kappa (κ) was excellent, κ = 0.76 [24].

Summary measures

Corrected standardized mean differences (Hedges’ g) were calculated for each assessment time-point and measurement multiplied by a small sample bias correction factor [25]. An effect size of 0.5 thus indicates that the mean of the experimental group is half a standard deviation larger than the mean of the control group. The magnitude of Hedges’ g was interpreted within the same ranges as Cohen’s d, regarding 0.20, 0.50, and 0.80 as small, medium, and large effect sizes, respectively [26]. Since such effect sizes are generally not easy to interpret in terms of clinical significance, effect sizes Hedges’ g were transformed into numbers needed to treat (NNT) [27]. For all dichotomous outcomes, Log Odds Ratios were computed and converted to Hedges’ g [28] in order to pool across different effect size formats.

If a study comprised more than one intervention group [2931] the shared control group was divided out approximately evenly among the comparisons [32].

Data synthesis

Outcome data were meta-analyzed using a random-effects approach. The generic inverse variance method was applied with heterogeneity estimated using the DerSimonian-Laird method [33]. Statistical heterogeneity between trials was assessed with χ2 heterogeneity tests (Cochran’s Q) and I2 statistic [34]. I2 describes the percentage of the variability in effect estimates that is due to heterogeneity rather than chance, with values from 0 to 40% indicating no important heterogeneity, 30 to 60% moderate, 50 to 90% substantial, and 75 to 100% considerable heterogeneity, respectively [35].

Risk of bias across studies

In order to test for publication bias funnel plots were inspected visually and the Egger test was run [36]. Additionally, Duval & Tweedie’s trim and fill procedure was used to obtain an adjusted estimate of the treatment effect after the publication bias had been taken into account and to indicate how many missing trials have been imputed to correct for publication bias [37].

Additional analyses

We conducted sensitivity analyses in order to test the robustness of findings, examining if meta-analytic results change when excluding approximated effect sizes and when excluding small samples (n ≤ 20 per group). Moderator analyses were planned to explain statistical heterogeneity [38]. However, heterogeneity was not important (I2 < 40%). Therefore, we conducted stratified analyses in order to exploratory examine potential moderators.

All data analyses were performed using Comprehensive Meta-Analysis (CMA; Version 2.0; Biostat Inc.).

Results

Study selection

A total of 7427 records was screened and N = 32 randomized controlled trials were included in the meta-analysis. Figure 1 contains a flow chart of the study selection process.

Fig. 1
figure 1

Flow chart of the study selection process

Description of included studies

Table 1 presents selected study characteristics. The majority of primary studies were published between 1986 and 2001; only one study [39] was published much earlier. Among the primary studies, there were three unpublished dissertations. One study was reported in German [40], all others were written in English. Altogether, n = 32 randomized controlled trials provided k = 37 comparisons between an intervention and a control group, incorporating a total of n = 1111 patients in intervention groups (M = 30.0, SD = 18.2) and n = 991 patients in control groups (M = 31.0, SD = 18.1). The mean age of patients in the intervention groups was 47.7 years (SD = 8.2), similarly in the control groups 47.2 years (SD = 9.4). The mean percentage of male patients was 17% (SD = 28.1) in intervention groups, and 17% (SD = 29.3) in control groups as well. This low percentage of male patients can be ascribed to a high proportion of studies including patients undergoing gynecological surgery; 16 primary studies investigated female patients only. In the majority of primary studies anesthesia was performed as “balanced anesthesia” with an opioid and an inhalational anesthetic (Table 1). In six studies, neuroleptanesthesia was used and in two studies total intravenous anesthesia (TIVA) with propofol or midazolam, respectively. Nitrous oxide was included in all except one study. In seven studies a benzodiazepine was applied for premedication. Therapeutic suggestions were presented via tape in all studies, played throughout the surgery in almost every study. Suggestion were judged as affirmative (e.g., “You will feel fine after the operation.”) in 12 intervention groups (32%), as non-affirmative (e.g., “After the operation you will not feel any nausea.”) in one (3%), and both affirmative and non-affirmative in 14 intervention groups (38%; no information reported for 10 intervention groups). In 19 intervention groups (51%), suggestion were accompanied by or alternated with soothing music or sounds. In all studies, the effects of therapeutic suggestions were compared against attention control. 18 studies (56%) used blank tapes/white noise, 7 studies (22%) offered sounds or music, and another 7 studies used spoken text (history of hospital, story of Peter Pan, parts of a cookery book) as control condition.

Table 1 Characteristics of the included studies

Additional file 2: Table S2 contains information on the risk of bias in included studies. Overall, the risk of bias in the included studies was mainly judged as low; no study indicated a high risk of bias in any quality item. However, due to missing information in the studies a high percentage of items was judged as unclear.

Meta-analytic results

Across all included postoperative outcomes, there was a small, but statistically significant and homogeneous effect of therapeutic suggestions compared to attention control (g = 0.13, 95% CI [0.04; 0.23], k = 37, p = .005; I2 = 0%).

When outcomes were analyzed separately, we found effects of therapeutic suggestions on pain intensity (g = 0.04, CI 95% [−0.04; 0.12], NNT = 44.3) and mental distress (g = 0.03, CI 95% [−0.11; 0.16], NNT = 68.2) to be close to zero and non-significant. However, small significant effects in favor of therapeutic suggestions appeared on medication use (g = 0.19, CI 95% [0.09; 0.29], NNT = 9.2) and on recovery (g = 0.14, CI 95% [0.03; 0.25], NNT = 13.0). Stratifying analyses on medication use and recovery with respect to outcomes, we found small, significant effects for therapeutic suggestions on PONV (g = 0.21, CI 95% [0.07; 0.36], NNT = 8.3) and analgesic use (g = 0.16, CI 95% [0.06; 0.26], NNT = 11.0). Therapeutic suggestions also revealed a small effect on antiemetic use (g = 0.22, CI 95% [−0.003; 0.45], NNT = 7.9) and on all other recovery outcomes (g = 0.11, CI 95% [−0.01; 0.24], NNT = 15.6), even though these effects were marginally significant only (Figs. 2, 3, and 4). Heterogeneity for all outcomes was not important (I2 < 40%).

Fig. 2
figure 2

Forest plot of meta-analytic results for mental distress and pain intensity

Fig. 3
figure 3

Forest plot of meta-analytic results for medication, stratified for use of antiemetics and analgesics

Fig. 4
figure 4

Forest plot of meta-analytic results for recovery, stratified for postoperative nausea and vomiting (PONV) and recovery (all other outcomes)

Regarding intraoperative outcomes, therapeutic suggestions revealed a small effect on physiological parameters, even though this effect was not significant (g = 0.13, CI 95% [−0.16; 0.42], k = 12, p = .389; I2 = 62%). Effects of therapeutic suggestions on length of surgical procedure (g = −0.04, CI 95% [−0.14; 0.07], k = 28, p = .499; I2 = 0%) were close to zero and non-significant.

Publication bias

A visual inspection of the funnel plot (see Additional file 3: Figure S1) gave no indication of publication bias as trials are distributed symmetrically around the pooled effect size. Egger’s test of funnel plot asymmetry did not indicate publication bias (t(35) = 0.18; p = .428), and Duval & Tweedie’s trim and fill procedure resulted in no trimmed studies. Hence, publication bias does not pose a threat to the accuracy of our meta-analytic results.

Additional analyses

We tested the robustness of effects for primary outcomes. After excluding approximated effect sizes for all outcome categories the meta-analytic result patterns (size of effect estimates and significance) did not change considerably though effect sizes were slightly larger and reached significance for recovery. Furthermore, effects were robust against the exclusion of small samples (n ≤ 20 per group) yielding effect sizes comparable in size and (non-)significance (Additional file 4: Table S3).

Since heterogeneity was not important at all (I2 < 40%), we did not run our pre-specified subgroup analyses. However, in order to get some ideas about potential moderators we exploratory conducted stratified analyses for PONV and antiemetic use since for all other postoperative outcomes results were homogeneous (I2 = 0%). Studies applying suggestions related to the absence of PONV (e.g., “no sickness”) yielded larger effects than studies without such suggestions, but this difference was not significant for both outcomes. There was no indication of an association between treatment effects and affirmativity of suggestions. Furthermore, studies using neuroleptanesthesia did not differ from those with intravenous or inhalation anesthesia.

Stratifying the analyses according to risk of bias, we only found differences with respect to handling of incomplete outcome data which were significant by trend for PONV (p = .061) with studies evaluated as low risk bias yielding smaller effects than studies judged as unclear risk of bias. Random sequence generation had no influence of treatment effects (Additional file 5: Table S4).

Discussion

The present meta-analysis aimed at evaluating the efficacy of therapeutic suggestions presented during general anesthesia to patients undergoing surgery or medical procedures. Currently, the efficacy of therapeutic suggestions applied under general anesthesia has been investigated on hospitalization and patient-controlled analgesia exclusively. Our meta-analysis expands this knowledge by adding results on pain intensity, mental distress, use of medication, and recovery.

We found small, significant positive effects of therapeutic suggestions on recovery and medication use which proved to be robust and free of publication bias. When analyzing outcomes in more detail, highest effects were found for PONV and analgesic use. Comparable results of therapeutic suggestions on the amount of morphine administered via patient-controlled analgesia were also reported in the meta-analysis of Merikle and Daneman [20]. However, there was no effect of therapeutic suggestions on pain intensity or mental distress.

One reason for the small or even zero effects might be the level of awareness. Usually, therapeutic suggestions were given during general anesthesia excluding the induction of anesthesia and emergence from anesthesia that are most sensitive to intraoperative awareness [16]. Another reason could be that when suggestions are presented via tape only, rapport and therapeutic relationship are missing, which are essential components of effective hypnosis or therapeutic suggestions [4, 8]. Accordingly, higher effect sizes of suggestions to reduce postoperative side effects spoken live compared to taped suggestions were reported [5, 6].

Since study effects were quite homogeneous, we merely ran stratified analyses on PONV and antiemetic use to get an idea about potential moderators of treatment effects. In this regard, the specificity of suggestions seems to have an influence on its efficacy since studies with specific PONV related suggestions yielded significant results on PONV, while studies with unspecific suggestions only resulted in non-significant effects. Thus, our results go along with studies demonstrating an impact of suggestion specificity on its efficacy [6].

Differences in anesthesia methods did not influence the efficacy of therapeutic suggestions, although neuroleptanesthesia is known to carry a higher risk of intraoperative awareness and lower interference with memory in comparison to balanced anesthesia with inhalational or intravenous anesthetics [16]. However, intraoperative awareness and memory are not considered a pre-requisite for effects of suggestions in unconscious patients [41, 8].

When interpreting these results the exploratory nature of the respective analyses should be considered. Although research on the impact of affirmativity and specificity of therapeutic suggestions on postoperative outcomes is available [6, 29, 30, 42] this issue has not been clarified conclusively. Studies examining the most efficacious phrasing of suggestions are still pending; an optimization of therapeutic suggestions is possible and needed.

Several limitations of the present meta-analysis are noteworthy. First, we excluded studies with children and studies where pre- or postoperative suggestions were presented in addition to those given intraoperatively. Both restrictions of inclusion might have led to smaller effects of suggestions during general anesthesia. There is some evidence of a higher level of efficacy of suggestive techniques in children [5], partly due to their higher suggestibility [43]. Moreover, meta-analytic findings have shown that suggestions are more effective when delivered at least in part prior to the medical procedure rather than solely during the medical procedure [5].

Second, the reporting quality, i.e. completeness and transparency, of the included studies was rather low making it difficult to adequately evaluate potential risks of bias. Particularly, methods of randomization and allocation concealment have been reported inadequately in the majority of studies, whereas blinding of participants, personnel, and outcome assessors was reported well. From the information on the anesthesia methods provided in the included studies no conclusion can be drawn on the precise depth of anesthesia and its impact on the results, besides that standard procedures were used without techniques to control depth, if reported, the dosage of anesthetics was reasonable, and the same procedure was used for intervention and control group. Finally, the latest available randomized controlled trial dates back to 2001.

It might be argued that insufficient anesthetic depth was more common at that time, but even modern electroencephalography (EEG)-based monitoring of anesthetic depth even could only reduce but not eliminate intraoperative awareness with recall (AWR) [44]. Current recommendations for AWR prevention include earplugs or music via earphones as an essential component. Positive suggestions should be considered as well, since being proposed for prophylaxis of posttraumatic stress disorder following AWR [45]. It has been claimed that effects of intraoperative suggestions are limited to insufficient depth of anesthesia [46], but even this pre-requisite is not absent in clinical practice today.

Conclusions

Altogether, we found at least small overall effects of therapeutic suggestions, with no significant negative effect in any primary study. Hence, therapeutic suggestions could be a conceivable way to safely improve recovery and to reduce medication. In the light of the quite low effort and costs of implementation and use of suggestions it might be efficient to present suggestions under general anesthesia in clinical practice.

So far the evidence on the efficacy of therapeutic suggestions applied under general anesthesia has been summarized with respect to hospitalization and patient-controlled analgesia exclusively [20]. Our meta-analysis expands this knowledge by adding results on mental distress, pain intensity, medication, and recovery. With solely including randomized trials the internal validity of the findings should have been increased.

However, we cannot make clinical recommendations since the quality of evidence supporting the beneficial effects of therapeutic suggestions was rated as unclear in a considerable number of included trials, particularly with regard to selection bias and reporting bias. Moreover, there is a lack of respective publications after 2001. We encourage the proliferation of studies with a high methodological and reporting quality to strengthen the promising evidence for the efficacy of therapeutic suggestions presented during general anesthesia for patients undergoing surgery.