Background

In recent years, there has been increasing evidence from large randomized trials and systematic reviews showing that patients receiving acupuncture report better outcomes than patients receiving no treatment or usual care only (for example, [1, 2]). A large trial on low back pain [3] and a meta-analysis of migraine trials [4] even found superiority over guideline-oriented conventional care. At the same time, many recent high-quality trials comparing true acupuncture with a sham acupuncture intervention found only minor or even no differences (see [47] for systematic reviews). The interpretation of this evidence is controversial. Some authors argue that the better effects over no treatment and usual care are only due to the usual placebo effects and bias [8]. Some authors argue that most sham acupuncture interventions are physiologically active [9, 10], and others contend that sham acupuncture interventions might be associated with particularly potent nonspecific or placebo effects [11, 12].

Treatment effects are considered specific if they are attributable solely, according to the theory of the mechanism of action, to the characteristic component of an intervention [13, 14]. Effects which are associated with the incidental elements of an intervention are considered nonspecific effects (synonymous with placebo effects). Nonspecific effects are mostly thought to be due to psychobiological processes triggered by the overall therapeutic context [15]. They have to be distinguished from the natural course of disease, regression to the mean, effects of being in a study, cointerventions and, as far as possible, from reporting and other biases [16, 17]. The total effect of an intervention consists of both specific and nonspecific effects [18].

Separating characteristic and incidental elements of an intervention is straightforward in pharmacology, but is difficult in other interventions such as psychotherapy [19]. Acupuncture involves the insertion and manipulation of needles into defined points of the body. While a variety of mechanistic models exist, the exact mechanism of action is unclear [20]. This makes it difficult to devise a placebo intervention which is both inert and indistinguishable and reliably separates specific and nonspecific effects. The frequent use of the term sham intervention instead of placebo partly reflects this problem. Sham interventions in clinical trials of acupuncture typically vary from "true" acupuncture in one or both of the following aspects [21]: location of points (for example, stimulation of nonindicated points or outside known points) and skin penetration (for example, use of fixed telescope "placebo" needles with a blunt tip). If some or most of these sham interventions should indeed be physiologically active, such trials would not compare acupuncture to a placebo but to an active intervention, making it more difficult to detect significant differences.

This problem would also apply if (sham) acupuncture would be associated with more potent placebo effects than other interventions. Both invasive and noninvasive sham acupuncture interventions exert (like true acupuncture) mild painful stimuli. It has been hypothesized that such interventions might trigger enhanced placebo effects by simultaneously acting on sensory, cognitive and emotional levels [12]. There is also evidence that the same sham acupuncture intervention can have quite different effects when provided in different contexts [22]. Placebo research indicates that in many situations, the therapeutic context associated with an intervention matters more than the placebo intervention itself [15]. The therapeutic context depends not only on the specific therapeutic ritual applied but also on experiences, attitudes and preferences of patients and providers, the patient-provider interaction, the setting and the cultural background [11]. Given the positive attitudes and expectation toward complementary therapies, it seems possible that complex rituals such as acupuncture could provoke significant psychobiological responses.

The most straightforward way to investigate whether sham acupuncture is associated with larger effects than a pharmacological placebo would be in randomized trials including both these interventions. The only trial using such an approach indeed found a significant superiority of sham acupuncture [23]. Another, albeit methodologically weaker, possibility is to compare differences between sham acupuncture interventions and no-treatment control groups in acupuncture trials with those of (other) placebos and no-treatment control groups in other trials. Hróbjartsson and Gøtzsche [2426] have repeatedly reviewed all available trials, including both a placebo or sham and a no-treatment group for any condition. The latest update of their Cochrane review includes a total of 234 trials. In a preplanned subgroup analysis, they found that studies using "physical placebos" (including sham acupuncture) reported larger placebo effects (standardized mean difference (SMD) -0.31; 95% confidence interval (CI) -0.41, -0.22) than studies using "pharmacological placebos" (SMD -0.10; 95% CI -0.20, -0.01) [26]. In a reanalysis of their data, we separated the trials in which the physical placebo was sham acupuncture from those which used other physical placebos. Effect sizes were significantly larger in trials using sham acupuncture than in trials using other physical placebos (SMDs -0.41 (-0.56, -0.24) vs -0.26 (-0.37, -0.15); P = 0.007) [27].

The Cochrane review [26] and our reanalysis of these data did not include a number of recent rigorous, large acupuncture trials which included both a sham group and a no-treatment group. Furthermore, these reviews did not investigate whether large nonspecific effects might make it difficult to detect specific effects. Therefore, we have performed a systematic review of acupuncture trials in any condition including both sham and no-treatment groups published through April 2010. Our primary aim was to investigate the size of nonspecific effects of acupuncture (difference between sham acupuncture vs no acupuncture). Our secondary aims were to investigate factors (such as type of sham intervention, condition, study quality or intensity of cointerventions) possibly influencing the size of such nonspecific effects and to quantify specific (difference acupuncture vs sham acupuncture) and total effects of acupuncture (difference acupuncture vs no acupuncture) in the included trials.

Methods

Selection criteria

To be included, studies had to meet the following criteria: (1) allocation to groups was explicitly randomized; (2) participants were persons treated for any illness or for preventative purposes; trials in healthy volunteers measuring physiological outcomes were excluded; (3) intervention involving the insertion of needles described as acupuncture at acupuncture points, pain or trigger points with or without stimulation; trials on interventions without skin penetration (for example, laser acupuncture) were excluded; (4) sham interventions described as sham, placebo, dummy or fake treatment which differed from true acupuncture in at least one of two key aspects (skin penetration or point location); (5) no-acupuncture control group had to be a second control group in which participants received neither true nor sham acupuncture; participants could be either completely untreated or receive treatments which were also administered in the true and sham acupuncture groups (for example, rescue medication, basic treatment or routine care); and (6) a clinical outcome for which the calculation of an effect size estimate was possible.

Data sources and searches

To identify potentially relevant studies, we searched MEDLINE (from 1966 to April 2010) and Embase (from 1988 to April 2010) for all sham-controlled trials of acupuncture (see Additional file 1, Search strategies). Furthermore, we searched the Cochrane Central Register of Controlled Trials using a search strategy based on a Cochrane review of randomized trials with placebo and no-treatment controls in all medicine [25]. While Chinese trials identified by our search were eligible, we did not search specific Chinese databases. One reviewer screened titles and abstracts of all references identified and excluded those which were clearly irrelevant. Full texts of all remaining articles were obtained and assessed independently for eligibility by two reviewers. Disagreements or uncertainties were resolved by discussion.

Data extraction and quality assessment

One reviewer extracted information on the following aspects from included studies using a standard form: diagnosis; recruitment; number and type of study centers; number and types of intervention and control groups; details on acupuncture and sham interventions; how patients were informed about these interventions; qualification of acupuncturists; cointerventions; study duration, number of patients randomized, analyzed and dropping out (per group); age; gender; results on the main outcome measures; important secondary outcomes and responder data. A second reviewer checked all extraction of study results against the original publications. Trials were considered to have lower risk of bias if they reported an adequate method of randomization concealment and had a dropout rate below 15% [28]. For our main analyses, we used the following strategy to choose the outcome: (1) it should be a continuous outcome (mean and standard deviation available, or the standard deviation could be calculated from standard errors or confidence intervals, for example; we did not impute standard deviations for studies without available data on variability or precision); (2) the timing should be as close as possible to the completion of treatment; (3) when there was a clearly predefined main outcome measure, we chose this measure (but always preferred the measurement at the end of treatment over other time points or change from baseline); (4) when there was no predefined single main outcome measure, two reviewers independently chose the outcome considered most important (two disagreements were resolved by discussion); (5) If available, we used intention to treat data; otherwise, we used the data as presented in the publication. If a trial had more than one intervention (for example, an individualized and a standardized intervention) or more than one sham group, the data were pooled. For more recent studies, we tried to contact authors to inquire for further information if data for meta-analysis were missing.

Data synthesis and analysis

The Cochrane Collaboration's Review Manager RevMan 5 software was used for meta-analyses. Three comparisons were investigated: sham acupuncture versus no acupuncture (primary comparison), acupuncture versus sham acupuncture, and acupuncture versus no acupuncture. Studies were categorized into the clinical categories of chronic pain studies, short-term studies (that is, studies with an observation period of less than 3 days), and other studies.

The main analysis was based on trials reporting a continuous outcome measure using the standardized mean difference (SMDs; difference between the means/pooled standard deviation) as an effect size estimate. As we assumed that studies would be clinically heterogeneous, a random effects model with the inverse variance method was used for meta-analysis. Negative SMDs indicated a beneficial effect of sham acupuncture over no acupuncture, acupuncture over sham acupuncture and acupuncture over no acupuncture, respectively. SMDs ≤ -0.4 were considered small effects, those between -0.41 and -0.7 were considered moderate effects and those > -0.7 were considered large effects [29]. To investigate statistical heterogeneity, RevMan 5 uses Tau2, Chi2 and I2. We considered I2 values between 30% and 60% as indicating moderate heterogeneity and higher values as indicating substantial heterogeneity. Subgroup comparisons were performed using the method described by Deeks et al. [30] and implemented in RevMan 5. Egger's test was used to assess funnel plot asymmetry [31].

To check the robustness of results, we performed sensitivity analyses (1) including three-armed studies which had been excluded because they did not meet all inclusion criteria, but still could be considered because they addressed the questions investigated in this review ("borderline" studies; see Results); (2) using different outcomes for studies with more than one relevant outcome at the completion of treatment; and (3) using dichotomous outcome measures (with a relative risk <1 indicating a beneficial effect).

For exploratory analyses, we defined further subgroups: larger (at least 100 patients) and smaller (< 100 patients) comparisons; lower and higher risk of bias (see data extraction and quality assessment); studies with intense or less intense cointerventions in all study arms, with and without skin penetration (and depending on where needles were placed) in sham groups; studies with and without a clearly defined main outcome measure; and studies describing sham in the consent procedure as another treatment or placebo. In multivariate random effects meta-regression analyses, we investigated simultaneously the influence of risk of bias, cointerventions, skin penetration in the sham group and condition (chronic pain vs others). Analyses were carried out using the restricted information maximum likelihood (REML) method. For meta-regression analyses, PASW versions 17.0 and 18.0 software (SPSS, Chicago, IL, USA) using additional macros described by Wilson was used [32]. To investigate the hypothesis that there is an inverse correlation between specific and nonspecific effects (that is, trials with large nonspecific effects are less likely to find large specific effects than are trials with small nonspecific effects), we performed a linear regression analysis using the inverse of the squared pooled standard error as a weighting factor.

Results

Literature search and selection

The literature search identified a total of 1854 references, of which 1779 were excluded in the screening process as they clearly did not meet the inclusion criteria (see Figure 1). The full text of the remaining 75 references was formally assessed for eligibility. A total of 37 studies [3369] met the inclusion criteria. Eleven additional publications reported protocols or treatment details of trials included in the review or reported the same results in another language (see Additional file 1, Table S1). Eighteen articles did not meet the inclusion criteria, and two were protocols of ongoing trials (see Additional file 1, Table S2). Two abstracts reported minimal information on probably eligible trials including results for a dichotomous outcome [70, 71]; attempts to obtain further information from the authors were unsuccessful. In four other studies, patients in the no-acupuncture control group received minor interventions not provided in the other two groups [7275]. Finally, for one study presenting an asymmetric confidence interval for the continuous main outcome measure, we were unable to unambiguously calculate the standard deviation [76]. The latter five trials were included in a sensitivity analysis as "borderline" studies.

Figure 1
figure 1

Flow chart.

Description of included studies

The 37 eligible trials included a total of 5754 patients (median 75, minimum 30, and maximum 638). Fourteen trials (3369 patients) addressed chronic pain or a condition associated with chronic pain (Table 1); eight were short-term trials with a duration of less than 3 days (522 patients; Table 2) investigating whether acupuncture is helpful for sedation, anxiety, pain or nausea associated with surgical operations, endoscopic interventions or labor; and 15 trials (1863 patients) addressed a variety of other conditions (Table 3). Ten of the 14 chronic pain trials, but only six of the remaining 23 studies, reported an adequate method of allocation concealment. Dropout rates were between 54% and 95% in three addiction trials, but low in most other studies. Ten chronic pain trials and three trials of other conditions reported an adequate method of allocation concealment and a dropout rate below 15% and were classified as having a lower risk of bias.

Table 1 Characteristics of included trials: Chronic pain trials
Table 2 Characteristics of included trials: Short-term trialsa
Table 3 Characteristics of included trials: Trials on various other conditionsa

Fifteen studies had a clearly predefined main outcome measure. For 32 trials, a continuous effect size measure could be calculated, and for 24 trials a dichotomous effect size measure (for 19 trials both a continuous and a dichotomous effect size measure could be calculated). Acupuncture interventions varied strongly regarding number of sessions, type of acupuncture (that is, classical acupuncture, electroacupuncture, ear acupuncture), level of individualization for point selection and number of needles used. In 31 trials, the sham procedure involved skin penetration (in 7 trials at acupuncture points not indicated for the condition treated and in 24 trials outside known acupuncture points); six trials used approaches without skin penetration (in three trials at the same points as in the acupuncture group and in three trials outside known points).

Meta-analysis of nonspecific effects (sham acupuncture vs no acupuncture)

The main analyses are based on the 32 trials reporting data on a continuous outcome. For the comparison of sham acupuncture with no acupuncture, the pooled SMDs were -0.53 (95% CI -0.67, -0.39) among chronic pain trials, -0.23 (-0.50, 0.04) among short-term studies and -0.42 (95% CI -0.66, -0.18) in other studies (Figure 2). The test for differences between diagnostic subgroups missed statistical significance at the 5% level (P = 0.08). Effect sizes showed moderate statistical heterogeneity among chronic pain studies, no heterogeneity among short-term studies and marked heterogeneity among the other studies. If studies were pooled across clinical subgroups, the SMD was -0.45 (95% CI -0.57, -0.34). In seven trials, effects over no-treatment groups were large (SMDs were above -0.7); in nine trials, these effects were moderate (between -0.4 and -0.7); and in 16 trials, these effects were small (< -0.4). Results were similar when borderline studies were included, when in studies without a predefined main outcome measure other outcomes were chosen or when dichotomous outcomes were analyzed (see Additional file 1, Table S3). Egger's test did not suggest funnel plot asymmetry (P = 0.25; asymmetry coefficient 0.21) (Figure 3). In exploratory subgroup analyses (see Additional file 1, Table S3), effect sizes differed significantly according to the level of cointervention (larger if less cointerventions) and according to the type of sham intervention (larger if no skin penetration). Nonspecific effects tended to be larger in trials with a larger sample size, a lower risk of bias, and a clearly predefined outcome, but the differences were not statistically significant. In multivariate meta-regression analyses, only the association with level of cointerventions approached statistical significance (P = 0.07). Trials with larger effects of sham over no acupuncture reported smaller effects of acupuncture over sham intervention than trials with smaller nonspecific effects (β = -0.39, P = 0.029).

Figure 2
figure 2

The nonspecific effect of acupuncture (difference between groups receiving sham acupuncture and no acupuncture). SD, standard deviation; Total, number of patients; 95% CI, 95% confidence interval; IV, inverse variance method; Random, random effects model; df, degrees of freedom.

Figure 3
figure 3

Funnel plot of studies comparing sham acupuncture versus no acupuncture. SE, standard error; SMD, standardized mean difference.

Meta-analysis of specific effects (acupuncture vs sham acupuncture) and total effects (acupuncture vs no acupuncture)

For the comparison of acupuncture with sham acupuncture, the pooled random effects SMDs were -0.46 (95% CI -0.72, -0.20) for chronic pain studies, -0.34 (95% CI -0.79, 0.12) for short-term studies, and -0.28 (-0.59, 0.03) for other studies (see Additional file 1, Figure S1). There were no statistically significant (P = 0.71) differences between diagnostic subgroups, but there was substantial statistical heterogeneity in all three clinical categories. If trials were pooled across categories, the SMD was -0.37 (95% CI -0.55, -0.19). The funnel plot was highly asymmetrical (Additional file 1, Figure S2; P = 0.002; asymmetry coefficient -0.52). Larger trials yielded significantly less positive results than smaller trials (SMDs -0.15 (95% CI -0.31, 0.01) vs -0.59 (95% CI -0.93, -0.24); P < 0.001). Specific effects were also smaller in trials with lower risk of bias and more intense cointerventions, while skin penetration and condition did not have a significant influence.

The pooled SMDs between acupuncture and no acupuncture were -0.94 (95% CI -1.20, -0.67) for chronic pain studies, -0.60 (95% CI -1.08, -0.12) for short-term studies, and -0.63 (-0.91, -0.35) for other studies (see Additional file 1, Figure S3) with marked heterogeneity in all three categories. If all studies were pooled, the SMD was -0.77 (95% CI -0.94, -0.59). There was significant funnel plot asymmetry (P = 0.03; asymmetry coefficient -0.38), with smaller studies yielding larger effect estimates (Additional file 1, Figure S4, for the funnel plot).

Discussion

Summary of main findings

According to our findings (sham) acupuncture interventions are often associated with noteworthy nonspecific effects. Differences between sham acupuncture and no-acupuncture groups tended to be smaller in trials in which there were intense cointerventions in all study groups. Indicators of study quality (that is, sample size, risk of bias, predefinition of a main outcome measure) were not associated significantly with effect size. Trials with larger effects of sham over no acupuncture reported smaller effects of acupuncture over sham intervention than trials with smaller nonspecific effects. In our analyses, we also found small to moderate specific effects of acupuncture interventions over sham acupuncture; however, trials with large sample size and low risk of bias yielded less positive results. In our study set, the total effect of acupuncture interventions including both specific and nonspecific effects was, on average, at least moderate in size.

Strengths and limitations

Although we did not systematically search Chinese language databases, our review is currently the most comprehensive and largest analysis of randomized trials of acupuncture including both a sham and a no-treatment control group. It includes many more and larger trials than previous analyses [2628]. The overall findings are highly robust to sensitivity analyses and indicators of study quality. The most important limitation of our review is the strong heterogeneity of our trial set regarding patients, interventions, outcomes and methodological quality. We do not think that pooling such a heterogeneous set of studies would be adequate if the aim were primarily to assess effectiveness for clinical decision making. However, our primary aim was to investigate whether (sham) acupuncture interventions are, on average, associated with relevant nonspecific effects. To assess the size of nonspecific effects, it is necessary to include trials with both a sham and a no-acupuncture control group. As the number of such trials is limited, pooling all available information can be justified for generating hypotheses and has been performed in the Cochrane review on placebo effects in all conditions in a much more radical manner [26].

The comparisons between sham acupuncture and acupuncture in the primary studies included in our review are unblinded. As almost all trials focused on patient-reported outcome measures, there is considerable risk of bias. Patients randomized to the no-treatment group might be disappointed and experience "nocebo" effects, or they might give overly negative ratings for subjective symptoms. On the other hand, patients randomized to no-treatment groups might use larger doses of rescue medication or cointerventions which would lead to an underestimation of the differences. In fact, in some of the trials included in our review, patients in no-acupuncture control groups had higher analgesic use than patients in the sham groups (for example, [56, 58]). Insufficient blinding is also a problem for the comparison between acupuncture and placebo acupuncture [28]. However, if patients find out that they are in a sham group, one would expect an underestimation of the effect of sham over no treatment. In summary, it is difficult to assess to what extent and in which direction biases can distort effect estimates between sham and no-acupuncture groups. It is noteworthy that although indicators of study quality were not significantly associated with the size of nonspecific effects, better and larger studies tended to report larger effects. It seems that our estimate of nonspecific effects is less subject to small study bias and other biases than those for specific and total effects.

Interpretation

Our findings are highly consistent with smaller analyses available in the literature [27, 28]. The reanalysis of the 21 acupuncture trials included in the Cochrane review on placebo effects yielded a SMD of -0.41 [27]. Owing to slightly different inclusion criteria, five trials were excluded from the current analyses. A meta-analysis by Madsen et al. [28], who reviewed 13 three-armed trials on acupuncture for acute and chronic pain, found a SMD of -0.42. Nine of the studies included in their review were also included in our review, while we excluded four trials due to slightly different selection criteria. Our main analysis includes 23 additional trials (including seven trials addressing chronic or acute pain).

It has been argued that sham interventions in which needles penetrate the skin (particularly if applied in the same dermatomes as the true acupuncture intervention) are physiologically not inert and therefore should not be considered as placebos [10]. Our exploratory subgroup analyses (as well as similar analyses in the review by Madsen et al. [28]) do not provide evidence that sham interventions involving needle penetration are associated with larger nonspecific effects than those which do not. Thus the limited available data suggest that skin penetration or no skin penetration does not seem to make a big difference.

If acupuncture should have indeed relevant total effects but only very limited specific effects, this would have major implications for the conduct and interpretation of clinical trials. On the basis of our data and available systematic reviews [47, 28], it seems reasonable to assume an average SMD of 0.4 (or more) for nonspecific effects and SMD of 0.2 (or less) for specific effects at least for a number of conditions. To achieve 80% power, a two-armed, sham-controlled clinical trial investigating a specific effect of 0.2 SMD would have to recruit about 800 patients. This suggests that almost all available trials comparing true and sham acupuncture would be underpowered.

One could argue that a SMD of 0.2 is clinically irrelevant. In line with that reasoning, Madsen et al. [28] questioned in their review whether "the prevailing hypothesis that acupuncture has an important effect on pain in general." (page 7). However, we believe that another conclusion is possible, too. As we did, Madsen et al. found, on average, a moderately large effect of sham interventions over no-acupuncture groups, and both reviews found at least small specific effects of acupuncture over sham interventions. The total effect of acupuncture seems to be at least moderate in size in a number of conditions, and such effects can well be clinically relevant. For many established drug treatments, SMDs over placebo are in the range between 0.3 and 0.5 (for example, [77, 78]). If, as the available data suggest [26], clinical effects associated with pharmacological placebos are small compared to no treatment (with a SMD of 0.1 on average), the total effects of these treatments could be in a similar range (around a SMD of 0.4 to 0.6) as those of several acupuncture interventions. It could be argued that for a suffering individual, it does not matter whether relief is due to specific or nonspecific effects. However, as the evidence for larger nonspecific effects of acupuncture compared to other treatments comes with one exception [23] from indirect comparisons open to confounding, firm conclusions are not yet possible.

We think that our findings are of major relevance to the question how the clinical effectiveness of complex nondrug interventions should be assessed. It is likely that nonspecific effects vary between different types of complex treatment interventions. The concept of specific and nonspecific effects might not be fully adequate in that case, as so-called nonspecific effects might turn out to be characteristic for a given therapeutic setting. If the total effect of an intervention in clinical practice would indeed consist of variable contributions of specific and nonspecific effects, it could be that a treatment which has only minor or even no specific but clinically relevant nonspecific effects has a larger total effect than a treatment with moderate specific but only minor nonspecific effects. This has been denoted the efficacy paradox [79]. Should such a treatment be readily available? The position of a pragmatic decision maker could be yes if the comparative treatment represents adequate standard treatment. In fact, in Germany, acupuncture is routinely reimbursed for chronic low-back pain as in a large randomized trial acupuncture (but also sham acupuncture which is not reimbursed) was more effective than treatment based on German guidelines [3]. Skeptical scientists would argue that these results are likely to be biased because of lack of blinding and that acupuncture should not be considered effective. Furthermore, if issues such as expectancies, beliefs and trust should have a relevant influence on the effectiveness of a treatment, the findings of clinical trials might no longer be valid when attitudes in a population change over time.

Conclusions

Sham acupuncture interventions are often associated with moderately large nonspecific effects, which could make it difficult to detect small additional specific effects. Compared to inert placebo interventions, effects associated with sham acupuncture might be larger, which would have considerable implications for the design and interpretation of clinical trials. Total effects of acupuncture interventions including both specific and nonspecific effects often seem to be at least moderate in size. We believe that there has to be a discussion involving scientists, decision makers, health care providers and patients whether and when the evidence for clinically relevant total effects from nonblinded comparisons is sufficient to consider a treatment effective, even if specific effects due to the postulated mechanism of action might be minor or even nonexistent.