Background

Acupuncture, a widely used therapy worldwide, involves inserting needles into the body for healing purposes [1]. Worldwide, numerous studies have been conducted to evaluate the efficacy of acupuncture; however, acupuncture-related clinical studies have been impeded by difficulties in designing an appropriate control group [2, 3]. When comparing the therapeutic efficacy of acupuncture and non-treatment controls, considering the general placebo effect and potential bias is crucial. Since the therapeutic efficacy of acupuncture is generally exaggerated, the specific effect of acupuncture remains to be established. To mitigate the problem regarding control groups, noninvasive sham acupuncture (SA) interventions, including the Streitberger's and Park sham needles, have been developed and used [4].

To facilitate the application of these noninvasive SA techniques in clinical research, relevant clinical validation studies are warranted. Accordingly, we aimed to conduct a systematic review of SA validation studies to investigate their characteristics, including participants, intervention and control group settings, and evaluation indicators. Our findings could inform the development and validation of novel and improved SA techniques.

Methods

Information sources and search engines

We performed a query of three databases (Pubmed, EMBASE, and the Cochrane Central Register of Controlled Trials) for relevant articles from inception to July 2022. We used the following search string: (acupuncture or needle) AND (sham or placebo) AND (validation or validity or validating or validate or credible or credibility). Author names were used to identify additional relevant articles. This study adheres to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) Statement, and the research protocol has been published in a previous paper [5].

Selection criteria

To select eligible articles for this systematic review, two independent reviewers (SML and EJG) assessed the retrieved articles based on the following inclusion criteria: 1) original articles, 2) clinical trials, and 3) SA validation studies using SA control groups. We excluded studies unrelated to manual acupuncture or those testing the effects of acupuncture. In the primary title/abstract-based screening, articles considered irrelevant to the research topic were excluded. Subsequently, a secondary full-text screening was performed on articles with unclear abstracts. Disagreements were discussed until a consensus was reached.

Data extraction and risk of bias assessment

Data extraction was conducted by two independent reviewers (SML and EJG) using a predetermined data extraction form. The following data were extracted from the selected studies: 1) study design; 2) information regarding acupuncturists and participants; 3) general and treatment-related characteristics of the intervention and control groups; 4) participants’ experience of acupuncture; and 5) research outcomes.

The literature quality was assessed using the Cochrane risk of bias assessment tool. The assessment items included random sequence generation (selection bias), allocation concealment (selection bias), blinding of participants and personnel (performance bias), blinding of outcome assessment (detection bias), incomplete outcome data (attrition bias), selective reporting (reporting bias), and other bias. Additionally, two researchers (SML and EJG) independently evaluated the literature quality, with disagreements resolved through discussion.

Data analysis

Descriptive analyses (mean, standard deviation, and frequency analysis) were conducted on the outcomes of the SA validation studies.

Results

Search and article selection

The database query yielded 673 articles, of which 644 articles were excluded during the screening process based on title/abstract and full texts. Finally, 29 studies were included in this systematic review (Fig. 1) [6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34].

Fig.1
figure 1

Flow chart of the trial selection process

Characteristics of the selected studies

The 29 selected articles were published between 1998 and 2016. Among them, five, seven, and six studies described validation tests for the Streitberger, Park, and Takakura devices, respectively. The remaining 11 studies described validation tests for other devices, including self-developed needles. Specifically, six studies used a blunted placebo needle and a block, cylinder, or pad foam [24, 26, 27, 30, 32, 33], one study used a toothpick and guide tube [25], two studies used an endermic acupuncture device with a flat, non-puncturing needle tip [28, 31], one study used a blunt, noninvasive needle that comprised a diamond honing stone and a guide tube [29], and one study used a sham device designed to prevent skin penetrations of needles using a hollow inner tube with a central base channel [34]. All studies were randomized controlled trials (RCTs) (Table 1).

Table 1 Summary of validation studies on sham acupuncture

Regarding participants, 21 studies involved healthy adults, with seven studies (including all studies that used the Takakura device) attempting to blind the acupuncturists. Among the remaining eight studies, four involved patients and four involved both healthy adults and patients. Moreover, 17 studies included both intervention and control groups, while 12 administered both acupuncture therapy (AT) and SA to the intervention group. Notably, three studies that used the Takakura device performed validation experiments on two SA types: skin-touch and non-touch.

The most frequently used acupoint for SA validation was LI4, followed by BL23, TE5, and ST36. Further, 14 and 13 studies involved single and multiple acupoints, respectively. Four of the 13 studies that used multiple acupoints assessed acupoint-dependent differences in outcomes. Two studies did not mention the acupoint chosen.

Acupuncture manipulation was performed in 21 studies. Four studies used a Streitberger device [7,8,9,10], five studies used a Park device [11, 12, 15,16,17], six studies used a Takakura device [18,19,20,21,22,23,24], and six studies used other devices [25, 26, 29, 30, 32, 34]. The manipulation method was usually rotation.

Twenty studies considered the participants’ acupuncture experience. Among them, 11 and nine studies recruited participants with and without acupuncture experience, respectively. The most frequently used SA validation method was guessing the applied acupuncture type (n = 21). Other SA validation methods included penetration, pain, and deqi sensation.

Reliability of acupuncturist blinding

All six studies that used the Takakura device evaluated acupuncturist blinding, with one study using a different device. These studies tested whether the acupuncturists could correctly guess the AT type after administering two (AT/SA) or three (AT/skin-touch SA/non-touch SA) different acupuncture treatments by providing a guessed (correct/incorrect) or “don’t know” (DK) response.

Among the studies that used the Takakura device, incorrect and DK answers outnumbered correct answers in four [18,19,20, 23] and two studies [21, 22] with AT and SA treatments, respectively, suggesting that the Takakura device is effective in acupuncturist blinding. In studies that identified three AT types, non-touch SA led to more incorrect answers than skin-touch SA [20, 21, 23]. In the study that used a different device, the rate of incorrect and correct answers was higher when the needle was shown before and after treatment, respectively [28].

In the study conducted by Takakura et al. [20], participants were instructed to indicate the reason for the answer, with the most frequent reason being deqi sensation.

Reliability of participant blinding

Participant blinding was evaluated in two, five, four, and eight studies using the Streitberger, Park, Takakura, and other devices, respectively. In all these studies, the participants were instructed to answer in the same aforementioned format as the acupuncturists. Among these studies, the rate of incorrect answers was higher for AT and SA in four [14, 15, 17, 28] and 14 [8, 11, 13, 15, 19, 21,22,23, 25, 27, 29, 30, 33, 34] studies, respectively. In the remaining study, most participants gave the answer ‘DK’, which contributed to a low rate of correct answers for SA [9].

Two studies compared the blinding success according to the selected acupoint. Participants were more likely to correctly guess the acupuncture type when it was administered to the upper limbs (vs. lower limbs), limbs (vs. torso), and traditional acupoints (vs. non-traditional acupoints) [13, 30]. Chae et al. [14] measured the penetrating force using a computerized system and observed that it was associated with the blinding outcome.

Blinding Index

The blinding effect was analyzed in 24 studies, with five studies being excluded owing to failure to provide data for calculating the Blinding Index [35] (Table 2). Among these, 11 studies had blinding scenarios of “unblinded” and “opposite guess” in the experimental (AT) and control (SA) arms, respectively. Additionally, two studies had a blinding scenario of “random guess” in both arms. Accordingly, 13 of the 24 (54%) studies were considered to have applied effective blinding scenarios. Moreover, six studies were unblinded in the experimental arm (AT), and random guessing was applied in the control arm (SA), while three studies were unblinded in both arms. Furthermore, one study applied random guessing in the experimental arm (AT) and was unblinded in the control arm (SA), while another study applied random and opposite guessing in the experimental (AT) and control (SA) arms, respectively (Table 3).

Table 2 Blinding index values computed from 24 validation studies
Table 3 Blinding scenarios

Participants’ responses to acupuncture-related sensations

Twenty studies evaluated participants’ acupuncture-related sensations. Among these, five, four, five, and six studies used the Streitberger, Park, Takakura, and other devices, respectively. Participants were asked to rate the acupuncture-related sensations, including pain and penetration, on a 1–10 or 1–100 visual analog scale (VAS).

Fifteen studies evaluated the participants’ penetration sensation. Among these, 12 and three studies evaluated the presence/absence and level of penetration sensation, respectively. Eleven studies performed pain evaluation, of which four and seven studies evaluated the presence/absence and level of pain, respectively. In six studies that reported the penetration sensation, most participants perceived the penetration in both AT and SA. The perception of penetration sensation was lesser in SA and AT in six [6,7,8, 24, 25, 32] and two [10, 16] studies, respectively. In four studies, more participants reported the penetration sensation only with AT [12, 18, 21, 30]. Notably, in the studies conducted by Chae et al. [14] and Lee et al. [15], participants who received AT and SA in the LI4 acupoint reported significantly stronger penetration sensation with AT; however, no significant differences were observed in the CV12 and ST36 acupoints [15, 19].

Takakura et al. [22] reported that most participants experienced pain with both AT and SA; however, the perceived pain was lesser in SA. Fink [26] showed that all participants reported pain with both AT and SA. In contrast, Kreiner et al. [30] reported that only 7.8% and 3.1% of the participants felt pain with AT and SA, respectively. Another study showed that 59.6% of the participants reported only AT-induced pain [22]. Regarding the pain level, three studies reported stronger pain in AT than in SA [6, 9, 14]. Liang et al. [16] reported that only group A (AT → wash out → SA) perceived significantly stronger pain with AT. In the remaining three studies, the pain level did not significantly differ between AT and SA [10, 32, 34]. Moreover, responses were sought regarding the feelings of relief, pleasure, facial temperature, acceptability, and comfort. Notably, only the facial temperature measurements showed differences between AT and SA.

Participants’ report on deqi sensation

Fifteen studies evaluated the participants’ deqi sensation, of which three, five, four, and three studies used the Streitberger, Park, Takakura, and other devices, respectively. Notably, twelve and three studies evaluated the presence/absence and level of deqi sensation, respectively. Six studies reported greater deqi sensation with AT than with SA [6, 11, 12, 16, 18, 24]. Five studies reported that most patients lacked deqi sensations with AT, which was even lower with SA [10, 19, 21, 23, 30]. Fink et al. [26] showed that 84.4% and 34.4% of participants reported deqi sensation with AT and SA, respectively. Chae et al. [14] reported that participants felt significantly stronger deqi sensations with AT than with SA. White et al. [7] reported no differences between the two groups. Lee et al. [15] reported some differences in deqi sensation at LI4 but no differences between the two groups at CV12 or ST36.

Quality assessment

Figure 2 presents the results of the assessment items of the overall risk of bias. In all included studies, 192 “low risk” and 11 “unclear risk” assessments were performed in seven domains. The risk of bias was low for random sequence generation (selection bias), allocation concealment (selection bias), blinding of participants and personnel (performance bias), blinding of outcome assessment (detection bias), incomplete outcome data (attrition bias), selective reporting (reporting bias), and other bias in 27, 22, 29, 29, 28, 29, and 28 studies, respectively. Among the assessment items, allocation concealment (selection bias) had the highest frequency of “unclear risk” evaluation (n = 7) due to the lack of a specific description of the method of concealing the allocation sequence. Random sequence generation (selection bias) had the second highest frequency of “unclear risk” evaluation (n = 2) due to an unmentioned or unclear randomization method. Similar distributions were noted for the low and unclear risks of bias in studies using the Streitberger, Park, and Takakura devices.

Fig. 2
figure 2

Risk of bias summary

Discussion

Invasive control groups involving needle insertion into an area other than a traditional acupuncture point or a traditional acupuncture point unrelated to the treatment objective may be unsuitable as placebo control groups since the procedure can induce physiological effects similar to invasive AT [36]. Noninvasive SA needles were developed to overcome these limitations. Noninvasive SA devices, including the Streitberger, Park, and Takakura devices, are characterized by blunt needle tips that cannot penetrate the skin but have the same shape as needles used for AT, which ensures participant blinding [4]. Validation studies on SA devices used across acupoints and participants are important for improving acupuncture-related clinical research that involves SA control groups [37, 38].

All included SA validation studies in this review had an RCT design involving randomly assigned intervention (AT) and control (SA) groups of healthy volunteers or patients. Blinding was influenced by the participants’ acupuncture experience, acupuncturist’s experience, acupoint, and type of SA (skin-touch or non-touch). A higher rate of blinding success was observed for participants without acupuncture experience, experienced acupuncturists, acupoints in body parts other than the hand, non-traditional acupoints, and skin-touch SA. Including DK as a response option may influence the results and their interpretation; therefore, this should be carefully considered.

Other aspects of blinding that were evaluated included penetration, pain, and deqi sensations. Specifically, the presence/absence and level of sensations were evaluated through yes/no responses and a VAS, respectively. Although the evaluation items for deqi varied across studies, it was mostly evaluated based on the level of sensations such as dull pain, heat, stinging, and tingling. Since AT- and SA-related sensations are important factors in studies involving patients, future studies should comprehensively consider the influence of the disease on sensations based on validation study outcomes using healthy volunteers.

In clinical studies evaluating the therapeutic effect of AT, establishing an appropriate control that allows the exclusion of the placebo effect is important, and thus, evaluation of the AT-specific effects. However, in real practice, precise assessment of the AT-specific effects is difficult owing to the multiple and complex factors that influence the AT-related experiences and expectations of patients [39]. Therefore, using an SA control intervention that allows effective blinding of patients and assessment of AT-specific effects is crucial for obtaining highly reliable clinical findings [40]. Meta-analyses conducted by Vickers et al. [41, 42] revealed that the AT intervention group showed clinically significant outcomes compared with the SA control group, which indicates that appropriate SA controls can allow high-quality clinical evidence. Moreover, compared with noninvasive SA interventions, penetration of a real acupuncture needle can achieve a significant analgesic effect for a specific condition such as pain [43]. Therefore, future SA-controlled clinical trials that use the optimal AT protocol and adequate sample size for the desired effect size could further improve evidence-based medicine. Additionally, for RCTs that include a no-intervention group, it would be helpful for validation of the SA control.

According to White et al. [8], compared to healthy participants, patients experience a stronger needle sensation for both real and sham needles and are more likely to report both as real needles. Thus, differences in sensation during AT or differences in treatment expectations between patients and healthy participants could affect the results. Consequently, generalizing the results of validation studies for sham needles in healthy adults or patients could be inappropriate. Future studies should focus on identifying the most suitable sham needles for specific diseases.

SA devices that involve skin contact or minimal insertion may pose limitations in controlled clinical studies owing to potential neurophysiological effects via skin contact or SA. Ideally, SA controls should have physical features and psychological effects identical to those of AT, which minimizes the physiological effects on the human body and maintains blinding of both participants and acupuncturists even in long-term clinical studies. Since SA validation studies are conducted using a single- or double-randomized design, establishing suitable control groups, including electroacupuncture and intradermal acupuncture, for various AT interventions is crucial to validate their therapeutic efficacy.

A limitation of this study is the possibility of language bias since we did not query Chinese and Japanese databases due to language barriers.

Conclusions

More efforts are required to establish control groups suitable for various acupuncture therapy interventions. Moreover, more rigorous sham acupuncture validation studies are necessary, potentially improving the quality of clinical studies.