Background

Complementary and alternative medicine (CAM) therapies are widely used to treat diseases [1,2,3,4,5,6]. Acupuncture in particular has been shown to be useful for chronic pain conditions [7,8,9,10,11]. Several studies were performed that evaluated its efficacy, effectiveness, and safety [12,13,14]. The rate of patient consultations for CAM treatments has been increasing [15]. As a result, the number of reported adverse reactions might increase, especially those which are not always avoidable, such as haematomas, nausea, vomiting, and aggravation of symptoms [13, 16]. Previous observational studies showed that acupuncture can be considered a safe therapy [13, 17, 18], although some case reports might give another impression [12, 19]. However, serious life-threatening adverse reactions of acupuncture, such as pneumothorax, are very rare [12, 13] but have been published in some case reports [20]. Acupuncture treatment for chronic low back and knee pain had been included in routine reimbursements by statutory health insurances in Germany since 2007 [21,22,23,24].

Patients who are interested in receiving needle acupuncture treatment should be informed about possible adverse reactions for ethical reasons [12] and patient safety should have a greater priority in acupuncture training [14]. However, the type and frequency of adverse reactions are difficult to compare between the various studies evaluating acupuncture safety [12].

A limitation in many trials is that only health care professionals, especially the physicians, have the responsibility to document (serious) adverse events or adverse reactions [25, 26]. The health care professionals have the professional competence regarding the evaluation of adverse events or adverse reactions, whereas patients have good individual knowledge about their own safety in healthcare [27]. Only a few studies document the adverse reactions by both physician and patients, e.g. [28] with the result that frequency and severity can differ between physicians and patients self-reports.

The aim was to compare patients’ safety reporting with physicians’ safety reporting regarding the safety of acupuncture using data from several large acupuncture trials. Furthermore, associations between patient characteristics and reporting of adverse reactions were evaluated.

Methods

The present secondary data analysis is based on the Acupuncture in Routine Care (ARC) studies that evaluated the effectiveness of the addition of needle acupuncture treatment [23] compared to usual care only. In those trials, patients and physicians had to complete questionnaires to document safety parameters.

Study design

The ARC studies were part of the German model project on acupuncture (‘Modellvorhaben Akupunktur’) funded by the German statutory health insurances [8, 23, 29]. The project was performed to evaluate the effectiveness of acupuncture treatment in routine medical care, as well as its safety and cost effectiveness [23].

The ARC studies were large pragmatic randomized trials with an additional non-randomized study arm including those patients who refused randomization. The recruitment period was from December 2000 to July 2004, and patients in both acupuncture treatment arms (randomized and non-randomized) received 10–15 sessions of needle acupuncture. Patients were eligible if they were at least 18 years of age and had been suffering from one of the following diseases for more than 6 months: osteoarthritis pain of the knee or hip, low back pain, neck pain, headache, allergic rhinitis/asthma, or dysmenorrhea. For each study, more detailed eligibility criteria were employed [8, 30,31,32,33,34].

Randomization for needle acupuncture

Of the 50,473 pooled patients over all trials who were asked if they agreed to be randomized, 11,486 agreed and were randomized either to the acupuncture treatment group (ACU, n = 5831) or to the control group (CON, n = 5655). Participants who did not agree to the randomization were part of the non-randomized acupuncture group (NR-ACU, n = 38,987). Participants in the ACU and the NR-ACU group started with the acupuncture treatment immediately, whereas the CON group received acupuncture treatment after 3 months. Needle acupuncture was performed by study physicians with at least 140 h of acupuncture training [8].

Data collection

At baseline, patient age, gender, school graduation, highest educational degree, occupational status, living situation, diagnosis, health insurance status, and health insurance type were assessed. Data collection regarding safety parameters was performed by questionnaires for patients and their study physicians after a complete treatment cycle.

If either the patient or the physician reported the presence of any side effect caused by acupuncture (adverse reaction) in a short first questionnaire, both received a detailed second questionnaire to report additional information about it, including frequency, duration, time between needle acupuncture and reaction, and treatment need because of adverse reaction. All questions of the questionnaires regarding the safety outcomes in the ARC studies are listed in the supplementary material (see additional file 1).

In the present analysis, only the acupuncture treatment groups ACU and NR-ACU that received the immediate acupuncture (n = 44,818) were considered because the CON group received different types of questionnaires regarding the safety parameters.

Safety parameters

We used the following definitions of the CONSORT statement to differentiate the safety parameters: adverse events are ‘harmful events that occur during a trial.; in contrast, adverse reactions are defined as ‘events for which a causality link to the tested intervention is well established and strong enough (sensitive and specific)’ [35]. Several other institutions in the healthcare sector such as the Food and Drug Administration (FDA), Europeans Medicine Agency (EMA), World Health Organization (WHO), or the German Federal Institute for Drugs and Medical Devices (BfArM) define adverse events and adverse reactions, including drug reactions, in a similar, but not identical way. The definitions are listed in the supplementary material (see additional file 2). In a review by Edwards and Aronson, the differentiation was explained as follows: ‘The terms adverse effect and adverse reaction are interchangeable’ and ‘must be distinguished from adverse event.’ [36].

Statistical analysis

The primary analysis assessed the agreements for all adverse reactions, which were classified into six categories: i. BLEEDING/HAEMATOMA, ii. INFLAMMATION, iii. PAIN, iv. VEGETATIVE SYMPTOMS, v. NERVE IRRITATION/INJURIES, and vi. OTHERS. The frequencies of the reported adverse reactions are listed, and a description in text form according to the European Commission guidelines is given: very common (≥1/10), common (≥1/100 to < 1/10), uncommon (≥1/1000 to < 1/100), rare (≥1/10,000 to < 1/1000) and very rare (< 1/10,000) [37].

In addition, the agreement between patients’ and physicians’ reports was assessed using Cohen’s kappa (κ), a coefficient that measures inter-rater agreement corrected for agreement [38]. Kappa can take values from − 1 to 1 and can be interpreted in accordance with the five levels by Landis and Koch: less than 0.00, poor; 0.00 to 0.20, slight; 0.21 to 0.40, fair; 0.41 to 0.60, moderate; 0.61 to 0.80, substantial; and 0.81 to 1.00, almost perfect agreement [39]. Note that the observed agreement is prevalence-dependent, but the agreement by chance is not.

To assess the association between self-reported adverse reactions by patient or physician (yes/no) and patients’ characteristics, a logistic regression approach was used. Because different participants were treated by the same study physician, the data are clustered. The effect of clustered data was estimated with the intraclass correlation coefficient (ICC) and the design effect (DE) [40]. The ICC estimates the correlation (similarity) of patients’ and physicians’ reports for patients of the same physician based on a null model, which represents a regression model with the variable for clustering only but no further covariates. If the ICC is near 1 and the design effect is much higher than 1, a clustered data structure is present. In the multilevel model developed by Laird and Ware [41], the clustered data structure can be taken into account in the model with physicians as random effects and patient characteristics as fixed effects. In a sensitivity analysis, generalized estimated equation (GEE) models by Liang and Zeger were used [42].

All analyses were performed with the statistics software R (The R Foundation for Statistical Computing, Vienna, Austria, version 3.1.1.) and the packages lme4 and geepack for clustered data regression based on the data set in SPSS format (IBM SPSS Statistics 19). An explorative significance level of 0.05 was used, and multiple test corrections were not applied. Note that the significance of all results (or confidence intervals) should be interpreted only exploratively. Furthermore, it should be noted that for all the following results, missing data were not imputed, and the analyses were based on the respective available data.

Results

Patient characteristics

In the ARC studies, n = 44,818 patients received immediate acupuncture treatment in the ACU and NR-ACU treatment groups performed by 6727 physicians. Patients who received acupuncture treatment were on average 48.5 ± 14.1 (mean ± standard deviation) years old, and 67.5% were women (Table 1). Of the included patients, 37.1% had at least a high school degree, 59.5% were employees, and 83.8% live in a multi-person household. Characteristics were similar in the ACU and NR-ACU groups. The most common diagnoses for inclusion in the study were headache and neck pain in both groups (ACU: 26.9%, 30.1%; NR-ACU: 29.6%, 26.7%, respectively). On average, approximately seven patients were treated by one study physician (6.6 ± 9.1, median = 4), with a range of only one to more than 50 patients per physician.

Table 1 Baseline characteristics for the acupuncture in routine care (ARC) patients by the treatment groups randomized acupuncture (ACU) and non-randomized acupuncture (NR-ACU)

Comparing patient and physician reports

The comparison of patient- and physician-reported adverse reactions during the trial is provided as absolute frequencies, proportions (%), and categories (Table 2). It shows differences between patients’ and physicians’ ratings for the main categories: BLEEDING/HAEMATOMA (patients: 2458 (6.7%), considered as ‘common’ vs. physicians: 255 (0.6%), ‘uncommon’), PAIN (636 (1.7%), ‘common’ vs. 207 (0.5%), ‘uncommon’), INFLAMMATION (136 (0.4%), ‘uncommon’ vs. 16 (0.04%), ‘rare’, NERVE IRRITATION/INJURIES (90 (0.2%), ‘uncommon’ vs. 35 (0.1%), ‘rare’), and OTHERS (420 (1.1%), ‘common’ vs. 158 (0.4%), ‘uncommon’). However, VEGETATIVE SYMPTOMS was reported in the same frequency category by patients (229 (0.6%), ‘uncommon’) and physicians (136 (0.3%), ‘uncommon’). The proportions of physicians’ to patients’ reports for the adverse reaction categories are illustrated (Fig. 1).

Table 2 Frequency of reported adverse reactions by patients and physicians sorted by categories in frequencies and proportions, description in text form, and agreement as Cohen’s kappa (κ) coefficient with 95% confidence interval (CI) for 36,792 and 42,811 available data of 44,818 patients
Fig. 1
figure 1

Physicians (light grey inner circle) to patients (outer circle) reported adverse reactions in six categories: bleeding/haematoma (n = 255 physicians and n = 2458 patients reports, the ratio represents 10.4%), inflammation, pain, vegetative symptoms, nerve irritation/injuries, and others (for numbers see Table 2)

Based on available data, 79% (n = 696) of the patients stated that they had informed their physician about their adverse reaction, whereas only 25% (n = 426) of the physicians reported they had learned this from their patients. Most of the physicians (88.5%, n = 1512) had not observed the adverse reactions themselves.

The agreements between patient- and physician-reported adverse reactions as measured by Cohen’s kappa differed for the various categories (Table 2). Depressive mood had the highest kappa value of 0.50, which represents a moderate agreement between patient and physician. Anxiety (Cohen’s kappa 0.35), tinnitus (0.29), and diarrhoea (0.29) also showed higher values. Many kappa values, however, represented only slight agreement with kappa values between 0.0 and 0.2 (not unexpected due to small prevalence). For several adverse reactions, the kappa value was estimated as agreement by chance (kappa = 0), e.g., for local infection, generalized muscle pain, increase in blood pressure, nerve injury, even though these adverse reactions had above-average prevalence. Specific questions on serious adverse events were included in the physicians’ questionnaires (see additional file 1), but frequencies were too low to be analysed meaningfully.

Association between reported adverse reactions and baseline characteristics

The ICC, estimating the similarity of patients’ and physicians’ reports for patients of the same physician, was 0.12 based on patients’ reports and 0.90 for physicians’ reports. Therefore, 12 and 90% of the total variability is between the patients treated by the same study physicians, and this effect will be considered in the subsequent logistic regression analyses.

To assess associations between patients’ characteristics and patient-reported or physician-reported adverse reactions (yes/no), a multivariable multilevel logistic regression was applied. Female patients showed higher odds of reporting adverse reactions than males (OR 1.96, 95% CI [1.76;2.17], Table 3). Patients who had agreed to be randomized showed higher odds of reporting (1.24 [1.11;1.39]) than patients who had not agreed to randomization. Older patients (for a 10-year increase in age) reported significantly less adverse reactions (0.82 [0.82;0.90]). Patients with a higher educational degree were more likely to report adverse reactions (1.39 [1.22;1.59] for 12/13 years in school; 1.16 [1.05;1.29] for academic degree in college/university) than patients with a lower degree.

Table 3 Association between patient characteristics and reporting of adverse reaction (yes/no) from patients and physicians (multivariable multilevel logistic regression, yielding adjusted* odds ratios and 95% confidence intervals (CI))

For physicians, the tendencies for associations are similar, but the ORs are less precise (Table 3). Study physicians reported significantly more adverse reactions for female than for male patients (2.39 [1.87;3.15]), for patients with higher degrees (2.32 [1.65;3.26] for 12/13 years in school), and for patients who had agreed to be randomized before the studies (2.10 [1.59;2.78]). The differences between the two statistical approaches, multilevel model and GEE models, are negligible (GEE model results not shown).

Discussion

We evaluated the reporting of adverse reactions in a secondary data analysis of a large semi-randomized controlled clinical trial on acupuncture for chronic pain patients. We compared patients’ and physicians’ reports regarding the frequency of adverse reactions and evaluated their agreement. Overall, the patients reported on average three times more adverse reactions than their physicians. The most commonly reported adverse reaction was bleeding/haematoma for both patients and physicians, similar to a study by Witt et al. [13]. Despite this, many types of adverse reactions were seldom reported, especially life-threatening adverse reactions such as pneumothorax [13, 20, 43]. No or only slight chance-corrected agreements existed. However, differences in actual frequency did not necessarily result in differences regarding frequency category commonly used in product descriptions [37]. Moreover, we observed that the chance of reporting an adverse reaction either by the patient or the physician was higher for patients who had agreed to be randomized at baseline, i.e., who were willing to participate in an RCT, were female, and had a higher education degree. Various reasons might explain the difference between patients’ and physicians’ reporting. In general, the physician is equipped with more medical knowledge than the patient due to long-term medical training and professional experience, which can impact the reporting of adverse reactions, especially when the causality is vague. Indeed, it is feasible that the patient is best positioned to report his or her own symptoms [44].

The communication about the treatment and about its potential adverse reactions, the motivation and time for reporting, the disease treated, and the general educational background might also impact reporting. Furthermore, the method of documenting adverse reactions might have an impact. In our study, the patient’s question used to specify adverse reactions offered tick boxes for bleeding/haematoma and local inflammation as examples, whereas the physician’s included a free text answer to this question. This and the fact that these reactions might not be considered side effects by acupuncturists from the perspective of traditional Chinese medicine have possibly contributed to the differences in these categories. Although we included a comparably large number of cases and treatments with more than 44,000 chronic pain patients, conclusions regarding the specific frequency of adverse reactions in acupuncture should be drawn very carefully when they are only based on our present study. For the evaluation of acupuncture safety, other even larger studies were specifically designed, performed and published [13, 17, 45].

This study has some limitations. Firstly, we used secondary data from December 2000 to July 2004 which were not primarily designed to evaluate the differences between physicians’ and patients’ reporting of adverse reactions. The primary aim of the ARC trials was the evaluation of efficacy, while evaluating safety was only one of many secondary outcomes. We did not adjust for multiple comparisons.

A further problem is that the patients and physicians do not rate completely independent because many adverse reactions are invisible to the study physician and have to be reported by the patient to the physician first, and the physician might explain the definition of adverse reactions to the patient. Hence, physician’s reports based on the patient’s reports could validate the patient’s report, assuming the physician’s assessment can serve as the gold standard. However, this might cause under-reporting, whereas over-reporting of too many unjustified adverse reactions could cause difficulties when explaining the safety characteristics of the intervention in a real life setting. A further limitation of our study is that the assessment of adverse reactions was based on retrospective self-reports, which can be influenced by recall bias [46]. The lack of differentiation between adverse events and adverse reactions caused by acupuncture could be an additional reason for the differential reporting. Definitions according to WHO, FDA or EMA differ by nuance [35, 47,48,49]. In the literature, these terms are sometimes used synonymously (e.g., [50, 51]). The exact definitions of reputable institutions are listed in the supplemental material (see additional file 2).

In the physicians’ questionnaire in this study, we included a definition for an adverse reaction that referred to its noxious and unintended character to separate it from an adverse event. In contrast, the patient’s questionnaire did not include any explanation to improve clarity and usability of the questionnaire and because only adverse reactions and not adverse events had to be reported by patients. The difference in the questionnaires may to some extent explain differences in reporting of some adverse reactions, such as bleeding/haematoma that is sometimes intended by acupuncturists or pain, but not the differences for adverse reactions such as vertigo or fatigue. For future studies, we recommend a similar application of written definitions for both the physician and the patient questionnaire. However, tick boxes or free text should also be applied in a similar way. Not only who assesses but also how the assessment is performed can cause large differences in reported rates as shown in an RCT by Bent et al. [52]. This study compared three methods (1. an open-ended question, 2. an open-ended, defined question, and 3. a checklist of 53 common side effects) to assess adverse events experienced by study participants. The percentage of patients reporting any adverse events was much higher in the group using the checklist (77%) than in the first (14%) or second group (13%). This demonstrates the complexity of reporting and standards.

Strikingly, most of the studies on the safety characteristics of acupuncture are either based on therapists’/physicians’ or patients’ reporting but not of both [53,54,55,56]. Fromme et al. investigated the clinician reporting of adverse reactions during chemotherapy [57]. In the study, 37 men with prostate cancer reported their adverse events, and the agreements with the study physicians using Cohen’s kappa was determined. The total Cohen’s kappa value was 0.15, which represents slight agreement, and was similar to our results. For rheumatoid arthritis, the reporting of adverse drug events between patients (n = 4246) and physicians differed; even for serious adverse events, the agreement was only 37% [58], whereas patients reported more events, which is similar to our results.

In a study comparing adverse events reported in post-discharge patient interviews with adverse events detected by medical record review, the agreement for adverse events (kappa = 0.20) and serious adverse events (kappa = 0.33) was low and comparable to our agreement results [59]. In contrast, in an oncology study in 2005, the agreement of 400 patients with their clinicians was higher (kappa up to 0.5) [60]. Especially, for observable reactions, the agreement was higher than for subjective ones [60].

A standardized reporting and documentation of both adverse events and reactions is essential [35, 61, 62]. For drug safety, the FDA developed a reporting system in 1998 [63]. For non-interventional pain studies, there are guidelines as CONSORT or ACTTION to document adverse reactions, but even these guidelines do not provide a high degree of detail [64]. In oncology, there are currently some documentation tools to combine the analytical and professional side of physicians and individual patient’s side using quality of life, symptoms, and patient-reported outcomes to enhance patient-clinician communication and to enable early detection of toxicities [60, 65]. In a British acupuncture study, the adverse events were monitored in a standardized way with self-reports by patient at each acupuncture session [66]. However, the documentation of safety by patient reports is still not standardized in clinical trials and health care. Even for the obligatory adverse drug reaction documentation, various information systems are used [67]. Standardized electronic web-based documentation software or intuitive mobile apps in contrast to classical methods (phone, questionnaire) could support the complete and harmonized documentation of adverse reactions [67, 68]. Further, it is important to differentiate between adverse events and adverse reactions and to evaluate a possible causal link to the intervention.

We think that both patients’ and physicians’ reports should be included when evaluating safety aspects of a medical intervention while electronic documentation tools might support this. Patients (or their relatives) can play an important role in signalling safety aspects in clinical trials as well as in routine care [62, 69] and can help the patient-centred approach in the future.

Regular communication between the physicians, other clinical staff and patients and the standardization of documents, including clarification of definitions, might help to minimize differences.

Conclusions

In our study, patients’ and physicians’ reports of adverse reactions of acupuncture differed substantially, possibly due to differences in patients’ and physicians’ questionnaires and definitions. The use of frequency categories has been shown to be useful and able to compensate for reporting differences. For the assessment of safety parameters, we strongly support the inclusion of both patients’ and physicians’ reports while ensuring standardization of data collection and definitions.