Background

Cancer is a major public health concern [1], with quality of life (QoL) as a fundamental variable of the patients’ well-being [2, 3]. The European white-berry mistletoe (Viscum album L.), an ever-green plant that grows as a semi-parasite on trees, has a long tradition in the treatment of cancer patients, particularly in continental Europe [4].

Viscum album extract (VAE) is applied subcutaneously, normally two to three times per week whereas the complete treatment duration varies from some weeks up to five years and more. Different products are available such as ABNOBAViscum, Helixor, Iscador, or Lektinol.

Mistletoe contains biologically active molecules including lectins, flavonoides, viscotoxins, oligo- and polysaccharides, alkaloids, membrane lipids and other substances [5]. Although the exact pharmacological mode of action of mistletoe is not completely elucidated, there is a growing number of biological studies with a clear focus on lectins. Lectins (from the Latin legere, “to select”) are carbohydrate-binding proteins displayed on cell-surfaces to convey the interaction of cells with their environment [6]. Lectins mediate many immunological activities: For example, lectins show an immunomodulatory effect on neutrophils and macrophages by increasing the natural killer cytotoxicity and the number of activated lymphocytes [7,8,9]. They induce apoptosis in human lymphocytes [10] and boost the antioxidant system in mice [11]. In healthy subjects, the subcutaneous application of mistletoe has stimulated the production of granulocyte-macrophage colony-stimulating factor (GM-CSF), Interleukin 5 and Interferon gamma [12], indicating the immunopotentiating properties of mistletoe. The multiple ways how mistletoe affects the immune system have been recently reviewed elsewhere [13]. In consequence, the immunological pathways of conventional oncological treatments may be influenced by VAE, affecting cancer cells and decreasing adverse effects. This may result in a better quality of life.

A number of reviews has been published over the last two decades that address the effects of VAE on QoL in cancer patients [14,15,16,17,18,19,20]. However, these studies are either out of date, don’t make use of all published evidence, and/or don’t combine the data quantitatively into a pooled effect size.

The aim of this study is therefore to review and analyze the current evidence regarding QoL of cancer patients which were treated with VAE and to calculate a meta-analysis.

Methods

The study has been reported in accordance to PRISMA. The protocol was submitted to PROSPERO (registration number: CRD42019137704).

Sources of evidence

We searched the databases Medline, Embase, PsychInfo, CENTRAL, CINAHL, Web of Science, and clinicaltrials.gov, we used google scholar, hand-searched the reference lists of reviews and identified studies and screened for grey literature via Google and opengrey.org. In case of missing data we contacted the authors.

Search strategy

We developed a search strategy by iteratively combining synonyms and/or subterms of “quality of life” (e.g. well-being, QoL), “cancer” (e.g. neoplasm, sarcom, lymphom) and “mistletoe” (e.g. Helixor, Eurixor) to identify an adequate set of terms. We applied the following search strategy for Medline (Pubmed) and adopted it to the other databases accordingly:

  1. 1.

    quality of life OR HRQoL OR HRQL OR QOL OR patient satisfaction OR well-being OR wellbeing

  2. 2.

    mistel OR mistletoe OR Iscador OR Iscar OR Helixor OR Iscucin OR Abnobaviscum OR Eurixor OR Plenosol OR Lektinol OR Vysorel OR Isorel OR Cefalektin OR Viscum

  3. 3.

    Krebs OR cancer OR neoplasm/ OR tumor OR oncolog* OR onkologie OR carcin* OR malignant OR metastasis

  4. 4.

    Humans [MESH]

  5. 5.

    1 AND 2 AND 3 AND 4

With the exception of #4 the general search fields were applied.

Selection criteria

We included studies that measured QoL or self-regulation of cancer patients treated with mistletoe extracts assessed by performance status scales or patient-reported instruments. Studies were chosen if they were

  • prospective controlled studies with

  • two or more arms,

  • both interventional and non-interventional.

The search was not limited to languages.

Studies were excluded if

  • they did not meet the aforementioned inclusion criteria,

  • if they tested multi-component complementary medicine interventions,

  • if they failed to report sufficient information to be included into the meta-analysis or

  • where this information cannot be gleaned from authors or extracted from graphs.

Data management

The data was extracted from each study and entered into a spreadsheet by two authors independently. Then the extracted spreadsheets were compared and discrepancies were resolved by discussion until consensus was reached. We coded the following characteristics:

  • number of participants in each treatment arm

  • year, when study was conducted; in case this was not given, we estimated a 3 year lag from publication date for the meta-regression

  • duration of study

  • country where the study was conducted

  • cancer type

  • age

  • gender of patients

  • diagnosis according to ICD 10

  • duration of study

  • type of study (interventional vs. non-interventional, randomized vs. non-randomized, blinded vs. not blinded, single vs. multi-center)

  • additional therapy (e.g. chemotherapy)

  • number of drop-outs in each study arm

  • active mistletoe extract preparation (e.g. Eurixor, Iscador, etc.)

  • control treatment (e.g. placebo)

  • effect size of primary outcomes plus standard deviation, or confidence intervals for effect measure provided using the reported global measure of QoL

  • instrument used to measure primary outcomes

  • statistics according to intent-to-treat analysis (yes/no)

  • sponsoring of study (corporate, public, no-sponsoring).

If numerical data provided by the study publication was insufficient to calculate effect sizes, we contacted the authors. In cases where additional data were provided by the authors, these were then used instead of the published data. In older studies this was impossible. In those cases we used the given information (for instance means and confidence intervals, or means and p-values, or statistical information to generate the necessary data). In some cases we had to use medians as means and recover standard errors of the means from the given confidence intervals which also necessitated an adaptation of the confidence intervals into symmetrical ones. In each case we used the more conservative option which yielded larger standard errors and hence larger standard deviations. Thus, we generally opted for an error on the conservative side. When no quantitative information was given, but only graphs were presented, we printed high resolution graphs and derived the mean values and standard errors applying a ruler and used the given statistical information to arrive at the necessary quantitative scores. All these procedures were conducted independently and in duplicate [21].

Risk of bias (quality) assessment

The Cochrane Risk of Bias tool 2 (Rob 2) was used to assess the risk of bias in randomized controlled trials [22]. All studies were assigned to the intention-to-treat-effect-analysis. Non-randomized or non-interventional studies were additionally analyzed with the Newcastle Ottawa Scale [23]. Two reviewers (HW, ML) independently assessed the risk of bias. In case of discrepancies they decided by consensus.

Statistical analysis

The data were analyzed using Comprehensive Meta-Analysis V. 2 and Revman 5.3.5, the summary measure was the standardized mean difference. The meta-analysis was calculated independently by both authors using the two software tools Comprehensive Meta-Analysis and RevMan. The results were compared and underlying discrepancies resolved by discussion until both analyses yielded the same numerical results up to the second decimal. We report the overall analysis according to the results yielded by the RevMan analysis and conducted sensitivity analyses with Comprehensive Meta-Analysis.

The heterogeneity between studies was assessed by the Cochrane Q test and quantified by the index of heterogeneity (I2) [24]. A value of I2 of 25, 50 and 75% indicates low, medium and high heterogeneity, respectively. If heterogeneity was higher than 25% we applied a random effects model for pooling the data, else a fixed effects model was used. As heterogeneity was high for the overall data-set, a random effects model was indicated. Fixed effect models were only used sparingly in exploratory subgroup analyses or sensitivity analyses, when heterogeneity was low.

We conducted subgroup analysis in order to identify possible sources of the heterogeneity. Stratified analyses were performed by: study types (e.g. blinded vs. not blinded, randomized versus non-randomized, types of control), additional treatments, country, risk-of-bias status, type of sponsoring, QoL instruments and related dimensions (in particular self-regulation), and mistletoe compound. Type of cancer was not included, as there were too many different cancer types. We conducted meta-regressions and regressed the three continuous predictors year of study, age of patients and length of treatment on effect size. We checked for publication bias using Egger’s regression intercept method and Duval and Tweedie’s trim and fill method [25].

Results

598 studies were identified by electronic and hand searches, after removing duplicates. 67 full texts were retrieved of which 26 publications with 30 separate data sets met the inclusion criteria (see Fig. 1) [26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50]. We contacted 14 authors for additional information which was granted by five [26, 30, 40, 49, 51].

Fig. 1
figure 1

Flow of literature search process.*e.g. not human, ongoing trials, finished trials without reports, results published multiple times

90% of the studies were conducted in Europe including Russia, 50% in Germany, and 10% in Asia. Three trials were blinded, four studies or datasets were not randomized. Different mistletoe preparations with varying conventional treatments were compared to conventional treatment (alone in 22 cases or plus an additional comparator in eight cases, respectively) for multiple types and stages of cancer. In nine studies QoL was measured with EORTC-QLQ-C30, six studies assessed self-regulation, and the others used one or multiple other instruments. The study characteristics are displayed in Table 1.

Table 1 Characteristics of included studies

The results of the overall meta-analysis are presented in Fig. 2.

Fig. 2
figure 2

Overall Meta-analysis of all included data sets

As can be seen the studies are highly heterogeneous (I2 = 84%), and hence the random effects model is applied to estimate the combined standardized mean difference as d = 0.61 (95% CI 0.41–0.81, p < 0.00001, z = 6.05).

The meta-analyses of the sub-dimensions of QoL are shown in Table 2. The SMD of seven out of 14 QoL dimensions are significant (p ≤ 0.05). The pooled SMD of role and social functioning are 0.63 (95% CI 0.05–1.22) and 0.62 (95% CI 0.22–1.03), respectively. For pain, it is SMD = − 0.86 (95% CI -1.54-(− 0.18)) and for nausea, it is SMD = − 0.55 (95% CI -1-(− 0.1)).

Table 2 Effect sizes of sub-dimensions of QoL that could be pooled by meta-analyses (positive [negative] values in functioning [symptom] dimensions indicate improvement for VAE vs. control)

The risk of bias assessment is displayed in the Figs. 3 and 4. 65% had an overall high risk of bias which resulted for most studies from the 85% high risk of bias in the measurement of outcome. This can be attributed to the missing blinding process, the QoL assessment as patient-reported outcome, and the uncertain appropriateness of some measurement instruments which may only incompletely capture the concept of QoL.

Fig. 3
figure 3

Summary of risk of bias assessment as percentage (intention-to-treat)

Fig. 4
figure 4

Risk of bias assessment by domain and overall bias

The five non-randomized trials [31, 34,35,36, 42] were additionally assessed with the Newcastle-Ottawa-scale. All studies had an overall score of 7 out of a maximum of 9. The sums in the selection, the comparability and the outcome/exposure domain were 3, 2 and 2, respectively, for all studies.

The sensitivity analyses are presented in Table 3.

Table 3 Sensitivity analyses according to various moderators

The sensitivity analyses confirmed the robustness of the results. Neither the methodological nor other moderator variables showed strong deviations. With the exception of the non-randomized studies (non-randomized: d = 0.38, p = 0.1), the lung cancer studies (d = − 0.18, p = 0.15), the studies conducted with Lektinol (d = 0.67, p = 0.1), and the studies using an index measure (e.g. Karnofsky index) as outcome (d = 0.33, p = 0.1) all other moderator analyses showed no appreciable differences between subgroups and yielded highly significant effect sizes. In tendency, methodologically more rigorous studies yielded higher or equally high effect sizes than less rigorous ones. Most notably, randomized studies yielded a higher effect size (d = 0.70, p = 0.001) than non-randomized ones (d = 0.38, p = 0.1). Studies using active controls (d = 0.6, p = 0.004) did not differ from studies using other controls (d = 0.65, p < 0.001). Various types of additional treatment did not show differential effect sizes, except individualized best care, which, however, is an estimate based on only one study and hence not reliable. Although the effect sizes of the various products vary, their confidence intervals overlap, and hence suggest the conclusion that they are roughly equally effective. There is no difference in effect sizes depending on countries, type of sponsoring, or type of measures. Studies that relied on corporate sponsoring, and studies using only a single index measure yielded a somewhat smaller effect size, although confidence intervals overlap and thus signal non-significant differences.

The three meta-regressions are presented in Table 4 and in Figs. 5 and 6.

Table 4 Meta-regression results
Fig. 5
figure 5

Scatterplot of the meta-regression of study year on effect-size

Fig. 6
figure 6

Scatterplot of the meta-regression of treatment duration on effect size

The study year is positively correlated with effect size. For each year the study was more recent the estimated QoL-effect size is larger by d = 0.03. Although there is a tendency for a larger effect size in younger patients this effect is not significant. The slope of the regression line for the duration of treatment is borderline significant, indicating that longer treatment produces effects that are 0.04 standard deviations larger per additional treatment week. Note though that only treatments between 5 and 52 weeks have entered the analysis and the variance is not large.

Publication bias was estimated using two methods. Egger’s regression intercept model regresses effect size on precision of study with the assumption that smaller studies that are less precise will more often go unpublished. A regression line with a lot of smaller and more imprecise studies missing should thus miss the origin by a large margin. In our analysis the intercept of the regression is 0.82 with a non-significant deviation from the origin (t = 0.65, p-value two tailed = 0.5). Duval and Tweedie’s trim and fill method is an extension of the graphical funnel plot analysis and analyzes how many studies would have to be trimmed to generate a perfectly symmetrical funnel plot. In our analysis this method estimates no studies to be trimmed on the left side, i.e. on the negative or low side of the effect size estimate and an estimate of 7 trimmed studies on the right side, i.e. on the positive side of the effect size funnel with an adjustment that leads to a higher effect size, if the studies are trimmed. These two analyses of publication bias show that publication bias is not a likely explanation of this result and any funnel asymmetries are not due to unpublished studies but due to positive outliers.

Discussion

This meta-analysis shows a significant and robust medium-sized effect of d = 0.61 of Viscum album extract (VAE) treatment on QoL in cancer patients.

The results should be regarded in the light of the following facts:

The included studies vary with regard to the cancer site, the control intervention, the additional oncological treatment, and the VAE. While sensitivity analyses confirmed the robustness and reliability of the findings, they could not account for the heterogeneity of the effect sizes. Neither methodological moderators (blinded vs. unblinded studies, randomized versus non-randomized, studies with high versus low risk of bias, active versus non-active control) nor structural moderators (type of outcome measure, funding, VAE product used, additional treatment) could clarify the heterogeneity. We suspect that this is due to multiple interactions between cancer types and stages, treatments and structural variables that cannot be explored with a limited set of 30 studies. Nevertheless, our sensitivity analyses document the overall robustness of the effect, as none of the levels of moderators exhibits significant deviations from other levels or from the overall effect size. This gives our effect size estimate of d = 0.61 reliability.

Although the overall risk of bias is high in many studies, one should bear in mind two aspects. First, we applied the intention-to-treat-algorithm of Rob2 as the more conservative approach and not the per-protocol evaluation which may have resulted in a better overall bias. Second, due to the local skin reaction of VAE application the blinding of participants and carers is practically impossible and could only have been implemented reliably with an active placebo, which is ethically questionable. In Rob2 this leads to a high risk of bias in the measurement of the outcome. On the one hand, the lack of blinding might have biased the results since most QoL are self-reported and there may be strong beliefs among users of anthroposophic medicine which might additionally be fortified by the severity of the disease and the hope that an additional treatment has a positive impact [53]. It was shown that these attitudes are correlated with a better QoL [54]. On the other hand, there is no evidence from the included studies that the attitudes differed between treatment arms and if patients in the control group searched and used for surrogate medications for the VAE, this bias would favor the comparator. Furthermore, our sensitivity analysis gives no indication that studies with blinding and without blinding estimate different effect sizes and there is also no difference in effect sizes between studies from Germany – where mistletoe is well known – and other countries where mistletoe is less known and used. The results of the Newcastle-Ottawa-Scale, finally, indicate a good methodological quality for the non-randomized trials that were included in the review.

Another limitation is that self-regulation, the Karnofsky performance index, or the ECOG scale cover important aspects of QoL, but are different in content from other measures such as the global QoL of the EORTC-QLQ C30 scale. This source of heterogeneity was also addressed by our sensitivity analyses. This showed that, indeed, as one would expect, single item indices estimate lower effect sizes, although the difference is not significant. In the same vein, the inclusion of non-randomized and non-interventional trials might have biased the results due to their lower internal validity, but their exclusion during sensitivity analyses again did not alter the significance of the pooled outcome. In addition, four of the five non-RCTs had a matched-pair design which increases the comparability between treatment arms compared to other types of group allocation.

The meta-regression shows that more recent studies have higher effect sizes compared to older studies. This is counterintuitive at first sight, as normally more recent studies are implemented with more methodological rigor due to the GCP guidelines and a higher methodological skill of trialists. This, one would think, should, if at all, lead to smaller effect sizes in more recent trials. The fact that this is not the case shows, together with our sensitivity analysis that methodological bias is an unlikely explanation for the effect size found. However, another point is worth bearing in mind: earlier studies were very often implemented with severely ill patients with tumor status IV or in palliative care. Only in more recent studies was VAE also used as add on treatment in first line patients with a relatively good chance of surviving. Thus the higher effect size for more recent studies might also reflect the less severe status of these patients.

Our review has a number of strengths. First, we conducted a comprehensive search for published and grey literature with no time or language limitation to minimize publication bias. Our analysis of publication bias supports the conclusion that the effect size estimate is not due to publication bias. Some authors who we contacted, however, failed to provide additional information and the respective studies were consequently excluded. Second, we calculated a pooled SMD for a global measure of QoL and for its subdomains such as pain or fatigue. Third, we analyzed the data both with Revman 5.3 and CMA software which implements the Hunter-Schmidt-corrections for small sample bias. We did both analyses in parallel and independently, thus preventing coding or typing errors from biasing our results.

The weaknesses of this review are obvious. Any meta-analysis can only be as good as the original studies entered. Some of these studies are large and methodologically strong. But some are also badly reported, small and with a mixed patient load. In some cases we had to recalibrate confidence interval estimates, because the data given were not detailed enough. Although it would have been desirable, the variance between cancer types and stages was too large to allow for detailed assessments and separate analyses, which might have reduced the heterogeneity. Although we can testify to the robustness of the overall effect size estimate, we have not succeeded in clarifying the heterogeneity of the studies. This requires multi-center studies in large cohorts of patients with large budgets. Thus, one consequence of this meta-analysis would be to call for more serious efforts from public funders to study the effects of VAEs in large and homogeneous patient cohorts to confirm or disconfirm the results of this analysis.

Clinical relevance

Our results indicate a statistically significant and clinically valuable improvement of the subjective well-being of patients with different types of cancer after the treatment with VAE. The analyses for the subdomains revealed a significant pooled SMD for important symptoms and functioning indices, whereas other show a positive, yet not significant effect of VAE compared to control. Whether these vital elements of QoL such as emotional functioning or fatigue are influenced remain statistically uncertain. Overall, a robust estimate of an improvement of d = 0.61 in quality of life represents a medium-sized [55] and clinically relevant [56, 57] effect that makes VAE treatment a viable add-on option to any anticancer treatment.

Conclusion

Our analysis provides evidence that global QoL in cancer patients is positively influenced by VAE. Because the risk of bias and the heterogeneity is high, future research needs to better assess the actual impact. Large studies in homogeneous patient populations are required to address these problems.