Introduction

Esophageal cancer is the ninth most frequent cancer worldwide and its incidence is rising. In recent decades knowledge of cancer has improved dramatically, but this has led to only a small improvement in survival rates of patients with esophageal cancer. A radical esophagectomy is by many considered to be the best treatment that can offer long-term survival with good locoregional control. More than half a century after surgery for esophageal cancer first took off, most clinicians now strongly feel that a multidisciplinary approach is necessary to further improve the outlook for patients with this disease. One such approach is preoperative chemoradiotherapy (CRT), a strategy that has received much attention recently. Quite a number of phase II studies and randomized phase III trials (RCTs) on neoadjuvant CRT as compared to surgery alone have been published. Also, the outcome data of these studies have been combined and reported as meta-analyses. In the present study we examined and critically appraised trials on neoadjuvant CRT in esophageal cancer.

What is a meta-analysis?

A meta-analysis is a review in which bias has been reduced by the systematic identification, appraisal, synthesis and, if relevant, statistical aggregation of all relevant studies on a specific topic according to a predetermined and explicit method [1, 2]. The advantages of meta-analyses are that the process of review generally is transparent, valid, and reproducible. Meta-analyses can be appealing because “significance” can be attained when small groups are pooled into big ones and new scientific hypotheses, that had inconclusive results or that had not been originally tested, can be examined in subgroups. Such reviews of well-performed, adequately powered randomized controlled trials are considered to be the highest level of evidence. A meta-analysis can thus serve as an efficient method to get a quick and valid insight into a clinical question and may serve as a policy foundation for evidence-based practice guidelines.

Discordance of meta-analyses

However, meta-analyses on the same topic can be discordant [3, 4]. They may differ with respect to the direction of the estimated effect or, if the direction is the same, with respect to the effect’s magnitude or statistical significance. This is the case with meta-analyses published on neoadjuvant CRT in esophageal cancer. Two of the six published reviews do not show a significant survival benefit in patients with resectable esophageal cancer (Table 1). This apparent lack of benefit may lead to differences in translation of the available evidence for the use of preoperative CRT. For example, in the United States and Australia, CRT followed by surgery is now considered to be standard treatment for resectable esophageal cancer, whereas in the Netherlands, surgery alone is still favored, and neoadjuvant CRT is only applied within the setting of a clinical trial [5].

Table 1 Characteristics of the meta-analyses that investigated the value of neoadjuvant chemoradiation in resectable esophageal cancer

It is therefore important to gain insight into the methods behind the published reviews in order to make judgments on the applicability of the findings in clinical practice and decision making. Jadad et al. summarized several sources of discordance among meta-analyses [3], including differences in the clinical question, selection of studies and inclusion, data extraction, ability to combine studies, and statistical methods for data analysis. In the following paragraphs we discuss these issues in detail, within the light of the currently available trials on neoadjuvant CRT in esophageal cancer.

Research question, search strategy, inclusion and exclusion criteria

The first step is to examine if the different reviews address exactly the same question under consideration. All 6 meta-analyses on neoadjuvant CRT in esophageal cancer addressed the following research question: “Is there a benefit of preoperative CRT compared with surgery alone for resectable esophageal cancer?”

Next, a search strategy was undertaken to identify all possible studies that should be included in the meta-analysis. Differences in search strategy (e.g., by using different search terms to screen databases) may reflect differences in included trials among reviews. Ideally, researchers search all available sources of information to identify all relevant studies addressing a particular research question. Medline (on PubMed), Cancerlit, Cochrane Library, and EMBASE are generally screened using key words, and manual searches (cross checking reference lists or abstract books from major congresses) are also done in most cases.

As shown in Table 2, not all reviews have included the same primary trials. It is of importance to determine which search and selection process is least likely to be biased. The publication by Gebski et al. includes data from three recent studies that could not be incorporated in the earlier published meta-analyses [6]. The inclusion of more studies should not lead to systematically different conclusions, but it does lead to increased precision (reduced random effect). The review by Urschel et al. included two studies published as an abstract only, whereas other meta-analyses limited their inclusion criteria to peer-reviewed, full articles [79].

Table 2 Meta-analyses on neoadjuvant chemoradiotherapy for esophageal cancer and randomized controlled trials (RCTs) included (+) or not included (–) in their analyses

Only in one meta-analysis, were unpublished data from a thesis incorporated and used for statistical pooling [6]. Although one study found that there were no substantial differences in study quality between published and unpublished clinical research studies, another suggested that intervention effects reported in journals were 33% greater than those reported in doctoral dissertations [10]. Meta-analyses limited to published trials, compared with those that included both published and so-called “gray literature” (literature difficult to locate or retrieve), overestimated the treatment effect by an average of 12% [11]. In conclusion, including unpublished reports tends to yield less bias and should be aimed at.

It should also be kept in mind that not including trials that haven’t been published because of a negative result could well lead to bias (publication bias). Two meta-analyses investigated the likelihood that publication bias was present. One study did not suggest publication bias against negative trials [9]. Another test for potential publication bias yielded an estimate of nine potentially unpublished studies from chemoradiotherapy [6].

Some reviews excluded non-English language studies [6], and this also introduces a potential source of bias, as there is no evidence to support differences in quality between studies reported in the English literature versus non-English language publications [1214]. Why Kaklamanos et al. excluded the study of Apinop et al. from their review is unclear [8, 15]. Although the latter study only included patients with locally advanced esophageal cancer, the tumors were otherwise operable.

To further minimize bias, the search and selection for potential trials to be included in meta-analyses should be done by two or more independent investigators, something that was done in only one of the meta-analyses dealt with here [16].

Data extraction and analysis

Meta-analyses of data on individual patients are considered the yardstick against which other reviews of RCTs should be measured. Data on individual patients afford the opportunity for reviewers to measure outcomes more uniformly, to compare outcomes measured at different times, to use intention-to-treat analysis, to conduct flexible subgroup analysis, and to update follow-up information [2]. Despite multiple attempts made by Greer et al. and Kakalamanos et al., they were unsuccessful in obtaining individual patient data from the trials included in their meta-analysis [8, 16]. Gebski et al. were able to evaluate individual patient data from two studies [6].

Data extracted from the primary studies may differ. It is therefore important to identify the review that takes into account the outcome measure most relevant to the clinical question. As shown in Table 1, the outcome measures vary widely between studies. But there may also be differences in data extraction due to human error, biased extraction, or misprints. Only in the study by Urschel et al. did two independent researchers perform the data extraction, and this was done in duplicate (not blinded) [9].

Quality of the primary studies

Before analyzing all trials together, the design of the trial, the treatments, the population of patients included, the quality of the trial, and a summary of the results must be assessed. It is important to find out if the trials are similar enough to be combined, to get an understanding of the types of patients studied and finally, to assess the quality and availability of the data. The authors from the meta-analyses should also state the reason for excluding trials from the analysis.

Studies can be included in a meta-analysis that are not well-designed controlled trials. A good meta-analysis of badly designed studies will still result in bad statistics: garbage in, garbage out [17]. Given the rather limited number of RCTs on the role of neoadjuvant CRT in esophageal cancer, the article selection process was designed to be inclusive as opposed to exclusive in all reviews, and trials were not excluded because of trial quality. However, assessment of the methodological quality of the trials included is an essential component of systematic reviews [18]. How the quality of the primary studies was assessed (by what method) should be included in a review. Reviews that address these issues are likely to be more rigorous that those that ignore trial quality. In only two of the reviews in this report was primary trial quality assessment performed (Urshel et al. and Fiorina et al.); the other four reviews ignore this important issue. In the study of Urshel et al. the quality of the included trials was rather poor (mean score of 2.1) as judged on a 5-point Jadad scale [9, 19]. The authors explain this by saying that it is likely due to the importance placed on blinding in this scoring system, and the inherent difficulty in blinding treatment, such as in CRT trials. Furthermore, most of the included primary studies did not report details of the randomization methods. Gebski et al. judged, however, that allocation concealment was not assumed to be compromised and they did not incorporate a quality assessment in their review [6]. Fiorica et al. stated that three studies did not clearly define criteria for handling withdrawal [7].

There are no RCTs that have a sufficiently large sample size. In only two studies were more than 100 patients randomized. The EORCT study by Bosset et al. reported on 293 patients and the study of Burmeister et al. included 256 patients [20, 21]. Moreover, it should be underscored that these two studies are not similar: sequential CRT was given in the study of Bosset et al. whereas Burmeister et al. applied concurrent CRT.

Two trials show a survival benefit of neoadjuvant CRT, but these trials have also been criticized by many investigators [22, 23]. The most recent of the two, by Tepper et al., was stopped prematurely due to lack of accrual [24]. Although the trial indicates a benefit in overall survival and progression-free survival for patients treated with trimodality therapy, the statistical methods used can be heavily criticized [25]. It is clear that a phase II clinical trial designed to definitively indicate whether chemoradiotherapy plus surgery is superior to surgery alone is difficult to conduct. Consequently, the results of meta-analyses to answer this question gain increased importance. In both studies reporting survival, that of the control group (no CRT) was very low, with a 5-year survival rate of 16% and a 3-year survival of 6%; also, the quality of the surgery performed is open for discussion.

Two other published series on neoadjuvant chemoradiation are from Lee et al. and from Burmeister et al. [21, 26]. The latter study is the largest with concurrent chemotherapy and radiotherapy followed by surgery. However, this study can also be criticized. Tumor staging was based on “old-fashioned” staging modalities, and no stratification for tumor type was performed. There was a low percentage of complete pathological response in the adenocarcinomas (7.5%), and 15% of the included patients received a different chemotherapy regime. Furthermore, postoperative radiotherapy was allowed but not standardized. Finally, a rather large number of R1/R2 resections was reported (59%) in the control (surgery only) group. Clearly, this study does not reflect current surgical outcomes in medium- to high-volume esophageal cancer centers.

Clinical heterogeneity

In a meta-analysis, data from several studies are combined to come to an estimation of the overall effect of a treatment. This is called pooling of the data from the individual trials. In order to do so the trials need to be homogeneous in terms of clinical as well statistical methods. Clinical heterogeneity exists when the patients, the interventions, or the outcome measures are not similar. As already briefly mentioned, there is wide variation between the included primary trials in patient characteristics, in tumor histology, and in radiotherapy and chemotherapy regimens (Table 3). Also, surgical technique was not uniform across the studies. Although the debate over the transthoracic versus the transhiatal approaches continues, there has been evidence to support the equality of these techniques with respect to outcome [27]. However, on long-term follow-up, patients with adenocarcinomas of the distal esophagus may benefit from a transthoracic approach as opposed to transhiatal resection [28]. But perhaps more important, there is considerable heterogeneity observed both in the radiotherapy and in the chemotherapy protocols among studies. Variability in chemotherapy protocols among trials was found in number and type of agents administered and the dose and scheduling of the drugs. Doses of radiotherapy in some studies are considered suboptimal by today’s standards. Not only the total dose varies between studies (between 35 Gy and 50.4 Gy), but also the daily dose differs (between 1.75 Gy and 3.70 Gy) and the number of fractions (between 10 and 30) administered over 2–4 weeks. Also, the timing of delivery varied: sequential in three studies versus concurrent in seven studies. It is difficult to determine whether differing results with respect to overall survival rate are due to more effective CRT regimens in some trials.

Table 3 Characteristics of randomized controlled trials on neoadjuvant chemoradiotherapy for esophageal cancer

Stage at time of diagnosis by study treatment arm was unavailable in one case [29]. Greer et al. noticed a greater proportion of patients with advanced disease (stage II or greater) in the surgery-alone arm [16]. However, only two studies enrolled patients with advanced disease [15, 20], and Walsh et al. reported only the stage of disease post-CRT [30]. This may have resulted in an overall “downstaging” of patients in the CRT group, giving a false impression that patients in the surgery-alone arm had more advanced disease. Uncertainty about the true baseline characteristics of patients limits our ability to interpret the effect of preoperative stage on outcome.

How to handle heterogeneity?

Judgments based on clinical and biological understanding of the disease and processes and mechanisms of action of interventions can be used to determine whether it makes sense to pool the results of particular studies with those of other studies. This issue rests with the clinician. Reviews that address whether results can be combined and that make efforts to test the underlying assumptions in choosing the statistical method for pooling data are probably more credible than reviews that ignore such issues [31]. If the results from the primary trials widely differ, despite the assumption of clinical homogeneity, this is called statistical heterogeneity. This can be due to differences in outcome measures between the trials, true differences in outcome between the trials or differences in methodological quality between the trials.

To test for heterogeneity one can perform an analysis (test). However, in the meta-analyses on neoadjuvant CRT this test is rather unreliable and lacks statistical power due to the few studies included [2, 7]. One can also ignore statistical heterogeneity by not adjusting the statistical method. This is called “fixed-effects model.” The downside of applying this method is that it frequently shows significant results with small confidence intervals compared to the “random-effects model.” This latter model, however, corrects for heterogeneity, is a more conservative estimator and minimizes the risk of erroneously assigning benefit to the treatment group if no benefit really exist [2].

Kaklamanos et al. assumed that the studies on neoadjuvant CRT are homogenous and used a fixed-effects model for calculating their summary estimates [8]. Urschel et al. considered that a fixed-effects model was not methodologically sound given the obvious heterogeneity of the trials [9]. Also Greer et al. used a random-effects model to calculate a summary value for relative risk of death [16]. These authors also tested for homogeneity across studies. Fiorica et al. calculated the overall odds ratios with models based both on fixed effects and random-effects assumptions [7].

After calculation of the overall effect in a meta-analysis, one can exclude trials of higher and/or lower methodological quality and measure the effect on the outcome. This is called sensitivity or robust analysis. If the outcome measure of the pooled analysis changes a lot, the result of a review needs to be interpreted with the highest caution. However, this analysis is not a standard or prerequisite part of a meta-analysis. The robust analysis of Fiorica et al. showed that statistical significance for overall mortality was lost after exclusion of either the trial by Walsh et al. or the trial by Urba et al. [29, 32].

In the study by Gebski et al., potential clinical heterogeneity because of different follow up durations between studies starting earlier and later was accounted for by use of hazard ratio as the effect estimate, because it takes into account the duration of follow-up [6]. The authors performed a test of heterogeneity between studies started before 1994 and after 1993. This test suggested that there was no heterogeneity and that combining all studies is appropriate. The same group also excluded one study without changing the result for the overall effect of CRT, which remains significant. If the two unpublished studies are excluded from this analysis, then there is no appreciable difference in the findings.

Another possibility is to exclude heterogeneity by performing subgroup analyses. This should however only been done based on a predefined plan at the time of designing the meta-analysis in order to avoid a fishing experiment and the risk for false-positive results. Subgroup analyses performed afterwards that have not been written in the initial study protocol can only be seen as hypothesis generating.

Subgroup analysis

Adenocarcinomas versus squamous cell carcinomas

Fiorica et al. found that the effect of preoperative CRT on overall survival is much more pronounced and statistically significant in patients with adenocarcinoma [7]. At the same time they warn that the sample size of this exploratory subgroup analysis is small (data obtained from only two trials). If only RCTs addressing squamous cell cancers were considered, the 3-year survival advantage of neoadjuvant CRT plus surgery was less apparent and became nonsignificant in one study [9]. Kaklamanos et al. did not find evidence to suggest that tumor histology is associated with variations in the effect of neoadjuvant treatment on 2-year survival [8]. Finally, by using the raw data from two trials, Gebski et al. found a benefit of CRT over surgery for both histological tumor types (both concurrent CRT schedules) [6].

Concomitant versus sequential radiation therapy

The optimal radiation fraction, dose, and time of delivery are not known. Higher postoperative morbidity was observed in two RCTs in which a fraction dose of >2 Gy was delivered [20, 30]. With respect to the time of delivery, concomitant radiotherapy with chemotherapy has theoretical advantages. The aim of combining neoadjuvant chemotherapy and radiotherapy is to use the radiosensitizing effect of chemotherapy to reduce tumor size and to maximize local control [33]. Fiorica et al. concluded that the available information seems inadequate to determine whether a concomitant regimen of CRT is better than induction chemotherapy followed by radiotherapy [7].

Urshel et al. as well as Gebski et al. found that sequential chemoradiation did not demonstrate a survival benefit at 3 years, as opposed to concurrent CRT [6, 9]. Concurrent CRT had a similar significant benefit for both histological types [6]. To date no trial has been performed that compared CRT delivered sequentially or concomitantly to resolve this apparent contradiction.

Treatment-related mortality and morbidity

Kaklamanos et al. could not find a significant difference in treatment-related mortality between neoadjuvant CRT plus surgery versus surgery alone [8]. The overall rate of postoperative adverse events was not different between the CRT group and the surgery alone group. However, there was a significant effect of CRT on postoperative mortality (90 days) with an odds ration (OR) of 2.1. This increased risk was confirmed in an analysis of observational data including 3,592 patients [34]. Excluding the trials of Bosset et al. [20] resulted in loss of significance [7]. The rate of adverse events was not significantly different between the two treatment arms, but a trend in favor of surgery alone was described for both operative mortality and all treatment mortality.

Are the results clinically relevant?

The magnitude of the overall effect of neoadjuvant CRT is considered clinically relevant by Fiorica et al. [20]. With a number needed to treat (NNT) of 10, preoperative CRT prolongs 3-year overall survival versus surgery alone. On the contrary, there is evidence that CRT significantly increases morbidity and postoperative mortality (number needed to harm; NNH = 25). But fewer patients need to be treated to benefit from the treatment over the long term than need to be treated to be harmed immediately post-surgery. Gebski et al. also calculated the number of patients needed to treat to save one life with a (theoretical) patient population risk if the 2-year survival was 20% (high risk), 35% (moderate risk), and 50% (low risk). Based on their findings of a relative risk reduction of 19% for CRT, the absolute risk reduction was greatest in those at high risk who were receiving CRT: to prevent one death, seven patients would need treatment (NNT = 7). The smallest benefit was for patients with a low risk who were receiving CRT: NNT = 10 [6].

With more rigorous staging, e.g., using positron emission spectroscopy/computed tomography (PET-CT) pretreatment stage stratification, and more complete resection of the primary tumor and associated lymph nodes, the effect from the neoadjuvant therapies is, however, likely to be less. It might lead to a smaller treatment effect in terms of absolute risk reduction but is unlikely to hold for relative treatment effects [6]. However, to identify individuals with operable esophageal cancer who are most likely to benefit from CRT, future trials will need to carefully stratify patients for the stage of disease by use of EUS, high quality CT, and PET scanning.

Finally, no correction for the extra time that CRT brings for patients was made in any of the trials or meta-analyses. For instance, if chemoradiation yields a survival benefit of a few months and it also costs a few months of extra time for recovery compared with surgery alone, the benefit would be completely lost.

Summary and conclusions

The pace of medical research, our increasing need for valid, relevant health care information, and our limited resources to find, appraise and apply this information underscore the need for rigorous reviews to guide health care decisions. There has been exponential growth in systematic reviews, and this has led to an increase in the number of reviews that address similar therapeutic problems and that yield discordant results [3].

Meta-analyses of evidence can be criticized because of heterogeneity among the trials included in a meta-analysis. Furthermore, the conclusions from meta-analysis depend on the selection of the statistical technique used. This was recognized in the late 1990s. Also the reporting of meta-analyses was frequently shown to be of inferior quality. In 1999, the so-called QUOROM (Quality of Reporting of Meta-Analyses) statement was published to address standards for improving the quality of reporting of meta-analyses of clinical RCTs [11].

The most recently published meta-analysis which includes 10 RCTs showed a significant survival benefit for preoperative CRT in esophageal cancer patients [6]. Can this study convince us to adopt this regime for our patients? In other words: is this meta-analysis of sufficient quality? Does it meet the A1 level of evidence? Studies with an A1 level of evidence can de defined as systematic reviews that have included homogeneous studies with level A2 evidence. Studies that are defined as level A2 evidence are randomized controlled clinical trials of good quality, with an adequate sample size. It is questionable whether the primary RCTs included in that meta-analysis can be considered as such. Most of them were started before 1994, and hence methods for diagnosis, staging, treatment delivery, and outcome measurement reflect clinical practice during that period. Pretreatment staging did not include routine CT scanning in some primary trials, and stage stratification was attempted in very few. Furthermore, trial design issues (effect size justification, statistical power, sample size, and study duration) were not rigorously applied. This has resulted in many small trials being done with negative results.

When we look at histological types, squamous cell cancers are overrepresented in these studies: 70% vs. 30% adenocarcinomas. Given the increasing incidence of adenocarcinomas and the fact that nowadays in the West more than half of the esophageal cancers are adenocarcinomas, the results of these studies have to be interpreted with care. With regard to the adenocarcinomas: these tumors do not show a difference in 2-year survival in one study [8]. A number of studies have been prematurely stopped due to lack of accrual and as such are underpowered.

Publication bias could have an effect, although extensive screening of the literature by some authors, in addition to personal contacts made directly with principal investigators, make this issue less likely. Finally, no correction for the extra time that chemoradiotherapy brings to patients was made in any of the trials or meta-analyses. For instance, if chemoradiation yields a survival benefit of a few months and it also costs a few months of extra time for recovery compared with surgery alone, then the benefit would be completely lost.

Studies have shown that neoadjuvant treatment may downstage the tumor and induce complete pathological response, but only a few trials reported separate survival data on patients who responded to treatment. Although separate analyses have not been performed in the published meta-analyses, the results suggest that neoadjuvant therapy may offer a survival benefit in this group of patients. It is likely that any such improvement would be greater for patients with a complete pathological response. Therefore, the results of other phase III RCTs that are underway may contribute to a better understanding of the role of neoadjuvant treatment for resectable cancer of the esophagus and help to identify patient subgroups who would benefit. The current data also strongly indicate the need for designing future trials considering the clinical difference between adenocarcinomas and squamous cell carcinomas and its potential influence on patient response to therapy.