Introduction

The standard treatment for locally advanced esophagogastric adenocarcinoma in Europe is either preoperative chemotherapy or chemoradiotherapy [13]. However, the majority of patients are nonresponsive [4]. Based on this information, response-based treatment stratification would be of utmost interest.

Despite an enormous number of studies about predictors of response and prognosis in esophagogastric adenocarcinomas, no molecular marker can be used in clinical routine to tailor treatment apart from HER2 expression in the palliative setting [5]. Three different types of response evaluation have been studied: morphological response evaluation by histopathology, metabolic response evaluation by functional imaging, and clinical response evaluation by conventional imaging modalities. Histopathological response is regarded as a reference method according to recent studies [6]. A clear disadvantage is that information about histopathological regression can only be achieved after resection, and thus can only be used as a prognostic marker. The use of the early metabolic response evaluation by FDG-PET is restricted as well, due to limitations of FDG-avidity in gastric cancer, the limited availability, and the missing validation in multicenter trials [7].

Clinical response evaluation was used to describe the effects of neoadjuvant treatment for more than 10 years; however, it is still not widely accepted as it is judged to be investigator dependent. Indeed, the data on clinical response assessment are conflicting: judgments range from calling it an important prognostic factor to regarding it as senseless information [8, 9]. One of the reasons for this uncertainty is that the association of clinical response with histopathological response is not well studied. Furthermore, clinical response evaluation can be performed at different time points with different criteria for response and after different treatment regimens (chemotherapy/chemoradiotherapy). Especially after chemoradiotherapy, the value of a restaging with computed tomography (CT) scan and/or endoscopic ultrasound seems to be limited, because discrimination between residual tumor and post-therapeutic changes (edema, scar) is difficult [8, 9,].

To dissolve some of the controversies in this field our exploratory study was aimed at evaluating clinical response in a large patient cohort with respect to later histopathological response and prognosis, with emphasis on the question of whether nonresponding patients can be identified by clinical response defined by a combination of CT scan and endoscopy.

The same clinical response assessment was sequentially applied in a second independent patient population. Additionally, in the second population, an interim response evaluation after 4–6 weeks of chemotherapy was performed and tested for its correlation with subsequent histopathological response, preoperative clinical response and prognosis.

Patients and methods

Out of 954 patients with esophagogastric adenocarcinomas (esophagus, gastroesophageal (GE) junction, stomach) treated with neoadjuvant chemotherapy, 860 patients with pretherapeutic and post-therapeutic CT scan and endoscopy followed by resection were included in this study. The initial tumor categories were cT3/4 and cN0/+. Data were documented in a prospective database.

We retrospectively analyzed 686 patients from the Klinikum Rechts-der-Isar, Technische Universität München, Germany between 1987 and 2007 (cohort A). For validation, we analyzed the data of corresponding patients (n = 184) of the Surgical Department, University of Heidelberg, Germany from 2007 to 2011 (cohort B). Additionally, in 118 patients of cohort B, an interim evaluation of clinical response after 4–6 weeks of chemotherapy was performed (Fig. 1).

Fig. 1
figure 1

Flow chart

Preoperative staging

Preoperative staging consisted of a CT scan as well as upper gastrointestinal endoscopy in all patients. Endoscopy and CT scan were repeated after the end of chemotherapy. In a subgroup of patients, an additional staging was done after 4–6 weeks of chemotherapy within study protocols (interim assessment).

Clinical response assessment

Clinical response was evaluated and standardized after chemotherapy, before surgery, by the respective interdisciplinary tumor boards.

A decrease of the maximal transversal primary tumor diameter of > 50 % measured on CT and a decrease of the endoluminal tumor size of >75 % as visualized by endoscopy were classified as clinical response [10]. Both criteria had to be fulfilled for being categorized as a clinical responder. These criteria have been used in previous studies. A detailed description of clinical response evaluation is presented in Supplemental Table 1.

Chemotherapy

Chemotherapy was performed with one of the following chemotherapy regimes: OLF/PLF, consisting of at least 6 weeks of either oxaliplatin 85 mg/m2 or cisplatin 50 mg/m2 on days 1, 15, 29 (1 h infusion time) and folinic acid (500 mg/m2 over 2 h) plus fluorouracil (2000 mg/m2 over 24 h) on days 1, 8, 15, 22, 29 and 36, all repeated on day 49. Patients aged 60 years or younger with a good health status were additionally given paclitaxel (80 mg/m2 over 3 h) on days 0, 14 and 28.

In Heidelberg, most patients (63 %) were treated with EOX: epirubicin 50 mg/m2 (day 1), oxaliplatin 130 mg/m2 (day 1), and capecitabin 1,250 mg/m2 (days 1–21), all repeated on day 22. Other delivered regimens were PLF (see above) and FLOT: oxaliplatin 85 mg/m2 (day 1), docetaxel 50 mg/m2 (day 1), folinacid 200 mg/m2 (day 1), and 5-fluoruracil 2,600 mg/m2 (day 1), all repeated on day 15.

Surgery

In patients with esophageal cancer, either an abdominothoracic approach with intrathoracic anastomosis including a two-field lymphadenectomy, or a transhiatal esophagectomy with cervical anastomosis was performed. In patients with carcinoma of the esophagogastric junction, we did a transhiatal extended gastrectomy and a D2-lymphadenectomy. For patients with tumor localization in the middle or distal third of the stomach, we performed a total gastrectomy with D2-lymphadenectomy, and for distal gastric tumors, a subtotal gastrectomy including a D2-lymphadenectomy.

Histopathological workup and regression analysis

Histopathological workup was done by pathologists experienced in upper gastrointestinal cancer. Tumors were classified according to the TNM classification 6th edition (Munich) and according to the TNM classification, 7th edition (Heidelberg). Regression was classified using the Becker regression score [11]: tumor regression grade (TRG) 1a (complete regression) and 1b (< 10 % residual tumor) are classified as histopathological response.

Follow-up

Most patients were followed on an outpatient basis of the Surgical Department, Klinikum Rechts der Isar, Munich or the National Center for Tumor Diseases, Heidelberg. Patients who were not followed in one of these departments were contacted by phone to obtain follow-up data.

Statistical analysis

SPSS 17.0 (IBM, Inc. Chicago) was used for statistical analysis. Median survival times were calculated using the Kaplan–Meier method. Survival times are counted in months from time of diagnosis to death, differences were determined with the log-rank test. Univariate and multivariate analysis was done by Cox stepwise proportional hazard model. To determine the correlation between different parameters, we used the Chi square-test, and Spearman correlation coefficients were calculated to quantify bivariate correlations. For diagnostic value of clinical response with respect to histopathological response and R0-resection, we calculated sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV). A p value of less than 0.05 was considered as statistically significant.

Results

A total of 860 patients from two centers were included in the study.

Cohort A (Munich)

686 patients (548 male, 138 female) were included, with a median follow-up of the surviving patients of 51 months (5–204); 356 patients (52 %) died during follow-up. Median survival of the overall population was 40 months.

Clinical response was present in 207 patients (30 %). Median survival of this subgroup was 108 months with a 3-year and 5-year survival of 71 % and 59 %, respectively. In contrast, patients without clinical response to chemotherapy (n = 479, 70 %) had a median survival of 27 months (3-year and 5-year survival, 43 and 33 %, respectively) (Table 1a) (Fig. 2). One hundred and seventy-seven patients (26 %) were histopathological responders. The median survival of responders is not reached, in contrast to a median survival of 27 months of histopathologically nonresponding patients (p < 0.001).

Table 1 Patients’ characteristics and prognostic factors (cohorts A and B)
Fig. 2
figure 2

Clinical response and survival in center A

All patients’ characteristics of cohort A, including survival times and 3-year and 5-year survival rates, are summarized in Table 1a.

Association of clinical response with histopathological response and R-category

The accuracy between clinical response and histopathological tumor regression was 85 % for nonresponders and 52 % for responders (Chi square, p < 0.001). Sensitivity for predicting a histopathological response was 60.5 %, specificity 80.2 %, PPV 51.9 % and NPV 85.2 %. Clinical response also correlated significantly with an R0 resection status (Chi square, p < 0.001), Sensitivity 35.7 %, Specificity 87.0 %, PPV 89.4, NPV 30.7 % (Table 2).

Table 2 Correlation between clinical response and histopathological tumor regression, correlation between clinical response and R-category (cohort A)

Impact of tumor localization on response evaluation

Clinical response was statistically significant for prognosis in all localizations, whereas histopathological regression was only significant in AEG I and II. Response rates and prognostic impact decreased from proximal to distal (Table 3).

Table 3 Prognostic impact of clinical response with respect to tumor localization

Impact of the respective time periods on response evaluation

Clinical and histopathological responses were both stable prognostic factors over the different time periods (Table 4).

Table 4 Prognostic impact of clinical and histopathological responses with respect to the different time periods

Multivariate analysis

Multivariate analysis (forward conditional hazard model) included grading, Laurén’s subtype, clinical response, ypT-category, ypN-category, R-category, lymphangiosis and TRG. Clinical response was an independent prognostic factor (Nonresponse: HR for death 1.4, 95 % CI 1.0–1.8, p = 0.032). Other independent factors were R-category, lymphangiosis carcinomatosa, ypT-category and ypN-category (each p < 0.001) (Table 5).

Table 5 Multivariate analysis (forward conditional hazard model)

Chemotherapy regimens

447 patients (65.2 %) of the patients were treated with a platinum containing chemotherapy, 137 patients (20.0 %) received additionally taxanes, 54 patients (7.9 %) received additionally epirubicine and 48 patients (6.9 %) received various regimen. Taxanes-containing regimens had the longest survival (median 108.0 months), followed by platinum based regimens (37.2 months) and others (34.0 months). The shortest survival could be observed after treatment with epirubicin (19.5 months) (p < 0.001). The type of chemotherapy did not correlate with clinical or histopathological response.

Cohort B (Heidelberg)

One hundred and eighty-four patients (150 male, 34 female) were analyzed with a median follow-up of 29 months of the surviving patients. Eighty-two patients (45 %) died during follow-up, median survival was 33 months.

Clinical response was evident in 24 %. The median survival was not reached in clinical responders in contrast to 27 months in nonresponders (p = 0.003) (Fig. 3). In contrast to cohort A, TRG was not significantly associated with survival (p = 0.312) (Table 1b).

Fig. 3
figure 3

Clinical response and survival in center B

Association of standard preoperative and interim clinical response evaluation

An interim response evaluation was performed in 118 patients. 72 % were classified as nonresponders, 27 % as responders. The percentage of concordant cases as determined by interim and standard preoperative evaluation was 94 % for responders and 93 % for nonresponders (p < 0.001). Two patients were first classified as responders and later as nonresponders (2 %), five patients were first classified as nonresponders, and later as responders (4 %) (Table 6). Sensitivity of interim response evaluation with respect to preoperative evaluation was 84.4 %, Specificity 97.6 %, PPV 93.3 %, and NPV was 94.3 %. Interim response was also significantly associated with survival (p = 0.008) (Table 1b).

Table 6 Correlation between interim clinial response and standard clinical response, correlation between clinical response and histopathological tumor regression, correlation between clinical response and R-category (cohort B)

Association of clinical response with histopathological response and R-category

Correct prediction of histopathological response was 92 % for nonresponding patients and 50 % for responding patients (p < 0.001), Sensitivity with respect to histopathological response was 66.7 %, Specificity 85.4 %, PPV 50 %, and NPV was 92.1 %.

98 % of clinical responders had a R0-resection compared to only 70 % of nonresponders (p < 0.001), Sensitivity with respect to R0-resection was 30.5 %, Specificity 85.4 %, PPV 97.8 %, and NPV was 30.0 % (Table 6).

Multivariate Analysis

Significant factors (included factors: clinical response, ypT-category, ypN-category, R-category) were ypT- (p = 0.015) and ypN-category (p < 0.001). Clinical response failed to reach statistical significance (p = 0.536).

Chemotherapy regimen

In the Heidelberg cohort, 25 patients (17.1 %) had platinum-based regimens, 112 patients (76.6 %) additionally had epirubicin, and eight (5.5 %) additionally had taxanes, while one patient received a different chemotherapy. The type of chemotherapy did not influence survival (p = 0.360), but patients having been treated with platinum containing chemotherapy had a significantly higher clinical response rate (48 versus 37.5 % with taxanes and 19.6 % with epirubicine, p = 0.022), and no association was found for histopathological response.

Discussion

Clinical response to preoperative chemotherapy was assessed as a combination of endoscopy and CT scan in a large patient series in two academic centers. It was shown that clinical response assessment is feasible and that it has a strong correlation with histopathological response and survival. Even an interim response evaluation seems to have a significant prognostic impact.

Of note, the goal of this study was not mainly to correctly identify patients with a complete histopathological response, but instead to test the hypothesis that nonresponse and poor prognosis can be assessed by clinically available tools. A major limitation of our study is its retrospective exploratory design, despite the clinical response assessment being documented prospectively and preoperatively without knowledge of the final histopathological workup. Furthermore, no separate documentation of results of endoscopy and CT scan are available, so the accuracy of the different staging methods cannot be analyzed. Detailed data were documented only for subgroups [10]. Additionally, no interobserver variability can be reported, since clinical response was evaluated in an interdisciplinary tumor board. Another drawback is the limited number of patients who had interim clinical response evaluation. It could also be criticized that AEG I, II, III tumors and gastric cancer were analyzed together, as studies showed that these different subtypes may have a different prognosis [12]. Nevertheless, we think that an analysis within all patients with esophagogastric adenocarcinoma is justified. Like in the MAGIC trial, these different tumor localizations are often treated following the same principles [1, 2], and in most centers the treatment regimens do not differ at the moment, and patients with chemoradiotherapy were excluded [3].

Only in the Munich subgroup did chemotherapy regimens have influence on prognosis. Patients treated additionally with taxanes had the best prognosis, which is in line with literature [1315]. However, in the neoadjuvant setting, no superiority could be shown for taxanes [16], and no randomized data are available so far. The association of merely platinum-based regimens with clinical response might be influenced by the small subgroups, because normally, higher response rates are expected for triple chemotherapy regimens, especially for taxane-containing regimens [1721].

Our subgroup analysis showed a decreasing response rate and prognostic impact from proximal to distal, similar to recent published data [12]. However clinical response remained statistically significant in all localizations, in contrast to histopathological regression.

It was often assumed that clinical response assessment is too investigator dependent, and therefore results may not be reliable enough in unexperienced hands. We cannot rule out that this is true, as three experienced investigators (L.F., B.M., O.K.) were identical in both centers and their experience and methodology were transferred from center A to B in 2007. Nevertheless, the combined data from widely available diagnostic tools (CT and endoscopy) are very promising. The correct prediction of a histopathological response remains difficult, but the correct prediction of a histopathological nonresponse was high, with 85 % in center A and 92 % in center B. Furthermore, 94.3 % of the clinical nonresponders after interim assessment remained nonresponder after the end of the full chemotherapy. Consequently, a later response to chemotherapy seems to be rare and one may assume that identification of nonresponse by interim assessment is possible. Future studies should test the hypothesis that if ineffective chemotherapy is withdrawn early or modified, prognosis is not impaired.

Clinical response was strongly associated with prognosis in both centers. The prognosis of clinical responders despite low association with final histopathological response was excellent, with 108 months median in center A and not reached in center B, and the prognosis of nonresponders was nearly identical with 27 months and 27 months, respectively. This points out that histopathological response is only one of the potentially available surrogate parameters for prognosis [4, 22], despite that it is judged as the gold standard to date. However, up to 30 % of the histopathological responders die due to recurrence [4, 22]. The rate of clinical response was not high, but realistic with a percentage of 30 and 24 %, which reflects the data obtained for histopathological response after chemotherapy [6].

In a study by the Cologne group [8], 32 % of the patients were estimated as complete responders after chemoradiotherapy, and 35 % as partial responders. This seems to be overestimated and might explain the missing predictive value for histopathological response and prognosis within this particular study.

In the last two decades, the simple clinical response evaluation was almost abandoned and all efforts were put into molecular and metabolic response evaluation or prediction [7, 2325]. Despite all efforts, no molecular marker gained relevance in the preoperative setting to tailor treatment. Only for FDG-PET in AEG I and II was a metabolic response based treatment algorithm shown to be feasible and meaningful in prospective studies [2426]. However, the value of FDG-PET has not yet been reproduced in multicenter trials. Only few studies on clinical response evaluation exist so far. They were mostly retrospective, included small patient numbers and produced conflicting results. Most studies were designed to predict a complete histopathological response and failed [8]. They concluded that clinical response evaluation after preoperative chemoradiotherapy in esophageal cancer had no place in clinical practice, and no patient with complete response should be harmed by denying a surgical resection due to an accuracy of only 47 % for a subsequent histopathological complete response [8]. As mentioned, our intention was different, and for our purpose, clinical response seems to be feasible. Additionally, our evaluation was after chemotherapy only, which might minimize treatment-related changes like fibrosis or inflammation, and might render response evaluation more accurate. The interim clinical response evaluation may not be possible as early as by FDG-PET after 2 weeks [2325], but seems to be realistic after 4–6 weeks.

To exclude a relevant bias caused by technical development over time, we analyzed the prognostic impact of response during the different time periods. We found clinical response to be a stable predictor of prognosis.

Our criteria for response may also be discussed, as they are mainly based on the former World Health Organization (WHO) classification, not on the Response Evaluation Criteria In Solid Tumors (RECIST) criteria. We aimed to keep the evaluation as simple as possible and continue with established criteria [7, 23, 27]. Overall, our criteria are in line with the published standard criteria [28]. The largest wall diameter in CT scan as a predictor for response after preoperative therapy has been proven to have prognostic relevance in other studies [29]. In contrast, more technically demanding volume-based evaluations by CT scan have rarely been studied and show conflicting results after 2 weeks of therapy [30, 31]. Admittedly, in endoscopy, despite our attempt of quantification, no exact measurement is possible and the results could be biased based on the experience of the endoscopist. However, endoscopic response has also shown to be of prognostic relevance in the literature [7, 23, 32].

Although clinical response evaluation has taken a back seat for more than two decades, our data based on a combination of endoscopy and CT scan are very promising. Clinical response has been shown to be significantly associated with histopathological response and with survival. Especially nonresponding patients can be identified with high accuracy. Additionally, an interim clinical response evaluation seems to be feasible, which might allow for a tailored preoperative treatment algorithm in the future for patients with esophagogastric adenocarcinomas.