Introduction

Combination chemotherapy is the treatment of choice for patients with advanced or metastatic gastric or gastroesophageal carcinoma because of the survival benefit demonstrated by previous clinical trials and meta-analyses [1, 2]. The availability of multiple active drugs, including fluoropyrimidines, platinum, anthracyclines, taxanes, and irinotecan, encourages clinical trials of combination chemotherapy for both advanced diseases and for adjuvant/neoadjuvant therapy [3]. However, heterogeneity issues in clinical trial design and patient population, which are important confounding factors of gastric cancer trials, are not adequately addressed in previous analyses [4].

The first source of heterogeneity is a lack of international consensus on the optimal chemotherapeutic regimens. Cisplatin-fluoropyrimidine combinations are used in many international clinical trials as the control arm treatment, but the types (parenteral or oral) and dosage of fluoropyrimidines vary widely. Many studies were done to compare the efficacy, safety, and pharmacological characteristics of different fluoropyrimidine regimens, but the optimal regimen remains undetermined [58]. Addition of epirubicin or docetaxel to the platinum–fluoropyrimidine combination is common in Europe and the USA because of the survival benefit demonstrated by randomized clinical trials [911], but it is still under debate whether the survival benefit justifies the additional toxicity of these three-drug combinations.

A second source of heterogeneity is the regional and ethnic difference in the clinical outcome and treatment strategies [12, 13]. The incidences of different subtypes of gastric cancers (intestinal type vs. diffuse type, non-cardiac location vs. cardiac/gastroesophageal junctional location) differ between Asian and non-Asian populations. These different subtypes of gastric cancers are associated with different risk factors, clinical presentation, and treatment outcomes [1417]. Therefore, clinical trials conducted in different regions of the world may enroll patients with heterogeneous biological backgrounds that make the trial results not comparable. In addition, the incidences of genetic polymorphisms of genes involved in drug metabolism, including those for fluoropyrimidines, taxanes, and platinums, also differ between Asian and non-Asian populations [1820]. These differences may result in different drug disposition, tolerance to the same chemotherapeutic agents, and preference of Asian and non-Asian investigators in choosing the optimal combinations.

Meta-analysis, which combines results from multiple studies to estimate the treatment outcome, is usually used to explore the heterogeneity issue. Ideally such exploration should be based on individual patient data obtained from the original clinical trials, but collection of individual patient data is time-consuming, and pertinent patient data may not be recorded in a standardized way and are usually not available from all relevant trials [2124]. Meta-analysis based on aggregated patient data, on the other hand, is more feasible to perform, and the heterogeneity issue is analyzed by sub-group analysis. The major drawback of this approach includes loss of statistical power due to decreased sample size in each subgroup and inability to measure the potential interactions between different patient and treatment variables.

A third approach is to perform meta-regression based on aggregated patient data. Meta-regression combines the meta-analytic and linear regression methods to detect the existence and direction of the association between patient and treatment variables and treatment outcome. It can analyze simultaneously the effects of multiple pertinent variables on the outcome as well as the potential interaction among these variables using all the available observations together. Therefore, meta-regression is more efficient statistically than conventional sub-group analysis.

To address these heterogeneity issues in gastric cancer trials, we performed a systematic review on recently published randomized trials of systemic chemotherapy as first-line therapy for gastric or gastroesophageal carcinoma. The purpose of this study is to identify pertinent factors that can help predict safety and efficacy of combination chemotherapy by using meta-analysis and meta-regression, which were done according to the Cochrane guidelines [25]. The potential confounding effects of chemotherapeutic regimens and regional/ethnic difference in treatment strategies were explored.

Methods

Databases and searches

The electronic databases searched included MEDLINE and PubMed. The search strategy was a combination of the MESH terms ‘stomach neoplasms’ and ‘randomized controlled trial.’ Manual search was done in the reference lists of all identified papers (research articles and review papers) as well as the abstracts presented in the annual meetings and gastrointestinal cancer symposiums of the American Society of Clinical Oncology from 2005 to 2009.

Study selection

The studies were selected for review if they fulfilled the following inclusion criteria: (1) randomized controlled trials published since 2005 that enrolled patients with advanced or metastatic gastric or gastroesophageal carcinoma, and (2) trials testing the safety and efficacy of systemic chemotherapy as the first-line treatment. Trials published before 2005 have been summarized in previous meta-analyses [1, 10]. In addition, the standard of supportive care may change over the years, and this change may confound the interpretation of the meta-analysis results.

Data extraction and synthesis

Two authors (C.H. and Y.C.S.) did the literature search, data extraction, evaluation, and summary independently. Any disagreement between them was resolved through discussion. The data of the following patient and treatment variables, if available, were extracted from the published reports: patient age, male percentage, geographic regions (Asian vs. non-Asian trials), performance status, tumor stage at enrollment (unresectable vs. recurrent disease after surgery, locally advanced vs. metastatic disease), percentage of diffuse-type histology, percentage of gastroesophageal junctional location, percentage of patients with two or more organs involved, prior treatment before enrollment, types of chemotherapeutic agents studied, and percentage of patients who received second-line chemotherapy. A quality score was generated for each study based on the Cochrane guidelines (Online Resource Supplementary Table 1), with higher quality scores indicating poorer study quality.

Patients who received fluoropyrimidines or platinums were further categorized to explore the potential difference between different dosing regimens of the same type of chemotherapeutic agents. Fluoropyrimidine regimens were categorized as conventional 5-fluorouracil (5FU) regimens (e.g., 5FU 800–1,000 mg/m2/day continuous infusion for 5 days, every 4 weeks or 200 mg/m2/day continuous infusion), weekly or biweekly infusional high-dose 5FU (e.g., 5FU 2,600 mg/m2, 24-h continuous infusion, every 1–2 weeks), and oral fluoropyrimidines (capecitabine or S-1). Platinum regimens were categorized as cisplatin and oxaliplatin.

Each treatment arm in a trial was considered an individual treatment group to compare with one another. The safety endpoints analyzed included the percentage of grade 3–4 neutropenia, febrile neutropenia, diarrhea, nausea/vomiting, and mucositis because they were reported in most of the clinical trials. The efficacy endpoints analyzed include 6-month progression-free survival (PFS) rate and 1-year overall survival (OS) rate.

Statistical analysis

Statistical analysis proceeded in two steps [26]. First, meta-analysis was conducted with the consideration of both fixed and random effects models, respectively. Then, meta-regression was performed to identify the most pertinent variables that predicted safety and efficacy. Variables incorporated in the meta-regression analysis included the following: patient age, male percentage, geographic regions, trial quality score, performance status, tumor stage at enrollment, percentage of diffuse-type histology, percentage of gastroesophageal junctional location, percentage of patients with two or more organs involved, prior treatment before enrollment, types of regimens (single-agent vs. 2-drug combination vs. 3-drug combination), types of chemotherapeutic agents, and percentage of patients who received second-line chemotherapy.

In the meta-regression models, the parameter estimates of each factor indicate the influence (the effect size) of each factor on the outcome. For categorical variables (such as the geographic regions), the parameter estimates indicate the effect size with the presence of individual variables. For continuous variables (such as the percentage of patients who received second-line chemotherapy), the parameter estimates indicate the effect size with each incremental unit (e.g., percentage) of the particular variables. The generalized additive models were used to detect the potential non-linear effects of continuous variables if necessary [27].

Basic model-fitting techniques for variable selection, assessment of goodness-of-fit, and regression diagnostics were used to assure the quality of analysis results, as described before [28, 29]. The meta-analysis and meta-regression were done using the Cochrane Review Manager (RevMan) software, version 4.2 (Oxford, UK), the SAS statistical software (version 9.1.3, SAS Institute Inc., Cary, NC, USA), and the R statistical software version 2.6.1 (The R Project for Statistical Computing, http://www.r-project.org/). Two-sided P value ≤0.05 was considered statistically significant.

Results

Characteristics of studies included in meta-analysis

The data extraction process and the selection of studies were shown in Fig. 1. Twenty-five trials, enrolling 6,792 patients in total, were eligible for meta-analysis [3053] (Table 1, Online Resource Supplementary Table 2). Eight trials were performed in Asian countries and 13 in Europe or in the USA. Four trials were performed in both Asian and non-Asian population. The percentage of Asian patients enrolled was reported in three trials (54% [30], 0.8% [31], and 66% [35], respectively).

Fig. 1
figure 1

Study flow chart of the data extraction process and selection of studies for meta-analysis

Table 1 Characteristics of randomized controlled trials included for meta-analysis

There were no significant differences between the Asian and non-Asian trials in terms of patient age, male percentage, performance status, tumor stage at enrollment (unresectable vs. recurrent disease after surgery, locally advanced vs. metastatic disease), percentage of organs involved, and treatment before enrollment. Asian trials generally do not enroll patients with gastroesophageal junctional carcinoma, while non-Asian trials enrolled a mean of 18.8% patients with gastroesophageal junctional carcinoma. In trials that reported the percentage of patients with Lauren’s diffuse-type histology (15 trials) and the use of second-line chemotherapy (11 trials), Asian trials reported significantly higher percentage of diffuse-type histology (mean 57.3 vs. 22.0%, P < 0.001) and more common use of second-line chemotherapy (mean 72.1 vs. 33.9%, P = 0.001) than non-Asian trials.

The 25 trials analyzed consisted of 56 treatment groups. The ToGA trial aimed to evaluate the efficacy of adding trastuzumab to chemotherapy in human epidermal growth factor receptor 2 (HER2)-positive advanced gastric or gastroesophageal junction cancer [30]. Only the patients in the control group, who received chemotherapy alone, were included in this meta-analysis. The choice of regimens differs significantly between Asian and non-Asian trials (see Online Resource Supplementary Table 3). Asian trials used single-agent therapy more frequently, while a three-drug combination was tested in only one treatment group. For the individual chemotherapeutic agents, Asian trials did not incorporate anthracycline as first-line therapy and used platinum less frequently than non-Asian trials.

Safety

Asian trials reported lower incidences of grade 3–4 neutropenia (19.3 ± 15.8 vs. 35.5 ± 25.5%, P < 0.0001 by two-sample test weighted by the patient number in each treatment group), febrile neutropenia (6.7 ± 7.1 vs. 11.2 ± 9.7%, P = 0.049), diarrhea (7.4 ± 6.5 vs. 8.3 ± 6.6%, P = 0.042), and nausea/vomiting (6.7 ± 7.1 vs. 11.2 ± 9.7%, P < 0.0001). The forest plots of meta-analysis on grade 3–4 neutropenia and diarrhea are shown in Fig. 2. The meta-regression models indicate that geographic region (Asian vs. non-Asian trials) is an independent predictor of neutropenia, febrile neutropenia, and diarrhea (Table 2, Online Resource Supplementary Table 4). After controlling other patient or treatment factors, Asian trials are associated with 8.2% lower incidence of grade 3–4 neutropenia (P < 0.0001), 2.1% lower incidence of grade 3–4 diarrhea (P < 0.0001), and 2.2% lower incidence of febrile neutropenia (P = 0.03). The difference in the incidence of nausea/vomiting results mainly from the different chemotherapeutic agents used (Supplementary Table 4).

Fig. 2
figure 2

Comparison of safety of systemic chemotherapy for advanced gastric cancer between the Asian and non-Asian trials

Table 2 Predictors of safety of chemotherapy for patients with advanced gastric or gastroesophageal carcinoma

The use of a three-drug combination, compared with a single-agent or two-drug combination, increased the incidence of grade 3–4 neutropenia by 14.8%. Chemotherapeutic agents that independently increase the risk of grade 3–4 neutropenia include cisplatin, irinotecan, and taxanes. A notable exception is the use of weekly or biweekly high-dose infusional 5FU, which is associated with a 12.3% lower incidence of grade 3–4 neutropenia, compared with other fluoropyrimidine regimens. Irinotecan independently increases the risk of grade 3–4 diarrhea by 16.4%, compared with regimens not using irinotecan. Effects of other agents on the incidence of diarrhea, while statistically significant, are relatively small.

Treatment efficacy

The reported 1-year OS rates are significantly higher in Asian than non-Asian trials (45.0 ± 10.3 vs. 35.5 ± 10.6%, P < 0.0001 by two-sample t test weighted by the patient number in each treatment group). By contrast, the 6-month PFS rates are significantly lower in Asian than non-Asian trials (35.7 ± 11.8 vs. 41.0 ± 7.3%, P = 0.004). The forest plots of meta-analysis on 1-year OS and 6-month PFS are shown in Fig. 3. However, the meta-regression analysis does not identify geographic region as an independent predictor of either 1-year OS or 6-month PFS rates (Table 3).

Fig. 3
figure 3

Comparison of efficacy of systemic chemotherapy for advanced gastric cancer between the Asian and non-Asian trials

Table 3 Predictors of treatment efficacy of chemotherapy for patients with advanced gastric or gastroesophageal carcinoma

Our meta-regression models indicate that two-drug and three-drug combination chemotherapy improves 6-month PFS rates, compared with a single-agent regimen, by 17.3 and 25.0%, respectively. This finding may partly explain the better PFS rate reported by non-Asian trials, in which two-drug or three-drug combinations are more commonly used. As for individual chemotherapeutic agents, the use of high-dose infusional 5-FU, oral fluoropyrimidines, or taxanes is associated with independent improvement of PFS. A higher percentage of patients with gastroesophageal junctional carcinoma independently predicts a poor PFS rate. The meta-regression models indicate that the 6-month PFS will decrease by 4% for every 10% increase in patients with gastroesophageal junctional carcinoma.

The preliminary meta-regression analysis suggested that a higher percentage of patients with good performance status (ECOG score 0 or 1) was associated with poorer 1-year OS, which is contradictory to previous reports [54]. Further meta-regression analysis by fitting a generalized additive model indicated that the data of ECOG performance status had a non-linear effect on the 1-year OS rate. A few trials having the highest percentage (>95%) of patients with good performance status happened to report relatively poor 1-year OS (Online Resource Supplementary Figure). After adjusting for this non-linear relationship in our meta-regression analysis, the ECOG performance status loses its independent predictive value. The final meta-regression model (Table 3) indicates that higher quality scores (indicating poor trial quality) and high median age are independent patient/trial factors that predict poor 1-year OS. While individual agents used in first-line chemotherapy may help predict OS, the meta-regression models indicate that a higher percentage of patients who received second-line chemotherapy is an independent predictor for a better 1-year OS rate. The 1-year OS rate will increase by 10% for every 10% increase in patients who received second-line chemotherapy. A graphic representation of the effects of second-line chemotherapy on 1-year OS is shown in Fig. 4.

Fig. 4
figure 4

Correlation between the use of second-line chemotherapy and 1-year overall survival rate. The size of individual circles represents the number of patients in each treatment group

Discussion

Geographic region (Asian vs. non-Asian) has long been considered an important confounding factor for interpreting the results of clinical trials of systemic chemotherapy for advanced gastric cancer. The present study systematically explored the impact of different geographic regions, including the potential difference in patient characteristics and treatment patterns, on safety and efficacy reported by gastric cancer clinical trials. After controlling all the patient and treatment factors, geographic region remains an independent predictor of treatment safety in terms of chemotherapy-induced neutropenia and diarrhea. Future clinical trials of gastric cancer should consider the geographic region as a stratification factor to control potential bias.

The most recent example illustrating the confounding effects of geographic region on clinical trial results is the AVAGAST trial comparing the effects of chemotherapy with or without bevacizumab [55]. In that trial, Asian patients who received chemotherapy alone apparently had better overall survival (median 12.1 months) than the European (median 8.6 months) or the American counterparts (median 6.8 months). Two thirds of Asian patients received second-line treatment upon tumor progression, while only 31% of European patients and 21% of American patients did. While bevacizumab plus chemotherapy produced significantly better PFS and objective response rate, the benefit of bevacizumab on overall survival was only seen in American patients. These results support our findings that geographic regions and the associated confounders must be carefully evaluated in the design of clinical trials for gastric cancer.

One important issue in clinical trial design is the selection of the primary endpoint. Although overall survival is the gold standard, it is difficult for a new treatment to demonstrate overall survival advantage as first-line therapy when multiple options of second-line therapy are available. This issue is even more complicated in international clinical trials when the clinical practice of second-line therapy varies as widely as is seen in advanced gastric cancer. The use of surrogate endpoints, including PFS or time to tumor progression, to evaluate the efficacy of first-line treatment has been extensively studied in trials of colorectal and breast cancers, and data from many meta-analyses indicate that PFS is a good endpoint to evaluate the efficacy of first-line chemotherapy [5658]. Standardization of evaluation of tumor progression is needed to validate the usefulness of these surrogate endpoints in trials of systemic therapy for gastric cancer [59].

Another important issue in clinical trial design is the selection of the optimal regimen as control. Our meta-regression models indicate that no specific types of chemotherapeutic agents are clearly superior to others in terms of improvement in PFS, consistent with previous meta-analyses [1, 10, 60]. While multi-drug combinations produce higher PFS, the selection of chemotherapeutic regimens should also take into account treatment safety and impact on the patients’ quality of life. The independent predictive value of geographic region on treatment safety strongly indicates an ethnic or genetic basis of the different safety profiles. Future international clinical trials should incorporate more detailed biomarker and pharmacogenetic studies to explore the optimal regimens for different ethnic groups.

Pharmacogenetic differences in drug targets and drug-metabolizing enzymes have been found to play important roles in both the efficacy and safety of fluororpyrimidines [61, 62]. The potential impact of pharmacogenetic factors must be kept in mind when interpreting and extrapolating results comparing different fluoropyrimidine regimens. For example, capecitabine has shown overall survival benefit in randomized trials compared with conventional 5-FU infusion (200 mg/m2/day by continuous infusion or 800 mg/m2/day for 5 days, every 3 weeks). S-1 has shown benefit in PFS and response rate compared with conventional 5-FU infusion (800 mg/m2/day for 5 days every 4 weeks). Regarding the potential benefit of weekly or biweekly infusion of high-dose 5-FU and leucovorin, our previous studies suggested a pharmacodynamic basis for its higher response and lower toxicity compared with conventional bolus injection of 5-FU. The serum concentrations achieved by the high-dose infusional 5-FU schedule can suppress thymidylate synthase, the target enzyme of 5-FU, more sustainably, while the bone marrow concentrations achieved by this regimen did not produce significant toxicity to myeloid progenitor cells [63, 64]. The impact of pharmacogenetics on this regimen is not known. Comparative studies of genetic polymorphisms for drug targets and drug-metabolizing enzymes in Western and Asian patients should be incorporated into future clinical trials to better understand differences in toxicity and efficacy of fluoropyrimidines in Asian and non-Asian patients.

There are several important limitations of this study. First, because the control arm of the trials analyzed are very heterogeneous, it is difficult to compare the efficacy of different combination regimens by conventional meta-analysis. That is why we used the treatment arm-based approach. Second, many important factors, such as the histological subtypes, tumor extent at the start of chemotherapy, and content of second-line therapy, were not reported in many of the clinical trials analyzed. This significantly limits the numbers of factors that can be analyzed by meta-analysis and meta-regression. Third, meta-analysis based on aggregated patient data will suffer from the ecological bias. Future studies based on individual patient data are needed to verify the predictors identified in this study.

In conclusion, geographic region (Asian vs. non-Asian) plays an important role in the heterogeneity of clinical trials of gastric cancers and is an independent predictor of safety of systemic therapy for gastric cancer.