Introduction

The incidence of gastric cancer has been decreasing; however, it is still one of the leading causes of cancer-related deaths worldwide due to its poor prognosis [1]. Guidelines recommend adjuvant therapy for advanced and curatively resected gastric cancer, mostly stage II and III, by the robust evidence from each clinical trial: INT 0116 in the USA, 2001 [2]; MAGIC in Europe, 2006 [3]; ACTS-GC in Japan [4, 5]; CLASSIC in Korea and China, 2012 [6]; SAMIT in Japan, 2014 [7]. Surgical procedures and adjuvant chemotherapy regimens were different among the trials; the specific recommendations are also different with no global standard. Also, prognostic factors except for tumor staging have not been established with the increased number of elderly patients and various standard adjuvant chemotherapy. Recent Meta-analysis suggested that tegafur/gimeracil/oteracil (S-1)-based chemotherapy and capecitabine plus oxaliplatin (CAPOX) are likely to be the most effective adjuvant treatments for patients with resected gastric cancer [8].

Since comparing treatments by a new clinical trial takes a long time, it is important to identify prognostic and predictive markers among clinicopathological factors using the individual patient data (IPD) from precious large randomized controlled trials. Recently, the West and East have collaborated to standardize cancer staging and surgical procedures to dissect lymph nodes. D2 is a systematic removal of lymph nodes around the stomach and pancreas to prevent loco-regional recurrence, and the three Asian trials included patients who received gastrectomy with D2 dissection. We aimed to identify prognostic markers and predictive markers in adjuvant chemotherapy for gastric cancer by a pooled analysis of the three large Asian randomized trials: ACTS-GC, CLASSIC, and SAMIT trials (ESM eTable 1) [9], which included patients who received D2 gastrectomy. Using each patient's background, treatment, and progress, we could identify useful clinical biomarkers for treatment selection to support patients with advanced gastric cancer.

Methods

Study design and patients

We sought IPD from the three large randomized clinical trials in gastric cancer that compared surgery followed by adjuvant therapy with surgery alone [6], or different adjuvant therapies [7]. Pooled analysis was aimed to identify prognostic markers among the clinical and pathologic factors in surgery alone groups, as well as in adjuvant groups; and to identify predictive markers of efficacy of adjuvant chemotherapy and specified regimens in adjuvant groups. Study design and patient selection were performed as initially published protocol [9] and conformed to PRISMA IPD guidelines [10].

All patients from ACTS-GC, CLASSIC, and SAMIT were included in the overall analysis. The statisticians had access to deidentified data of the primary analysis population of each trial; N = 1059 in ACTS-GC, N = 1035 in CLASSIC, and N = 1433 in SAMIT. We requested information about patient characteristics, tumor characteristics, treatments, date of surgery, randomization, therapy, recurrence, and the last visit to be alive or dead.

The data-sharing agreement is effective among the primary investigators, a statistician, and the sponsors of each trial. Written informed consent was obtained from all participants in each trial. The protocol of the pooled analysis was approved by the institutional review board in Tsuboi Cancer Center Hospital.

Patient outcome measurements

The primary endpoints and observation periods were different among the trials; we selected the following endpoints considering the accuracy of data. In the pooled analysis, the primary endpoint was relapse-free survival (RFS), which was measured as the time between the date of randomization and the date of recurrence of the original gastric cancer or death from any cause, whichever came first. The secondary endpoint was overall survival (OS), which was calculated from the date of randomization until date of death from any cause. Surviving patients were censored at the date of last visit.

Potential prognostic factors and predictive factors

Factors screened for their prognostic value and predictive value were age, sex, Body Mass Index (BMI), time since surgery, histologic subtype, the extent of the primary tumor (T stage), the occurrence of lymph node metastases (N stage), and stage of disease, which were potentially correlated with RFS and OS.

Since the three studies categorized stage of disease according to different guidelines, ACTS-GC and SAMIT using Japanese Classification of Gastric Carcinoma [11], and CLASSIC using American Joint Committee on Cancer/Union Internationale Contre le Cancer (AJCC/UICC) [12], we unified the classification according to the sixth edition of AJCC/UICC TNM stage [12], deriving from the T stage and N stage.

Statistical analyses

Identification for prognostic markers

After combining surgery alone groups of ACTS-GC and CLASSIC as one dataset, prognostic markers for RFS or OS were identified by stratified Cox regression model that contained age, sex, BMI, time since surgery, histologic subtype, T stage, N stage in the model and stratified by trial. Since TNM stage derived from T stage and N stage, we generated a separate model using TNM stage replacing T stage and N stage. The marker was regarded as prognostic if the p value from the stratified Cox regression model was less than 0.05. The hazard ratios (HRs), 95% confidence intervals, and p values of the models were estimated. The prognostic markers identified were validated in second, adjuvant groups of ACTS-GC, CLASSIC, and SAMIT by fitting stratified Cox regression with the same covariates but stratified by trial and treatment [13].

Identification for predictive markers of efficacy of adjuvant chemotherapy

Predictive markers for adjuvant chemotherapy were identified by examining heterogeneity in treatment effects on RFS or OS, i.e., surgery + adjuvant vs. surgery alone, in combined patients of ACTS-GC and CLASSIC. Patients, nevertheless, staged I and IV were excluded from analysis for predictive markers since CLASSIC trial did not recruit such subgroups. The HRs, 95% confidence intervals, and p values of the treatment groups were estimated by stratified Cox regression using trial as the stratified factor. Tests for treatment–subgroup interactions were examined by stratified Cox regression including a treatment–subgroup interaction as a covariate and trial as the stratified factor.

Identification for predictive markers of efficacy of specific regimens

RFS and OS were compared between the treatment groups, i.e., CAPOX in CLASSIC vs. S-1 in ACTS-GC, and subgroups defined based on the clinical and pathological factors. S-1 group in SAMIT was excluded, since the regimen was different from ACTS-GC. The HRs, 95% confidence intervals, and p values of the treatment groups were estimated by Cox regression adjusted for clinical and pathological factors. Tests for treatment–subgroup interactions were examined by Cox regression including a treatment–subgroup interaction and clinical and pathological factors as covariates. Combining the pathological factors, regimen selection models were made.

General

Categorical data were tabulated with frequencies and percentages. Medians and ranges (minimum–maximum) were used to summarize continuous variables. Distributions of clinical and pathological factors common in the three trials were described as well and were compared across trials by ANOVA or Fisher exact tests. Missing data were substituted using the multiple imputation method. All reported p values were two-tailed and p values lower than 5% (p < 0.05) have been reported, except for interaction p values for predictive biomarkers (p < 0.15) to identify the efficacy trend and deviation in subset analyses. Statistical analyses were conducted by academic statisticians using SAS version 9.4 (SAS Institute, Cary, NC, USA), and R software (R × 64 3.5.1).

Results

Analysis and characteristics

The integrity of IPD of the three trials was verified and analyzed as in Fig. 1. In the eligible 3527 patients who had been registered and analyzed in the main papers, 3527 and 3521 data were available for OS and RFS, respectively.

Fig. 1
figure 1

Individual patients’ data disposition. *Available for multivariate analyses

ESM eTable 1 showed a comparison of trials’ characteristics. All three studies were conducted in East Asia, and eligibility criteria were similar. Patients were recruited from 2001 to 2009, at least 18 years of age, and treated by D2 gastrectomy. Slight inconsistency appeared at the period of follow-up in two trials, in which publications showed patients were followed up for at least 60 months, but the data we received were followed up for 2–8 months less than the data in the publications.

The patients’ characteristic distributions were slightly different among the trials except for gender (Table 1). For patients’ backgrounds, median age was younger, BMI was higher, and time since surgery was longer in CLASSIC as compared with ACTS-GC and SAMIT trials. For tumors, undifferentiated type ratio was higher in CLASSIC; T3, T4, N3, and stage IV ratios were higher in SAMIT as compared with the other two trials due to the different eligibility criteria (ESM eTable 1).

Table 1 Baseline characteristics of the patients among three clinical trials

Prognostic factors

In “surgery alone” groups, age was the only significant prognostic factor except for tumor stagings (Table 2). HRs for patients over 65 were 1.42 and 1.35 for RFS and OS, respectively. Also, there was a trend for better survival for female patients. Tumor stagings: T, N, and TNM were all significant prognostic factors. These prognostic factors were almost similar in the “adjuvant” groups; HRs for older groups were 1.16 and 1.29 for RFS and OS, respectively (Table 3).

Table 2 Prognostic factors of surgery alone groups in ACTS-GC and CLASSIC
Table 3 Prognostic factors of adjuvant groups in ACTS-GC, CLASSIC and SAMIT

Predictive factors for adjuvant therapy and regimens

Adjuvant therapy was more favorable among all clinical and pathological factors evaluated, compared to surgery alone, for both RFS and OS (Fig. 2). Predictive factors for adjuvant therapy were BMI in RFS and T stage in OS with less than 0.15 interaction p values. Adjuvant was effective in low- to middle-BMI patients, with HR less than 0.83 in both OS and RFS, while it failed to show efficacy for high-BMI group. For tumor stages, adjuvant was effective for every T and N stage for RFS and OS; however, OS HRs tended to be lower in T1–2 than in T3–4, 0.56 and 0.77, respectively (p = 0.07 for interaction). There was no difference between histological subtypes.

Fig. 2
figure 2

Efficacy of adjuvant vs. surgery alone in ACTS-GC and CLASSIC

T N stage and histologic type all showed heterogeneity in efficacy of CAPOX vs S-1, but TNM staging did not (Table 4). The median HRs for RFS and OS were between 0.59 and 0.70 and almost significant in T1–2, N2–3, and differentiated type groups; while median HRs in N0 group were high as 2.19 and 1.68, but not significant.

Table 4 Predictive factors for adjuvant regimens: CAPOX vs. S-1 in ACTS-GC

Combined predictive models

According to the predictive factors (Table 4), T stage, histologic type, and N stage were categorized into 12 tumor types (Table 5). HRs, CAPOX vs S-1, were classified into three groups; high: < 0.75, middle: 0.75–1.33, and low: > 1.33 for estimated efficacy. Mostly significant blue HRs for RFS and OS were in T1–2, differentiated and N1–3 groups between 0.29 and 0.45. Yellow HRs were in T3–4 and N0–1 groups between 1.03 and 2.70; although not significant.

Table 5 Subgroup identification for adjuvant regimens: CapeOX vs. S-1

Discussion

The pooled analysis suggested that age was a significant prognostic factor both in surgery alone and adjuvant group patients; elderly patients over 64 years tended to have a poor prognosis. These HRs tended to be lower in adjuvant groups; and in the predictive factor analysis, age was not associated with heterogeneity in treatment effects for RFS and OS. Obesity, nutrition status, and sarcopenia have been reported to be significant prognostic factors [14, 15]. These nutritional and muscle status are age and cancer related [16]. Our study could not identify those markers, BMI was slightly associated with survival; HRs were higher in low-BMI groups, and lower in high-BMI groups compared with medium-BMI group. Combining those factors, aged and low-BMI patients may have the worst prognosis. Gastric surgery significantly affects nutrition and body weight, perioperative nutritional control is needed for those patients [17, 18].

On the other hand, high-BMI group did not benefit from adjuvant therapy with HRs closed to 1.0. In contrast, low- and medium-BMI groups benefited from adjuvant therapy with median HRs between 0.55 and 0.67. Also, age and sex showed no heterogeneity in treatment effects between adjuvant and surgery alone groups. For obese patients with serious comorbidities, strategy for adjuvant therapy should be decided considering the risk and lower benefits.

Tumor types: pTN stage and histology were possible predictive markers for CAPOX and S-1 with different trend among the factors; but combined TNM staging was not predictive. Every tumor has these three factors, and it is difficult to select the best regimens through the three different HRs. So, we categorized the tumors into 12 histological subtypes for a regimen selection model. Interestingly, the present study clearly demonstrated that there was a significant interaction between S-1 and CAPOX for T, N, and histological types. Especially, CAPOX was effective for differentiated T1–2 tumors with lymph node metastases. On the other hand, S-1 seemed to have high efficacy for T3–4 and N0–1 tumors. Recently, Yoshida et al. reported that addition of docetaxel to S-1 is effective in patients with stage III [19]; it might be the best choice for undifferentiated T3–4 tumors with lymph node metastasis. The administration period for CAPOX and S-1 is half a year and one year, respectively, also their toxicities are different. It may be possible to use T, N, and histologic types properly; but it is necessary to select a regimen considering such efficacy and the toxicity, administration period, and patient characteristics.

Our study has some limitations. For prognostic factors, inclusion of SAMIT might have increased statistical power, but at the same time, the direct comparison between surgery and adjuvant groups was difficult. For predictive factors, since the background factors and the tumor classifications in CLASSIC and ACTS-GC do not completely match, we may not exclude the possibility of unknown confounding factors. Also, the analyzed datasets did not exactly match the original papers’ data for follow-up periods, which tended to be slightly shorter; and there were slight discrepancies between RFS and OS HRs. The subset analyses were not all accurate and precise due to the lower number of patients for some groups such as obese patients, differentiated T3–4 N0, 2–3 tumors. There were certain tendencies for each category; we need further study to verify the effectiveness of those small subgroups.

Conclusion

The IPD analysis suggests that age is a significant prognostic factor both in surgery alone and adjuvant group patients. CAPOX is more effective for differentiated T1–2 tumors with lymph node metastasis. It may be possible to use T, N, and histologic types properly to select a regimen; while it is necessary considering such efficacy with each toxicity, administration period, and patient characteristics.