Introduction

Gastric cancer remains one of the most common malignancies and leading causes of cancer death worldwide [1]. The prognosis of patients with advanced or recurrent gastric cancer (AGC) remains poor, with median overall survival (OS) of only 1 year with commonly used first-line combination chemotherapy regimens (fluoropyrimidine plus a platinum agent with or without docetaxel or anthracyclines) [27]. Trastuzumab, a humanized monoclonal antibody that targets human epidermal growth factor receptor 2 (HER2), has recently been shown to improve the prognosis of HER2-positive AGC [7], although these cases account for fewer than 20 % of all AGCs. Because median progression-free survival (PFS) associated with these first-line chemotherapies is around 6 months and most patients ultimately experience disease progression, development of effective second-line chemotherapy is critical. Several phase II studies of second-line chemotherapy have suggested that taxanes (paclitaxel or docetaxel) or irinotecan can be effective, with corresponding objective response rates (ORRs) of approximately 10–20 %. Recently, a small randomized study suggested that irinotecan improved outcomes in patients with pretreated AGC [8]. Another randomized study that compared docetaxel or irinotecan and best supportive care for AGC patients with one or two previous lines of chemotherapy also showed the survival benefit of salvage chemotherapy (OS, 5.8 vs. 3.8 months) [9].

Correlations between PFS or other endpoints and OS have been analyzed in an effort to identify surrogate endpoints of OS [1015]. A validated shorter-term surrogate endpoint would likely both reduce drug development costs and facilitate the assessment of efficacy [16]. Previously, a literature-based analysis and an individual patient data meta-analysis evaluated PFS as surrogate endpoint for OS in patients with AGC who underwent first-line chemotherapy [14, 15]. However, no corresponding analysis had been done in patients who underwent second-line chemotherapy for AGC. Thus, the goal of the present study was to conduct a comprehensive analysis of the correlation between PFS or other endpoints and OS in patients with AGC who underwent second-line chemotherapy.

Materials and methods

Search for studies

We conducted a literature search for trials through computer-based searches of the Medline database (January 2002 and January 2013) and of abstracts from conference proceedings of the American Society of Clinical Oncology (2002–2012), Gastrointestinal Cancer Symposium (2002–2013), and European Cancer Conference and European Society for Medical Oncology (2002–2012). To avoid publication bias, both published and unpublished trials were identified. Data were gathered as possible from presentations in meeting as well as abstracts.

Search keywords included “gastric cancer” and “second-line chemotherapy.” The search was also guided by a thorough examination of reference lists of original and review articles. No limitation based on language was defined. We included unpublished data if sufficient information on study design, characteristics of participants, interventions, and outcomes was available from an abstract or meeting presentation.

Procedures

The data were abstracted in accordance with the Quality of Reporting of Meta-analyses (QUORUM) guidelines [17]. Prospective trials (single-arm or randomized trials) of chemotherapy for chemotherapy-pretreated adenocarcinoma (metastatic disease or unresectable locally advanced disease or recurrent) of the stomach or gastroesophageal junction were included in the analysis. Because some trials included patients who received experimental treatments as second-line or third-line chemotherapy, these studies were also included. However, we excluded studies in which all patients received experimental treatments as third-line chemotherapy. Trials that compared chemotherapy with best supportive care were also included, as were those that included patients with adenocarcinoma of the distal esophagus. Eligibility was limited to trials that reported data on OS with either or both PFS and TTP. Exclusion criteria included trials designed to assess combined modality treatments, including radiotherapy and surgery (neoadjuvant or adjuvant chemotherapy).

For each trial, the following information was extracted: first author’s name; year of publication or report; trial design; trial region; number of enrolled patients; treatment regimens. The following data were also extracted if reported: previous treatment regimens, and proportion of patients with measurable lesions. For trials with more than two treatment arms, we constructed multiple pairs of each investigational arm and the reference arm.

Statistical methods

For each trial, median PFS, TTP, ORR, disease control rate (DCR; proportion of patients who achieved complete or partial response or stable disease), and OS were abstracted. In the case of randomized studies, hazard ratio (HR) with 95 % confidence intervals (CI) for clinical outcome (PFS/TTP and OS) was also abstracted. If the HR was not provided, we estimated HR and 95 % CI as relevant effect measures directly or indirectly from the given data [18]. The nonparametric Spearman rank correlation coefficient (ρ) was used as a measure of correlation between the median PFS/TTP and OS and of correlation between HR of PFS/TTP and HR of OS. As the number of subject studies was limited, we applied bootstrap resampling [19] using 10,000 bootstrap samples to estimate 95 % CI for correlation coefficients.

To investigate possible reasons for heterogeneity of correlation, subgroup analyses were conducted according to trial region (Asian vs. non-Asian), reported data (old trials; before 2009 vs. recent trials; 2009 or later), status of publication (published vs. presentation only), endpoint for progression (PFS vs. TTP), previous chemotherapy regimens [fluoropyrimidine plus platinum (FP) mandatory vs. not defined], treatment line (second-line only vs. second-line and third-line) and treatment regimens (taxane-based vs. irinotecan-based). In the case of global trials, data were classified as both Asian and non-Asian unless suitable subset analysis results were provided. Median values of each endpoint were calculated, and differences in subsets were evaluated using the Mann–Whitney test. Statistical analyses were performed using STATA ver. 10 (Stata Corp., College Station, TX, USA). All tests were two sided, and p values less than 0.05 were considered statistically significant.

Results

Selection of studies

A total of 640 potentially relevant reports were identified, of which 472 were initially excluded by title view (Fig. 1). After review of the remaining studies, 64 trials were identified as eligible for this meta-analysis, including a total of 75 treatment arms and 4,286 patients (Supplement 1). Forty-four trials were published, and another 20 trials were presentations or abstracts only. Table 1 shows the characteristics of the 64 trials. Only 10 trials were randomized trials (5 phase II and 5 phase III), and 54 were single-arm phase II studies. By region, 39 were conducted in Asia, 23 were conducted in non-Asia regions, and 2 were global studies that included Asia. Sixteen trials included only patients who received a previous regimen that included FP as first-line chemotherapy. Forty-nine trials included only patients with measurable lesions. Forty-one studies described disease progression with previous chemotherapy as inclusion criteria. The most common primary endpoint was ORR (n = 39), followed by OS (n = 10). Only 16 studies assessed tumor response by independent review. Most commonly used regimes were taxanes followed by irinotecan or platinum-based therapy. As a time to event for progression, more studies reported PFS (n = 41) than TTP (n = 23), whereas no trial reported both PFS and TTP. Subset analysis according to region (Asia and non-Asia) was reported in one global phase II trial, and these subset data were accordingly included in analyses that focused on comparing Asian and non-Asian trials.

Fig. 1
figure 1

Selection process for trials. PFS/TTP progression-free survival/time to progression; OS overall survival

Table 1 Characteristics of the 60 clinical trials analyzed in the present study

Results of each endpoint according to subsets

Median value of reported OS among the 64 trials was 7.6 months, and median PFS or TTP was 3.0 months (Table 2). Median OS tended to be longer in Asian trials than in non-Asian trials (8.1 vs. 6.0 months; p < 0.001). In contrast, median PFS or TTP were not significantly different when comparing Asian and non-Asian trials (3.0 vs. 3.1 months; p = 0.19). Unpublished trials were associated with longer OS than published trials (8.1 vs. 6.7 months; p = 0.02). No other subset analysis showed significant differences in OS or PFS/TTP. Median reported ORR and DCR were 17.9 % and 53.8 %, respectively. DCR tend to be higher in trials of second-line only therapy when compared with trials of second- and third-line therapy (p = 0.09), although no other subset showed significant differences in DCR.

Table 2 Results of each endpoint according to subsets

Correlation between PFS or TTP and OS

Median PFS or TTP and OS were moderately correlated (ρ = 0.56, 95 % CI 0.34–0.74; Fig. 2; Table 3). The correlation tended to be stronger with PFS (ρ = 0.65) than with TTP (ρ = 0.28), stronger in non-Asian trials (ρ = 0.74) than in Asian trials (ρ = 0.37; Fig. 3; Table 3), and stronger in trials with second-line and third-line chemotherapy (ρ = 0.47) than in trials of second-line therapy only (ρ = 0.77). The correlation was almost similar when comparing published trials vs. presentation only (ρ = 0.52, ρ = 0.60).

Fig. 2
figure 2

Correlation between median progression-free survival/time to progression (PFS/TTP) and overall survival (OS). Size of gray markers (circles) corresponds to the number of randomized patients in the trial in this analysis. Median PFS or TTP and OS were moderately correlated (r = 0.51, 95 % CI 0.31–0.71)

Table 3 Correlation between PFS/TTP, ORR, DCR, and OS
Fig. 3
figure 3

Correlation between median PFS/TTP and OS according to trial area. The correlation tended to be stronger in non-Asian trials (ρ = 0.74) than in Asian trials (ρ = 0.37)

Correlation between ORR, DCR, and OS

The ORR and DCR was not strongly correlated with OS (ρ = 0.38 for ORR, 95 % CI 0.16–0.61; ρ = 0.54 for DCR, 95 % CI 0.33–0.75; Fig. 4), although DCR was more strongly correlated with OS when compared with ORR vs. OS in the whole cohort or any subset (Table 3).

Fig. 4
figure 4

Correlation between objective response rate (ORR) or disease control rate (DCR) and OS. ORR and DCR were not strongly correlated with OS (ρ = 0.38 for ORR, 95 % CI 0.16–0.61; ρ = 0.54 for DCR, 95 % CI 0.33–0.75)

Correlation between HR for PFS/TTP and OS in randomized trials

A total of 11 pairs of HRs for PFS/TTP and OS between treatment arms were available from the 10 randomized trials (reported in 9 trials and estimated in 1 trial). The HR of PFS/TTP and OS in each arm showed a low correlation (ρ = 0.36, 95 % CI −0.30 to 1.00; Fig. 5). Wide 95 % CI indicated that the sample sizes were too small for this type of analysis.

Fig. 5
figure 5

Hazard ratio (HR) of PFS/TTP and OS in ten randomized studies. The HR/TTP of PFS was moderately correlated with OS in each arm (ρ = 0.36, 95 % CI −0.30 to 1.00)

Discussion

This is the first study to evaluate the correlation between PFS, TTP, or other endpoints and OS in patients with AGC who underwent second-line chemotherapy for AGC. Our results suggests that PFS/TTP, ORR, and DCR did not correlate sufficiently with OS to be used as surrogate endpoints for OS in patients with AGC who underwent second-line chemotherapy. We should interpret our results cautiously because this study is of exploratory nature and has the following several limitations. (1) Our analysis is based on literature-based data without individual patient data. (2) Most of the included studies were single-arm studies, and only ten of the studies were randomized trials. (3) Little information was available about subsequent treatment including crossover treatment, which may weaken the surrogacy. Against these limitations, we consider that our work could convey important aspects with regard to the trial conduct and data collection for the future trials of second-line therapy for advanced gastric cancers.

Previously, two meta-analyses studied whether PFS could be a surrogate endpoint for OS in patients with AGC who underwent first-line chemotherapy [14, 15]. According to a literature-based analysis of 36 randomized trials [14], median PFS or TTP moderately correlated with median OS (ρ = 0.70). The correlation coefficient between HR of PFS or TTP and OS was 0.80. Another meta-analysis called the GASTRIC project (Global Advanced/Adjuvant Stomach Tumor Research through International Collaboration) analyzed data from 4,102 AGC patients included in 20 randomized trials [15]. The correlation between treatment effects on PFS and OS in each trial was only moderate (trial-level decision coefficient R 2 adjusted for estimation errors was 0.61), which is the same strength of relationship seen in the literature-based analysis [14]. Correlations between PFS and PS were lower for AGC than for those in patients with advanced colorectal cancer [10] or for those seen in studies of adjuvant treatment for colorectal cancer or gastric cancer [20, 21]. These results suggest that PFS is not a good surrogate for OS in patients undergoing first-line chemotherapy for AGC.

Recently, it has been suggested that second-line chemotherapy prolonged the OS of patients with AGC, according to two randomized studies [8, 9]. Therefore, we conducted a literature-based analysis of endpoint of clinical trials patients who underwent second-line chemotherapy for ACG. The present analysis showed that there was an insufficient correlation between OS and other endpoints, which is similar to data observed in the first-line setting. There are several possible reasons for these results. First, heterogeneity of treatment, especially in terms of subsequent chemotherapy, may affect the results. In this analysis, median PFS was almost the same when comparing Asian trials and non-Asian trials, whereas OS was significantly longer in Asian trials when compared with non-Asian trials. One possible reason for this difference in survival after progression is the effect of subsequent treatment, as already suggested in the first-line setting [22]. Indeed, the proportion of patients who receive subsequent chemotherapy is higher in Asian trials than in Western trials [22, 23]; in the AVAGAST (a study of bevacizumab in combination with capecitabine and cisplatin as first-line therapy in patients with AGC) study, 66 % of Asian patients received second-line chemotherapy compared with 31 % of patients in Europe and 21 % in America [23]. Although the proportion of patients who can receive subsequent therapy is expected to be lower in second-line trials than in first-line trials, 40 % of patients in Korean randomized studies received subsequent therapy after second- and third-line chemotherapy [9]. Also, in the West Japan Oncology Group (WJOG) 4407 study, which compared irinotecan and weekly paclitaxel as second-line chemotherapy, more than 70 % of patients received third-line chemotherapy in both arms [24]. Therefore, subsequent therapy may contribute to the difference in OS according to trial area and confound the correlation in the current analysis, similar to the phenomenon seen in a previous analysis [14].

Another possible reason of moderate correlation of PFS and OS may be heterogeneity in inclusion criteria and patient characteristics. Types of prior chemotherapy before enrollment or investigational agents were quite variable in this population. Also, the definition of failure of prior chemotherapy varied between source studies. Although subset analysis according to prior treatment or treatment regimens did not show a strong correlation between each endpoint, these heterogeneities may contribute to the weak correlation between each endpoint in our analysis. Further, although most studies included patients with measurable lesions, the Japan Clinical Oncology Group (JCOG) 0407 study included patients with peritoneal metastasis, which is associated with a low frequency of measurable lesions [25]. By contrast, the WJOG4007 study excluded patients with apparent peritoneal metastasis [24]. These variations in inclusion criteria might affect the results of correlation.

Although this study showed that there was an insufficient correlation between OS and all endpoints examined, the correlation between ORR and OS was much weaker than that between PFS, TTP or DCR, and OS. These results suggest that a single-arm phase II study with a primary endpoint of ORR may not be adequate to evaluate the efficacy of second-line chemotherapy for AGC. Randomised phase II studies that compare standard treatments and investigational treatments may be better methods of screening for effective treatments to include within phase III trials [26].

This study has several methodological limitations. First, as already described, most of the component studies were single-arm studies, and only ten of the studies were randomized trials. Although there is no consensus in terms of what defines a valid surrogate endpoint, any candidate endpoint must correlate with the true endpoint, and effects on the surrogate endpoint must correlate with those on the true endpoint [27, 28]. However, the effect of each treatments on the surrogate endpoints may be difficult to analyze in this case, as there were relatively few randomized trials available. Second, the present study was not based on an analysis of data from individual patients, which is a confirmatory method of evaluating individual-level measures of agreement between the two endpoints (PFS/TTP and OS) [29]. Additional individual data analysis, especially using ongoing randomized studies, might therefore be necessary to characterize the surrogacy of endpoints. Finally, most trials analyzed in this study provided little information on disease progression after prior chemotherapy, and only a few studies evaluated patient responses by external review. Also, interval to evaluation imaging is also varied. Therefore, it is impossible to confirm whether the evaluation of disease progression was consistent among the trial arms.

In conclusion, our exploratory analysis suggests that PFS/TTP, ORR, and DCR do not correlate sufficiently with OS to be used as surrogate endpoints in patients with AGC who have undergone second-line chemotherapy. Further research is needed based on individual patient data from ongoing randomized trials to evaluate an optimal surrogate endpoint.