Introduction

Advanced gastric cancer has a relatively poor prognosis after failure of standard treatments, and thus effective treatment options for refractory disease are needed [1]. The INTEGRATE trial (ANZCTR 12612000239864) was undertaken in 147 evaluable patients to help address this unmet need by evaluating the activity and safety of regorafenib, an orally administered multikinase inhibitor with anti-angiogenic properties that had previously been shown to be effective in treating other solid tumors [2, 3]. INTEGRATE demonstrated that regorafenib had substantial activity on the primary endpoint of progression-free survival (PFS) [4]. That promising result suggests a phase III trial is warranted; however, a well-informed decision about the merit of such a trial should include an appraisal of the positive efficacy signals relative to the impact of treatment side effects on quality of life (QoL).

Previous trials of regorafenib have reported skin reactions, fatigue, hypertension, and diarrhoea as among the most common side effects [5]. Adverse events occurring more frequently in the regorafenib arm of INTEGRATE similarly included skin reactions and hypertension, but an increase in the incidence of fatigue and diarrhoea was not readily apparent [4].

Given its demonstrated activity, regorafenib has the potential to positively affect QoL in terms of reducing some disease symptoms via effective cancer control, as was found for chemotherapy in advanced gastric cancer [6, 7], but negatively affects QoL through side effects such as skin rash. Understanding the combined effect of these factors is important to determine whether plausible gains on clinical endpoints have the potential to be overshadowed by substantial reductions on overall QoL. On-treatment QoL information may also help clinicians improve care by highlighting key symptoms and the side effects of greatest relevance. A better understanding of the prognostic value of the baseline information QoL collected in INTEGRATE could furthermore help clarify the potential of QoL information as a supplement to clinical indicators used to inform estimates of likely survival time.

The general aim of the INTEGRATE QoL substudy was to generate preliminary evidence that would help inform the planning of a phase III trial. The primary objective was to compare the QoL impact of regorafenib versus placebo. The secondary exploratory objectives were to identify the more common troublesome QoL problems reported and to assess whether baseline QoL was prognostic for survival.

Methods

Details of the design of the randomized placebo-controlled INTEGRATE trial have been published previously [4]. Using the EORTC QoL Questionnaire (QLQ-C30) with the gastric cancer module (STO22) [8] and the EQ-5D [9], QoL was assessed at baseline and every 4 weeks until discontinuation of study treatment. English-speaking participants were invited to also complete the patient disease and treatment assessment (PTDATA) form [10], which includes relevant items not covered in the other instruments (e.g., rash, light-headedness, headaches, sore hands/feet, drowsiness). The protocol was approved by the human research ethics committee of each participating institution.

Analysis of the PTDATA form

The proportion of patients experiencing troublesome symptoms, or troublesome impacts on general aspects of QoL, was calculated across the post-baseline assessment period (based on the worse grade reported during the on-treatment period) and compared between treatment arms using logistic regression, adjusting for baseline in exploratory analyses. Symptoms were defined as troublesome if they were rated with an intensity of 3 or more points relative to the scale in which 0 = “no trouble at all” and 10 = “worst I can imagine.” Reductions in general aspects of QoL (e.g., physical well-being) were defined as troublesome if a 3-point decrement from the optimal score of 10 was reported (i.e., a rating of ≤7 points relative to 0 = “worst possible” and 10 = “best possible”).

Analysis of EORTC and EQ-5D questionnaires

A mixed model for repeated measures was applied to the scales from the EORTC and EQ-5D instruments. The models included the relevant baseline score, treatment allocation, time point, and a treatment allocation-by-time point interaction as covariates. We also tested whether treatment effects varied by region by fitting the corresponding interaction term. In a sensitivity analysis, a multiple imputation (MI) strategy was used to account for each patient who discontinued treatment before supplying any post-baseline information. Such patients were assumed to have unfavorable QoL at the time of discontinuation (i.e., the values were not missing at random), and an imputation model was specified that drew random values with uniform intensity from the unfavorable half of each QoL scale. The imputation process was repeated to construct three augmented data sets, and the analysis result from each was appropriately combined to yield MI estimates [11, 12].

A deterioration-free survival (DFS) endpoint was constructed as a marker of overall net clinical benefit of treatment [13, 14]. This endpoint was defined as the time until the first of the following events occurred: a 10-point deterioration in health status from baseline (without subsequent 10-point improvement compared with baseline), disease progression, death, or treatment discontinuation. The 10-point QoL criteria was specified to reflect a clinically important QoL reduction [15]. Two DFS endpoints were derived using different markers of health status deterioration based on the EORTC QLQ-C30 as per the report by Au et al. [14]: one used the Physical Function scale (DFSPF), and the other used the General Health scale (DFSGHS). The distribution of DFSPF and DFSGHS was estimated using the Kaplan–Meier approach. Hazard ratios were estimated using Cox regression, and the proportional hazards assumption was tested.

An exploratory analysis of the prognostic (for survival) value of baseline information from these instruments was undertaken using Cox proportional hazards regression, adjusting for treatment allocation. Baseline QoL scores were classified as high or low relative to the median to form binary indicator variables for inclusion in the proportional hazards regression models. p values from log-rank tests stratified for treatment allocation were produced, in addition to those based on standard Wald tests, as these remain valid when the proportional hazards assumption is violated. Associations were also tested, as part of a sensitivity analysis, using multivariate models that adjusted for ECOG performance status plus the following previously reported univariate prognostic factors [4]: number of metastatic sites, baseline vascular endothelial growth factor (VEGF)A, and neutrophil-to-lymphocyte ratio (NLR).

Results

Of the 147 eligible randomized patients, 142 consented to participate in the QoL substudy, 136 completed a baseline QoL assessment, and 95 completed at least one post-baseline QoL assessment (see Fig. 1). For these 95 patients, the mean age was 61 years; 18% were women; 42% were from Korea; and the primary site was “esophago-gastric junction” for 33% and “stomach/other” for 67%.

Fig. 1
figure 1

Flowchart is truncated at week 16 because information was sparse beyond this point; nevertheless, all eligible QoL questionnaires (i.e., even those beyond week 16) were included in analyses. Note that 3 of the REG (regorafenib) patients with a missing assessment at week 4 completed the week 8 assessment. The N = 66 REG patients in the ‘post-baseline QoL cohort’ (Table 1) thus comprises the N = 63 completing week 4 assessments (Fig. 1) plus these 3

Although QoL form completion rates to treatment discontinuation were high, a substantial proportion of patients had discontinued treatment because of disease progression by the time of the first post-baseline assessment. Furthermore, as PFS was longer in the regorafenib arm, a differential arose between groups in the availability of post-baseline QoL information. No post-baseline QoL information was available for 41% of placebo patients compared to 29% of regorafenib patients. The patients contributing baseline QoL information, and those contributing post-baseline QoL information, were generally representative of the full set of all eligible INTEGRATE patients (Table 1). The likelihood of contributing no post-baseline data was higher for patients with ECOG performance status score 1 (versus 0), with liver metastases, and with distant lymph node metastases (Table 1).

Table 1 Baseline characteristics

PTDATA form

Of the subset of English-speaking participants invited to complete the PTDATA form, the most common troublesome symptoms reported at baseline were fatigue (54%), pain (44%), drowsiness (39%), poor appetite (41%), anxiety (33%), trouble sleeping (31%), and altered sense of taste (30%) (eAppendix Table 1). Patients also commonly reported troublesome impacts across all aspects of QoL measured by the PTDATA form at baseline. The incidence of troublesome symptoms, and impacts on aspects of QoL, was numerically higher across the groups during the post-baseline period compared to baseline. For both groups combined, there was an increase of more than 25% post baseline in the incidence of troublesome levels of diarrhoea, trouble sleeping, shortness of breath, fatigue, sore mouth or throat, and poor appetite. Adjusting for baseline, there was some statistical evidence that the post-baseline incidence of troublesome cough was lower in the regorafenib arm (p = 0.004), and that the incidence of troublesome sore mouth or throat was higher in the regorafenib arm (p = 0.05); however, this evidence is weak given the multiple comparisons performed. There was no statistical evidence of other post-baseline differences between the groups on the PTDATA form after adjusting for baseline.

EORTC and EQ-5D questionnaires

At baseline, the more intense symptoms measured by the EORTC instruments for both groups combined were anxiety (mean = 44.5), fatigue (mean = 38.7), body image (mean = 32.8), and appetite loss (mean = 32.3) (see Table 2). Although the repeated-measures models were fitted to all data available for the EORTC and EQ5-D assessments, the reliability of the modeled estimates decreased over time as the number of patients contributing information to the analysis declined. For each treatment group, the time point at which at least 10% of the baseline numbers remained in the analysis set was chosen as a limit beyond which results are not presented. This cutpoint corresponded to week 8 for the placebo group and week 16 for the regorafenib group (Table 2; eAppendix Figures). There was some evidence for increased reporting of diarrhoea for regorafenib at week 4 compared to placebo (24.6 versus 11.4; p = 0.02), but no evidence of differences for other symptom scales at week 4 or 8. The 95% confidence intervals for the group differences provide an indication of the plausible range for the treatment effect. Estimates were not systematically different across geographic regions. The MI estimates were systematically less favorable toward PBO compared to the estimates from the original analysis of available data (results not shown).

Table 2 Estimates from repeated measures analysis of EQ-5D and EORTC instruments

The overall QoL indices, functional scales, and symptom scores tended to worsen for both groups from baseline to week 8. Increases in symptom intensity to week 4 were most prominent for Diarrhoea, Dyspnoea, and Fatigue. In addition, prominent increases to week 8 were observed for Appetite Loss, Insomnia, and Dysphagia by week 8.

The DFS rate was significantly improved with regorafenib (Fig. 2). The hazard ratio (HR) was 0.50 for DFSPF [95% confidence limit (CI) 0.35–0.72; p < 0.0001] and 0.53 for DFSGHS (95% CI 0.37–0.75; p = 0.0002).

Fig. 2
figure 2

Deterioration-free survival (DFS) was defined as the time from randomization until the first of the following events occurred: death, a 10-point deterioration in health status from baseline (without subsequent 10-point improvement compared with baseline), disease progression, or other reason for treatment discontinuation (clinician/patient preference, adverse event, or other). Two DFS endpoints were derived using different markers of health status deterioration: one used the Physical Function scale (a), and the other used the General Health scale (b). PBO placebo,  REG Regorafenib

The results of the analyses investigating the prognostic value of baseline QoL scales for overall survival are shown in eAppendix Table 2. Adjusting for treatment allocation, there was evidence of longer survival: (1) in patients with less intense baseline symptoms of General Pain (HR 0.54; 95% CI 0.37–0.79; p = 0.002), Abdominal Pain (HR 0.51; 95% CI 0.35–0.74; p = 0.0005), Appetite Loss (HR 0.50; 95% CI 0.32–0.78; p = 0.002), Constipation (HR 0.60; 95% CI 0.41–0.88; p = 0.009), and Eating Restrictions (HR 0.69; 95% CI 0.47–1.01; p = 0.05); (2) in patients with better baseline levels of Physical Function (HR 0.62; 95% CI 0.42–0.91; p = 0.02) and Role Functioning (HR 0.67; 95% CI 0.46–0.98; p = 0.04); and (3) in patients with a better EQ-5D utility score (HR 0.57; 95% CI 0.84–0.39; p = 0.004). There was some evidence of shorter survival among patients with lower levels of Financial Problems, but this was not maintained in the multivariate analyses. The evidence of prognostic value from the multivariate analyses was, furthermore, weaker for the EQ-5D utility, role functioning, and eating restrictions; stronger for nausea/vomiting; and largely unchanged for other variables tested (eAppendix Table 2).

Discussion

There was no compelling evidence from the INTEGRATE trial that regorafenib had a broad negative effect, relative to placebo, across the spectrum of QoL indices evaluated. Other recently published randomized trials of regorafenib have obtained similar findings [16,17,18]. In fact, the DFS rate, a marker of net clinical benefit that amalgamated the risk of death, clinical progression/treatment discontinuation, and self-rated deterioration in health, was approximately halved with regorafenib (Fig. 2). Nevertheless, the plausibility of clinically important reductions, or benefits, in any aspect of QoL cannot be definitively ruled out given the limited size of this phase II trial. For example, the 95% confidence intervals for the treatment effect on EORTC scales (Table 2) often extended beyond into a region indicative of clinically relevant effects (i.e., ±10 points [15]).

The amount of QoL data available for analysis declined sharply post baseline because of the rapid onset of disease progression, with 41% of placebo and 29% of regorafenib patients providing no post-baseline information. Given the likely association between progressing disease and worsening QoL, this differential may have introduced a bias favoring the placebo group. Our conclusion that there was no convincing statistical evidence that QoL was generally worse for the regorafenib group up to the time of progression is therefore likely to be conservative. The results of the MI analysis and the DFS analysis provide further reassurance of this.

The side effects of regorafenib appeared to have had limited impact on QoL and to have not eroded the gains in PFS demonstrated by INTEGRATE; the lack of QoL improvement from reversal of tumor-related symptoms (e.g., pain) is disappointing. This result may be explained in part by the effects of anti-angiogenic therapies in preventing, rather than reversing, tumor growth [19]. This explanation is consistent with the findings from INTEGRATE that showed regorafenib delayed disease progression but had limited activity on RECIST objective tumor response (objective response rate was 3%) [4].

Baseline levels of pain, appetite, constipation, and physical functioning were found to be significant prognostic factors for survival. The prognostic value of QoL information has been highlighted previously in advanced and metastatic esophago-gastric cancer [20, 21] on the basis of analyses performed on pooled trial data. The evidence from these studies, and from INTEGRATE, underscores the potential value of incorporating QoL information into risk assessments when reaching treatment recommendations for individual patients and for cohort stratification in future randomized trials.

In conclusion, regorafenib was associated with a significant improvement in the DFS rate and did not appear to have an excessively negative effect on QoL parameters from toxicity. Accordingly, there was no compelling statistical evidence that regorafenib side effects had a negative impact on QoL that was sufficient to outweigh the PFS gains observed in the INTEGRATE trial. Progressing to a phase III evaluation of regorafenib in patients with advanced and refractory gastric cancer is therefore warranted. That trial, named INTEGRATE II, has recently commenced and will provide more precise evidence on the effect of regorafenib on QoL by virtue of its larger sample size (N = 350). This information will be critical for reliably evaluating any survival gains against the impact of side effects to determine the net clinical effect of regorafenib. Development of a prognostic tool is planned that will incorporates QoL information based on a multivariate analysis of the pooled data from the phase II and III trials.