Introduction

Cancers of the esophagus and stomach are the second and sixth most common causes of cancer-related deaths worldwide [1]. Patients with unresectable locally advanced and metastatic disease can be offered palliative systemic treatment to prolong survival, to offer symptom relief, and to improve or maintain health-related quality of life (HRQoL) [2, 3].

Recognition of the importance of HRQoL is reflected in the fact that HRQoL is assessed in randomized clinical trials (RCTs) with increasing frequency. In the past decades, an increase in HRQoL assessments was shown of 3.6% of trials from all disciplines and 6.7% of cancer trials. However, considerable variation among these HRQoL assessments in the instruments used has been noted, and the reporting of methods and results has often been inadequate [4, 5]. Also, a more recent study has shown that even though more RCT reports are meeting the quality standards for HRQoL assessment, the analysis and reporting of data and the presentation of findings remain highly variable [6]. Inadequate reporting of HRQoL in clinical trials may lead to a loss of valuable information or may even mislead clinical decision-making [7].

In the field of oncological RCTs, Efficace and colleagues showed that the reporting of HRQoL in RCTs of high-incidence diseases (i.e., breast, colorectal, prostate, and lung cancers) has improved over the years [7]. In contrast, in the majority of studies investigating curative treatment for esophageal cancer (a disease with a low incidence), the reporting of HRQoL was limited [8]. The reporting of HRQoL in studies of palliative therapy for advanced esophagogastric cancer has not yet been investigated. Given the limited remaining life span of this patient group, an emphasis on HRQoL is paramount. Therefore, we systematically reviewed the literature to determine the quality of HRQoL reporting in RCTs that involve palliative systemic therapy for patients with esophagogastric cancer. The following research questions were formulated. (1) What is the quality of HRQoL outcome reporting in locally advanced esophagogastric cancer? (2) What aspects of HRQoL reporting require improvement in order to facilitate clinical decision-making? (3) Has the quality of HRQoL reporting improved over time? Given the increasing body of literature regarding patient-reported outcome research, we hypothesized that the reporting of HRQoL in unresectable locally advanced and metastatic esophagogastric cancer had improved over time.

Methods

The PRISMA statement guided the writing of the manuscript. We focused on items that are relevant to the research questions and excluded irrelevant ones (e.g., items related to potential bias with respect to treatment outcomes).

Search methods

The study protocol was not registered in advance. The online databases PubMed, EMBASE, and the Cochrane Central Register of Controlled Trials (CENTRAL) as well as meeting abstracts from the American Society of Clinical Oncology (ASCO) and the European Society for Medical Oncology (ESMO) were searched for RCTs on palliative systemic therapy for advanced esophagogastric cancer up to February 2017. Details regarding this search can be found in Online Resource 1 in the Electronic supplementary material (ESM). Prospective registration databases such as clinical trials.gov were not searched, as our aim was to assess published reports. For the same reason, no contact with study authors was sought for additional information. Titles, abstracts, and full texts were screened by EtV and NHM. Disagreements were discussed with JJvK until consensus was reached.

Study selection

Studies were included that met the following criteria: (1) prospective phase II or III RCT design; (2) unresectable, metastatic, or recurrent esophageal, gastroesophageal junction (GEJ), or gastric cancer; (3) palliative systemic therapy (i.e., chemotherapy and/or targeted therapy); (4) full-text articles published in English; and (5) HRQoL was measured with validated questionnaires. Studies using self-constructed nonvalidated HRQoL questionnaires were not eligible because of their nonreproducible nature or a lack of information regarding their psychometric properties.

Data extraction

Data extraction was conducted by EtV and JJvK using Microsoft Excel. The following baseline characteristics of the included studies and patients were extracted: number of patients enrolled in the study, gender, age, performance status, tumor histology, tumor location, and disease status. The following characteristics regarding HRQoL reporting were extracted: the presence of a hypothesis a priori, the rationale for the HRQoL instrument, psychometric properties, cultural validity, HRQoL domains, instrument administration, baseline compliance, timing of assessments, documentation of missing data, and the clinical significance and presentation of the results in the discussion section. The quality of HRQoL reporting in each article was rated using the Minimum Standard Checklist for Evaluating HRQoL Outcomes in Cancer Clinical Trials checklist [9]. Articles were rated independently by EtV and JJvK. Discrepancies were discussed until a consensus was reached. The checklist consists of eleven items that can be scored as ‘yes’ (one point) or ‘no’ (zero points), and contains four domains: conceptual, measurement, methodology, and interpretation. Two items (‘a priori hypothesis stated’ and ‘cultural validity verified’) could also be evaluated as ‘not applicable’ (N/A) if the study explicitly stated that the HRQoL assessment was intended for exploratory investigations only or if the HRQoL measure was validated in the same population as that of the trial. When RCTs used validated measures for their study population, all items in the measurement domain (i.e., ‘psychometric properties reported,’ ‘cultural validity verified,’ and ‘adequacy of domains covered’) were scored as ‘yes.’ Three mandatory items of the checklist are: ‘psychometric properties reported,’ ‘baseline compliance reported,’ and ‘reasons for missing data reported.’ The checklist classifies the HRQoL reporting into the following categories: ‘very limited’ (score 0–4), ‘limited’ (score 5–7 or ‘no’ on one or more of the mandatory items), and ‘probably robust’ (score 8–11 and ‘yes’ on all mandatory items). Studies classified as ‘probably robust’ are most likely to have an impact on clinical decision-making [9].

Statistical analysis

For statistical analysis, the quality of HRQoL reporting—our main outcome—was expressed as an adjusted checklist score (ACS). The ACS was calculated for each study report by dividing the raw item score by the total number of applicable items. Higher ACS scores imply better quality of reporting. Descriptive statistics were used to gain insight into the quality of reporting. To assess the extent to which the quality of HRQoL reporting has improved over time, the variance and change in the ACS over time was graphically assessed using a scatterplot. Subsequently, a univariate generalized linear regression analysis with a binomial distribution, a logit link function, and robust standard errors was performed. Herewith, the independent variable ‘time’ is expressed as the year of publication and the dependent variable ‘quality of HRQoL reporting’ is expressed as the ACS. Predicted values in our model can range between 0 and 1. In order to investigate other associations of study characteristics with the quality of HRQoL reporting, the following covariates were considered in the regression analysis: (1) whether or not the study reported statistically significant differences in HRQoL results at any scale between arms or within arms over time (no vs yes); (2) if a separate article with HRQoL results was published (no vs yes); (3) the presence of an appendix or supplementary data (no vs yes); (4) intention-to-treat sample size (continuous); (5) type of endpoint (primary vs secondary); and (6) type of therapy line (first vs second vs third). Furthermore, we investigated if there were differences in the median ACS between studies that were published before versus after the publication of the CONSORT-PRO statement [10] using the Mann–Whitney U test. Statistical significance was reached at the 5% level and all analyses were performed using STATA version 14 for Windows.

Results

Literature search

One hundred sixty-four RCTs investigating palliative systemic therapy for advanced esophagogastric cancer were eligible. Among these, 37 unique RCTs (N = 10,887 patients) reported on HRQoL [11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56]. More details regarding the number of studies screened, assessed for eligibility, and included in the review can be found in Fig. 1. Eight studies (21.6%) published HRQoL findings separately. The year of publication of the studies ranged from 1997 to 2017. Major baseline characteristics of the included studies are shown in Table 1.

Fig. 1
figure 1

Flowchart of the studies included

Table 1 Baseline characteristics

Checklist

Conceptual issues

Only nine studies (24.3%) reported an a priori hypothesis, and two studies (5.4%) stated explicitly that the HRQoL assessment had an exploratory nature. No studies provided a rationale for selecting a specific HRQoL questionnaire (see Table 2). Checklist scores per item per RCT are provided in Online Resource 2 in the ESM.

Table 2 Checklist items on the Minimum Standard Checklist for Evaluating HRQoL Outcomes

Measurement issues

All studies used a culturally validated HRQoL questionnaire with previously published psychometric properties and adequate covering of HRQoL domains. The most frequently used questionnaire was the EORTC QLQ-C30 (32 studies, 86.5%) (Table 1). Additional disease-specific questionnaires (e.g., EORTC QLQ-OES18 for esophageal cancer, STO22 for gastric cancer, or OG25 for esophagogastric cancer) were used in addition to the QLQ-C30 (as recommended by the EORTC) in 12 of the 37 studies (32.4%). Five studies (13.5%) used the EQ-5D, four of which were employed in combination with the EORTC QLQ-C30. The Spitzer Quality of Life Index (a proxy-based questionnaire) was used solely in one study (2.7%).

Methodology

In eight studies (21.6%), the authors specified by whom and in which clinical setting the HRQoL instrument was administered. Baseline compliance was reported in 30 studies (81.1%), and the timing schedule of the HRQoL assessments was documented in 36 studies (97.3%). Only 13 studies (35.1%) provided reasons for missing data or the number of patients for whom data were missing during the study.

HRQoL interpretation

In 15 studies (40.1%), the authors addressed the clinical significance of the HRQoL findings. In 22 studies (59.5%), the authors provided any comments on the HRQoL assessment in their study, regardless of the results.

Overall quality of HRQoL reporting

Among the 37 studies, the quality of 4 studies (10.8%) was classified as ‘very limited,’ that of 24 studies (64.9%) was classified as ‘limited,’ and the quality of 9 studies (24.3%) was classified as ‘probably robust.’ Figure 2 shows that the adjusted checklist scores per study varied over time. A high variability in ACS scores over time and also within publication years can be seen. For all studies, the median quality score was 0.55 and ranged from 0.27 to 0.91. Univariate generalized linear regression analysis showed that the year of publication was not associated with an increased ACS (β = 0.004, SE = 0.007, P = 0.57). Moreover, there was no difference between the median ACS scores of studies published before [median ACS = 0.55, interquartile range (IQR) = 0.5–0.64, N = 20) and after (median ACS = 0.55, IQR = 0.55–0.82, N = 17) the publication of the CONSORT-PRO statement, z = 1.12, P = 0.26. Second-line therapy and the publication of HRQoL results in a separate article were found to be significantly associated with the quality of reporting in the multivariate analysis (Table 3). In addition, post hoc analysis showed similar results when the criterion with the lowest score (’rationale for instrument reported’) was omitted (data not shown).

Fig. 2
figure 2

Scatterplot depicting the year of publication versus the adjusted checklist score. The x-axis shows the year of publication, and the y-axis shows the adjusted checklist score. The gray area represents the 95% confidence interval of the mean adjusted checklist score

Table 3 Results of the univariate and multivariate generalized linear regression analysis

Discussion

Although more than half of all the RCTs included in this systematic review were published in the past 5 years, the quality of HRQoL reporting in esophagogastric cancer RCTs involving palliative systemic therapy was limited and did not improve over time. This outcome is independent of the type of endpoint used in the RCT, the usage of supplementary data or appendices in the main publication, and, most importantly, the number of patients in the RCT. The latter indicates that shortcomings in reporting occur in both small and large phase III RCTs. Since larger and otherwise methodologically sound trials are the basis for guideline development and clinical decision-making, we advocate that care should be taken when interpreting HRQoL findings from these trials.

While most included studies report the timing schedule of the HRQoL assessments, describe compliance rates, and use validated questionnaires, the following aspects of HRQoL reporting require improvement: the formulation of a priori hypotheses, a clear description of how the instrument is administered, the interpretation of findings, and the number of missing data as well as how such data are handled (see Table 2). The latter in particular provides valuable information regarding potential bias in HRQoL estimates when there is nonrandom attrition. The importance of reporting missing data is reflected in the checklist, given that it is required before the study can be rated as high quality.

RCTs that presented HRQoL findings in a separate article were significantly more likely to be of better quality than studies that published their HRQoL findings along with the main clinical results. This pattern was also found in the systematic review of Brundage and colleagues [6]. Those authors emphasized that poorer reporting is most likely due to restrictions on manuscript length. Thus, omitting valuable HRQoL data to ensure that the word count is below a particular limit might lead to reporting bias and therefore hamper interpretation and clinical decision-making. Furthermore, publication bias could arise when findings are not significant and/or compliance rates in RCTs are low. Conversely, the publication of HRQoL data separately from the main clinical findings may reduce their clinical impact. For these reasons, one could consider reporting HRQoL findings in an extensive appendix or supplementary dataset along with the main article, so that valuable information regarding both clinical and HRQoL outcomes can be presented within one publication.

As observed previously, we found substantial variability in the quality of HRQoL reporting [4, 6, 7, 57]. The Consolidated Standards of Reporting Trials Regarding Patient-Reported Outcomes (CONSORT-PRO) statement provides detailed information on how to accurately and transparently report HRQoL in RCTs, and is endorsed by prominent journals. The current systematic review suggests that the CONSORT-PRO statement may not have had a significant impact on the reporting of HRQoL findings in esophagogastric cancer yet [10, 57].

Our study has some limitations. First, the Minimum Standard Checklist for Evaluating HRQoL Outcomes in Cancer Clinical Trials was published in 2003 and might not be as extensive as those published later, such as the CONSORT PRO or the ISOQOL-recommended PRO reporting standards (both published in 2013) [10, 58]. The advantage of the checklist used is the predefined scoring system. In addition, the checklist includes the majority of the essential items of the latter published statements and recommendations based on expert consensus by CONSORT PRO and ISOQOL, respectively. The checklist is based on a minimum set of criteria, whereas the CONSORT-PRO or the ISOQOL-recommended PRO reporting standards elaborate more extensively on different aspects of HRQoL assessments. Extensive tools may be more sensitive to change, which means that the results in the current study might be an underestimation of the true change that occurred over time [7].

Second, the search strategy was limited to reports in English. Consequently, we might have failed to include RCT reports published in other languages—thus limiting our international scope. However, since the major phase II/III trials are published in English, we believe the risk of language bias to be low.

Third, RCTs scored particularly poorly on the item ‘rationale for the instrument used.’ The validated EORTC QLQ-C30 questionnaire is most frequently used in esophagogastric cancer, and this can be regarded as the ‘standard’ HRQoL instrument in esophagogastric cancer RCTs. Therefore, devaluing a RCT for not stating a rationale for the instrument used may be an excessively strict approach, as the EORTC questionnaire is consistently applied in order to permit fair comparisons between trials. However, post hoc analysis showed that the results were not different when the criterion ‘rationale for instrument reported’ was omitted. It should be emphasized that when authors use a newly developed or less frequently applied questionnaire, they should state the rationale.

Finally, one might dispute the interpretation of the outcome (ACS) as an interval scale. We adhered to the general practice of analyzing percentages or values between 0 and 1 to two decimal places using parametric statistical techniques.

To improve the quality of HRQoL reporting in future RCTs, we recommend that researchers and clinicians should involve a HRQoL expert in the trial design, execution, analysis, and reporting phases. When the word count is restricted by journals, an extensive appendix or supplementary dataset can be of value. In addition, we would like to affirm the comment by Brundage et al. [6] that researchers and clinicians in advisory positions can stimulate the acceptance of patient-reported outcome reporting standards—such as the CONSORT PRO—by involving editors, reviewers, and related stakeholders.

Conclusion

Although the number of RCTs on palliative systemic therapy for advanced esophagogastric cancer that include an HRQoL endpoint has increased, the quality of HRQoL reporting is highly variable, limited, and did not improve over time. This systematic review highlights the gaps in the current quality of HRQoL reporting in esophagogastric cancer RCTs. The formulation of a priori hypotheses, a clear description of how the instrument is administered, the number of missing data and how those data are handled, and the interpretation of findings are areas for improvement. We recommend that HRQoL should be extensively described in supplementary appendices if good HRQoL reporting is restricted by the word limit of the manuscript. As results from RCTs are crucial to daily practice, reliable and adequate reporting of HRQoL outcomes from RCTs is needed to facilitate clinical decision-making.