Background

Primary endpoints in clinical cancer trials are preferably objective, well-defined and measurable outcomes, such as overall survival, progression free survival and response rates [1]. However, secondary or exploratory endpoints, such as health-related quality of life (HRQL) measured through patient-reported outcomes (PROs), have become increasingly important to supplement the overall study findings [2]. PROs provide a patient-focused assessment of the impact of a treatment on patients´ symptoms and functional abilities, which can inform regulatory label claims and clinical decision making [3,4,5].

Multiple myeloma (MM) is an incurable hematological cancer associated with bone destruction, hypercalcemia, anemia, renal failure and infections [6]. Overall survival has markedly increased after the introduction of novel treatments in the past two decades [7,8,9]. A contributor to improved overall survival is high-dose melphalan with autologous stem-cell transplantation (ASCT), which is standard treatment in patients with newly diagnosed MM younger than 70 years of age without significant comorbidities [10, 11]. High-dose melphalan causes acute toxicity of anorexia, mucositis with pain and diarrhea, neutropenia and thrombocytopenia with temporary impairment of HRQL and full recovery for some patients, 1–2 months post-ASCT, while other patients still have moderate to severe symptoms one year post-ASCT [12,13,14,15].

Another contributor to improved overall survival in MM is maintenance treatment, i.e. long term therapy that extends response duration [16,17,18]. An important consideration for individual decision-making about maintenance therapy is the long-term impact on patient HRQL [17, 19]. Results from secondary PRO endpoints have shown that HRQL during maintenance varies with the specific drug used. Despite thalidomide´s prolongation of progression free survival (PFS), this drug is not approved for maintenance therapy for MM due to unacceptable impairment of HRQL [20, 21]. Lenalidomide maintenance improves PFS and overall survival and is approved for maintenance therapy in transplant-eligible patients after ASCT [18]. Adverse event registration from clinical trials shows that lenalidomide maintenance after ASCT is tolerable, and includes mainly hematological side-effects, diarrhea and fatigue [22,23,24]. However, randomized studies comparing HRQL during lenalidomide maintenance versus observation after ASCT are lacking. Published studies allowed several drugs as maintenance therapy or were designed with another drug as comparator [25,26,27]. In one of the studies, HRQL comparison between patients receiving maintenance therapy versus no maintenance therapy after first-line ASCT was done [27]. Several maintenance drugs were included, but not carfilzomib. The overall results indicated minimal impact on HRQL by maintenance therapy, but with worsening diarrhea and reduction in future perspectives.

Eventually, almost all patients with MM will experience relapse or progressive disease and for these patients, salvage ASCT will be an option, particularly for patients who achieved a long remission after frontline ASCT [10, 28]. After successful salvage ASCT, it would be desirable to prolong response duration with maintenance therapy, as long as that does not compromise HRQL or hamper recovery after salvage ASCT. However, there is limited randomized evidence on maintenance therapy after salvage ASCT. In the ReLApsE trial, lenalidomide maintenance therapy was included in the transplant arm; but the study was designed to evaluate salvage ASCT, not maintenance therapy [29].

To investigate the impact of maintenance therapy after salvage ASCT, the Nordic Myeloma Study Group initiated the CARFI trial in 2015 [30]. Carfilzomib, a second-generation proteasome inhibitor, was chosen for maintenance therapy in the CARFI trial, administered with dexamethasone every second week. The primary clinical endpoint of the CARFI study was time to progression; as previously reported, time to progression was significantly prolonged by eight months for the carfilzomib-dexamethasone (Kd) maintenance group (25.1 months) compared to observation (16.7 months) [30]. HRQL was one of the secondary endpoints of the CARFI trial; results of the European Organisation for Research and Treatment of Cancer (EORTC) Quality of Life Questionnaire– Core 30 items (QLQ-C30) global health status/quality of life (GHS/QoL) subscale showed no between-group difference during the 2-year follow-up period [30]. The aim of this paper is to present an in-depth analysis of the CARFI trial PRO endpoints, as per the statistical analysis plan [31].

PRO objective, endpoints and hypotheses

The objective of the HRQL component of the CARFI trial was to assess the impact of Kd maintenance therapy, relative to observation, post salvage ASCT on a range of relevant aspects of HRQL. The primary PRO endpoint was change in overall HRQL from randomization to eight months follow-up, with the corresponding primary PRO hypothesis that there was no difference between the Kd maintenance therapy and observation groups. The primary PRO endpoint was augmented by the average overall HRQL per-patient based on all completed HRQL forms from randomization to last follow-up time-point prior to progressive disease/death/drug discontinuation/end of study (whichever came first). These two primary endpoints were supplemented by secondary PRO endpoints defined in terms of specific HRQL domains from randomization to 8 months later: (1) mean change; (2) the proportion of patients who improved, remained stable or worsened; (3) time to first recorded improvement; and (4) the proportion of patients with Kd-related symptoms. The final secondary endpoint was quality-adjusted progression free survival (QAPFS) from randomization to progressive disease/death/drug discontinuation/end of study (whichever came first). Hypotheses corresponding to each of these endpoints, including specific HRQL domains and symptoms, the questionnaires used to assess these, and the analysis and interpretation approach are presented in Table 1.

Table 1 Health-related quality of life endpoints, hypotheses, analysis and interpretation approach of the CARFI trial

Methods

Design, patients and treatment

The CARFI trial was a Nordic Myeloma Study Group open-label multi-center, randomized, phase II clinical trial, using a parallel (1:1) group design (clinicaltrials.gov: NCT02572492). Details of the study design, trial treatment and main findings have been published elsewhere [30]. In brief, patients with MM aged 18 years or more with first relapse after prior ASCT and found eligible for salvage ASCT were included at 25 hospitals within four Nordic countries and Lithuania from January 2015 to April 2018. Key exclusion criteria were previous treatment with carfilzomib, maintenance treatment given after first ASCT, World Health Organization performance status ≥ 3, significant neuropathy (grade 3–4, or grade 2 with pain) and any comorbidity that would preclude treatment with carfilzomib or salvage ASCT.

Two months after reinduction with four cycles of carfilzomib-lenalidomide-dexamethasone followed by salvage ASCT, patients were randomized to Kd maintenance or observation. Kd maintenance was given as intravenous carfilzomib (27 mg/sqm every second week with escalation of carfilzomib to 56 mg/sqm if tolerated) and oral dexamethasone 20 mg every second week until progression, unacceptable adverse effects, withdrawal of consent or 1 September 2019 (end of study). Dose modifications of carfilzomib and/or dexamethasone were done in case of toxicity related to the drugs (described in the supplementary appendix).

PRO instruments

HRQL were assessed with three instruments: the EORTC core module (QLQ-C30) [32, 33] and Multiple Myeloma module (QLQ-MY20) [34], and the Functional Assessment of cancer Therapy/Gynaecologic Oncolocy Group-Neurotoxity (FACT/GOG-ntx) instruments [35]. The three questionnaires and domains are described in Table 2.

Table 2 The three questionnaires, domains, descriptions and scoring. EORTC QLQ-C30; quality of life questionnaire-core 30, EORTC QLQ-MY20; quality of life questionnaire-multiple myeloma module, FACT/GOG-Ntx; Functional Assessment of cancer Therapy/Gynaecologic Oncology Group-Neurotoxicity

PRO data collection

Patient inclusion in the PRO sub-protocol was part of the CARFI trial. HRQL was assessed with all three instruments at randomization (two months post salvage ASCT) and then every second month in the outpatient clinic at study visits. The study nurses had access to trial specific guidance in PRO data collection. If a scheduled visit was postponed (e.g. due to vacation, public holidays, acute toxicities beyond grade 2), the PRO data collection was postponed as well. From study initiation until July 2017, the study nurses administered all three PRO instruments on paper for the patients to complete in the clinic. From July 2017, electronic completion of the EORTC instruments via a tablet directly into a REDCap database became an option. Real-time monitoring of non-completion of instruments was not carried out. PRO data collection continued until end of study for each patient, defined as the time of disease progression, death, drug discontinuation or study termination (whichever came first). Study termination was defined as the last protocol visit for the last included patient, which was on 1 September 2019.

Rationale for PRO endpoints

In developing the statistical analysis plan, HRQL studies of patients with MM were reviewed to inform our choice of PRO endpoints and appropriate PRO measures [31]. The EORTC QLQ-C30 Summary (QLQ-C30-sum) score was chosen to assess the primary PRO endpoint because there is no standard HRQL assessment for use in MM maintenance trials, it captures the generic HRQL concepts of importance in cancer, it reduces issues with multiplicity (type I error), and it has been validated in patients with hematological malignancies [36, 37]. Secondary PRO endpoints were specified a priori, aligned with hypotheses (Table 1), assessed by specific domains from the QLQ-C30, QLQ-MY20 and FACT/GOG-ntx. These included four of the functioning domains (physical, role, social, emotional) and GHS/QoL from EORTC QLQ-C30, and two domains from the QLQ-MY20 (body image and future perspectives); these were chosen because they have previously been shown to be affected after primary ASCT [12,13,14]. Based on the literature, Kd-related symptoms were identified and defined as fatigue, dyspnea, nausea/vomiting, diarrhea, insomnia, agitation and restlessness; these were included to assess the non-infusion related symptomatic impact of carfilzomib and dexamethasone treatment [38,39,40,41,42,43,44,45]. Time in QAPFS was included as it captures the benefits and consequences of Kd maintenance therapy in terms of both treatment efficacy (PFS) and HRQL consequences. Further details are provided in the statistical analysis plan [31].

Statistical analyses

The statistical analysis plan for the PRO endpoints was finalized and published before starting the PRO data analyses [31]; a clarifying amendment was added to the published statistical analysis plan before finalizing the PRO data analyses. The statistical analyses are described here in brief. For each patient, the study period was from randomization to that patient’s last follow-up prior to progressive disease/death/drug discontinuation/end of study (which ever came first). The analyses were based on all available questionnaires from randomized patients on protocol until last follow-up time point except for the supportive analysis of the primary endpoint where all available questionnaires were included in the analysis. The last follow-up time point was defined as the time point before the number of patients on protocol in one of the groups became less than 15 patients. Analyses were carried out using Stata 17.

Patient characteristics and PRO scores at randomization were described with summary statistics (mean and 95% confidence intervals (CI)). PRO completion rates were calculated for each assessment time point as number of randomized patients with enough completed items to calculate the QLQ-C30-sum score as a proportion of patients on protocol at that time point.

HRQL domain scores of the QLQ-C30 and QLQ-MY20 were calculated according to EORTC standard scoring [46]. The QLQ-C30-sum score was calculated from the mean of 13 of the 15 QLQ-C30 domains (excluding GHS/QoL and Financial impact defining items) [37]. The QLQ-C30-sum score ranges from 0to 100, with 100 being best. The FACT/GOG-ntx score was calculated according to the FACT manual [47].

For the endpoints involving change in HRQL, the longitudinal PRO data were analyzed with a linear mixed effect model for repeated measures. The model included patients as a random variable and time (baseline, two, four, six and eight months), treatment group and country as fixed factors. The results were tested for statistical significance at the 5% level for the primary HRQL endpoint and at the 1% level for the secondary HRQL endpoints. Statistically significant results were interpreted as clinically relevant/meaningful if they exceed the threshold for minimally important difference/change (MID). As MID has not yet been established for the QLQ-C30-sum score, a 10-point change/difference was pre-specified as indicating a clinically relevant change (within a group) and a clinically relevant difference in change between groups [48, 49]. Within and between group changes in the individual domains of the QLQ-C30 questionnaire were interpreted according to the evidence-based guidelines for between-group differences [50] and change over time [51] using the threshold for a small difference/change. For the domains of QLQ-MY20, recommended estimates for MIDs were used (disease symptoms and side effects of treatment; 10 points, body image; 13 points and future perspectives: 9 points) [52]. For the FACT/GOG-ntx subscale, 4.4 [53] points were used as primary MID and 11.8 points were used for sensitivity analysis [54].

For the supportive analysis of the primary endpoint, the average per-patient QLQ-C30-sum score from randomization to progressive disease/death/drug discontinuation/end of study was calculated. The mean per-patient QLQ-C30-sum score was calculated for each group and compared with a rank-sum test.

To evaluate the proportion of patients who improved, remained stable or worsened in HRQL, patients were categorized as having perceived either an improvement or worsening from randomization if they experienced a change in score (in the direction of improvement or worsening, respectively) that exceeded at least 20 points [55]. For the single item domain of body image, 33 points were used [52].

Time to first recorded improvement in HRQL was calculated for each patient as time from randomization to the first time the patient reported an improvement of at least 20 points for the EORTC multi-item domains [55], and 33 points for the body image single item domain [52]. Patients who did not record an improvement at any time point were censored at end of study. Patients with no score at randomization and patients with high functioning as well as good GHS/QoL (greater than 80 points) and body image (less than 33 points) at randomization were excluded from this analysis since this left no room for improvement. The mean time to improvement was compared between groups using a proportional hazards Cox regression model with treatment, PRO score at randomization and country as covariates, and presented as a hazard ratio (HR) with corresponding 95% CI. A post-hoc analysis of average time of first recorded improvement with corresponding standard deviations was performed.

For endpoints involving the proportion of patients with Kd-related symptoms, raw item scores were converted to 0-100 scales according to the EORTC manual [46], i.e. “none at all”=0, “a little”=33, “quite a bit”=67, “very much”=100. The proportion of patients with Kd-related symptoms was defined as the number of patients experiencing any of the Kd-related symptoms corresponding to at least “a little” (≥ 33 points) divided by the total number of patients. Similarly for the proportion of patients reporting moderate to severe Kd-related symptoms, a score of at least “quite a bit” (≥ 66 points) was used. This dichotomized grading has previously been used in patients with hematological malignancies [56]. Differences between treatment groups in these proportions were analyzed as binary endpoints using chi-square tests of homogeneity.

Preference-based utility scores were calculated from the QLQ-C30 data using the utility scoring algorithm of EORTC Quality of Life Utility Measure-Core 10 dimensions (QLU-C10D) [57], which yields a score with a maximum of one (representing the best possible health state), and is anchored at zero (representing the state of being dead). Implementing this algorithm requires a country-specific value set; as a Danish QLU-C10D value set was not available at the time of analysis, the German value set was used [58], as it was considered to be closest culturally of the country-specific value sets available at the time of analysis. The time in QAPFS was calculated using two quality adjustment metrics: the per-patient mean EORTC QLQ-C30-sum score (divided by 100, as required for quality adjustment of life years) and the per-patient mean QLU-C10D utility score [59]. Patient-level QAPFS was calculated by multiplying the per-patient estimated mean time to progression/death (censored at last follow-up time point) by each of the quality adjustment metrics. Group-level QAPFS was then estimated with the Kaplan-Meier method. The mean difference in QAPFS between the two groups was estimated using bootstrap methods for each quality-adjustment metric.

To examine potentially missing data patterns (informative drop-out) two pattern mixture model analyses were performed; (1) stratifying patients into two groups based on drop-out time early (drop-out between zero and eight months follow-up) and late (participated in the study at 10 months) per treatment group, (2) stratifying patients into three groups based on drop-out time (early, late and never drop-out) per treatment group.

A post-hoc analysis was performed to investigate compliance with Kd maintenance therapy by assessing the proportion of fully administered, reduced and omitted doses of carfilzomib and dexamethasone relative to scheduled doses.

Sample size in the maintenance phase of the CARFI trial was determined according to the primary trial endpoint (time to progression). According to the trial protocol, it was expected that 150 patients would continue to the maintenance phase. No formal power analysis was performed for the PROs, and no patients were involved in the design or the interpretation phase of the study.

Results

Patient population

In total, 181 patients underwent salvage ASCT, and 168 patients continued to randomization between Kd maintenance or observation; 82 were randomized to Kd maintenance and 86 to observation, constituting the population for this HRQL analysis. Reasons for drop-out before randomization have been reported previously [30]. Reasons for drop-out from randomization until eight months follow-up are presented in the CONSORT flow diagram of Fig. 1 and the supplementary appendix Figure S1 until last follow-up time point (i.e. 22 months). Patient demographics, clinical characteristics and PRO mean scores at randomization are presented in Table 3. PRO completion rate until eight months follow-up was 93% (95% for the Kd maintenance and 91% for the observation group) and 93% until last follow-up time point (supplementary appendix Table S1).

Fig. 1
figure 1

CONSORT flow diagram

CONSORT flow diagram of the entire study period is presented in the supplementary appendix Figure S1

Kd; Carfilzomib-dexamethasone, p; patient, q; questionnaires (completed items for calculating the EORTC QLQ-C30 summary score)

Table 3 Patient characteristics and patient-reported outcomes scores at randomization. Kd; Carfilzomib-dexamethsone, IQR; Interquartile range 25-75%, MM; multiple myeloma, ASCT; high-dose melphalan with autologous stem-cell transplantation, WHO; World health organization, EORTC QLQ-C30, European Organisation for Research and Treatment of Cancer Quality of Life Questionnaire core questionnaire; EORTC QLQ‐MY20, European Organisation for Research and Treatment of Cancer Multiple Myeloma module; FACT/GOG‐ntx subscale, Functional Assessment of Cancer Therapy/Gynecologic Oncology Group Neurotoxicity subscale; SD, Standard deviation

Effect of kd maintenance on HRQL

The difference in mean change from randomization between groups in QLQ-C30-sum score estimated at eight months was 4.62 points on the 0-100 scale (95% CI -8.9; -0.4, p = 0.032) was statistically significant but not clinically important (i.e. it was less than the predefined MID). The findings are visualized in Fig. 2. Similarly, none of the differences at specific time points (two, four and six months follow-up) reached statistical significance and clinical relevance. Data until last follow-up time point are provided in the supplemental appendix Table S2. Results of the pre-specified supportive analysis of the primary endpoint, the average per-patient mean QLQ-C30-sum score, was 81.4 (SD 12.9) for the Kd maintenance group and 82.5 (SD 12.3) for the observation group. These means were similar and did not differ statistically significantly between the groups (rank sum test, p = 0.625).

Fig. 2
figure 2

Mean score change from randomization in EORTC QLQ-C30 Summary Score until 8 months follow-up. Horizontal bars illustrates 95% confidence intervals. Kd; Carfilzomib-dexamethasone

Effect of kd maintenance on the individual PRO domains

Within the observation group, the patients reported statistically significant and clinically meaningful improvements at eight months in six domains (physical, role, social functioning, appetite loss, fatigue and body image), whereas the patients in the Kd maintenance group did not report statistically significant or clinically meaningful improvements in any domains at eight months follow-up. The findings are presented in Table 4. For these six domains, the results at eight months are presented in Fig. 3, and mean score changes, 95% confidence intervals and p-values until last follow-up time point are presented in supplementary appendix Table S3-S8.

Table 4 Mean change from randomization to eight months follow-up and between-group difference. The point estimates in bold indicate a statistically significant (p < 0.01) and clinically relevant difference/meaningful change. Improvements are indicated by positive changes for function domains and negative changes for symptom domains
Fig. 3
figure 3

Mean score improvement from randomization to 8 months follow-up for the six domains with statistically significant and clinically meaningful improvement at 8 months follow-up

Blue* indicates time points with statistically significant and clinically meaningful improvement for the observation group. Red* indicates time points with statistically significant and clinically meaningful improvement for the carfilzomib-dexamethasone (Kd) maintenance group

Proportion of patients who improved, remained stable or worsened

Generally, the majority (range 52-93%) of patients in both groups remained stable from randomization to the eight months follow-up, with some notable exceptions. A higher proportion of patients in the Kd maintenance compared to the observation group (93% versus 81%) reported stable physical functioning at four months (odds ratio 2.99, 95% CI 1.01; 8.83, p = 0.040). Also, a higher proportion of patients in the Kd maintenance group compared to the observation group (13% versus 2%) reported worsening in social functioning at eight months (odds ratio 8.74, 95% CI 1.06; 72.2, p = 0.018). The proportions and odds ratios for all domains from randomization to two, four, six and eight months follow-up are reported in Supplementary Table S9.

Time to first recorded improvement

There were no statistically significant between-group difference in time to first recorded improvement in any of the (EORTC QLQ-C30) functional domains, GHS/QoL scale, QLQ-MY20 body image domain or future perspectives (Table 5). The average weeks to first recorded improvement is displayed in the supplementary appendix Table S10).

Table 5 Time to first recorded improvement in the functional domains, Global Health Scale/Quality of Life and body image domains. Patients with no score at randomization and patients with high functioning as well as good GHS/QoL and body image at randomization were excluded from this analysis since high/good score at randomization leaves no room for improvement

Proportion of patients with Kd-related symptoms

Significantly more patients in the Kd maintenance group developed Kd-related symptoms of restlessness and agitation (odds ratio 1.90, CI95% 1.01; 3.58, p = 0.046) and insomnia (odds ratio 2.60, CI 95% 1.15; 5.88, p = 0.019) during the study period (Fig. 4 and the supplementary appendix Table S11).

Fig. 4
figure 4

Proportion of patients developing carfilzomib-dexamethasone (Kd) related symptoms during the study period for the Kd maintenance and the observation group

Quality-adjusted progression free survival

When quality-adjusting the PFS based on the QLQ-C30-sum score, the QAPFS difference between the twogroups was 3.0 months in favor of the Kd maintenance group (95% CI 0.78; 5.28, p = 0.008). Mean QAPFS based on the QLQ-C30-sum score for the Kd maintenance group was 18.0 months (95% CI 16.4; 19.6) and 15.0 months for the observation group (95% CI 13.5; 16.5). Similarly for QAPFS based on the QLU-C10D, where Kd maintenance significantly extended QAPFS by 3.5 months (95% CI 1.2; 5.9, p = 0.004) compared to observation. The mean QAPFS based on the QLU-C10D for patients in the Kd maintenance group was 17.5 months (95% CI 15.9; 19.2) and 14.0 months (95% CI 12.4; 15.5) for the observation group. The quality-adjusted progression-free-survival curves are displayed in Fig. 5. For comparison, the unadjusted progression-free survival was 19.3 months for the Kd maintenance group (95% CI 17.9; 20.7) and 16.8 for the observation group (95% CI 15.4; 18.3) resulting in a significantly superior PFS of 2.5 months for the Kd maintenance group (95% CI 0.47; 4.5, p = 0.016).

Fig. 5
figure 5

Quality-adjusted progression-free-survival based on the EORTC QLQ-C30 Summary score and EORTC Quality of Life Utility Measure-Core 10 dimensions (QLU-C10D). Kd; Carfilzomib-dexamethansone

Impact of missing data

Concerning informative drop-out, the pattern mixture models showed a statistically significant and clinically relevant difference for QLQ-C30-sum score reported by patients dropping out early compared to patients dropping out late for the Kd maintenance group (mean difference 12.2 points, 95% CI 5.3; 19.2, p = 0.001, with patients who dropped out early having poorer scores). There were no statistically significant or clinically relevant differences between early, late and never drop-out patients for the observation group or between never and late for the Kd maintenance group.

Post hoc analysis

Our post hoc investigation revealed that 90% (2150 out of 2392) of the dexamethasone doses that were administered to the patients on protocol were full doses of 20 mg, and 83% (1983 out of 2392) of Kd maintenance doses were administered with full dose carfilzomib of 56 mg/sqm. In addition, 12% of the carfilzomib doses were administered as reduced doses (45, 36 or 27 mg/sqm), and for the remaining carfilzomib doses, treatment was omitted or data were not available. Further details of dose escalation and modification as well as administered doses of dexamethasone and carfilzomib can be found in the supplementary appendix Table S12, S13 and S14.

Discussion

This is the first prospective randomized trial reporting patient-reported HRQL data in patients with relapsed MM receiving maintenance therapy with Kd after salvage ASCT. Kd maintenance therapy after salvage ASCT did not affect the perceptions of overall HRQL compared to observation, confirming our primary hypothesis. When adjusting the PFS for the HRQL impact, the Kd maintenance treatment still remained beneficial compared to observation, but reduced the PFS for the Kd maintenance group from 19.3 to 18.0 or 17.5 months and from 16.8 to 15.0 or 14.0 months for the observation group depending on the quality adjustment metric. Progression-free survival based on maintenance therapy should be quality-adjusted and constitutes a relevant estimate for the maintenance therapy benefits when taking the HRQL impact into account. For health technology assessment and health reimbursement decisions, the EORTC QLU-C10D score is suitable for estimating quality-adjusted PFS and for use in cost-utility analyses because it is a preference-based HRQL metric, whereas the EORTC QLQ-C30 summary score is not.

Patients randomized in the CARFI study were recovering after salvage ASCT. Evidence-based recovery trajectories in symptoms and functioning have been documented only following primary ASCT, but not for salvage ASCT [12,13,14]. Introducing maintenance therapy after ASCT leads to concerns of hampering patient recovery and/or initiating new side effects related to the drugs included in maintenance treatment. Our study showed that salvage ASCT recovery was hampered by Kd maintenance in six domains at eight months follow-up including physical functioning and fatigue, which are considered core HRQL domains in patients with multiple myeloma. Those findings were consolidated in our findings of higher proportions of patients treated with Kd maintenance reporting stable physical function at four months follow-up, which is a time point where improvement in physical functioning is expected, as well as worsening in social functioning at eight months follow-up.

Based on the non-infusion related toxicity profiles of carfilzomib and symptomatic side-effects of dexamethasone, we pre-specified domains and items expected to be affected during Kd maintenance treatment. The analysis showed that more patients reported insomnia, restlessness and agitation during Kd maintenance treatment. Those side-effects can primarily be attributed to dexamethasone treatment [45]. Our investigation of Kd-related symptoms revealed that dexamethasone symptomatic side-effects had a greater impact on patients than carfilzomib-related symptomatic side-effects. This finding in keeping with our post hoc analysis findings of 90% of dexamethasone doses being administered as full doses of 20 mg.

Dyspnoea is a well-documented side effect of carfilzomib, as revealed by the adverse events registrations from the large phase III clinical trials ASPIRE and ENDURANCE [39, 42]. Trajectories of patient-reported dyspnoea during carfilzomib treatment have to our knowledge not been published. Our investigation revealed no negative impact of carfilzomib maintenance on the patient-reported dyspnoea. As our findings of Kd-related symptoms are based on a patient cohort where the majority of patients received full-dose carfilzomib, dose reduction cannot be the exploration. However, in the CARFI study, the patients completed the HRQL questionnaires in the outpatient clinic on the drug administration day. The patients were asked to report symptoms experienced for the past seven days when completing the questionnaires. At that time point, it had been two weeks since the last dose of carfilzomib was administered, and the relevant side-effects might have subsided by then.

This study has several strengths. The objectives, endpoints, hypotheses and analyses methods are based on a pre-published statistical analysis plan guided by an evidence-based guideline for inclusion of PRO in clinical trials [31, 60]. This minimizes statistical multiplicity issues, avoids cherry picking of HRQL findings and ensures high-quality PRO data results to inform clinical decision-making. Another strength is the high PRO completion rate. These factors work together to confer scientific rigor and credibility to our results.

A limitation of related research rather than this study is that an MID for the primary PRO endpoint of the QLQ-C30-sum score is not yet established. An arbitrary cutoff of 10 points was thus chosen as the MID, based on previous EORTC QLQ-C30 MID findings in patients with MM [61]. However, looking at the domain results (where we did have MIDs [51]), we suspect that an MID for QLQ-C30-sum score lower than 10 points may have been more correct. Hence, this study might have been underpowered for detection of smaller, but potentially clinically important differences. The performed sensitivity analysis investigating the impact of missing data due to drop-out suggests that the missing data are not missing at random. The patients in the Kd maintenance group leaving the study late or who were on protocol at the end of the study period reported better HRQL compared to the patients leaving the protocol early. Reasons for drop-out were imbalanced between the two randomized groups during follow-up: 53 out of 86 (62%) patients in the observation group came off study protocol due to progressive disease, while the corresponding number was smaller in the Kd maintenance group (29 out of 82 (35%)). More patients in the Kd maintenance compared to the observation group (7 versus 1) left the study due to withdrawn consent.

Conclusion

HRQL data from the phase II randomized CARFI trial demonstrate that Kd maintenance did not impair the patients´ perceptions of their overall HRQL, but Kd maintenance did delay recovery after salvage ASCT in several of the core domains of patients with multiple myeloma, specifically physical, role and social functioning, appetite loss, fatigue and body image. When adjusting the PFS for the HRQL impact, the Kd maintenance treatment still remained beneficial, and the difference in PFS between the two groups became larger when adjusting for HRQL. Progression-free survival of maintenance therapy should be quality-adjusted to be able to balance the benefits against the HRQL impact.