Plain English summary

Safety and treatment tolerability have historically been assessed solely by clinicians via the Common Terminology Criteria for Adverse Events (CTCAE). However, clinician reports under-detect symptoms relative to patient reports. Therefore, the National Cancer Institute contracted development of the Patient-Reported Outcomes version of the CTCAE (PRO-CTCAE) item library, which is now widely implemented in cancer clinical trials. However, missing PRO-CTCAE responses from patients complicate reporting and analysis. As with other patient-completed questionnaires, patients with and without missing PRO-CTCAE responses may systematically differ, thus jeopardizing the validity and generalizability of clinical trial results. For example, patients with more severe symptoms might miss PRO-CTCAE assessments more often than patients with less severe symptoms, potentially leading to invalid conclusions being drawn from analyses based on only the remaining patients’ PRO-CTCAE responses. This paper focuses on analysis of the PRO-CTCAE when some patients’ responses are missing (Alliance A151912), with application to 2 randomized, double-blind, placebo-controlled, phase III trials: Alliance A091105 and COMET-2. In each trial, we applied various methods for comparing patient-reported symptoms across treatment arms while addressing missing PRO-CTCAE responses. We found that optimal methods use patients’ responses to the PRO-CTCAE at other time points to provide information about their would-be responses to the PRO-CTCAE at the time point of interest. Clinicians’ CTCAE grades did not provide useful information about patients’ missing PRO-CTCAE responses.

Introduction

To incorporate the patient perspective into assessments of symptomatic adverse events, the National Cancer Institute contracted development of the Patient-Reported Outcomes version of the Common Terminology Criteria for Adverse Events (PRO-CTCAE) [1]. The PRO-CTCAE is a library of 124 items assessing the frequency, severity, interference, amount, and/or presence of 78 symptomatic adverse events drawn from the CTCAE (http://healthcaredelivery.cancer.gov/pro-ctcae). Patients complete a subset of items based on symptomatic adverse events most relevant to the treatment(s) under investigation. The National Cancer Institute recommends reporting patients’ PRO-CTCAE scores in conjunction with clinicians’ CTCAE grades to improve the evaluation of symptomatic adverse events in cancer clinical trials [2]. Relative to clinician reports, patient reports show greater sensitivity to changes in daily functioning or symptom burden, thus allowing for improvements in safety monitoring, symptom management, and even survival [3, 4]. Furthermore, the PRO-CTCAE may be better able to capture when treatments produce less severe but more chronically bothersome adverse events relative to the CTCAE (where grades 4 and 5 correspond to life-threatening adverse events and death, respectively), and thereby enable better understanding of treatment tolerability from the patient perspective.

However, missing scores complicate reporting and analysis of the PRO-CTCAE. Patients with and without missing scores may systematically differ, thus jeopardizing the validity and generalizability of clinical trial results. For example, patients with more severe symptomatic adverse events might drop out or feel too unwell to complete PRO-CTCAE assessments, such that these patients miss PRO-CTCAE assessments more often than patients with less severe symptomatic adverse events. Analyses based on only the remaining patients’ PRO-CTCAE scores may result in biased parameter estimates and invalid conclusions.

This paper focuses on optimal analysis methods for incomplete PRO-CTCAE items (Alliance A151912), with application to 2 randomized, double-blind, placebo-controlled, phase III trials: Alliance A091105 and COMET-2. We conduct between-arm comparisons within each trial while comparing the following strategies for addressing missing PRO-CTCAE scores: complete-case two-sample t-test, mixed modeling with contrast, and multiple imputation followed by a two-sample t-test. Because interest lies in whether CTCAE grades can inform missing PRO-CTCAE scores, we then perform multiple imputation with and without CTCAE grades as auxiliary variables to assess the added benefit of including CTCAE grades in the imputation model relative to only including PRO-CTCAE scores across all cycles.

Methods

Patients, measures, and procedures

In A091105, patients with desmoid tumors or deep fibromatosis were randomized 2:1 to receive sorafenib or placebo by mouth once daily. Upon confirmation of progression, patients assigned to the placebo arm were allowed to cross over to open-label use of sorafenib. The primary endpoint was progression-free survival (NCT02066181; see Gounder et al. [5] for results). English-speaking patients were invited to participate in a correlative study of patient-reported pain, symptomatic adverse events, and quality of life while on randomized treatment. Patients enrolled in the correlative study completed 19 PRO-CTCAE items via paper booklets prior to randomization and after each 4-week cycle for eight cycles while on randomized treatment (i.e., weeks 4, 8, 12, 16, 20, 24, 28, and 32). The PRO-CTCAE items assessed the frequency (F), severity (S), interference (I), and/or presence (P) of the following symptoms: insomnia (SI), constipation (S), pain (FSI), fatigue (SI), nausea (FS), vomiting (FS), diarrhea (F), rash (P), hand-foot syndrome (SI),Footnote 1 decreased appetite (SI), and mouth or throat sores (S). Clinicians graded patients’ fatigue, papulopustular rash, palmar-plantar erythrodysesthesia syndrome, diarrhea, anorexia, nausea, vomiting, abdominal pain, mucositis oral, hypertension, arthralgia, and myalgia via the CTCAE v4.0 at each cycle. For adverse events beyond those solicited at each cycle, clinicians reported grade 1 and 2 adverse events with attributions of possible, probable, or definite and all grade 3+ adverse events regardless of attribution. The institutional review board or ethics committee at each participating site approved the protocol, and patients provided written informed consent.

In COMET-2, patients with metastatic castration-resistant prostate cancer and narcotic-dependent pain from bone metastases who had progressed after treatment with docetaxel and either abiraterone or enzalutamide were randomized 1:1 to receive cabozantinib by mouth once daily or mitoxantrone every 3 weeks plus prednisone by mouth twice daily (with matching placebos in each arm). The primary endpoint was pain response at week 6 confirmed at week 12 (NCT01522443; see Basch et al. [6] for results). Patients completed 21 PRO-CTCAE items via an interactive voice response system prior to randomization; at weeks 3, 6, and 12; and every 6 weeks thereafter until progression. The PRO-CTCAE items assessed the frequency (F), severity (S), interference (I), and/or presence (P) of the following symptoms: insomnia (SI), constipation (S), pain (FSI), fatigue (SI), nausea (FS), vomiting (FS), diarrhea (F), rash (P), decreased appetite (SI), numbness or tingling in hands or feet (SI), mouth or throat sores (S), and shortness of breath (SI). Clinicians graded patients’ adverse events via the CTCAE v4.0 while recording start/stop dates for each adverse event. The institutional review board or ethics committee at each participating site approved the protocol, and patients provided written informed consent.

Statistical analysis

All analyses were performed separately for A091105 and COMET-2. As recommended by the Setting International Standards in Analyzing Patient-Reported Outcomes and Quality of Life Endpoints Data (SISAQOL) Consortium [7], the PRO-CTCAE completion rate and available data rate were calculated at each cycle. The available data rate was calculated as the ratio of the number of patients with observed PRO-CTCAE scores to the total number of patients. The completion rate was calculated as the ratio of the number of patients with observed PRO-CTCAE scores to the number of patients eligible to complete the PRO-CTCAE assessment at that time point. That is, the completion rate’s denominator excluded patients who were no longer required to complete the PRO-CTCAE assessment per the protocol (e.g., due to death or going off randomized treatment). In A091105, reported reasons for missing PRO-CTCAE scores were summarized at each cycle.

Between-arm comparisons on the PRO-CTCAE at week 12Footnote 2 were conducted using the following strategies: complete-case two-sample t-test, mixed modeling with contrast, and multiple imputation followed by a two-sample t-test. The complete-case two-sample t-test assumes a missing completely at random mechanism, meaning the probability of missingness is unrelated to the observed and missing scores. Excluding patients with missing scores yields unbiased parameter estimates under a missing completely at random mechanism because the observed scores can be regarded as a random subsample of the hypothetical complete scores. Mixed modeling and multiple imputation assume a missing at random mechanism, meaning the probability of missingness is unrelated to the missing scores after conditioning on the observed scores. The missing at random mechanism is a much more plausible assumption than the missing completely at random mechanism. Mixed modeling uses maximum likelihood estimation and includes patients’ responses to a PRO-CTCAE item at all available cycles, such that patients with any observed scores contribute to the mixed model. In doing so, mixed modeling accounts for missing PRO-CTCAE scores using patients’ responses to the same PRO-CTCAE item at different cycles (but not patients’ responses to other PRO-CTCAE items or clinicians’ CTCAE grades). Multiple imputation involves creating multiple copies of the dataset with different imputed scores (imputation phase), analyzing the imputed datasets as though they were complete datasets (analysis phase), and pooling the parameter estimates and standard errors across the imputed datasets to yield a single set of results (pooling phase). Multiple imputation quantifies uncertainty due to the missing scores and inflates the standard errors accordingly based on changes in the imputed scores across the imputed datasets.

Multiple imputation was performed with and without CTCAE grades as auxiliary variables to assess the added benefit of including CTCAE grades in the imputation model relative to only including PRO-CTCAE scores across all cycles. To impute the missing scores, multiple imputation uses available information about patients’ would-be responses to the incomplete PRO-CTCAE items. This information may include patients’ responses to the same PRO-CTCAE item at different cycles, patients’ responses to other PRO-CTCAE items, and/or clinicians’ CTCAE grades (i.e., so-called auxiliary variables). For example, a patient who reports no pain at Cycle 2 may be more likely to report no pain at Cycle 3. A patient who reports severe insomnia at Cycle 3 may be more likely to report severe fatigue at Cycle 3. If a clinician reports grade 1+ vomiting at Cycle 3, then the patient may be more likely to report severe vomiting at Cycle 3. This information helps multiple imputation make plausible guesses about what patients’ missing scores would have been had they completed the PRO-CTCAE assessment. In the imputation model, auxiliary variables correlate with an incomplete PRO-CTCAE item and/or missingness. Auxiliary variables that correlate with an incomplete PRO-CTCAE item improve power by providing information about the missing scores; auxiliary variables that correlate with an incomplete PRO-CTCAE item and its missingness indicator (0 = missing, 1 = observed) reduce bias [8]. In general, the imputation model should include auxiliary variables that correlate at ≥ 0.40 (in magnitude) with both an incomplete PRO-CTCAE item and its missingness indicator or correlate at ≥ 0.50 (in magnitude) with an incomplete PRO-CTCAE item [8,9,10,11]. Auxiliary variables generally should not have more than 10% of their scores concurrently missing with the incomplete PRO-CTCAE item [9, 12]. For example, in A091105, most patients who did not complete the PRO-CTCAE item on insomnia severity at Cycle 3 also did not complete the PRO-CTCAE item on fatigue severity at Cycle 3 due to missing the PRO-CTCAE assessment altogether. Thus, our imputation model only included patients’ responses to the same PRO-CTCAE item at different cycles and clinicians’ CTCAE grades as auxiliary variables.

All analyses were performed in SAS 9.4 (SAS Institute Inc., Cary, NC). The complete-case two-sample t-test was conducted using the TTEST procedure (for PRO-CTCAE items assessing frequency, severity, or interference); complete-case logistic regression was conducted using the LOGISTIC procedure (for the PRO-CTCAE item assessing presence). Mixed modeling was performed using the MIXED procedure (for PRO-CTCAE items assessing frequency, severity, or interference) or GLIMMIX procedure (for the PRO-CTCAE item assessing presence). Each mixed model included a fixed intercept; fixed effects for time, arm, and arm by time interaction; and autoregressive (lag 1) residual covariance matrix that accounts for repeated PRO-CTCAE assessments within patients. Time was treated as nominal, such that the mixed models did not make assumptions about the trajectory of patients’ PRO-CTCAE scores over time. A logit link function was specified within the mixed model for the PRO-CTCAE item assessing presence. Multiple imputation was conducted twice for each PRO-CTCAE item using the MI procedure. In the first imputation model, each PRO-CTCAE item was imputed based on patients’ responses to that PRO-CTCAE item at all available cycles. In the second imputation model, each PRO-CTCAE item was imputed based on patients’ responses to that PRO-CTCAE item at all available cycles plus relevant CTCAE grades at week 12 (Table 1). Because clinician-reported adverse events were collected by start/stop dates in COMET-2, CTCAE grades for ongoing adverse events at week 12 (± 1 week) were used as auxiliary variables. For example, in A091105, the PRO-CTCAE item on hand-foot syndrome severity was imputed based on patients’ responses to that PRO-CTCAE item at all available cycles plus clinicians’ CTCAE grades for palmar-plantar erythrodysesthesia syndrome at week 12. Similarly, in COMET-2, the PRO-CTCAE item on numbness or tingling in hands or feet was imputed based on patients’ responses to that PRO-CTCAE item at all available cycles plus clinicians’ CTCAE grades for peripheral neuropathy at week 12. Using a fully conditional specification, 50 imputed datasets were generated while setting the number of burn-in iterations (i.e., the number of iterations prior to saving each imputed dataset) to 1000. A two-sample t-test was performed on each imputed dataset using the REG procedure (for PRO-CTCAE items assessing frequency, severity, or interference) or LOGISTIC procedure (for the PRO-CTCAE item assessing presence), and results were pooled across the imputed datasets using the MIANALYZE procedure.

Table 1 CTCAE grades used as auxiliary variables in the imputation models

To assess similarity of results, estimates of average treatment effects at week 12 and statistical efficiency were compared across strategies for addressing missing PRO-CTCAE scores. Statistical efficiency was measured by confidence interval width, with narrower confidence intervals indicating greater statistical efficiency.

Results

Patient demographic and disease characteristics

Of the 87 patients enrolled in A091105, 64 consented to participate in the correlative study. Of these 64 patients, 36 (56.3%) were randomized to the sorafenib arm and 28 (43.8%) were randomized to the placebo arm. The randomization ratio was not 2:1 (sorafenib:placebo) as specified in the protocol due to a randomization algorithm error that was detected and corrected partway through enrollment [5]. Patients were 65.6% female. Median age was 36.5 years (range = 18 to 68 years). Baseline ECOG performance status was 0 for 39 (60.9%) patients and 1 for 25 (39.1%) patients. Intraabdominal disease was present in 21 (32.8%) patients. Disease status was newly diagnosed for 32 (50.0%) patients, recurrent for 30 (46.9%) patients, and not reported for 2 (3.1%) patients. Patients were unblinded on November 17, 2017, and those receiving placebo were allowed to cross over to open-label use of sorafenib if progression had not yet occurred.

Of the 119 patients enrolled in COMET-2, 107 completed the baseline PRO-CTCAE assessment and at least one post-baseline PRO-CTCAE assessment and were thus included in this analysis. Of these 107 patients, 53 (49.5%) were randomized to the cabozantinib arm and 54 (50.5%) were randomized to the mitoxantrone-prednisone arm. Median age was 65 years (range = 44 to 80 years). Baseline ECOG performance status was 0 or 1 for 94 (87.9%) patients. All patients had undergone at least two prior lines of systemic treatment for castration-resistant metastatic prostate cancer.

PRO-CTCAE completion

In A091105, 100.0 to 52.8% and 100.0 to 39.3% of those randomized to the sorafenib and placebo arms, respectively, completed the PRO-CTCAE assessment between baseline and week 32 (Table 2). When excluding patients who were no longer required to complete the PRO-CTCAE assessment per the protocol, completion rates ranged from 100.0 to 70.4% and 100.0 to 73.3%, respectively (Table 2). At week 12, 13 patients did not complete the PRO-CTCAE assessment due to being off randomized treatment (n = 7), not having a clinic visit (n = 2), staff not administering the questionnaire booklet (n = 2), or unspecified reason (n = 2). Of the 51 patients who completed the PRO-CTCAE assessment at week 12, 50 completed all 19 items and 1 missed 5 of the 19 items due to skipping a page of the questionnaire booklet.

Table 2 PRO-CTCAE completion rates and available data rates

In COMET-2, 100.0 to 28.3% and 100.0 to 22.2% of those randomized to the cabozantinib and mitoxantrone-prednisone arms, respectively, completed the PRO-CTCAE assessment between baseline and week 24 (Table 2). When excluding patients who were no longer required to complete the PRO-CTCAE assessment per the protocol, completion rates ranged from 100.0 to 68.2% and 100.0 to 81.0%, respectively (Table 2).

Comparison of missing data strategies

Tables 3 and 4 summarize between-arm comparisons on the PRO-CTCAE at week 12 based on a complete-case two-sample t-test, mixed modeling with contrast, and multiple imputation followed by a two-sample t-test in A091105 and COMET-2, respectively. In both trials, mixed modeling and multiple imputation provided the most similar estimates of the average treatment effect at week 12 for PRO-CTCAE items assessing frequency, severity, or interference (Tables 3 and 4). In A091105, differences between these estimates ranged from − 0.313 to 0.131 when comparing the complete-case two-sample t-test and mixed modeling, − 0.265 to 0.103 when comparing the complete-case two-sample t-test and multiple imputation, and − 0.060 to 0.068 when comparing mixed modeling and multiple imputation. In COMET-2, differences between these estimates ranged from − 0.158 to 0.132 when comparing the complete-case two-sample t-test and mixed modeling, − 0.194 to 0.174 when comparing the complete-case two-sample t-test and multiple imputation, and − 0.065 to 0.085 when comparing mixed modeling and multiple imputation.

Table 3 Comparison of analysis strategies for estimating average treatment effects on the PRO-CTCAE at week 12 in A091105
Table 4 Comparison of analysis strategies for estimating average treatment effects on the PRO-CTCAE at week 12 in COMET-2

In A091105, the sample size used for analysis was consistently 64 (i.e., all patients enrolled who consented to participate in the correlative study) for mixed modeling and multiple imputation, whereas the sample size used for analysis was 50 or 51 (i.e., patients who consented to participate in the correlative study and completed the relevant PRO-CTCAE item at week 12) for the complete-case two-sample t-test (Table 3). Mixed modeling yielded confidence intervals that were 96.5% as wide as those generated by a complete-case two-sample t-test, multiple imputation yielded confidence intervals that were 106.3% as wide as those generated by a complete-case two-sample t-test, and mixed modeling yielded confidence intervals that were 91.2% as wide as those generated by multiple imputation, on average, for PRO-CTCAE items assessing frequency, severity, or interference. In COMET-2, the sample size used for analysis was consistently 107 for mixed modeling and multiple imputation, whereas the sample size used for analysis was 75 or 76 for the complete-case two-sample t-test (Table 4). Mixed modeling yielded confidence intervals that were 87.8% as wide as those generated by a complete-case two-sample t-test, multiple imputation yielded confidence intervals that were 99.1% as wide as those generated by a complete-case two-sample t-test, and mixed modeling yielded confidence intervals that were 88.7% as wide as those generated by multiple imputation, on average, for PRO-CTCAE items assessing frequency, severity, or interference.

CTCAE grades as auxiliary variables

Table 5 provides CTCAE grades for A091105 and COMET-2 at week 12. Notably, the proportion of nonzero CTCAE grades in A091105 was very low for several clinician-reported adverse events. In A091105, correlations between the PRO-CTCAE items and corresponding CTCAE grades at week 12 varied widely (range = 0.014 to 0.627; Table 1). The strongest correlation occurred between patient-reported severity of mouth or throat sores and clinician-reported mucositis oral (r = 0.627), though this correlation was inflated due to observing very few nonzero CTCAE grades for mucositis oral at week 12 (i.e., 3/64, 4.7%; Table 5). Other strong correlations occurred between patient-reported severity of hand-foot syndrome and clinician-reported palmar-plantar erythrodysesthesia syndrome (r = 0.487), patient-reported frequency of nausea and clinician-reported nausea (r = 0.447), patient-reported severity of vomiting and clinician-reported vomiting (r = 0.441), and patient-reported interference of fatigue and clinician-reported fatigue (r = 0.436). The weakest correlations occurred between patient-reported interference of pain and clinician-reported arthralgia (r = 0.014), patient-reported severity of pain and clinician-reported myalgia (r = 0.059), and patient-reported frequency of diarrhea and clinician-reported diarrhea (r = 0.096).

Table 5 CTCAE grades at week 12

Relative to A091105, COMET-2 had a much higher proportion of nonzero CTCAE grades due to targeting a more advanced cancer patient population and administering a more toxic chemotherapy regimen (Table 5). In COMET-2, correlations between the PRO-CTCAE items and corresponding CTCAE grades at week 12 varied widely (range = − 0.102 to 0.445; Table 1). The strongest correlations occurred between patient-reported frequency and severity of vomiting and clinician-reported vomiting (r = 0.427 and 0.445, respectively). Clinician-reported insomnia, fatigue, decreased appetite, and peripheral neuropathy did not strongly correlate with their patient-reported counterparts (range = − 0.102 to 0.046; Table 1).

In both trials, between-arm comparisons on the PRO-CTCAE at week 12 were similar regardless of whether CTCAE grades were included in the imputation model (Tables 3 and 4). Including clinicians’ CTCAE grades did not consistently narrow the confidence intervals associated with these average treatment effects (Tables 3 and 4). In A091105, on average, multiple imputation with auxiliary variables yielded confidence intervals that were 98.0% as wide as those generated by multiple imputation without auxiliary variables. When included in the imputation model, the CTCAE grades with the weakest correlations with the PRO-CTCAE scores widened the confidence intervals (maximum = 106.7% as wide as the confidence interval generated by multiple imputation without auxiliary variables). Similarly, in COMET-2, on average, multiple imputation with auxiliary variables yielded confidence intervals that were 101.3% as wide as those generated by multiple imputation without auxiliary variables. These results are consistent with the correlation patterns observed among the PRO-CTCAE items and CTCAE grades in this sample. That is, correlations for the same PRO-CTCAE item between different cycles were generally stronger than correlations between each PRO-CTCAE item and its corresponding CTCAE grade at the same cycle. These results are also consistent with the data sparseness observed for several clinician-reported adverse events. That is, in A091105, most CTCAE grades equaled 0 or 1 at week 12 (Table 5).

Conclusion

Properly handling missing PRO-CTCAE scores supports more accurate and generalizable causal inferences regarding treatment tolerability. In this paper, we conducted between-arm comparisons while applying the following strategies for addressing missing PRO-CTCAE scores in A091105 and COMET-2: complete-case two-sample t-test, mixed modeling with contrast, and multiple imputation followed by a two-sample t-test. In both trials, mixed modeling and multiple imputation provided the most similar estimates of the average treatment effect. These results are unsurprising because, unlike a complete-case two-sample t-test, mixed modeling and multiple imputation provide unbiased parameter estimates under a missing at random mechanism and do not exclude patients with missing scores—highly desirable features per the SISAQOL Consortium [7].

We also performed multiple imputation with and without CTCAE grades as auxiliary variables to assess the added benefit of including CTCAE grades in the imputation model relative to only including PRO-CTCAE scores across all cycles. Our results suggest that CTCAE grades can inform missing PRO-CTCAE scores for any adverse events that show strong agreement between clinician and patient reports, though model simplicity and computational ease may warrant use of other strategies. In A091105 and COMET-2, inclusion of CTCAE grades in the imputation model was not worthwhile because the strongest correlations occurred among the same PRO-CTCAE item at different cycles. These results make sense because the information provided by patients often differs from the information provided by clinicians. Thus, we recommend using patients’ PRO-CTCAE scores for the same symptom at different cycles to inform patients’ missing PRO-CTCAE scores. This can be accomplished via mixed modeling or multiple imputation. Although multiple imputation is widely available in statistical software packages, multiple imputation is much more demanding procedurally than mixed modeling. The user must examine convergence diagnostics; ensure the imputation model includes all the analysis variables; create, manage, and analyze multiple imputed datasets; and pool the results. The extent to which statistical software packages automate this procedure varies. Conducting mixed modeling for PRO-CTCAE items assessing frequency, severity, or interference is consistent with recommendations outlined by the SISAQOL Consortium [7]. However, multiple imputation may outperform mixed modeling for PRO-CTCAE items assessing presence as mixed modeling can result in convergence issues with binary endpoints.

Limitations of this work include our focus on two phase III trials with modest sample sizes. Examining other phase II and phase III trials as well as other patient populations may improve the generalizability of these results. However, the A091105 and COMET-2 sample sizes are not atypical for plausible applications of these strategies for addressing missing PRO-CTCAE scores. Another limitation is our focus on the similarity of parameter estimates and differences in statistical efficiency across the complete-case two-sample t-test, mixed modeling with contrast, and multiple imputation followed by a two-sample t-test. Because we did not simulate the data, we cannot calculate bias of the parameter estimates. However, our results can serve as the basis for a future simulation study evaluating strategies for addressing missing PRO-CTCAE scores.

In summary, using patients’ PRO-CTCAE scores for the same symptom at different cycles to inform patients’ missing PRO-CTCAE scores can mitigate problems associated with missing scores. Accurately evaluating patients’ PRO-CTCAE scores promotes the safety and tolerability of treatments as well as improves the implementation and interpretation of cancer clinical trials.