Comparative efficacy of cabozantinib and ramucirumab after sorafenib for patients with hepatocellular carcinoma and alpha fetoprotein ≥ 400 ng/mL: a matching-adjusted indirect comparison (MP4 364666 kb)

FormalPara Key Summary Points

Why carry out this study?

Cabozantinib (a once-daily oral tyrosine kinase inhibitor with targets including MET, AXL and vascular endothelial growth factor [VEGF] receptors) and ramucirumab (a monoclonal antibody specific for VEGF receptor-2, administered every 2 weeks by intravenous infusion) are approved for the second-line (2L) treatment of hepatocellular carcinoma (HCC) after sorafenib; however, ramucirumab is restricted to use in patients with serum alpha-fetoprotein (AFP) levels of 400 ng/mL or above.

No clinical trials have compared cabozantinib and ramucirumab directly in patients with HCC and elevated serum AFP. This analysis compared 2L cabozantinib and ramucirumab in patients with HCC and serum AFP of 400 ng/mL or above using a matching-adjusted indirect comparison (MAIC) approach.

What was learned from this study?

In 2L HCC populations matched for prior therapy and clinically relevant baseline characteristics, cabozantinib significantly prolonged median (95% CI) progression-free survival compared with ramucirumab (5.5 [4.6–7.4] months vs. 2.8 [2.7–4.1] months; p = 0.016); overall survival (median [95% CI]) was not significantly different for cabozantinib (10.6 [9.5–17.3] months) and ramucirumab (8.7 [7.3–10.8] months) (p = 0.104).

Discontinuation rates resulting from treatment-related adverse events were not significantly different for the matching-adjusted cabozantinib population and the ramucirumab population (log odds ratio [95% CI] for cabozantinib vs. ramucirumab, 1.16 [−0.89, 3.20]; p = 0.271).

In the absence of direct comparative trial data, this MAIC analysis may help to inform individualized clinical decision-making with respect to 2L treatment for patients with HCC and elevated serum AFP who have received prior sorafenib.

Digital Features

This article is published with digital features, including a summary slide, graphical plain language summary, infographic, author video, and animation video to facilitate understanding of the article. To view digital features for this article, go to https://doi.org/10.6084/m9.figshare.14141435.

Introduction

As many as one-third of patients treated for advanced hepatocellular carcinoma (aHCC) are eligible for second-line (2L) treatment [1]. The recent approval of a number of 2L vascular endothelial growth factor (VEGF)-targeted therapies with proven benefit over placebo holds the potential to preserve and to prolong the life of a substantial number of patients with HCC [2,3,4,5,6,7]. Yet, there is limited comparative evidence to inform the selection of VEGF-targeting agents and to guide optimum treatment sequencing for patients with HCC.

The range of VEGF-targeted therapies approved for 2L use after prior sorafenib treatment in patients with HCC includes the tyrosine kinase inhibitors (TKIs) cabozantinib and regorafenib and the monoclonal antibody (mAb) ramucirumab, which is limited to use in patients with serum alpha-fetoprotein (AFP) of 400 ng/mL or above [2,3,4,5,6,7]. All three agents target VEGF-related angiogenesis and tumor neovascularization, but there are clinically relevant differences in their mechanisms of action, modes of administration and indications. Cabozantinib and regorafenib are both oral TKIs (administered daily) with activity against multiple receptor kinases [2,3,4,5]. Cabozantinib has activity against a broader range of receptor kinases than regorafenib, including MET (hepatocyte growth factor receptor protein), AXL (GAS6 receptor), RET, ROS1, TYRO3, MER, KIT (stem-cell-factor receptor), TRKB, Fms-like tyrosine kinase-3 and TIE-2 as well as VEGF receptors [2,3,4,5]. It is noteworthy that overexpression of MET and AXL has a negative impact on HCC prognosis and that elevated MET expression is associated with sorafenib resistance [8]. Based on the results of the phase 3 CELESTIAL trial (NCT01908426), which involved patients with HCC receiving 2L or third-line (3L) treatment, cabozantinib is approved in the USA and Europe for use in patients with HCC who have been previously treated with sorafenib [2, 3, 9].

Ramucirumab, in contrast, is a recombinant monoclonal human immunoglobulin G1 antibody that is specific for VEGF receptor-2 and is administered as an intravenous (IV) infusion once every 2 weeks [6, 7]. The agent is approved as a 2L treatment for patients with HCC after prior sorafenib treatment based on the results of the phase 3 REACH-2 trial (NCT02435433), but only in the subgroup of patients with serum AFP levels of 400 ng/mL or above [6, 7, 10]. Approximately 50% of HCC tumors secrete AFP, and a plasma AFP level > 400 ng/mL is generally considered to support a diagnosis of HCC; elevated levels are associated with poor prognosis [11, 12].

International management guidelines for HCC broadly align with the approved indications for these VEGF-targeting agents, endorsing their use in the 2L setting but with agent-specific caveats reflecting their respective phase 3 trial designs and resultant labels [13,14,15]. In the USA, guidance from the National Comprehensive Cancer Network also now features checkpoint inhibitor regimens, approved by the US Food and Drug Administration (pembrolizumab or nivolumab with/without ipilimumab) as additional 2L options following prior sorafenib treatment [13]. No HCC randomized controlled trials (RCTs) have compared the available 2L treatment options directly, resulting in limited evidence to inform optimum treatment sequencing in patients with progressive disease.

A recent matching-adjusted indirect comparison (MAIC), using data from the CELESTIAL trial (cabozantinib; NCT01908426) and the RESORCE trial (regorafenib; NCT01774344) provided indirect comparative data for the efficacy and safety of cabozantinib and regorafenib [16]. The MAIC method is a recognized means of addressing comparative evidence gaps that often exist around new treatments and licensed alternatives [17, 18]. Initially used to inform critical reimbursement decisions, the MAIC method of treatment comparison is increasingly being used to guide clinical decision-making [19, 20]. No MAIC data are currently available for cabozantinib and ramucirumab.

An exploratory analysis of CELESTIAL showed that cabozantinib improved overall survival (OS) and progression-free survival (PFS) compared with placebo across a range of serum AFP levels in a mixed 2L and 3L population [21]. Outcomes were stratified by baseline serum AFP level bifurcated at 400 ng/mL. For OS with cabozantinib, the hazard ratio (HR) (95% confidence interval [CI]) was 0.81 (0.62–1.04) for patients with baseline serum AFP < 400 ng/mL and 0.71 (0.54–0.94) for patients with levels of 400 ng/mL or above. HRs for PFS were consistent with those for OS [21].

Building on these data, here we used the MAIC method to compare the efficacy and safety of cabozantinib and ramucirumab using data from the CELESTIAL and REACH-2 pivotal trials in populations matched in terms of their prior therapy and baseline characteristics [9, 10].

Methods

Matching-adjusted indirect comparison methods used to evaluate the comparative efficacy of cabozantinib and ramucirumab after sorafenib for patients with hepatocellular carcinoma and alpha fetoprotein ≥ 400 ng/mL (MP4 63710 kb)

Data Source and Analysis Approach

The phase 3 CELESTIAL and REACH-2 trial populations were selected for this comparative study based on the similarity of their trial designs and evaluated outcomes (Table S1) [9, 10]. Individual patient data (IPD) were available from the CELESTIAL trial, and the published aggregate population-level statistics were available from REACH-2.

To align with the (post-sorafenib) 2L population of patients with serum AFP levels of 400 ng/mL or above who were eligible for REACH-2, the subgroup of patients from CELESTIAL with baseline serum AFP levels of 400 ng/mL or above and who had received prior sorafenib only was identified. The baseline characteristics of this CELESTIAL subpopulation (N = 178) were compared with those of the REACH-2 population (N = 292) to assess the feasibility of a meaningful standard indirect treatment comparison (ITC) (Table S2). The presence of residual differences in potentially effect-modifying baseline characteristics (e.g., in race, region, duration of prior sorafenib treatment and HCC etiology) indicated that a standard ITC would be subject to potential confounding. Instead, a MAIC method was selected to reduce the risk of bias that would be associated with a standard ITC [17, 18, 20].

The MAIC method requires IPD to be available for one trial (here CELESTIAL) so that each patient’s contribution to the analysis can be weighted based on their similarity to the comparator trial population (here, REACH-2) at baseline. The aim of this patient-level weighting/adjusting was to match the clinically relevant baseline characteristics of the CELESTIAL population to those of the REACH-2 population, thereby minimizing potential effect modifying differences and sources of outcome bias.

Population Matching

Primary Analysis

A panel of clinical experts met (April 17, 2020) to review the baseline characteristics of the aligned CELESTIAL subpopulation and the REACH-2 population. They identified 11 potential effect modifiers of OS between the populations that were deemed clinically relevant for matching: age, sex, duration of prior sorafenib treatment, baseline HCC (extrahepatic spread, macrovascular invasion), etiology of HCC (hepatitis B, hepatitis C, non-viral), AFP level (median of log10(AFP)), Barcelona Clinic Liver Cancer stage B and albumin-bilirubin (ALBI) grade 1.

Sensitivity Analysis

As a sensitivity analysis, effect modifiers of OS were also identified empirically using statistical modeling. This was done by adding or eliminating potential effect-modifying baseline characteristics to/from a regression model using a stepwise approach (forwards, backwards, both directions). In all model directions, the following seven variables were identified as having the lowest Akaike information criterion (AIC) values and therefore as being the most predictive of treatment effect [22, 23] on OS: age, sex, etiology of alcohol use, macrovascular invasion, duration of prior sorafenib treatment, ALBI grade and AFP value.

Validation Analysis

Despite population alignment by subgroup selection, median (interquartile range) baseline AFP level remained substantially higher for the subgroup of 2L CELESTIAL patients with AFP of 400 ng/mL or higher than for the REACH-2 population (8813 [1648–30,751] ng/mL vs. 3394 [1177–16,812] ng/mL, respectively). To aid matching, AFP measurements were therefore log transformed and median log10(AFP) used as the matching criterion in the primary analysis. Minor residual differences in median log10(AFP) persisted, in part because of the method used to calculate weights (centering moments based on mean rather than median population values) and because baseline AFP values for the eligible CELESTIAL subpopulation did not follow a Normal distribution. To assess the implications of this residual difference, a validation analysis was conducted that repeated the primary analysis but with AFP excluded as a matching criterion. By removing AFP as a matching criterion, it was possible to investigate the extent to which differences in baseline serum AFP level for the comparator populations may have influenced the outcome analysis. If any outcome differences seen in the primary analysis were driven by minor residual differences in baseline AFP level, the impact of much larger baseline AFP differences would be clearly evident in the results of the validation analysis.

Survival Outcomes

Survival outcomes (OS [primary outcome from the trials] and PFS) were evaluated for the matching-adjusted CELESTIAL population and compared with those for the REACH-2 population. Median OS and PFS estimates for the CELESTIAL arms were derived from weighted Kaplan-Meier (KM) curves fitted to the survival data and compared with those fitted to simulated IPD for the REACH-2 population. CIs were generated for the weighted KM analysis using Woodruff’s method [24].

Active Treatment Arms

The validity of comparing the survival outcomes for cabozantinib and ramucirumab using an anchored analysis was assessed by testing the proportional hazards (PH) assumption for the OS and PFS outcomes. An anchored analysis derives relative survival estimates for the active treatment arms anchored via a common comparator (here, placebo) and generates a HR for relative effect comparison. For an anchored analysis to be meaningful, however, the PH assumption must hold true (i.e., the ratio of the hazards for any two individuals must be constant over time). In the present analysis, three methods were used to test the PH assumption: (1) visual inspection of the log-cumulative hazard plots for cabozantinib and ramucirumab (a pattern of non-parallelism indicates PH violation); (2) visual inspection of the scaled Schoenfeld residuals over time (systematic departure from the horizontal indicates PH violation); (3) the Grambsch-Therneau statistical test of the scaled Schoenfeld residuals (the test evaluates whether there is a non-zero slope in a generalized linear regression of the scaled residuals as functions of time; non-zero values indicate PH violation) [20, 25, 26].

If the PH assumption does not hold true, best-practice guidance for MAIC analyses advises use of an unanchored analysis and generation of absolute outcomes for each treatment arm (not relative to a common [placebo] comparator) [20]. This approach involves fitting individual parametric survival curves to each treatment arm with best-fit models identified by AIC/Bayesian information criterion (BIC) analysis. Potential models are ranked (based on AIC and BIC) for each treatment arm. The sum of the model rankings for the two treatment arms is used to identify the best-fit model with the superior model for the analysis indicated by the lowest sum rank. Where more than one model is potentially eligible, the sum of the AIC/BIC values is also taken into consideration (the superior model having the lowest sum value) and the choice of model is validated by visual assessment of its fit to weighted KM and log-cumulative hazard curves [20].

Placebo Comparison

To assess the effectiveness of the MAIC weighting method for matching the baseline characteristics of the trial populations, negative outcome controls (i.e., outcomes for the weighted-CELESTIAL and REACH-2 placebo arms) were evaluated. Similarity of outcomes for the matching-adjusted and comparator placebo arms indicates that any differences between the original trial populations at baseline have been reduced, so the potential for bias in the active treatment arm comparison has also been reduced, thus validating the MAIC approach.

For the matching-adjusted CELESTIAL and REACH-2 placebo arms, median (95% CI) estimates were computed for OS and PFS and HR (95% CI) estimates using weighted Cox PH regression models.

Safety Outcomes

The primary publication from the CELESTIAL trial reported all-cause AEs [9], but treatment-related AEs were published for REACH-2 [10]. Using the CELESTIAL IPD, the incidences of treatment-related AEs (TRAEs) of interest (any grade, and grade 3 or 4) were identified for the matching-adjusted cabozantinib population and compared with those for the ramucirumab population. Events of interest were TRAEs reported in both CELESTIAL and REACH-2 that affected at least 5% of patients in any of the trial arms. Rates of treatment discontinuation due to TRAEs were also compared for the matching-adjusted cabozantinib population and the ramucirumab population.

In an anchored analysis, TRAE estimates (log odds ratios [ORs]) were generated for the matching-adjusted CELESTIAL population and compared with log ORs for ramucirumab versus placebo, computed from the published REACH-2 data. Anchored estimates were not feasible for AEs that did not occur in one of the trial arms. In such instances, unanchored estimates of the log ORs were calculated, with the number of TRAEs occurring in CELESTIAL being used to compute a weighted estimate [22]. This method multiplied the weighted mean of the TRAE occurrences by the number of patients in the corresponding trial arm. The delta method was then used to compute variances, and 95% CIs and p values were estimated for the log OR of each TRAE.

Safety outcomes were not evaluated for the matching-adjusted CELESTIAL placebo population and the REACH-2 placebo population because of their low rates of AEs.

Analyses

The analyses were performed using R version 4.0.2 (R Core Team, 2020). The package ‘survey’ version 4.0 was used to fit weighted survival models with weights computed from the MAIC that were used as sampling weights.

Compliance with Ethics Guidelines

The results presented in this article are based on published studies. All procedures performed in those studies involving human participants were conducted in accordance with the ethical standards of the local Institutional Review Boards for each site and with the 1964 Declaration of Helsinki and its later amendments or comparable ethical standards. Informed consent was obtained from all individual participants included in the CELESTIAL and REACH-2 trials.

Results

Patient Characteristics

In total, 292 REACH-2 patients were included in the analysis, of whom 197 received ramucirumab and 95 received placebo. Of the 707 patients randomized in CELESTIAL, 202 had received prior sorafenib treatment only and had a serum AFP level of 400 ng/mL or above. Of these patients, 178 had complete baseline data and were eligible for matching, of whom 114 received cabozantinib and 64 received placebo. Following baseline weighting and adjustments, the effective sample sizes (ESSs) from CELESTIAL were 63 patients allocated to cabozantinib and 44 to placebo (Table 1).

Table 1 MAIC population sizes

Application of MAIC weighting to the baseline IPD from CELESTIAL was effective in aligning the potentially effect-modifying baseline characteristics of the 2L CELESTIAL population with serum AFP of 400 ng/mL or above with those of the REACH-2 population (Table 2; Fig. S1).

Table 2 Baseline matching characteristics used in the primary MAIC analysis, before and after matchinga

Survival Outcomes

KM-Derived Estimates

There was no significant difference in the OS estimates (median [95% CI]) derived from the weighted KM analysis for the matching-adjusted cabozantinib population (10.6 [9.5, 17.3] months) and for the ramucirumab population (8.7 [7.3, 10.8] months) (p = 0.104, log-rank test) (Table 3; Fig. 1a). Median (95% CI) PFS estimates, however, were significantly longer for the matching-adjusted cabozantinib population than for the ramucirumab population: 5.5 (4.6, 7.4) months versus 2.8 (2.7, 4.1) months (p = 0.016, log-rank test) (Table 3; Fig. 1b).

Table 3 Efficacy estimates for the matching-adjusted cabozantinib and the ramucirumab populations
Fig. 1
figure 1

Kaplan-Meier curves for a OS and b PFS of the matching-adjusted CELESTIAL population and the REACH-2 populations. a OS, b PFS. 2L second line, CI confidence interval, OS overall survival, PFS progression-free survival

Parametric Modeling Estimates

Tests of the PH assumption indicated that an anchored analysis (including hazard ratio estimates) was not supported and that an unanchored parametric modeling analysis would provide more meaningful survival estimates. This assessment was based on visual inspection of the log of cumulative hazard versus time plots, which displayed patterns of non-parallelism for both OS and (more distinctly) PFS, suggesting a violation of the PH assumption. This assessment was confirmed by plots of scaled Schoenfeld residuals versus time for both outcomes, which showed a systematic departure from the horizontal for both outcomes, as did the non-zero slopes (at the 5% significance level) given by the Grambsch–Therneau test (Fig. S2 and 3). Accordingly, and in agreement with recommended practice [20], an unanchored analysis was conducted by fitting individual parametric survival curves to each treatment arm.

The results of the parametric modeling analysis mirrored those of the weighted KM analysis (Table 3). For OS, the Weibull model had the lowest sum rank and AIC/BIC values and, following validation of model choice by visual inspection, was selected as the best-fit model (Table S3). Estimated median (95% CI) OS was 12.0 (9.6, 14.5) months for the matching-adjusted cabozantinib population versus 9.6 (8.4, 10.8) months for the ramucirumab population (Table 3). For PFS, the log-logistic model was selected as the best-fit model, generating median (95% CI) PFS estimates of 5.2 (4.1, 6.4) months for the matching-adjusted cabozantinib population versus 3.2 (2.8, 3.6) months for the ramucirumab population (Tables 3 and S4).

Negative Control Estimates

There were no significant differences in OS or PFS estimates for the matching-adjusted CELESTIAL and the REACH-2 placebo groups. Estimated median (95%) OS was 5.3 (4.8–8.2) months for the matched-adjusted CELESTIAL placebo population, and 7.4 (5.8–9.4) months for the REACH-2 placebo population. For PFS, median (95% CI) estimates were 1.9 (1.8–2.1) months for the matching-adjusted CELESTIAL placebo population and 1.6 (1.4–2.6) months for the REACH-2 placebo population (Table S5). Similarly, the results of the parametric modeling analysis showed no significant difference in either OS or PFS survival estimates for the matching-adjusted CELESTIAL placebo population and the REACH-2 placebo population (Table S5; Fig. S4).

Safety Outcomes

TRAEs of interest (reported in both trials and occurring in at least 5% of patients in any trial arm) were: increased aspartate aminotransferase (AST; any grade and grade 3 or 4); diarrhea (any grade); fatigue (any grade and grade 3 or 4); decreased appetite (any grade); vomiting (any grade); hypertension (any grade and grade 3 or 4); nausea (any grade) and proteinuria (any grade).

Anchored log OR TRAE estimates found significantly lower rates of any grade diarrhea and hypertension and grade 3 or 4 hypertension in the ramucirumab population than in the matching-adjusted cabozantinib population. Unanchored log OR estimates also found significantly lower rates of grade 3 or 4 fatigue and increased AST with ramucirumab than with cabozantinib. Frequency of any grade proteinuria (unanchored) was lower in the matching-adjusted cabozantinib population than in the ramucirumab population (Table 4; Fig. S5).

Table 4 Log OR (95% CI) and p values for TRAEs reported in at least 5% of patients in any arm of CELESTIAL or REACH-2 (cabozantinib vs. ramucirumab)

Rates of treatment discontinuation due to TRAEs were not significantly different for the matching-adjusted cabozantinib population and the ramucirumab population (Table 4).

Sensitivity and Validation Analyses

Weighting the baseline IPD of patients in the 2L CELESTIAL population with serum AFP levels of 400 ng/mL or above based on seven empirically-identified matching criteria was effective in aligning the potentially effect-modifying baseline characteristics with those of the REACH-2 population (Table S6). Following matching and adjustments for these seven characteristics, the ESSs from CELESTIAL were 73 patients for cabozantinib and 46 for placebo. As was used for the primary analysis, the REACH-2 population included 197 patients randomly allocated to ramucirumab and 95 randomly allocated to placebo (Table 1).

Median (95% CI) survival estimates for the sensitivity analysis mirrored those of the primary analysis, for both the weighted KM analysis (Table 3; Fig. S6) and (following confirmation of PH assumption violation [Tables S3, S4, and S7; Figs. S7 and S8]) for the parametric modeling analysis (Tables S7). Results of the safety sensitivity analysis were also consistent with those of the primary analysis (Table S8; Fig. S5).

The validation analysis, which repeated the primary analysis but with baseline AFP level excluded from the 11 matching criteria, further supported the findings of the primary analysis. Excluding the baseline AFP criterion from the weighting and adjustment steps resulted in slightly larger ESSs from CELESTIAL than were available for the primary analysis (80 patients allocated to cabozantinib and 49 to placebo, Table 1). At baseline, the 2L CELESTIAL population with serum AFP levels of 400 ng/mL and the REACH-2 population were closely matched (Table S9). Compared with the primary analysis, exclusion of the baseline AFP matching criterion had little impact on the comparative PFS and safety outcomes, but did reduce the magnitude of difference in median OS estimates for the matching-adjusted CELESTIAL population and the REACH-2 populations, although the difference was already non-significant (Tables 3, S3, S4, and S10; Fig. S9, S10 and S11).

Discussion

Main Findings

In the absence of head-to-head trials directly comparing cabozantinib and ramucirumab, we conducted a MAIC using data from the pivotal phase 3 CELESTIAL and REACH-2 trials to compare these two VEGF-targeting agents in HCC populations matched in terms of prior therapy and key baseline characteristics. The MAIC method is a recognized means of evaluating comparative outcomes from trials with similar endpoints, but heterogenous populations [19].

In the primary analysis, the baseline data of individual CELESTIAL patients were weighted so that the overall characteristics of the 2L CELESTIAL population with elevated AFP matched those of the REACH-2 population. The assigned weights were intended to minimize residual differences in 11 potentially effect-modifying baseline characteristics selected by a panel of clinical experts. Weighted KM analysis indicated that daily oral administration of cabozantinib was associated with significantly longer PFS than IV administration of ramucirumab every 2 weeks. OS was not significantly different for cabozantinib and ramucirumab. The parametric modeling analysis (undertaken in line with recommendations for MAIC analyses published by the National Institute for Health and Care Excellence’s [NICE] Decision Support Unit [20]) mirrored the results of the weighted KM analysis: significantly prolonged PFS, but no significant difference in OS. A sensitivity analysis (using seven effect modifiers selected using stepwise AIC regression) and a validation analysis (examining the impact of removing baseline AFP from the primary analysis’ matching criteria) also reinforced these findings, further strengthening confidence in the results.

In terms of TRAEs, the odds of experiencing any grade diarrhea (anchored) or hypertension (anchored), or grade 3 or 4 hypertension (anchored), fatigue (unanchored), or increased AST (unanchored) were significantly lower for ramucirumab than for cabozantinib; however, rates of treatment-related proteinuria (unanchored) were significantly lower for cabozantinib than for ramucirumab, and treatment discontinuation rates due to TRAEs were not significantly different between populations.

When considering the results of the analysis, it should be noted that it was only possible to generate comparative rate estimates for TRAEs that occurred in both trials. The most common adverse event (AE) reported for patients treated with cabozantinib in the CELESTIAL trial, for example, was palmar-plantar erythrodysesthesia (PPE; any grade/grade 3 or 4 in 46%/17% of patients), but this was not reported in REACH-2 and so was not evaluable in the current analysis. Similarly, in REACH-2, treatment infusion-related reactions and bleeding or hemorrhage events occurred in 7 and 10% of patients treated with ramucirumab, respectively, but neither were relevant/reported in CELESTIAL, so they did not feature in the present analysis. The difference in treatment-related TRAEs reported in the CELESTIAL and REACH-2 trials reflects the differing mechanisms of action of the two drugs, with cabozantinib being a TKI with activity against a range of receptor kinases, including VEGF receptors-1, -2 and -3, AXL and MET, and ramucirumab being a mAb that binds specifically to VEGF receptor-2 [2, 3, 6, 7]. Furthermore, when considering the respective efficacy-safety profiles of the two therapies, it is also relevant to note that the occurrence of some TRAEs may be positively correlated with treatment outcomes, as reported for the subgroup of cabozantinib patients from CELESTIAL who experienced any grade PPE or grade 3 or higher hypertension [27].

Strengths of the Approach

Although network meta-analyses (NMAs) or traditional ITCs are perhaps more familiar methods of generating comparative data than MAIC analyses, they are not able to produce meaningful comparisons in all scenarios. The reliability of such methods is compromised, for example, when a common trial comparator is not available or when there is distinct heterogeneity between comparator studies [28]. A review of nearly 181 technology appraisals conducted by NICE found that more than half (54%) of all related assessments did not include a mixed/indirect treatment comparison; trial design heterogeneity was cited as the most common reason for their absence [29]. MAIC analyses can overcome some of the challenges of between-trial population heterogeneity. While NMAs rely on published data, MAIC analyses require patient-level data to be available for at least one of the studies in order to permit adjustments for between-trial population differences and to minimize the potential for outcome bias [28]. Increasing recognition of the MAIC method is reflected in the growing number of related publications apparent within the peer-reviewed medical and cancer-related literature over the past 10 years (Fig. S12).

In the current analysis, weighting of the baseline IPD of the 2L CELESTIAL subgroup with elevated AFP was broadly effective in balancing the distribution of potentially effect-modifying differences between the two trial populations. Although a difference in baseline serum AFP levels persisted between the matching-adjusted 2L CELESTIAL population with serum AFP of 400 ng/mL or higher and the REACH-2 population despite the matching and weighting procedures, the validation analysis provides confidence that this residual difference had minimal impact on the evaluated outcomes. In addition, the similarity of the estimates for the matching-adjusted CELESTIAL placebo arm and REACH-2 placebo arm suggests that the weighting process was successful in reducing clinically relevant baseline differences between the populations and, accordingly, in reducing the potential for associated confounding in the active treatment arm analysis.

It is also relevant and appropriate that the present analysis used published data from the pivotal REACH-2 trial rather than from the earlier REACH trial [10, 30]. In REACH, patients with HCC were randomly allocated to 2L ramucirumab or placebo after 1L sorafenib treatment; patients with a baseline serum AFP of 400 ng/mL or above constituted a pre-specified subpopulation (n = 250). Overall, the REACH trial was negative: ramucirumab did not significantly improve OS compared with placebo, but the subgroup analyses suggested a possible OS benefit with ramucirumab over placebo in patients with baseline serum AFP of 400 ng/mL or above (vs. < 400 ng/mL) [30]. This signal from the REACH trial was explicitly investigated and validated by the subsequent pivotal REACH-2 trial, which led to the regulatory approval of ramucirumab for patients with HCC and serum AFP of 400 ng/mL following prior sorafenib treatment [4, 5, 10].

Limitations of the Approach

Despite their strengths, MAIC analyses cannot offer the quality of evidence generated by a head-to-head RCT and cannot adjust for all potential differences in trial populations and designs. The tumor assessment schedules, for instance, were different for the CELESTIAL trial (every 8 weeks) and the REACH-2 trial (every 6 weeks during the first 6 months; every 9 weeks thereafter). The PFS estimates in the current analysis may, therefore, be subject to resulting bias, although the direction of any such bias is primarily discernable at the individual patient level and depends on the relative timing of tumor growth and scheduled assessment. For instance, any tumor growth that occurred at week 5 would have been detected at week 6 in REACH-2 but not until week 8 in CELESTIAL, favoring cabozantinib. Yet, tumor growth occurring just after the first assessment in REACH-2 (e.g., during week 7) would have been assessed at week 8 in CELESTIAL but not until week 12 in REACH-2, favoring ramucirumab. The overall impact of the difference in assessment schedules remains unclear, but may favor cabozantinib because of the lower frequency of tumor assessments conducted in CELESTIAL (versus REACH-2) in the initial 6 months of follow-up.

For OS, the median estimates may have been influenced by subsequent treatment use, something that is particularly relevant when comparing therapies within a rapidly evolving treatment landscape, like that of HCC. It is plausible that temporal changes in the availability of therapies may have favored the ramucirumab population over the cabozantinib population because of the later start date of the REACH-2 trial (vs. the CELESTIAL trial).

Although it is possible that residual differences persisted between the trial populations despite the matching and weighting procedures, the consistent results of the primary, sensitivity and validation analyses provides reassurance that factors predictive of treatment effect were generally well balanced. In terms of prognostic factors, in an anchored analysis, any variables that are prognostic of disease course do not inhibit interpretation of the results because randomization should ensure equal distribution of any such factors between the active and placebo arms of each trial; between-trial comparison of results relative to the common placebo arm are therefore unaffected. Unanchored analyses that use parametric modeling (as in the present analysis) are open to greater potential bias because they compare absolute rather than relative estimates. Nevertheless, the similarity of the placebo arm analysis in the present study provides reassurance that the matching and adjustment steps successfully minimized clinically relevant differences between the comparator populations.

It is also relevant to note that MAIC analyses can only compare similar outcomes reported in both trials, prohibiting the evaluation of some TRAEs of potential relevance or the comparison of quality-of-life outcomes, which were evaluated differently for the two trials and so was beyond the scope of the current analysis [31, 32]. Finally, selection of the 2L CELESTIAL subpopulation with serum AFP of 400 ng/mL or above, and the subsequent weighting and adjustment procedures, unavoidably reduced the number of patients eligible for inclusion in the analysis. These steps were necessary to compare ‘like with like,’ but reduced the size of the original CELESTIAL trial population (powered to demonstrate treatment effect versus placebo) from 707 individuals to an ESS of 199 individuals (73 in the cabozantinib arm), limiting the overall statistical power of the analysis.

Interpretation

This MAIC analysis provides further insights into the therapeutic options available for patients with HCC who have progressed despite prior sorafenib therapy, particularly those with elevated serum AFP and potentially poor prognosis [12].

The results reinforce and build on those of a recent ITC that reported clinical equivalence of cabozantinib, ramucirumab and regorafenib with respect to OS in patients with serum AFP levels of 400 ng/mL or above [33] and of an NMA of available phase 3 trials of 2L agents for advanced HCC [34]. The NMA subgroup analysis of patients with AFP > 400 ng/mL suggested that cabozantinib may significantly prolong PFS compared with ramucirumab (HR [95% CI], 0.59 [0.40, 0.88]), but it found no significant differences in OS estimates for cabozantinib, ramucirumab or regorafenib for patients with AFP > 400 ng/mL. AEs were not explicitly reported in the NMA for AFP subgroups, but any grade diarrhea, fatigue, nausea and decreased appetite were reported in at least 10% of patients receiving cabozantinib or ramucirumab in the included trials. Diarrhea was the most common AE reported for cabozantinib (54% of patients compared with 18% for ramucirumab), and peripheral edema was the most common for ramucirumab (36% of patients compared with 0% for cabozantinib). Overall, the authors considered the AEs in patients treated with ramucirumab to be relatively mild, with 36% of patients experiencing grade 3 or 4 AE rates compared with 68% of those treated with cabozantinib [34].

A recent MAIC analysis of 2L cabozantinib and regorafenib in patients with HCC who had received sorafenib therapy has provided some indirect comparative data for the two VEGF-targeting TKIs [16]. Using data from the phase 3 CELESTIAL and RESORCE trials, the authors found no significant difference in OS estimates (median [95% CI]) for cabozantinib and regorafenib (11.4 [8.9–17.0] vs. 10.6 [9.1–12.1] months; p = 0.3474), but significantly prolonged PFS (median [95% CI]) with cabozantinib compared with regorafenib (5.6 [4.9–7.3] months vs. 3.1 [2.8–4.2] months; p = 0.0005) [16]. Perhaps unsurprisingly, given its focus on patients with elevated serum AFP levels (known to be associated with poor prognosis [12]), median survival estimates were numerically shorter for the current analysis than for the MAIC of cabozantinib and regorafenib, but absolute differences were similar: a median PFS benefit of 2.7 months with cabozantinib (vs. ramucirumab) in the present analysis compared with a 2.5-month benefit for cabozantinib versus regorafenib [16]. In terms of AEs, the MAIC of the two TKIs reported a trend for lower rates of treatment-emergent grade 3 or 4 hypertension with regorafenib than with cabozantinib, but only differences in rates of grade 3 or 4 diarrhea reached statistical significance. The authors noted that the observed differences in treatment-emergent AEs may, at least in part, reflect that the MAIC methods were not able to adjust for the fact that sorafenib-intolerant patients were excluded from the RESORCE trial, but not from CELESTIAL [16].

The present analysis builds on these data, addressing the evidence gap with respect to the scant availability of comparative data for cabozantinib and ramucirumab and utilizing the MAIC analysis method to minimize potential sources of confounding by matching and weighting the baseline data for individual CELESTIAL patients to those of the REACH-2 population. It offers insights into the efficacy and tolerability of cabozantinib in patients with serum AFP levels of 400 ng/mL or above and focuses the analysis on the pure 2L CELESTIAL population rather than on the mixed 2L and 3L population used in previous indirect comparisons.

The results suggest that cabozantinib may offer an efficacious and tolerable alternative to 2L ramucirumab in patients with serum AFP levels of 400 ng/mL or above. In addition to informing efficacy-safety considerations when treating patients with elevated serum AFP, the findings may also provide insights for clinicians seeking to optimize treatment options within the context of local healthcare resourcing, given the potential implications of administering daily oral therapy (cabozantinib) versus fortnightly IV therapy (ramucirumab). Similarly, the findings may be of interest to clinicians seeking to individualize care decisions for patients with clear attitudes and preferences for particular treatment regimens and methods of administration (e.g., high frequency [daily] oral versus lower-frequency [fortnightly] IV therapy).

Conclusion

In this MAIC analysis of 2L cabozantinib and ramucirumab after prior sorafenib therapy in patients with HCC and elevated serum AFP, the alignment and weighting processes effectively balanced the distribution of effect-modifying baseline characteristics between the matching-adjusted CELESTIAL population and REACH-2 population. The similarity of the placebo arm survival analysis reinforces the assertion that clinically relevant baseline characteristics were largely balanced for the comparator populations. OS estimates were not significantly different for the cabozantinib and ramucirumab treatment arms, but PFS was significantly longer in the cabozantinib group (by an additional 2.7 months), almost double that of the ramucirumab group. These findings were consistent for the primary analysis and the sensitivity analysis. Rates of some grade 3 or 4 TRAEs were lower with ramucirumab than with cabozantinib, likely reflecting the different mechanisms of action of the two drugs, but there was no significant difference in discontinuation rates resulting from TRAEs. These MAIC results do not replace those of a head-to-head RCT, but, in the absence of RCT evidence, they contribute indirect comparative efficacy-tolerability data to inform clinical decision-making around optimum 2L treatment for patients with elevated AFP who have received prior sorafenib.