FormalPara Key Summary Points

Why carry out this study?

Valoctocogene roxaparvovec provides endogenous factor VIII (FVIII) production for individuals with severe hemophilia A and was evaluated in the GENEr8-1 trial using intra-individual comparisons.

Here, we use propensity scoring to compare, post hoc, GENEr8-1 outcomes with an external control (individuals from the 270-902 study who did not enroll in the trial).

What was learned from the study?

Valoctocogene roxaparvovec was associated with lower bleed rates and higher proportion of patients without bleeds compared with FVIII prophylaxis.

The results were consistent across multiple sensitivity and scenario analyses and confirmed the original results reported from the GENEr8-1 trial.

This study suggests that the valoctocogene roxaparvovec-mediated benefit observed in the GENEr8-1 trial is not an artifact of the intra-individual comparisons of the study design or biased by observable differences in the study population.

Introduction

Hemophilia A (HA), a rare X-linked recessive bleeding disorder caused by a deficiency of coagulation factor VIII (FVIII) protein [1], affects approximately 17.1 per 100,000 male individuals worldwide [2]. Standard of care (SOC) for individuals with severe HA (FVIII < 1 IU/dL or < 1% of “normal”) is regular prophylaxis with exogenous FVIII or bispecific monoclonal antibodies that mimic the function of FVIII [1]. A multinational, prospective, non-interventional study (270-902) described bleeding outcomes in a global cohort of 294 participants with severe HA receiving FVIII prophylaxis over an extended period, including 6 months of retrospective data and at least 6 months of prospective follow-up. Participants reported all bleeding events, and impaired physical functioning was observed [3].

Valoctocogene roxaparvovec is a gene therapy that uses an adeno-associated virus serotype 5 (AAV5) vector to transfer a B-domain-deleted human FVIII-coding sequence under the regulatory control of a liver-selective promoter. The goal of this gene therapy is to establish endogenous production of FVIII protein from hepatocytes [4, 5]. Results are reported for two clinical trials that evaluated the safety and efficacy of valoctocogene roxaparvovec in adult men with severe HA and without FVIII inhibitors or AAV5 antibodies. A phase 1/2, single-arm, open-label, dose-escalation study (NCT02576795; 270-201) treated 15 participants with severe HA with a single intravenous (IV) infusion of valoctocogene roxaparvovec at varying dosage levels [5]. A phase 3, single-arm, open-label study (NCT03370913; GENEr8-1) treated 134 participants with severe HA with a single IV infusion of 6 × 1013 vector genomes of valoctocogene roxaparvovec per kilogram of body weight [4]. Valoctocogene roxaparvovec gene therapy yielded endogenous FVIII production and, relative to FVIII prophylaxis, reduced bleeding and FVIII use significantly [4].

The intra-individual design of the clinical trials, with rollover of participants from 270-902 to GENEr8-1, controls for confounding by utilizing a participant (and their associated disease history) as their own control. Although the intra-individual comparison is efficient in controlling for confounding, randomized controlled trials are considered the gold standard for evaluating the efficacy of alternative therapies and for avoiding issues of temporality. However, conducting randomized trials in rare diseases presents challenges: principally, that the small number of eligible participants impedes the conduct and interpretation of a randomized controlled trial [6]. However, owing to the design of the development program for valoctocogene roxaparvovec, there exists a unique opportunity to utilize the 270-902 study population as an external control and to then use propensity scoring to account for differences in observable participant characteristics. Accordingly, this study addresses an important limitation of the intra-individual comparison previously reported for GENEr8-1 by evaluating the potential for differences in observable demographic and clinical characteristics between the non-rollover and rollover populations to influence outcomes related to annualized bleeding rate (ABR) and the proportion of participants with bleeding events.

Methods

Study Design

For the analysis, a cohort was identified in the 270-902 population who completed at least 6 months of prospective follow-up with SOC FVIII prophylaxis and met inclusion criteria for GENEr8-1 but who also did not enroll in GENEr8-1. This cohort was used as an external comparator group for the rollover cohort in GENEr8-1 (n = 112). Propensity scoring was used to account for differences in observable baseline demographics and clinical characteristics between the cohorts, thereby minimizing bias for the comparison of bleeding outcomes between the cohorts. A summary of the study designs and workflow can be found in Supplementary Fig. S1.

The primary objective was to compare the control cohort (270-902) with the treated cohort (GENEr8-1) regarding mean treated and all bleed ABR and the proportion of participants with zero treated and all bleeding events. The treated and all bleed ABR were calculated on the basis of the number of bleeding events captured from the start of the efficacy evaluation period through 52 weeks post-valoctocogene roxaparvovec infusion and are annualized metrics for the number of bleeding events multiplied by 365.25. The percentage of participants with bleeding events was calculated over the same time horizon by taking the number of participants in the intervention cohort with zero bleeds or zero treated bleeds divided by the total number of participants in the intervention cohort. The primary analysis population of interest used for the comparative effectiveness analysis was the treated cohort of rollover participants from GENEr8-1 (n = 112). This cohort included participants who were enrolled in 270-902 for at least 6 months prior to enrolling in GENEr8-1. The control cohort included participants who, on the basis of their clinical characteristics, were eligible for enrollment in GENEr8-1 but elected not to enroll; none were missing data for key variables used in generating the propensity scores (n = 73). The primary analysis then used standardized mortality ratio weighting (SMRW) to re-weight the baseline characteristics of the control cohort in order to better match the treated cohort.

This manuscript provides a post hoc analysis of previously conducted studies and does not include the addition of new studies with human participants or animal subjects. As previously reported, the protocols for 270-902 and GENEr8-1 were approved by the institutional review boards or independent ethics committees of all participating sites, and all participants provided written informed consent [3, 4]. Both trials were performed in accordance with the ethical principles set forth by the Declaration of Helsinki.

Study Population and Data Sources

The study designs, characteristics, and inclusion and exclusion criteria were previously described for 270-902 [3] and GENEr8-1 [4], and Supplementary Table S1 provides an overview of each study. Briefly, 270-902 was a prospective, non-interventional, multicenter study of 294 participants, including men at least 18 years old with severe HA, receiving SOC FVIII prophylaxis for at least 6 months prior to enrollment, 225 of whom were prospectively followed for at least 6 months in the 270-902 study [3]. For the study period, ABR, FVIII utilization, and FVIII infusion rates were calculated. After at least 6 months of prospective follow-up in the study, participants could be screened for eligibility and entry into GENEr8-1 [3].

GENEr8-1 was a phase 3, single-arm, open-label study of 134 participants, 112 of whom rolled over from the prospective non-interventional 270-902 study [4]. All participants were men at least 18 years old, previously on FVIII prophylaxis, negative for FVIII inhibitors, and negative for AAV5 antibodies. FVIII prophylaxis was continued through 4 weeks post-infusion and thereafter could be used if required, per the study protocol. The intention-to-treat (ITT) population included the 134 participants who received valoctocogene roxaparvovec. The modified ITT (mITT) population included the 132 participants who were human immunodeficiency virus (HIV) negative [4].

All participants from the 270-902 and GENEr8-1 studies were considered for the present analysis, except for those participants from 270-902 who did not meet the GENEr8-1 inclusion criteria. This included participants who were HIV positive, had less than 6 months of prospective follow-up at the end of the study, and participants who were AAV5 antibody positive. Participants with AAV5 antibodies were excluded over concerns that their pre-existing immunity to AAV5 could interfere with the efficacy of the gene therapy [3]. No participants in either study received emicizumab.

Statistical Analysis

Propensity scores were calculated using a logistic regression model and, in this analysis, represent the probability of a participant being treated by valoctocogene roxaparvovec, given their baseline demographics and clinical characteristics. Treated and all bleed ABR were calculated from the efficacy evaluation period of week 5, or 3 days after the end of routine FVIII prophylaxis, whichever was later, to week 52 (249–336 days; referred to as the efficacy evaluation period hereafter) for the treated cohort and at maximum follow-up for the control cohort (171–427 days). Additionally, the proportions of participants with zero treated and all bleeding events were calculated during the efficacy evaluation period for the treated cohort and at maximum follow-up for the control cohort. The absolute differences in the mean treated and all bleed ABR were then compared between the treated and control cohorts using two-sided, two-sample t tests. The absolute differences in the proportions of participants with zero treated and all bleeding events were compared between the treated and control cohorts using chi-squared tests.

To calculate the propensity scores, the following baseline demographics and clinical characteristics were considered: age (years), weight (kg), height (m), body mass index (kg/m2), ethnicity, hepatitis C virus (Y/N), hepatitis B virus (Y/N), geographic location, number of problem joints, baseline treated ABR, baseline FVIII utilization (IU/kg/year), baseline number of FVIII infusions (number/year), and FVIII concentrate type at baseline (standard half-life [SHL], extended half-life [EHL], both SHL and EHL, or plasma derived). Candidate variables for inclusion in the propensity score were based on the statistical relationship with bleeding outcomes (via stepwise regression) and input from the study team and external clinical experts. In order to statistically assess the similarity of groups, the standardized mean difference (SMD) was calculated. SMDs with a value greater than 0.1 indicate an imbalance between groups [7], and these values are presented alongside P values, where a value less than 0.1 is also generally recognized as an imbalance between groups [8]. The SMDs and P values were calculated before and after weighting to investigate any differences remaining after adjustment.

Statistical analyses were performed using the statistical software R, version 4.0.3 (R Core Team, 2020), with the “MatchIt” package, version 4.1.0, and “survey” package, version 4.0.

Sensitivity and Scenario Analyses

Sensitivity and scenario analyses were performed to evaluate the robustness of the findings based on the propensity score adjustment methodology, selection bias based on the control cohort utilized, the time horizon over which bleeding event data were captured, and the impact of using alternative baseline data or modeling methodology to develop the propensity scores.

Sensitivity analysis 1 used an alternative propensity score matching (PSM) methodology to implement a matching ratio of 1:1, with replacement for the control and treated cohorts assessed in the primary analysis using the same covariates. Sensitivity analysis 2 used the same PSM methodology but without replacement for the control and treated cohorts assessed in the primary analysis with the same covariates used for SMRW. Sensitivity analysis 3 used inverse probability treatment weighting (IPTW) to re-weight both the control and treated cohorts assessed in the primary analysis with the same covariates used for SMRW. Sensitivity analysis 4 assessed temporality through the use of alternative baseline data from the prospective component of 270-902 that was used to develop the propensity scores for the treated cohort, as opposed to baseline data from the retrospective report.

Scenario analyses evaluate the impact of selection bias through the application of primary or sensitivity analysis methodologies to alternative populations. Scenario analysis 1 used the GENEr8-1 mITT population for the treated cohort, with the same SMRW methodology. Scenario analysis 2 used the 270-902 non-rollover population for the control cohort, with the same SMRW methodology. Scenario analysis 3 used all of the 270-902 completed participants for the control cohort, with the same SMRW methodology. Scenario analysis 4 used the GENEr8-1 mITT population for the treated cohort and the 270-902 AAV5-negative population for the control cohort, with PSM and a matching ratio up to 1:2. Summaries of the primary analysis, sensitivity analyses, and scenario analyses for propensity scores with changes to the base case are provided in Supplementary Table S2.

Results

The unadjusted and SMRW-adjusted baseline participant demographics and clinical characteristics, along with SMDs to reflect any remaining imbalances between groups, are summarized in Table 1. The distribution of propensity scores before matching for the probability of receiving treatment between the control and treated cohort demonstrated considerable overlap and ensured the analysis was feasible (Fig. 1a).

Table 1 Unadjusted and SMRW-adjusted baseline participant demographics and clinical characteristics
Fig. 1
figure 1

Standardized mortality ratio weighting-adjusted comparison between the control and treated cohort for participants with bleeding events. a Histogram of propensity scores before matching; b mean treated ABR; c mean all ABR. ABR annualized bleeding rate, SD standard deviation

Compared with the control cohort, participants who received valoctocogene roxaparvovec were significantly more likely to have no treated bleeds (treated ABR [standard deviation (SD)], 4.40 [6.14] vs 0.85 [3.59]; P < 0.001) and a reduction in all bleeds (all ABR [SD], 5.01 [6.60] vs 1.54 [3.82]; P < 0.001; Fig. 1b, c). Compared with the control cohort, a significantly higher proportion of participants who received valoctocogene roxaparvovec also had zero treated bleeds (32.9% [95% confidence interval (CI), 21.8–45.5%] vs 82.1% [95% CI, 74.2–88.6%]; P < 0.001) and zero all bleeds (28.5% [95% CI, 17.9–41.0%] vs 58.0% [95% CI, 48.6–67.1%]; P < 0.001; Fig. 2a, b).

Fig. 2
figure 2

Standardized mortality ratio weighting-adjusted comparison between the treated cohort and the control cohort for participants with bleeding events. a Percentage of participants with zero treated bleeds; b percentage of participants with zero bleeds. CI confidence interval

The goal of sensitivity analyses 1 to 3 was to characterize the influence of the propensity score adjustment methodology on the results by utilizing the same populations and variables as the primary analysis but utilize alternative methodologies. Sensitivity analysis 1 used PSM with a matching ratio of 1:1 with replacement, sensitivity analysis 2 used PSM with a matching ratio of 1:1 but without replacement, and sensitivity analysis 3 used IPTW; the results were consistent regardless of the propensity score methodology used. Finally, the prior sensitivity analyses and the primary analysis used retrospectively collected ABR data, which introduce a potential bias regarding temporality. Therefore, in sensitivity analysis 4, consistency was ensured in temporality using prospectively captured data, and the results still confirmed the primary analysis. The results of the sensitivity analyses are summarized in Supplementary Table S3. Additionally, multiple scenario analyses summarized in Supplementary Table S4 demonstrate that the results of the primary analysis are broadly applicable using alternative study populations to generate the propensity scores.

Discussion

Before weighting, the baseline demographic and clinical characteristics of the control and treated cohorts were reasonably similar. While this similarity improved once propensity score weighting was performed (as shown by the SMDs and P values), some variations did remain. For example, there was a small but statistically non-significant imbalance in baseline ABR between the control and treated cohorts even after weighting (5.88 vs 4.17; SMD, 0.189; P = 0.182). Despite this, the benefit of propensity scores is evident when comparing the unadjusted and SMRW-adjusted results for the treated and all bleed ABR, as the differences in these outcomes between the control and treated cohorts increased post-weighting.

Results with or without propensity score weighting showed that valoctocogene roxaparvovec improved ABR significantly (0.85 vs 4.40 and 1.54 vs 5.01, after weighting for treated bleeds and all bleeds) compared to participants on FVIII prophylaxis and increased the proportion of participants with zero bleeds (82.1% vs 32.9% and 58.0% vs 28.5%, after weighting for treated and all bleeds). Importantly, these findings were consistent across multiple sensitivity and scenario analyses. The agreement between these additional analyses covering a range of changes to the methods and populations used for the primary analysis demonstrates that the findings are not an artifact of the parameters and methods used in the primary analysis.

The decision to use SMRW for the primary analysis meant that all of the available control cohort data for participants matching the criteria of the rollover population were used in the analysis and facilitated the interpretability of the comparison with the previously published GENEr8-1 data. This is because, with the SMRW methodology, the treated cohort characteristics and outcomes do not change and, thus, can be more easily compared. This feature ensures consistency in reporting across sample sizes, participant baseline demographic and clinical characteristics, and outcome data in the GENEr8-1 clinical study report and regulatory submissions. The results of the propensity score analyses were consistent with the intra-individual findings of GENEr8-1, which found an absolute difference in mean treated ABR of − 4.1 (GENEr8-1) vs − 3.6 in the propensity score analysis and an absolute difference in the proportion of participants with zero treated bleeds of 48% (GENEr8-1) vs 49% in the propensity score analysis.

While FVIII prophylaxis for HA effectively maintains hemostatic control, its primary limitation is a short half-life resulting in constantly fluctuating peaks and troughs of FVIII activity [9, 10]. In particular, the declining FVIII levels associated with the trough periods expose individuals to an increased risk of breakthrough bleeding [10]. It is possible that the reduced risk of bleeding observed after valoctocogene roxaparvovec infusion is due to the consistent endogenous FVIII activity levels achieved with gene therapy compared to FVIII prophylaxis. Alternatively, emicizumab is quickly surpassing FVIII concentrate as a preferred choice for prophylaxis in hemophilia A given its EHL [11]. Unfortunately, at the time of GENEr8-1 enrollment, emicizumab was still an investigational product, and its use was excluded for participants of the trial. Therefore, data are not available to perform direct comparisons between valoctocogene roxaparvovec and emicizumab, as is the case for FVIII prophylaxis presented here. However, an indirect evaluation was made between valoctocogene roxaparvovec and emicizumab using a matching-adjusted indirect comparisons (MAIC) method [12]. This approach evaluates treatment-related outcomes by accounting for differences between the study populations at baseline. The results demonstrated valoctocogene roxaparvovec generally provided greater protection from bleeding compared with emicizumab prophylaxis dosed at 1.5 mg/kg once weekly. This included a lower ABR for all bleeds and a lower percentage of participants with no treated bleeds [12]. However, FVIII expression levels derived from valoctocogene roxaparvovec decrease over time [13]. The MAIC evaluation comparing valoctocogene roxaparvovec and emicizumab was performed with data collected from the GENEr8-1 trial up to 52 weeks post-infusion [12]. The MAIC was limited by the relevant follow-up data published on emicizumab [12]. Therefore, future indirect evaluations are needed to assess the comparison of valoctocogene roxaparvovec with emicizumab once gene therapy-derived FVIII levels have begun to decline. Regardless of the kinetics for FVIII expression derived from gene therapy, approaches like propensity scoring and MAIC remain valuable tools in rare diseases to make indirect treatment comparisons when head-to-head evaluations are not feasible. Without question, these tools will prove useful in future studies that evaluate the efficacy of valoctocogene roxaparvovec against standards of care based on long-term data for each intervention.

The primary limitation of the present analysis, and all propensity scoring methods, is the potential for selection bias. This bias can be introduced when participants with better or worse prognoses may be more likely than other participants to receive treatment. To address this concern, participants who would have been candidates for valoctocogene roxaparvovec based on their clinical characteristics, but who did not receive treatment, were compared to the treated participants using a variety of propensity scoring approaches to mitigate potential selection bias. While multiple sensitivity and scenario analyses support the primary analysis, the potential for selection bias remains an unavoidable assumption of the study design. Furthermore, the development of propensity scores is restricted to the observable participant characteristics that can be found in both the GENEr8-1 and 270-902 clinical study protocols. As both are clinical trials, this lowers the possibility of missing data among participants, as the clinical trials involve thorough regulation of data monitoring and data collection, but missing data remains a risk in all studies. Additionally, since 270-902 effectively served as the entry point into GENEr8-1 for the majority of participants, the inclusion and exclusion criteria were mostly aligned, which increases the robustness of the findings. These beneficial attributes in the analyses were used to reduce the risks of uncertainty associated with cross-study comparisons. Collectively, the similarity between the results presented here using propensity scoring and the intra-individual comparisons reported from GENEr8-1 support the conclusion that valoctocogene roxaparvovec mediates a true benefit with respect to lowering bleeding risk without potential bias caused by differences in observable participant characteristics between those who elected to directly enroll or not enroll from the observational 270-902 study into GENEr8-1.

Conclusion

Despite the limitations of the present study design, this work leverages the design of the valoctocogene roxaparvovec development program to use best practices in observational data methods. Taking this approach, the outcomes of the valoctocogene roxaparvovec-treated cohort can be compared to those of a matched FVIII prophylaxis cohort. Results demonstrated that participants receiving valoctocogene roxaparvovec have lower bleeding rates and a higher probability of having zero bleeds compared to the control cohort. The propensity scores presented here are in alignment with the intra-individual comparisons made in the GENEr8-1 clinical trial and should serve to further strengthen the confidence in valoctocogene roxaparvovec-mediated benefit in bleeding outcomes for participants with hemophilia.