Background

Chronic hepatitis B (CHB) is responsible for about 600,000 deaths worldwide per year, from end-stage liver disease or hepatocellular carcinoma (HCC) [1]. An estimated 350 to 400 million people have CHB [2], of whom 15 to 40% will eventually experience serious complications (hepatic cirrhosis, hepatic decompensation or HCC) [3]. Development of complications is associated with persistent replication of the hepatitis B virus (HBV) [2]; hence, an important goal of CHB treatment is long-term suppression of HBV replication to undetectable levels, as measured by serum HBV DNA (virologic response) [2, 4]. Normalization of serum alanine transaminase (ALT), loss of hepatitis B e antigen (HBeAg) and improvement in liver histology are other recognized measures of CHB treatment efficacy.

Current European clinical guidelines recommend the following treatment options for CHB: entecavir, lamivudine, telbivudine, adefovir dipivoxil, tenofovir dipivoxil fumarate, peginterferon-alfa-2a, interferon-alfa-2a, and interferon-alfa-2b [2]. Information on their relative efficacy is important in order for healthcare professionals and payers to make evidence-based decisions on which treatments to prescribe. Because head-to-head comparisons of competing CHB treatment options via randomised clinical trials (RCTs) are not available for all comparators in HBeAg antigen-positive or -negative CHB, indirect evidence in the form of network meta-analyses (NMAs) has been used to estimate relative efficacy. NMAs extend conventional, pair-wise meta-analysis, and are based on the principle that within trial estimates of relative treatment effects can be added and subtracted [5, 6].

An important assumption with NMAs is that the studies used are sufficiently similar in terms of relative treatment effect modifiers [7] - that is, study-level factors that may influence the size of the treatment effect seen with a particular pair-wise intervention. These include patient characteristics, outcomes measured, and study design. Thus, to ensure a fair comparison of interventions, it is essential to control for differences between studies in terms of potential relative treatment effect modifiers. In particular, baseline differences in patient characteristics between different trials may distort between-trial comparisons if appropriate adjustments are not made.

In CHB, response to treatment varies according to the outcome of interest, the agent used, and the patient’s HBeAg status [2]. Patient/disease characteristics that have been shown to predict response to treatment in at least some categories include baseline viral load, serum ALT level, HBV genotype, and activity score on liver biopsy [2, 4]. Ali and colleagues [8] analysed data from 1,353 patients in two RCTs of entecavir and found that higher baseline viral load was associated with reduced odds of response to treatment: when baseline viral load (by PCR) was treated as a continuous variable, the odds of achieving a response were reduced by a factor of 0.38 (62%) for every one unit increase in log10 PCR above a threshold of 400 copies/ml.

Given the absence of head-to-head RCTs for all interventions, the objective of the current study was to generate estimates of relative efficacy of achieving undetectable viral load (UVL) that take into account the potential of baseline viral load to act as a treatment effect modifier, in order to provide like-for-like comparisons between treatments for CHB that take into account the heterogeneity in baseline viral load across patient populations in different trials.

In order to compare the results with previously published NMAs, as well as demonstrate the implications for clinical and reimbursement decisions of using such estimates, we also generated unadjusted relative efficacy estimates using similar methodologies to those used in these previous analyses.

Methods

We carried out adjusted and unadjusted analyses, using the same trial data for each, to explore the impact of baseline viral load on treatment response at 1 year. The interventions analysed at licensed doses were interferon alfa, peginterferon alfa-2a/2b, lamivudine, adefovir dipivoxil, entecavir, tenofovir, telbivudine and also placebo. Trials for inclusion in the NMAs were identified through a systematic review of the literature.

The efficacy endpoints analysed in the unadjusted analysis were ALT normalization, histological improvement, HBeAg seroconversion and achievement of UVL at 1 year. Since Ali and colleagues [8] only generated results for one endpoint (achievement of UVL at 1 year) the adjusted analysis was necessarily restricted to this endpoint/timepoint. In all analyses, UVL was defined as reduction in HBV DNA level (by PCR assay) below the trial specific lower level of quantification (LLOQ).

Systematic review

We carried out a systematic review of RCTs of the interventions listed above. The inclusion criteria were RCTs (phase II or III) of monotherapy interventions at licensed dose, adults with CHB, reporting any of the endpoints of interest, and published in English. Papers (full or otherwise) reporting interim results and studies using the interventions of interest at non-licensed doses were excluded.

Searches were carried out in the Embase, Medline, Medline in Process and Cochrane CENTRAL databases between March and April 2011. No restriction was placed on the earliest date of publication and all databases were searched from date of inception. Search strategies comprised CHB disease and drug terms (a combination of controlled vocabulary and free text terms), and also a bespoke RCT filter. A search was also made for abstracts from the European Association for the Study of the Liver and the American Association for the Study of Liver Diseases 2010 and 2011 annual conferences. Search syntax for all databases are available on request from the authors. The search strategy used to search the Embase database is presented in Additional file 1.

The studies were separated into four clinically distinct patient groups: treatment-naïve HBeAg-positive or -negative, lamivudine refractory and ‘other’. Abstract screening was performed by two authors and included in the full paper review if one reviewer thought it relevant. Formal full paper review was undertaken by two reviewers against the pre-specified inclusion criteria with a third acting as mediator in situations of disagreement. Three authors independently extracted study characteristics and the outcome data required for the NMA using a standard form. Discrepancies were resolved by one of two other authors. Outcome data from weeks 48 and 52 were assumed to refer to 1 year. A risk of bias assessment was carried out using Cochrane methodology for those RCTs reported as full papers [9]. No formal protocol was created for this review.

Statistical analyses

Given the lack of head-to-head trial evidence estimating the relative efficacy of all licensed interventions, we used an NMA approach to synthesise the evidence. In the NMA methodology the difference in effect between treatments A and B is equal to the difference in effects between treatments A and C, and B and C. The analysis can be expanded to more complex networks of evidence, and can produce estimates of both mean effect and uncertainty [10].

Fixed effect models were used in the unadjusted analysis. For the adjusted analyses we used both fixed and random effects models, and final model choice in all analyses was based on deviance information criteria (DIC) [11].

Choice of prior distribution for parameters in NMA models is an important consideration, especially in the presence of sparse networks of evidence. Uninformative priors were used in the unadjusted analyses for all model parameters. In all covariate-adjusted analyses the results from Ali and colleagues [8] were used to inform the prior distribution on the regression coefficient associated with baseline viral load. Variations in baseline viral load in each study arm (where there were differences) were incorporated into the adjusted analyses via the use of the average baseline HBV DNA value across arms within a given RCT.

Baseline viral load was assumed to modify treatment effects relative to entecavir (0.5 mg), which was also used as the baseline against which all relative efficacy estimates were calculated. To make the results easier to interpret by a non-statistical audience, we represented relative efficacy as a relative risk (RR) of response instead of the more natural odds ratio. We reported the mean of the posterior probability distribution as well as the 95% credible interval (CrI) for each RR. When the 95% CrI did not include the value one, the RR was considered significantly different to that for entecavir.

In order to compare the results of the analyses with the input data, as well as presenting the output in an intuitive manner, we also generated the absolute predicted posterior probabilities of response for each clinical outcome and treatment combination. In the adjusted NMA we also undertook a range of sensitivity analyses whereby in addition to the use of fixed and random effects models, the impact of adding or removing individual studies due to heterogeneity was assessed. Caterpillar, density and Brooks-Gelman-Rubin plots were examined in all analyses to ensure model convergence.

The analyses were conducted in WinBUGS Version 1.4 (MRC Biostatistics Unit, Cambridge, UK) [12] using Bayesian Markov Chain Monte-Carlo Gibbs sampling methods.

Results

Search results and summary of studies

Our search of clinical databases identified 3,000 abstracts; 179 articles, including clinical study reports (CSRs), were ordered/requested for review, of which 35 (six CSRs) met the inclusion criteria [1348]. The contents of five of the CSRs had been reported in peer reviewed publications already captured by the search and hence the published data were used [18, 26, 37, 42, 44]. One CSR (BMS study AI463023) [13] and the Summary of Product Characteristics for telbivudine [22] were included in the review. In total, the review identified 29 unique trials. Of these, 19 contained information in HBeAg-positive patients, and 14 of the 19 reported enough information to warrant inclusion in a NMA, and 13 reported information on UVL at that timepoint [1315, 18, 2024, 26, 28, 30, 32, 48].

The study selection process is presented as a PRISMA diagram in Figure 1. The PRISMA 2009 checklist is reportedin Additional file 2. Study characteristics and reported UVL at 1 year (defined as either 48 or 52 weeks) are shown in Table 1. The assessment of study quality undertaken as part of the systematic review is reported in Additional file 1: Table S1. Studies identified by the systematic review used a range of LLOQ values from 1,000 to 200 copies/ml.

Figure 1
figure 1

PRISMA diagram of studies included in the systematic review. AASLD, American Association for the Study of Liver Diseases; CHB, chronic hepatitis B; EASL, European Association for the Study of the Liver; HBeAg, hepatitis B e antigen; NMA, network meta-analysis.

Table 1 Study characteristics and 1-year outcomes of studies included in the network meta-analysis (HBeAg-positive patients only)

Unadjusted network meta-analysis

The network of evidence used to generate all results is presented in Figure 2. The results of the fixed effects analysis are presented as relative risks in Table 2 and absolute probabilities of response in Additional file 1: Table S2. There was only one instance where a treatment performed significantly better than entecavir: the RR for tenofovir achieving UVL was 1.43 (95% CrI 1.30 to 1.54). With the exception of telbivudine which demonstrated no statistically significant difference to entecavir (RR 0.88, 95% CrI 0.76 to 1.00) all other interventions performed significantly less well than entecavir.

Figure 2
figure 2

Evidence networks of studies used to generate unadjusted results for the undetectable viral load endpoint.

Table 2 Unadjusted efficacy estimates relative to treatment with entecavir

Adjusted network meta-analysis

The primary adjusted analysis of achieving UVL at 1 year, when accounting for baseline viral load, was undertaken using materials available in the public domain (the “base case”). Thus, the material extracted from the CSR was excluded. In addition, the baseline rates for two studies were very different to the remainder in that they were assessed using a different assay with very different LLOQ definitions suggesting that baseline data were collected in a different manner to all other studies [16, 24]. These studies were also excluded from the base case analysis. One study, TBVIG, reported median rather than mean baseline viral load and was hence also excluded from the base case-adjusted analysis [30]. Information on this study is provided in Table 1. Data from the ten studies that reported baseline viral load were used in the adjusted analyses [14, 15, 18, 2023, 2628, 31, 48].

The results are presented as relative efficacy estimates in the second column of Table 3 and as absolute probabilities of UVL at 1 year in Figure 3 (fixed effects) and Figure 4 (random effects).

Figure 3
figure 3

Absolute probability of undetectable viral load at 1 year (fixed effects). “Basecase” refers to the adjusted analysis undertaken using the ten studies listed in the document containing appropriate information.

Figure 4
figure 4

Absolute probability of undetectable viral load at 1 year (random effects). “Basecase” refers to the adjusted analysis undertaken using the ten studies listed in the document containing appropriate information.

Table 3 Adjusted relative risk estimates for virologic response, expressed as relative risk of achieving undetectable viral load

The relative risk estimates produced by the fixed and random effects base case analyses were very similar. In particular, entecavir produced significantly increased RRs of UVL at 1 year compared with all interventions except telbivudine and tenofovir, for which the likelihood was similar. In contrast to the unadjusted analysis, the relative efficacy of entecavir and tenofovir for achieving UVL was not significantly different. Thus, baseline viral load is a significant moderator of the effects of monotherapies for CHB. The DIC estimate for the fixed effect model was lower than that for the random effects model and was thus the preferred approach.

Adjusted network meta-analysis: sensitivity analyses

Exclusion of data from one adefovir study

The reported 1 year UVL rate for adefovir patients as reported by the 018 Study Group is approximately two to three times higher than reported for adefovir in all other studies (Table 1). In contrast, the absolute response rate for patients receiving telbivudine reported by this study was in line with that observed in the other studies. The impact of removing this study is presented in column three of Table 3, and in Figures 3 and 4.

The DIC statistics for both fixed and random effects models were similar, with the random effects analysis representing the best fitting model. While the results overall are similar to those in the base case analyses, the greatest impact is observed in the tenofovir results, with a relative risk value of 1.08 (95% CrI 0.22 to 1.52). Of note, the derived absolute response probabilities for entecavir and tenofovir in this scenario were 65.9% and 71.4%, respectively (random effects model). The corresponding values in the key regulatory trials were 66.7% and 76.5%, respectively.

Exclusion of data from the 018 Study Group, and inclusion of data from AI463023 and TBVIG

The systematic review identified two additional studies which contained information of potential interest: as yet unpublished data from BMS study AI463023 and median baseline values from the TBVIG study [13, 29]. When these data were included, but the data from 018 remained excluded, the corresponding results from this analysis are presented in column four of Table 3, and in Figures 3 and 4.

The random effects analysis generated the lowest DIC and is therefore the preferred model. There was no significant difference between the relative efficacy of entecavir, telbivudine and tenofovir, but entecavir performed significantly better than all other interventions. These results are in contrast to the unadjusted results. The absolute probabilities derived using a random effects model for tenofovir, telbivudine and entecavir were similar to those observed in the landmark RCTs.

Discussion

NMA can be used to generate relative efficacy estimates of competing treatments in situations where more than two treatment options are available and direct head-to-head evidence from RCTs does not exist for all comparators. The NMA approach allows all relevant evidence to be considered and addresses research questions in the absence of direct comparative evidence, improving the precision of estimates by combining direct and indirect evidence.

One of the key assumptions underpinning this method is that the studies included in the analysis are homogeneous (that is, the trials are sufficiently similar on study and patient characteristics). The similarity assumption is violated if one or more study-level covariates act as modifiers of the relative treatment effects and their distribution is not balanced across the studies being compared [49, 50]. In this case, NMA may be affected by confounding bias, unless one explicitly controls for these covariates in the statistical analyses.

Controlling for covariates is particularly important in cases where response to treatment is defined in terms of post-treatment level of a measure, and when that baseline level of this measure is known to vary across studies. If one study recruits patients with worse levels of a variable that is known to modify the relative impact of treatment, then the level of response achieved is likely to be smaller compared with another study which primarily includes patients with better baseline levels, other things being equal.

The motivation for our work was the belief that such baseline covariate imbalances had occurred for patients recruited into studies looking at interventions for CHB. In particular, it was noted that there were differences in mean baseline viral load (expressed in terms of log10 copies/ml when measured using the PCR assay) with values for entecavir and tenofovir differing by approximately 1 log10 copies/ml (Table 1). We hypothesised that failure to account for these differences in previous analyses may have led to biased estimates of relative efficacy.

The work contained in this paper supports this hypothesis. When no adjustment was made to account for differences in baseline viral load among trials, tenofovir was shown to be significantly better than entecavir in terms of achieving UVL at 1 year (fixed effects RR 1.43, 95% CrI 1.30 to 1.54). However, when we accounted for the impact of baseline viral load the difference between the two treatments was not significant (fixed effects RR 1.27, 95% CrI 0.96 to 1.47; random effects RR 1.21, 95% CrI 0.48 to 1.51). The fixed effects adjusted model best fitted the underlying data, although the difference was minor (fixed effects DIC, 35.56; random effects DIC, 35.86).

Sensitivity analyses highlighted that the relative efficacy of tenofovir versus entecavir was contingent on the choice of studies included in the meta-analysis, and in particular whether or not data reported by one study group [13] were used. When these data were excluded, there is no significant difference between the two interventions (RR 1.08, 95% CrI 0.22 to 1.52). A subsequent sensitivity analysis, whereby this study was removed but two other studies were included (AI463023 and TBVIG), generated similar non-significant results (RR 1.15, 95% CrI 0.39 to 1.50). In both sensitivity analyses the most appropriate model, based on DIC, consisted of random as opposed to fixed effects approaches. Close examination of the published paper [14] has identified no reason why this result should occur, and so there may be some other form of study level heterogeneity as yet unaccounted for that is influencing the results.

Our paper is the first to generate baseline viral load adjusted and unadjusted NMA results using data from the same set of studies, and the results from the unadjusted analyses are very similar to those generated by other research groups [51, 52]. Accepting that NMA is based on relative efficacy, the results from all three unadjusted analyses for UVL appear to be at odds with those provided by the clinical trials included in the NMA. The systematic review identified one study of tenofovir [27] and the observed response rate was 76%. The corresponding value arising from our NMA was 93.2% (95% CrI 85.6% to 97.6%). Similar values were generated by two other research groups [51, 52]. One other NMA has been recently published [53]. This analysis, however, contains a number of methodological flaws, the most notable being the pooling of data from HBeAg-positive and -negative individuals. We have therefore not extracted results from this paper for the purposes of discussion.

In contrast, with the exception of placebo and interferon-based therapies, the CrIs for the values derived in the adjusted analyses all contain the observed trial values, and the RR estimates are close to the trial values once the 018 Study Group data are removed (Figures 3 and 4). Hence, we would argue that the adjusted results are of greater clinical relevance than the unadjusted results.

Generating ‘like-for-like’ estimates of relative efficacy by controlling for covariates believed to be modifiers of relative treatment effects is not just of clinical interest but is essential for the purposes of reimbursement decisions. Such estimates are used by agencies such as the National Institute for Health and Care Excellence in their appraisal processes when assessing the clinical efficacy in a given disease area [54]. In addition, such values are also used in economic models to evaluate the cost-effectiveness of interventions. A number of such models have been developed in CHB [5559], of which one [57] used the results from their unadjusted analysis directly as model inputs. Another [59] used UVL as a surrogate variable for risk of cirrhosis using information from the REVEAL-HBV study [60], which quantified the relationship between HBV DNA and the likelihood of being diagnosed with cirrhosis. Overestimation of virologic response would thus correspond to underestimation of the likelihood of cirrhosis, which has been identified as a key driver of cost-effectiveness.

Despite the review finding a decent number of studies overall, as can be seen from Figure 2, the presence of a large number of treatment options means that the majority of the branches in the network are informed by the findings of a single study. This increases the uncertainty surrounding all results and means that baseline imbalances in other potential treatment effect modifiers may have influenced the results.

Further work is needed to complement the work contained in this paper in connection with the achievement of UVL at 1 year in order to explore the impact of other potentially clinically relevant covariates on the relative effects of comparators and the probability of achieving UVL. Exploring the impact of other areas of potential heterogeneity (for example, study design, impact of different LLOQ definitions) is also important. In addition, Ali and collagues [8] identified the time of assessment as a treatment effect modifier in addition to baseline viral load. The studies included in this analysis were very similar in terms of assessment times and so the exclusion of this variable is likely to have had a modest effect. Nonetheless, it would be interesting to replicate the analyses contained in this paper when controlling for these slight differences. Furthermore, expanding this type of analysis to other clinically relevant endpoints is also worthwhile.

Conclusions

The analysis showed that baseline viral load is a treatment effect modifier in CHB and that failure to correct for this variable inflates the relative efficacy estimates for some interventions. Since these estimates are often used in economic models to generate cost-effectiveness estimates, failure to adjust for baseline viral load will generate erroneous ICERs, resulting in poor use of scarce healthcare resources. As such, reimbursement agencies should therefore only use covariate-adjusted relative efficacy estimates in their decision making surrounding treatments for CHB.