Background

Treatment decisions in medicine, whether at the patient- or policy level, should consider all relevant alternative health care technologies potentially capable of delivering the change being sought. Such informed decision making on the use of competing treatments requires evidence of relative effects from randomised controlled trials (RCTs) (which may be further used to inform estimates of cost-effectiveness). However, trying to synthesise evidence from several different pair-wise comparisons relevant to a decision problem can be challenging. A piecemeal approach can be avoided if all RCTs evaluating interventions relevant to the treatment decision are considered collectively, for example, with the use of network meta-analysis (NMA). NMA is a well-established statistical technique that extends standard pairwise meta-analysis framework to allow simultaneous comparison of multiple interventions in a single statistical model [1, 2]. This approach then produces relative effect estimates (and associated descriptions of uncertainty) for all treatments connected by the network of evidence [3] – even where head-to-head trials for comparisons do not exist.

NMA using individual patient data

Published work on NMA mainly focuses on the synthesis of aggregate data (AD) (sometimes called summary data, e.g. group means and standard errors available from study reports) [4, 5]; however, methods have been developed that allow use of individual patient-level data (IPD) in NMA, specifically for binary and continuous data [68]. The appeal of including IPD in a NMA is that it is likely to reduce statistical heterogeneity across the network (and in this way help resolve possible inconsistencies); it may also allow subgroup effects to be estimated which in turn could guide more personalised treatment decisions [6]. The use of IPD, alone or in combination with AD, has been shown to improve inference in NMAs where the outcome of interest is binary (dichotomous) by aiding convergence, and by providing unbiased treatment–covariate interactions (that would otherwise be affected by ecological bias [9]). For continuous outcomes, IPD is likely to produce more precise estimates of treatment effects, even in the absence of treatment–covariate interactions [10].

NMA using time to event related outcome data

When undertaking an NMA, aggregate time to event data presented as hazard ratios (HR), and some measure of uncertainty around these estimates, can be pooled directly using standard methods (analogous to pooling odds ratios or relative risks) [11]. However, other AD outputs such as median/mean time to event [12] and cumulative counts of patients having the outcome event in a period of time [13] have also been meta-analysed in a network. In modelling these data, an underlying time to event distribution can be specified which will then allow HRs to be generated from the original AD [5, 14, 15]. To date, evidence synthesis that includes IPD on time to event has been limited to pairwise analysis; additionally there has been little methodological exploration how outcomes in the form of IPD and AD might be considered together in the same NMA [13].

Developing a NMA combining AD and IPD data to synthesise time to event related outcomes

This paper describes a modelling framework that combines AD and IPD in the synthesis of time to event related outcomes within an NMA. It extends the work of Saramago et al. [6] and Sutton et al. [16] on the synthesis of AD and IPD for binary data to consider time to event outcomes. And extends the work on NMAs for time to event AD outcomes by Soares et al. [15] and Woods et al. [14] (the general modelling framework is also described in Dias et al. [5]). Our work was motivated by an NMA on treatments for healing venous leg ulcers, for which we had data from multiple RCTs (more information provide in the next section). A proportion of the RCT data on ulcer healing was available in IPD format (time to healing and time to censoring), with the remaining data available as AD, i.e. count data (proportion of patients healed at end of follow-up which differs across studies). To maximally draw from available data we aimed to jointly synthesise the available AD and IPD, and, in this way, generating better estimates and providing fuller characterisations of uncertainty to best inform decisions on the use of the treatments of interest.

Motivating example: high compression treatments for venous leg ulcers

The case study relates to compression systems aiming to deliver high compression (classed as ≥40 mmHg compression at the ankle) to promote venous leg ulcer healing. Available standardised systems are: two layer hosiery (HH), the four layer bandage (4LB), the short stretch bandage (SSB), the zinc paste bandage (ZINC), and the two layer bandage system (2LB). A detailed description is provided in Additional file 1 with further details of these systems presented elsewhere [17].

Effectiveness evidence from RCTs was obtained from the most recent update of the relevant Cochrane review available to us, and from a recent multicentre RCT which compared 4LB with HH. All available RCT evidence was assessed for inclusion in the current NMA: a detailed process that has been reported elsewhere [17]. The final NMA contained data from 16 RCTs on the relative effectiveness of high compression systems for the treatment of venous leg ulcers. Data for two of the 16 included RCTs (VenUS I and VenUS IV, hereby denominated studies 1 and 2) had full IPD data available (841 participants) which included time to healing or censoring for each participant, together with other individual-level characteristics such as treatment centre, ulcer duration and size and also patient mobility. For the remaining RCTs (1105 participants), aggregate data on the number of healed ulcers were extracted from the source review alongside information regarding treatment type, number of participants allocated to each treatment group, mean duration of follow-up (if this was not stated, trial duration was used), mean ulcer duration and size.

The 16 included RCTs described nine unique high compression treatments: the five standard treatments (4LB, SSB, ZINC, HH and the 2LB) and four ad hoc systems [17]. The ad hoc group consisted of treatments deemed irrelevant to current clinical practice, and are not reported further (results can be provided upon request). These studies were, however, included in the NMA as their data was potentially relevant, for example, in describing determinants of healing.

Table 1 describes the data available and Figure 1 presents the treatment network formed by the evidence [1732]. The most populated comparison was the 4LB vs. SSB, informed by seven RCTs: six with healing data available as AD [1822, 31] and one as IPD [32]. The link between the 2LB and 4LB was informed by two RCTs and each of the remaining six comparisons in the NMA were informed by AD extracted from one RCT for each comparison (Table 1).

Table 1 Analytic dataset
Figure 1
figure 1

Network of RCTs. In the network, a unique treatment category is indicated by a circle. Arrows between circles indicate that these treatments had been compared in a trial (trials are identified using ‘[]’, numbered as in column ‘ID’ in Table 1. (4LB, SSB, HH, Ba, Zinc Paste, BHeH, BzeaH, HV and 2LB as described in Additional file 1).

Methods

We first describe the modelling framework for our main analysis, model A. This model brings together IPD and AD into the same synthesis model, with both types of evidence contributing to the estimation of all key models parameters (i.e. absolute and relative effectiveness estimates). We then detail the evaluation of alternative assumptions applied to model A, thus highlighting and challenging specific assumptions of the modelling framework proposed. All synthesis was conducted in a Bayesian framework.

Suppose there are two sets of studies, one set for which IPD are available and one set for which AD are available. Consider the set of studies to be T = {1,…, number of IPD studies (NSIPD),…, NSIPD + number of AD studies (NSAD)} where the total number of studies is J; consider the set of treatments to be {1,2,3,…}, where the total number of treatments is K, and treatment 1 the reference.

Statistical model for the data

We describe model A in two interrelated parts: part I describes the modelling of the IPD and part II the modelling of the AD.

Model A , part I - modelling the IPD studies , controlling for baseline covariates t _ ijk Weilbull ; s , λ ijk I t ijk c log λ ijk = μ j IPD + γ j c + β 0 j x ijk if k = b μ j IPD + d bk + γ j c + β 0 x ijk if k > b x _ ijk N m , p γ j c N 0 , π
(A1)

Time to ulcer healing (t ijk ) of the i th participant in the j th study (where j =1,…, NS IPD ) and in the k th treatment arm was assumed to be Weibull distributed [33] with shapea. parameter, s, and scale, λ ijk . For some participants, time to event was not observed, and these observations were censored at the time the participant last had trial data recorded, t ijk c . The linear predictor log(λ ijk ), was modelled as a function of the log-hazard of an event for the baseline treatment b, μ j IPD , of the log of the treatment effects, d bk and of a study-specific individual-level regression term, β 0j x ijk , where β 0j exemplifies a covariate effect, i.e. the difference in the log hazard ratio per unit increase in the covariate x ijk a patient-level covariate for the i th patient in the j th trial on treatment k available in the IPD data sets [17]). The effect of each covariate on the hazard of healing was assumed to be equal in both IPD studies. Due to the possibility of missing covariate information existing for some individuals, x ijk was represented as a Normally distributed random variable with mean m and precision p, common across all IPD studies. This is essentially a multiple imputation technique through MCMCb. and assumes that the missingness mechanism was at randomc.. Additionally, to account for centre variability within each IPD study, γ j c was defined for each centre, c, in the j th study, these were combined using a common frailty effect described by a normal distribution with mean zero and precision τ.

The treatment effects, d bk , were log-hazard ratios for treatment k relative to the study-specific baseline treatment b, partitioned further as d 1k  - d 1b . This equation defines the set of functional parameters, that is the log effects of each treatment in relation to the reference treatment 1. Note that d 11 = 0, where treatment 1 was 4LB, arbitrarily chosen as reference treatment.

Vague prior distributions were specified for μ b IPD N 0 , 10 6 for the regression coefficient β 0j (∼N(0, 106)) for m(∼N(0, 106)) and p(∼Gamma(0.01, 0.01)) for the shared centre effect τ(∼Gamma(0.01, 0.01)), for s(∼Gamma(0.01, 0.01)) and for d 1k  ∼ N(0, 106).

Model A , part II - modelling the AD studies r _ jk Bin p jk , n jk p jk = 1 - exp - λ jk AD . t jk AD s log λ jk AD = μ j AD if k = b μ j AD + d bk if k > b
(A2)

Within the AD studies, the observed number of participants with a healed ulcer within each study, r jk , out of the total number of individuals in the j th trial [(where j = NS IPD +1,…, (NS IPD + NS AD )] and k th treatment (intention to treat), n jk , was assumed to be Binomially distributed. The underlying probabilities of an event for each arm and in each trial were represented by p jk . In turn, p jk was expressed as a function of the scale parameter, λ jk AD , of follow-up time, t jk AD , and the shape parameter, s, of the Weibull distribution. The linear predictor, log λ jk AD was a function of the baseline log-hazard of an event for treatment b in study j, μ jb AD , and by the log-hazard ratio for treatment k and baseline treatment b, d bk (=d 1k  - d 1b ) Note that there are parameters common to both model parts (equations A1 and A2), namely the log-hazard ratios and the shape parameter of the time to healing distribution. Vague prior distributions were specified for μ jb AD N 0 , 10 6 .

Alternative modelling assumptions

Underlying model A are a set of assumptions that are detailed below. Such assumptions have been relaxed in models B to D.

Exploring between-study variation

Model A assumed that each included RCT aimed to measure a common treatment effect (fixed-effect); however, it is likely that there was between-study variation. Model B included a random effect to characterise between-study heterogeneity, where d bk was replaced by a study specific δ jbk , that is then described by a Normal distribution with mean d bk and precision, prec – this is common to both parts I (eq. A1) and II (eq. A2).

Time to healing distributions

Model A used the Weibull distribution to describe time to healing. Our choice of survival distribution was limited as distributions such as the Log-Logistic or the Log-Normal do not allow the probability of healing over time to be expressed in a closed form, and hence impede the approach proposed here for the joint synthesis of IPD and AD. Other distributions, such as the Gompertz, were not readily defined within the software used in this work (WinBugs/OpenBugs), specifically under censoring. Nonetheless, the goodness of fit could still be assessed in each IPD data source individually. To do this, we applied parametric regression survival-time models [33] to both IPD data sources [17, 32] independently (covariates and frailty effect considered, as in model A).

Distributional shape parameter

Model A assumes that the Weibull shape parameter of the hazard of healing was common to both IPD data sources. It is possible that this parameter differed between studies, in which case HRs could be affected. Thus, we implemented two alternative NMA models to ascertain the impact of this assumption on the relative effectiveness estimates: model C1 used the shape parameter from the first IPD study to describe the AD studies and model C2 used the shape parameter from the second IPD study to describe these same studies. Because models C1 and C2 represent simple modifications of model A we do not present these algebraically.

Treatment-covariate associations

Model A uses baseline covariates to adjust for clinical heterogeneity in the IPD. To further explore the impact of covariates on the relative treatment effects (i.e. whether they were effect modifiers), and potentially help explain between-study heterogeneity, we also included interaction terms between alternative treatments and baseline ulcer area and duration— as described by Cooper et al. 2009 [34] and Saramago et al. 2012 [6]. Model D assumed a regression (slope) coefficient for the interaction terms, this effect was common across treatments and thus common to parts I (eq. D1) and II (eq. D2). This assumption was data driven, as this was the only option we were able to implement with the data available (i.e. compared to assuming ‘exchangeability’ or ‘independence’). Note that interaction estimates obtained were influenced by the full evidence base for which study mean covariate(s) values were available, including trials considering ad hoc treatments.

Model D, part I- modelling the IPD studies, considering common treatment-by-covariate interactions

Model D , part I - modelling the IPD studies , considering common treatment - by - covariate interactions t _ ijk Weibull ; s , λ ijk I t ijk c log λ ijk = μ b IPD + γ j c + β 0 j x ijk if k = b μ b IPD + γ j c + β 0 j x ijk + d bk + β x ijk if k > b x _ ijk N m , p γ j c N 0 , τ
(D1)

Time to ulcer healing was modelled in the same way as in model A (equation A1). A treatment-by-covariate interaction regression term, βx ijk , was defined, where β is the association effect, assumed common across studies and the same regardless of treatment (excluding control). We included no interaction term for each comparison of k versus b when b ≠ 1 because the common regression coefficient cancels out; but, as β 11 = 0, we included an interaction term for each comparison of k versus treatment 1. For the remainder of the parameters of interest, vague prior distributions were assigned as in model A.

Model D , part II - modelling the AD studies , considering common treatment - by - covariate interactions r _ jk Bin p jk , n jk p jk = 1 - exp - λ jk AD . t jk AD s log λ jk AD = μ jb AD if k = b μ jb AD + d bk + β x ¯ j if k > b
(D2)

In part two of model D, x ¯ j represented the mean covariate value for the j th trial ((where j = NS IPD +1,…, (NS IPD + NS AD )). Both the IPD and AD contributed to the estimation of this interaction term (i.e. same β term in equations D1 and D2). All other components of the model were as described for model A.

Model selection and implementation

The NMA analyses were undertaken in the WinBUGs software [35]. In all models the MCMC sampler was run for 10 000 iterations and these were discarded as ‘burn-in’. Models were run for a further 5000 iterations, on which inferences were based. Chain convergence was assessed by running the model for two disparate sets of initials values, and by then checking that the Brooks-Gelman-Rubin diagnostic converges to 1.0. The WinBUGS code is included for reference in Additional file 2. Within the NMA, goodness of fit was assessed using the deviance information criterion (DIC) [36]. Results were presented using hazard ratio estimates (and associated credibility intervals, CrIs) and also using the probability of each compression system being the ‘best’ treatment in terms of being the most clinically effective [37].

The statistical software STATA [38] was used to fit alternative time to event distributions to the IPD datasets individually. Goodness of fit was assessed with the Akaike Information Criteria (AIC) statistic [39].

Results

Table 2 shows parameter estimates obtained for model A (first column) and alternative models relaxing its assumptions (models B to D, second to fifth columns). Model A considers the baseline information for the following covariates: the log of the ulcer area (in cm2) and duration (in months) - both centred to its sample mean; and dummies generated from a categorical variable on participants’ mobility with three categories (‘immobile’, ‘walks with difficulty’ and ‘walks freely’, the latter being the reference category). The results for model A highlight that the modelling framework proposed is feasible. A graphical representation of model A results is shown in Figure 2. This plots the probability of healing over time for the five main high compression ulcer treatments, but considers only the uncertainty over relative treatment effects. The results of testing the assumptions are described next.

Table 2 Parameter estimates from the alternative MTC synthesis models
Figure 2
figure 2

Graphical representation of model A results reflecting uncertainty over relative treatment effects in the probability of healing over time for the five main high compression ulcer treatments. The main figure (a) shows the expected probabilities of healing (point estimates) across time (25 months); figures (b), (c), (d) and (e) compare the expected values for four layer bandage with the healing probability (point estimates and uncertainty) of each of the other four high compression treatments. Estimates reflect the average participant in the trial data from VenUS IV (IPD study 2) (mean ulcer area at baseline of 9.4cm2 and ulcer duration at baseline of 11.5 months). (4LB, SSB, HH, Zinc Paste and 2LB as described in Additional file 1).

Exploring between-study variation

Despite estimates of HRs from the random effect model (model B) being associated with wider CrIs than those from model A (as expected), point estimates were found to be fairly similar except for the comparison between HH vs. 4LB: HH is estimated to be more effective in model B (HR 1.63, 95% CrI 0.76-3.53) compared to model A (HR: 1.05, 95% CrI 0.85 to 1.29), although the CrI of the former includes the later. The treatment with the greatest estimated probability of healing was HH in model B (59%), rather than 2LB (72%) as in model A. Differences may be explained by any existing variation between studies of SSB vs. 4LB indirectly impacting on the evidence loop 4LB vs. SSB vs. HH. Baseline covariate effect estimates remained similar. However, note that the gain in quality fitting of the random-effects model compared to the fixed-effects is null (DIC: 5396.21 and 5396.22, respectively). Previous published work assessing evidence on the SSB vs. 4LB comparison [40] similarly found no evidence of between-study heterogeneity.

Time to healing distributions

The Weibull was used to describe time to healing in model A. Whilst we were limited in the use of other distributions, goodness of fit was explored by applying alternative time to event distributions to the IPD studies individually. Table 3 shows results of such analysis (AIC statistic) and Additional file 3 graphically presents the fitted curves and the Kaplan Meier counterpart for control arms. The best fitting distributions for both studies were the Log-Logistic and Log-Normal. Of the remaining, the Weibull and Gompertz distributions provided better fit than the Exponential; this was expected given the flexibility of these distributions in assuming increasing, decreasing or constant hazards over time. The Weibull was best in IPD study 2 and the Gompertz best in IPD study 1.

Table 3 Goodness of fit (AIC statistics) of alternative time to ulcer healing models for IPD studies 1 and 2

Distributional Weibull shape parameter of healing hazard

The Weibull shape parameters estimated within models C1 and C2 indicate that in IPD study 1 [32] the hazard of healing was expected to decrease over time (s 1 = 0.93, 95% CrI 0.86-1.01), while in IPD study 2 it is expected to increase (s 2 = 1.27, 95% CrI 1.17-1.38). Note that there is no overlap in the CrIs. However, results show that relative effectiveness estimates are robust to the range of assumptions tested: the estimated HRs did not differ between models C1 and C2, and did not substantially differ from model A.

Treatment-covariate associations

Model D tested the inclusion of interaction terms. Two covariates on ulceration area (in cm2) and duration (in months) were considered and their association with treatment evaluated. Results (column 5 of Table 2) show that the covariates included did not appear to be treatment effect modifiers in this case study. However, estimating these two additional regression terms increased uncertainty in relative treatment effects estimates, specifically for ZINC and 2LB.

Discussion

This paper introduces a novel NMA modelling approach that allows IPD on time to event (with censoring) and AD on event count (for a given follow-up time) to be synthesised jointly, by assuming an underlying, common, distribution of time to healing. Available IPD is used directly to inform this distribution (likelihood). Studies reporting the number of participants healed (AD) are used to inform a probability parameter, and a Binomial likelihood was defined for this subset of the evidence-set. The probability of healing was then related (algebraically) to the common distribution of time to healing, by taking the duration of follow-up in each AD study into account. This modelling framework extends approaches in the literature [5, 14, 15] and is also a natural extension of previously published methodologies of synthesising IPD and AD jointly [68, 16]. This work was motivated by a real data example looking at the effectiveness of high compression treatments on the healing of venous leg ulcers.

We found that the key strength of the use of IPD in this context (additional to the known advantages described in the introduction) was the flexibility in modelling these data allowed. For example, had all evidence been available as AD, the modelling process would have been limited to the specification of uniparametric distributions for time to healing i.e. the Exponential, with constant healing hazard over time [5, 14, 15]. In our motivating example, the Exponential distribution was shown to be less adequate than other distributions in describing the time to event data in the studies for which IPD were available. The availability of IPD allowed a more complex distribution for time to event outcomes to be implemented, in this case the Weibull. This may be of particular importance when absolute effectiveness estimates, and not just relative effect estimates, are of interest – and especially where results may need to be extrapolated beyond the follow-up time horizon.

We note that even with the flexibility offered by the use of IPD we were, in practice, limited to using the Weibull distribution. Semi-parametric methods are also unmanageable as the evidence in aggregate form consists of a single data point and thus does not allow defining a non-parametric distribution of hazard of healing. Given this limitation, the synthesis of time to event data will still often require the use of potentially suboptimal distributional assumptions, in which case estimates obtained may be biased. We suggest further research could focuses on using numerical analysis techniques within NMA, to try and resolve this issue.

This work was also relevant in again highlighting the importance of including IPD when wanting to consider patient characteristics in the model. In the presence of heterogeneity, incorporating information on either a) patients’ baseline characteristics or b) control for treatment-effect modifiers within the synthesis model has been shown to improve estimates and, by doing so, possibly resolve evidence inconsistencies [6, 34]. The first relates to potential heterogeneity in the baseline hazard, which cannot be explicitly explored with AD alone (this is important when analyses aim to explore determinants of baseline hazard, for example). In this study we assumed a common effect of baseline covariates on the hazard of healing across IPD studies (as commonly assumed in IPD meta-analysis). The second relates to treatment-covariate interactions, which are generally acknowledged to be best estimated using IPD, as ecological bias can be avoided [9]. For the proportion of evidence only available as AD, the model here implemented considered study level mean covariate values. Nonetheless, not all studies provided information for these, and imputation was undertaken (imputation is naturally done through the MCMC, and assumes values are ‘missing at random’).

In our work we assumed the shape parameter of the Weibull time to event distribution to be common across studies. However, testing proved this assumption was not valid, thus highlighting the importance of evaluating any assumptions of similarity imposed across studies. Despite relative effectiveness estimates being mainly unaffected, such potential heterogeneity between studies should be explored and accounted for in analyses. Such assumptions of commonality also mean that information may be shared throughout the network, in which case evidence on treatments other than those on our decision set (the five treatments of interest for which results were reported). This is the case of model D that makes use of all evidence (including ad hoc treatments) to estimate treatment-covariate interaction, which may indirectly affect the relative effectiveness estimates of interest.

In the network of evidence there was one closed loop where both direct and indirect data informed relative treatment effect estimates. The existence of inconsistency was explored elsewhere [17], showing no evidence of statistically significant discrepancies between the direct and the indirect data. We note, however, that given the fairly high uncertainty in the evidence base, only large differences in direct and indirect data within the loop would have returned a statistically significant result.

The evidence included as AD in our case study was limited to the primary outcome data on proportion of patients healed at the end of follow-up. However, we could have extended the modelling framework to use any other summary data reported, such as number of patients healed at different time points or Kaplan-Meier (KM) curves. Information conveyed in Kaplan Meier plots can be reconstructed [41], and further research could focus on including this information in our NMA modelling framework to strengthen inferences.

Conclusions

The use of IPD for a time to event outcome is particularly useful in guiding HTA decision making by allowing flexibility in the specification of more appropriate survival distributions and in dealing with potential existing study heterogeneity [42]. Increasingly, data sharing is promoted in research [43, 44] and this example highlights how use of IPD allows the development of more informative and flexible models that are better able to summarise existing evidence. However, it is important to acknowledge that accessing and analysing IPD can be time consuming and may cause delay [4]. The process needs to be well planned and implemented. In our case both sources of IPD were easily accessed and that directly facilitated the conduct of these analyses and the associated methodological work presented here.

Endnotes

a.The shape parameter of the Weibull distribution, s, can be interpreted directly as follows: i) if 0 < s < 1, hazard rate decreases over time; ii) if s = 1, hazard rate is constant over time (hazard exponentially distributed); and if s > 1, it indicates that the hazard increases with time.

b.When imputing missing information, MCMC generates independent draws of the missing data from its predictive distribution. Multiple imputation through MCMC techniques is attractive for exploratory or multi-purpose analyses involving a large number of estimands.

c.The missing-at-random assumption (sometimes called the ignorability assumption) considers that the probability that an observation is missing may depend on the observed values but not the missing values, as sufficient data has already been collected.