FormalPara Key Points for Decision Makers

The Bayesian methods’ ability to incorporate prior information, handle heterogeneity between studies explicitly and provide intuitive probabilistic interpretations, make it a powerful framework for pooling utility data.

This tutorial provides step-by-step guidance on how to conduct a Bayesian meta-analysis to pool health state utility values using a ready-to-use R script.

1 Introduction

Utility refers to a measure placed by an individual on quality of life that is commonly associated with different health states [1]. It is measured by a value between one (representing full health) and zero (representing death), although it may take a value less than zero for extreme health conditions viewed as worse than death [1]. Estimates of quality-adjusted life years (QALYs) are directly informed by both health state utility values and the length of time spent in the health states of interest. In the context of cost-effectiveness analysis and health technology assessment (HTA), utility values and QALYs are used to quantify the effect of health interventions or technologies on an individual’s quality of life, and offer a standardized measure to compare alternative interventions and allocate resources efficiently [2]. Thus, utility holds significant importance in economic evaluations.

Researchers employ utility values as part of the inputs to inform economic models. However, for a particular health state or condition, multiple utility values derived from different studies typically exist, making the selection of the most suitable utility value challenging. This complexity arises due to the differences in study characteristics, for example, in terms of the health-utility instrument used, the timing and frequency of collection, the variables collected, and the heterogeneity of the study population [3]. Additionally, the lack of methodological harmonization in utility measurement methods, along with the deliberate selection of a utility value to be incorporated into decision-support economic models, can contribute to discrepancies in utility outcomes and add to the complexity in selecting the appropriate utility value. For these reasons, a single study is often insufficient to represent the best available source of utility values needed for informing a policy decision [4], and researchers rely on systematic literature reviews of utilities to assess and harmonize disparities in the estimated values across studies [4,5,6].

Meta-analysis of utilities has been recommended to generate a single parameter input for economic evaluation within a clinical domain or for a specific health state or condition [4]. The studies conducted so far on meta-analytic approaches for pooling utility values have shown that such methods can generate reliable utility estimates suitable for incorporation into economic models [4, 7, 8]. Petrou et al. [4] described three approaches (i.e. fixed-effect meta-analysis, random-effects meta-analysis and mixed-effects meta-regression) for combining utility data from various studies, but cautioned about possible methodological issues associated with the application of these approaches [4]. Hatswell et al. [7] conducted a comparison between the frequentist and Bayesian meta-regression models for the synthesis of utility data. The authors found comparable outcomes between the two approaches, and recommended the Bayesian analysis as the preferred approach, due to its capacity to incorporate prior information into the analysis, making it possible to utilize utility values identified from previous reviews [7]. A subsequent study by Hatswell [8] introduced the use of the Bayesian power prior [9] to adjust prior knowledge according to perceived relevance of the study source. The use of this prior produced comparable results with random-effects meta-analysis, but not with fixed-effect meta-analysis, which yielded very narrow confidence intervals. The authors noted clear benefits associated with the method, and suggested various avenues for development [8]. Overall, these studies offered valuable guidance for conducting meta-analysis of utilities, contributing to the progress of evidence synthesis in HTA to inform health policy decisions.

Bayesian meta-analysis involves the combination of information from multiple studies, while incorporating prior knowledge or beliefs about the parameters of interest, to generate updated estimates and quantify uncertainty. As such, it provides a powerful framework for the synthesis of health state utility values. Firstly, the inclusion of prior knowledge into the analysis is relevant in the case of utility values, given the increasing number of primary studies and systematic reviews of utility values across various health states and population groups [4]. The findings from these studies can serve as valuable prior information in the meta-analysis. Secondly, Bayesian meta-analysis handles the degree of heterogeneity among studies more explicitly, which is crucial since utility values can be highly variable across studies [3]. The integration of prior information for the effect size and between-study heterogeneity helps in pooling diverse sources of data to produce more robust and precise estimates [10]. Lastly, Bayesian meta-analysis generates posterior distributions that encompass a range of possible values for the parameters of interest, allowing for better assessment of uncertainties around estimates and enabling direct calculation of posterior probabilities [11]. This facilitates clear interpretation of results and allows researchers to make informed inferences about plausible values for utility.

Effectively applying Bayesian inference to real-world problems requires a blend of statistical and programming skills, domain expertise and an understanding of the decision-making process within data analysis [12]. Together, these components form a sophisticated Bayesian workflow, which encompasses several tasks, including pre-specifying the analysis plan, incorporating diverse priors, providing scientific rationale for the priors and comparing different models [13]. Against the above backdrop, this tutorial aims to provide step-by-step guidance on how to perform Bayesian meta-analysis of utilities using R, with the view to empowering meta-analysts to utilize statistical modelling more effectively, enhancing confidence in the inferences and decisions derived, and going beyond the simple pooling of results. To this aim, we have provided codes to aid practitioners in understanding and applying Bayesian methods using data from a systematic review of health state utilities of patients with heart failure [14].

2 Data and Software

2.1 Summary of the Systematic Review

2.1.1 Systematic Review Methods

The data used in this tutorial has been reproduced with permission from the study authors [14]. The objective of the systematic review was to identify and summarize utility values of patients with heart failure. The search strategy included a peer-reviewed database search from their start date until June 2019, supplemented by a grey literature search including HTA websites and by relevant publications from a parallel review on cost-effectiveness models for pharmacological interventions in heart failure led by the same primary author [15]. Studies were included if they reported health state utility values for adults aged 18 years and above with heart failure, regardless of study design. Details on the study design, the instrument used to elicit utility, the value set used to produce utility values, the health state to which the utility data was reported (i.e. chronic heart failure, hospitalized, and other acute heart failure) and the utility value and its measure of variability were extracted from the eligible studies. Studies that had a sample size of ≥ 100 were included in the calculation of the interquartile limits (25th and 75th percentile) for health states and heart failure subgroups. Meta-analysis was not carried out.

2.1.2 Findings from the Systematic Review

The review identified 161 publications with primary utility data of patients with heart failure elicited from 142 studies. The studies varied in design and study population. The EuroQol-5D (EQ-5D) (3L or 5L) was the most common instrument used to elicit utility (n = 104) although several studies did not specify which version was used (n = 37). The majority of publications did not report the value set used to calculate utility (n = 88), although the UK value set was the most commonly reported (n = 33). Utility values were reported in 128 publications for chronic heart failure, 39 publications for hospitalized patients with heart failure and three for other acute heart failure. Of the publications that reported EQ-5D utility values and met the criteria for calculating the interquartile limits, the calculated limits for chronic heart failure (n = 35) were 0.64–0.72, with a trend of decreasing utility with increasing disease severity. The limits for hospitalized patients with heart failure were 0.54–0.63 during hospital admission (n = 4) and 0.64–0.73 at hospital discharge (n = 6).

2.2 Description of the Dataset Used in the Meta-Analysis

For the purpose of this tutorial, the Bayesian meta-analysis only includes studies that reported heart failure utility values using EQ-5D (either 3L or 5L). This was an attempt to establish a reasonable degree of comparability between the studies and utility values included in the meta-analysis, to enhance the validity of the results. However, it is important to note that there were still some variations in the design (e.g. randomized controlled trials, non-randomized trial, observational), diagnosis or health state, and source of data (e.g. some were obtained from conference abstracts) across the studies included in the meta-analysis.

For studies with more than two treatment arms and those that reported utility values by subgroups (e.g. by the New York Heart Association class), where appropriate and possible, groups were combined to produce a single weighted average of utility values. For studies with multiple time points and intervention studies, only baseline data were utilized to avoid the introduction of any confounding effect of the intervention into the analysis [4]. For studies that reported multiple utility values derived from different value sets, the utility value based on the UK value set was preferred [4]. The final dataset includes 21 studies, of which six did not report a measure of variability (e.g. standard deviation) (Table 1). The dataset and R Script (hereinafter referred to as the “script”) can be found in the Online Supplementary Material (see the electronic supplementary material).

Table 1 Characteristics of studies included in the meta-analysis

2.3 Setting-Up R and RStudio

The open-source R software [35], along with its popular integrated development environment RStudio [36], are used in this tutorial. Several other open-source software and packages are available for carrying out Bayesian meta-analysis, which allow researchers to implement and customize models according to their needs and preferences. Some examples include JAGS [37], BUGS [38] and JASP [39].

Many introductory courses and workshops on data manipulation, analysis and visualization for R and RStudio are available online. Throughout the tutorial code, existing packages and user-defined functions in R are used. Packages can be easily installed using the R function ‘install.packages()’.

3 An example of Bayesian Meta-Analysis of Health State Utilities in R

The tutorial adheres to the following structure: (1) set-up the data in R; (2) employ methods to impute missing standard deviations; (3) define the priors; (4) fit the model; (5) diagnose model convergence; (6) interpret the results; and (7) perform sensitivity analyses.

3.1 Set-Up the Data in R

All the packages needed to run our model are loaded using ‘library()’ (Box 1). The brms (Bayesian regression models using Stan) package is used for the implementation of the Bayesian meta-analysis [40]. Stan is a probabilistic programming language for statistical modelling [41], and brms extends the functionality of Stan by offering an interface (similar to the traditional regression modelling syntax in R) to fit Bayesian models. The mfp and mice packages are utilized to impute missing standard deviations through fractional polynomial regression and multiple imputation using chained equations, respectively [42, 43]. Plots are generated either through the built-in functions in the brms package or through the bayesplot [44], shinystan [45], tidybayes [46] or the ggplot2 [47] packages.


Box 1 Set-up the data in R

figure a

We load the dataset using the ‘read.csv()’ function. The dataset contains four variables: studyid, which refers to the first author and year of publication; n, the study sample size; utility, the reported or calculated mean health state utility per study; and sd, the standard deviation.

3.2 Employ Methods to Impute Standard Deviations

Commonly, meta-analyses exclude studies that lack a measure of variability. However, there are several approaches to deal with missing standard deviations in meta-analysis [48], and the choice of imputation method usually depends on the nature of the outcome variable or data structures. For the purpose of demonstration, two approaches are shown in this tutorial to impute missing standard deviations of utility values. The first method involves fitting a regression model using fractional polynomials based on the methods of Royston and Altman [49]. This method has been applied in a previous meta-analysis of utilities of chronic kidney disease patients [50]. The second method involves multiple imputation using chained equations. The first approach is applied in the main analysis, while the second approach was used in the sensitivity analysis.

To fit a fractional polynomial regression model of the observed standard deviations against the utility estimates, the mfp function from the mfp package is used [42] (Box 2). The advantage of using fractional polynomial regression is that it allows for non-linear modelling of relationships between variables [49]. Briefly, the mfp package works by fitting several models with different combinations of fractional polynomial transformations of the predictor variable, and then selects the model that best fits the data through a stepwise model selection approach. In our model, sd is the dependent variable, while utility is the predictor. The term ‘fp(utility)’ tells R that we want to investigate different fractional polynomials of utility. The results are stored in an object called sd.model1.


Box 2 Employ methods to impute missing standard deviations

figure b

The output shows that given our data, the best model for predicting the standard deviation of a utility estimate is a simple linear model with the following equation: sd = 0.558 − 0.474 × utility. We also calculate the standard error of each utility estimate since this is required as an input for fitting the Bayesian meta-analysis model using the brms package in the subsequent step.

3.3 Define the Priors

One of the advantages of employing Bayesian methods lies in their capacity to integrate prior knowledge or beliefs into the analytical process, and combine this prior knowledge with the observed data to update parameter estimates. Prior distributions are specified for each parameter in the model, and in the context of Bayesian meta-analysis, this means that we can directly model our assumptions about two parameters of interest: (1) the effect size (i.e. mean health state utility) and (2) the between-study heterogeneity tau. The inclusion of prior information for these parameters can help improve the precision of the estimates, particularly when dealing with a limited number of studies or highly variable data, by shrinking estimates towards more plausible values [51].

In general, priors can be classified into three types: flat, informative and weakly informative. Flat priors are typically used when one wants to input as little information as possible about the parameters of interest, thus assigning equal probability to all possible parameter values. In contrast, informative priors incorporate specific prior knowledge, for instance, from previous research or literature, which can influence the plausibility of some parameter values. Lastly, weakly informative priors fall between flat and informative priors, and provide some information to guide the analysis without strongly influencing the results. These are usually employed when there is some prior knowledge about the parameters of interest, but not strong enough to justify a more constrained prior [11]. While Bayesian meta-analysis offers flexibility in including priors in the model, it is important to carry out sensitivity analysis to assess if specifying different prior information affect the results.

We define the priors using the ‘prior()’ function from the brms package (Box 3). This function takes two arguments, the prior distribution and the class. For illustration purposes, we use a Normal prior centred at 0.5 with a standard deviation of 0.05 for the mean health state utility, but other priors can be used [7, 8]. We set the class as intercept since it is a fixed population-level effect. For the between-study heterogeneity, we use a half-Cauchy prior to restrict values to positive numbers (since standard deviations cannot be negative), with a peak of 0 and a scale of 0.5. We set the class as sd since it is a measure of variability. The priors are saved into an object called priors.model1.


Box 3 Define the priors and perform prior predictive check

figure c

The adequacy of the priors can be checked by performing prior predictive checks [44]. A prior predictive check involves generating simulated data based on the chosen prior distribution and comparing it to the observed data. In brms, this is carried out by fitting the model (the arguments of the brm function are explained in the next section) and including the ‘sample_prior = “only”’ argument [40]. The ‘pp_check()’ function is used to plot the simulated data points and the observed data. In our example, the prior predictive check showed that the simulated data aligns with our expectation (that the utility value is around 0.5 and has a standard deviation drawn from the half-Cauchy prior) and covers the range of the observed data (Fig. 1). It is to note that if the simulated data from the prior predictive distribution consistently diverge from the observed data, it implies that the suggested priors are at odds with the data and likely need to be reconsidered.

Fig. 1
figure 1

Prior predictive check. The black line represents the observed data, while the blue lines represent the simulated data based on the prior distributions specified for the model parameters.

3.4 Fit the Model

The brm function from the brms package is used to fit the Bayesian meta-analysis model. The brms package uses the No-U-Turn sampler (NUTS) to find and draw samples from the posterior distribution [40, 52]. The NUTS algorithm is considered better than traditional Markov Chain Monte Carlo (MCMC) in terms of efficiency, adaptability and scalability [11, 52]. The NUTS algorithm is packaged into Stan [11], which the brms package applies for fitting Bayesian multilevel models.

For our example, we define the model for our meta-analysis by specifying the formula, data, prior and iter (Box 4). The formula argument follows the standard regression notation, with some modifications since we are doing a meta-analysis. The part ‘utility | se(se.imp1) ~ 1’ indicates that our outcome is the utility value weighted according to the standard error of each study and that we do not have any predictors in the model. However, if one wishes to perform a meta-regression to account for factors that could potentially influence the pooled utility value (for instance, the year of study, study design, or the instrument used to elicit utility), then the syntax would be replaced by ‘utility | se(se.imp1) ~ covariates’. It is worthy to note that inclusion of covariates into the model requires specifying priors for those parameters as well. The part ‘+ (1 | studyid)’ indicates that the utility values are assumed to be nested within studies and as such we want to use a random-effects model. We specify our dataset in the data argument, the priors for the effect size and between-study heterogeneity in the prior argument (which we have already set in the previous step), and the number of iterations per chain in the iter argument. By default, the brm function runs four chains. We save the fitted model into an object called fit.model1.


Box 4 Fit the model

figure d

3.5 Diagnose Model Convergence

Several tools are available to evaluate model convergence. By default, the brm output offers two convergence metrics: the Gelman–Rubin convergence diagnostic (i.e. Rhat) [53] and the number of effective sample size (i.e. bulk_ESS and tail_ESS). Rhat serves as a numerical summary for evaluating convergence. In practical applications, many researchers employ a threshold value greater than 1.1 to indicate non-convergence [54]. The effective sample size refers to the number of independent samples from the posterior distribution after taking into account autocorrelation of chains [11]. A low effective sample size indicates high autocorrelation, which means that the sequential samples are closely related to the previous one, rendering the chains inefficient. As a rough guide, both bulk_ESS and tail_ESS should be at least 100 per chain to be able to consider the estimates reliable [55]. Additionally, graphical diagnostics can also be used to assess model convergence, for example, using a trace plot and a posterior predictive check plot. If the model has converged well, we can expect a trace plot with a stable path and good mixing, and a posterior predictive check plot where the density of the generated effect size aligns with the observed data.

We evaluate the Rhat, bulk_ESS and tail_ESS using the ‘summary()’ function (Box 5). The ‘plot()’ function from the brms package displays both the density and trace plots for the parameters of interest (in our case, the effect size and between-study heterogeneity), whereas the ‘mcmc_trace()’ function from the bayesplot package displays just the trace plot. The ‘pp_check()’ function is used to display the posterior predictive check plot. This function works by drawing samples of model parameters from the posterior distribution and generating simulated data points that match the structure of the observed data. Lastly, the ‘launch_shinystan()’ function from the shinystan package opens an interactive window where we can further examine diagnostic plots and assess the performance of the model.


Box 5 Diagnose model convergence

figure e

Model diagnostics showed that our model achieved convergence. No parameter had an Rhat above 1.1 or bulk_ESS and tail_ESS of less than 400 (100 × 4 chains) for both effect size and between-study heterogeneity parameters. There were no divergent transitions recorded. The trace plot showed stationarity and good mixing (see Figure S1 in the electronic supplementary material). The posterior predictive check plot showed that the simulated effect sizes aligned with the observed effect size, particularly at the tails of the distribution (Fig. 2).

Fig. 2
figure 2

Posterior predictive check. The black line represents the observed data, while the blue lines show the simulated data based on the posterior distribution of utility (which takes into account both the observed data and the prior distributions of the model parameters).

3.6 Interpret the Results

Guidance about interpreting the results from Bayesian analysis is available [11, 56]. In our example, we interpret the results by looking at the pooled effect size and between-study heterogeneity in the summary output (Box 6). The pooled effect size, in our case the pooled utility value, is 0.66 with a 95% credible interval (CrI) of 0.60–0.70, given the data, priors and model used. The between-study heterogeneity tau is 0.12 (95% CrI 0.08–0.18). Since we fitted a random-effects model under the assumption that each study has its unique effect size, we can also look at the study-specific effect sizes (by summing up the pooled effect size and the deviations from each study) using the ‘ranef()’ function.


Box 6 Interpret the results

figure f

Bayesian meta-analysis naturally provides the posterior distribution of the pooled effect, which we can examine and use to make explicit probability statements regarding our parameters of interest [10]. We can extract the parameters of interest from the fitted model using the ‘posterior_samples()’ function, and then perform manual calculations or use the ‘ecdf()’ function to calculate posterior probabilities (Box 7). The ecdf function takes a value or set of values as input and returns the cumulative probabilities associated with those values. In our example, the probability that the pooled utility value is less than 0.80 is 100.00%, while the probability that it is less than 0.70 is 96.06%. Figure S2 displays the cumulative posterior distribution plot, which shows the cumulative probability associated with values less than or equal to each utility value on the x-axis (see the electronic supplementary material).


Box 7 Calculate posterior probabilities

figure g

Lastly, we can also generate a forest plot by following the step-by-step guide from the tidybayes package [46] (Fig. 3). Note that this requires installation of other packages. The full code is provided in the script.

Fig. 3
figure 3

Forest plot showing the posterior distribution of effect sizes for each study and the pooled effect size

3.7 Perform Sensitivity Analysis

For this tutorial, we explored the results of using a more informative prior for the mean health state utility, employing multiple imputation using chained equations to impute missing standard deviations, and excluding studies with missing standard deviations. For the first sensitivity analysis, a Normal prior centred at 0.6 with a standard deviation of 0.03 is used (Box 8). This is regarded as being more informative than the prior used in the main analysis since it is above the midpoint of possible utility values with a smaller standard deviation. Given that mean utility values are generally bounded between 0 and 1 (although, at the individual participant level, they may occasionally take negative values), it is important to choose a prior distribution that will not violate these bounds. When the number of observations is large, the likelihood will be more important than the prior, and the posterior distribution will approach a normal distribution. In such scenarios, the normal distribution is a justifiable choice for the prior, and is easy for non-statistical collaborators to understand. However, when utility values are likely to be close to the boundaries of possible values (0 or 1), skewness will be introduced that only an extremely large number of observations will overcome. In these cases, a different prior distribution is justified. Options include a truncated normal distribution, a truncated log-normal distribution (which is skewed and has a lower limit) or truncated log-normal distribution reversed to have an upper limit. Using a beta prior distribution might also be an appropriate choice, as it aligns well with the above constraint and could potentially reflect the uncertainty around utility values better [57]. For example, in the script, this could be implemented by using ‘prior(beta(1,1), class = Intercept)’ rather than ‘prior(normal(0.5,0.05)’. Lastly, if the analysis conceives of the possibility of negative utilities, has relatively small number of observations and anticipates results around zero, it may be necessary to define a lower boundary to the utilities. The prior for the between-study heterogeneity can also be changed, but is kept the same as the main analysis in this example for simplicity. Guidance on the selection of prior distributions for the between-study heterogeneity parameter is available [58, 59].


Box 8 Sensitivity analysis 1: using a more informative prior

figure h

For the second sensitivity analysis, we use the ‘mice()’ function from the mice package. Multiple imputation using chained equations involves generating several datasets with imputed values for the missing data. Within each dataset, missing values are filled in one at a time, using the observed values and other variables in the dataset. The process iterates, refining the imputed values based on the results of the previous round, until all missing data have been imputed. In our example, we generate five sets of imputed standard errors (Box 9). We use the brm_multiple function (rather than the brm function) from the brms package to fit the model since it is compatible with fitting multiple imputed datasets generated by mice and pooling the posterior distributions of those imputed datasets [40].


Box 9 Sensitivity analysis 2: using multiple imputation using chained equations

figure i

For the last sensitivity analysis, only studies with available standard deviations were included (Box 10). The results are summarized in Table 2. The sensitivity analyses conducted did not materially change the pooled utility value and the 95% CrI. The model excluding the studies with missing SDs (model 4) generated less precise estimates compared to the other models, while the model with more informative priors for the effect size (model 2) yielded slightly narrower CrIs. All models achieved convergence. Model 3 has the highest bulk_ESS and tail_ESS since five imputed datasets were used and pooled to produce the estimates.

Table 2 Summary of results and convergence diagnostics

Box 10 Sensitivity analysis 3: excluding studies with missing SDs

figure j

3.8 Comparison to Frequentist Approach

Random-effects meta-analysis using the DerSimonian and Laird method were also carried out using the metafor package [60] to compare the results between the frequentist and Bayesian approaches. The codes for the frequentist meta-analysis are included in the script. The three frequentist models (i.e. with imputed SDs using fractional polynomial regression, with imputed SDs using mice, and excluding studies with missing SDs) produced identical pooled utility values (0.70), which were slightly higher than the values from their Bayesian counterparts (Table 3). As expected, the confidence interval widths were narrower than the CrIs, and the taus were lower in the frequentist models than in the Bayesian models. Lastly, all p values produced from the frequentist models were statistically significant (p < 0.05).

Table 3 Comparison between Bayesian and frequentist meta-analysis results

4 Discussion

This paper outlines the fundamental steps in conducting Bayesian meta-analysis of utilities in R. By providing an illustrative example with data and codes, the paper highlights the applicability of Bayesian modelling in synthesizing utility values.

The tutorial benefits from following a clear workflow for conducting Bayesian meta-analysis. This workflow is adaptable to a wide range of statistical problems, and encourages the clear and transparent communication of assumptions, prior beliefs, and data, to increase rigor and replicability in research. The tutorial also benefits from using the brms package [40]—a powerful and versatile tool for fitting Bayesian models using Stan. The brms package greatly simplifies the model specification process since it follows the coding language in other widely used R packages (e.g. the lme4 package [62]). This makes Bayesian modelling more accessible and approachable to individuals without a deep understanding of Stan, and also eases the transition of individuals who are already familiar with R to Bayesian modelling. Nonetheless, users have the flexibility to select from a range of available software and packages for conducting Bayesian meta-analysis based on their preferences and specific requirements.

The sensitivity analysis was deemed as an insightful exercise, showcasing some approaches that can be implemented to deal with missing standard deviations. This is essential in the context of economic evaluations, given that reporting of measures of variability around utility values is poor [8], despite it being promoted as good practice by HTA agencies [63]. It has been shown that imputing missing standard deviations in meta-analyses is generally better than excluding studies [48, 64]. It is worth noting that the methods presented here (i.e. fractional polynomial regression and multiple imputation using chained equations) are both executed prior to model fitting, meaning that missing data are filled-in before running the model. In contrast, imputation can be built into the Bayesian meta-analysis rather than a two-step process, which is considered as a superior approach since imputation is integrated into the model fitting process [48, 65]. However, it is computationally demanding and requires programming in the underlying Stan software, and falls outside the scope of this tutorial. This method allows extra flexibility by allowing the inclusion of prior information and uncertainties related to the missing data, updating these uncertainties through the sharing of information within the hierarchical structure of the model, and ultimately generating a posterior distribution for each missing data point [11].

The sensitivity analysis also showed that using a more informative prior did not change the results, suggesting that the data had more influence on the analysis than the prior in our example. In Bayesian analysis, the consideration of priors is a crucial aspect since they can impact the final results and conclusions drawn from the analyses. It is therefore essential to carefully select and specify priors for each parameter of interest, and perform sensitivity analyses to check how different initial states of the model can affect the estimates [11].

The comparison of Bayesian and frequentist meta-analytic approaches produced roughly similar results. Yet, the interpretation between these two approaches differs. For the Bayesian approach, the 95% CrI is easier to interpret, such that it indicates the 95% probability that the pooled utility value lies between the lower and upper limits of the interval. On the contrary, interpreting the 95% confidence interval presents more challenge, as it implies conducting the analysis repeatedly with the assumption that 95% of the generated confidence intervals will contain the true value. The significant p value from the frequentist models is a useless test in the context of pooling utility values since it does not provide meaningful information in terms of interpretation of findings. The posterior distribution, on the other hand, can be used and explored to estimate parameters, calculate posterior probabilities and quantify uncertainties. In terms of the tau, the frequentist models appeared to underestimate the level of heterogeneity between studies. Although not to a great extent in our example, frequentist meta-analysis had been shown to perform poorly when there is a high between-study heterogeneity and small number of studies included in the meta-analysis [66]. These issues are addressed in the Bayesian model by inclusion of priors for tau and effect size, which helps in handling the uncertainty around these parameters. Thus, while the frequentist approach is easier to implement in practice (in terms of speed and simplicity, as it could be implemented with a few lines of codes), the Bayesian approach offers several advantages that makes it suitable for the meta-analysis of health state utility values.

5 Conclusion

In conclusion, Bayesian method offers several advantages when conducting meta-analysis of utility values. Its ability to incorporate prior information, handle heterogeneity between studies explicitly and provide intuitive probabilistic interpretations make it a valuable tool for synthesizing utility data. In this tutorial, we provided a pooled utility value (and its CrI) for patients with heart failure, which can be used as an input for economic evaluations. We hope that this fosters an interest in Bayesian methods and their applications in meta-analysis of utilities.