Background

Hepatitis C virus (HCV) was first described as non-A, non-B hepatitis in patients who presented with acute hepatitis after transfusion of blood products [1]. HCV is an enveloped RNA virus, which targets hepatocytes leading to liver damage [2]. Parenteral transmission due to intravenous drug use, followed by transfusion of blood products, before HCV screening, has been described as the most frequent routes of infection. However, HCV can also be transmitted sexually or vertically [3, 4]. Among patients exposed to HCV, a minority can spontaneously clear the virus, and around 66–82% of patients who still have detectable serum HCV RNA for six months should be considered as chronically infected (chronic hepatitis C [CHC]) [5].

Chronic Hepatitis C is a major global health burden, but it is treatable [6]. However, for economic reasons the treatment is still restricted or out of reach in several settings [7, 8]. New direct-acting antivirals (DAAs) are highly effective for HCV treatment [9] but are still relatively expensive in most countries. The decision about whether and what subgroup of CHC patients should be treated with DAAs has economic importance. Nevertheless, there are also costs associated with the surveillance management of CHC to determine the stage of liver disease for treating only a smaller group of patients with significant but reversible impairment. Recently, it was found that the surveillance of liver disease with transient elastography (TE) is an economically attractive alternative to liver biopsy [10]. All these options for monitoring and management of CHC make the decision process much more complex. Furthermore, treat-all DAA strategies for CHC have gained acceptance despite the high acquisition costs of DAA drugs in most countries [11]. However, the cost-effectiveness of these alternative strategies for deploying DAAs has not been examined in low and middle income countries possibly due to high budget impact.

Understanding the methods used in identified models and how they influence results is important. One important review was previously published [12] to describe and systematically review the methodological approaches in published cost-effectiveness analyses (CEA) of CHC treatment with DAAs. The current systematic review aims to extend the analysis to: (1) explore and discuss the variation in model characteristics, and (2) summarize the incremental cost-effectiveness ratios found in intervention studies for CHC, and surveillance of liver disease studies.

Methods

The systematic review was made to answer the following research question:

What model structures and parameters have been used to estimate the cost-effectiveness or effectiveness of surveillance or treatment of people living with chronic hepatitis C and what are their conclusions?

The systematic review was carried out following the principles published by the National Health Service (NHS) Centre for Reviews and Dissemination [13].

Eligibility criteria

Eligible studies included mathematical or simulation models predicting costs and/or outcomes applied for interventions, surveillance, or clinical management of people living with CHC. Ultimately, only comparative studies evaluating an intervention that included a DAA were eligible for inclusion in the review. Economic evaluations alongside clinical trials and isolated statistical models fitted to observed data were excluded.

Eligible studies included models used to evaluate DAAs as intervention compared with established treatment strategies. The comparator conditions for the including a model were limited to “no treatment” or regimens with pegylated interferon. Eligible surveillance studies were those which used biological markers, elasticity imaging techniques, or liver biopsy. Studies evaluating screening of blood donation to reduce exposure to HCV were not included.

Search strategy

The search strategy was developed in conjunction with an experienced Information Specialist (CC) and is provided in Additional file 1.

The database searches were conducted from inception to May 2015. The following bibliographic resources were searched: MEDLINE, EMBASE, NHS EED (The Cochrane Library), HTA Library (The Cochrane Library) and LILACS were searched. No limits were used. Citation chasing was conducted on publications included in the review and the reference lists of identified systematic reviews were also scrutinized.

Study selection, data extraction and quality assessment

Titles and abstracts were screened by nine researchers (RC, RA, HP, JVC, DM, MH, LC, JCALS, CH). Each pair of researchers were allocated ~ 600 titles/abstracts and screened for relevance against the inclusion criteria, disagreements were resolved by discussion. Papers selected for full text review were reviewed and screened by six researchers (RC, HP, JCALS, CH, JVC, LC).

Data extraction was carried out by six researchers (RC, HP, JCALS, CH, LC, JVC) using a template. Data were extracted from included studies by one researcher and checked by another.

The following aspects of the included studies’ methodology were reviewed: model type, HCV population, regimens, perspective, time horizon, discount rate, cycle length, and sponsor.

Studies were critically appraised using the Philips checklist for assessing the quality of model-based economic evaluations [14]. In line with the instructions accompanying the final checklist, where there was insufficient information available in the article to assess quality the item was marked ‘No’. Included studies were also quality assessed using the CHEERS checklist (for reporting quality) [15].

Data synthesis

The results of included studies were analysed on the basis of visual inspection of the tabulated extracted data. When applicable, the mean and confidence interval of incremental cost-effectiveness ratios (ICERs) deflated to 2015 international dollar with purchasing power parity (PPP) were calculated. Quantitative data synthesis was conducted using the R environment [16].

Changes to protocol

Prior to full-text screening the criteria for the review of intervention CEAs were revised to specify that only studies evaluating an intervention that included a DAA were eligible for inclusion in the review. Although data extraction was conducted for all eligible studies, only studies that enabled comparative analysis were critically appraised and included in the quantitative synthesis. This selection was made to focus on the studies which evaluated at least one DAA with another treatment protocol for CHC and reported results by Genotype.

Results

The initial searches identified 2403 titles and abstracts after deduplication. Following screening 348 papers were requested for full-text review. Of these, seven further studies were identified when screening the reference lists of systematic reviews. A total of 307 publications were excluded at full text (see Additional file 2 for more detail). A total of 41 publications were eligible for inclusion in the review (see Tables 1 and 2): 37 publications [17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53] evaluated the cost-effectiveness of a DAA and four publications [54,55,56,57] reported three models evaluating the cost-effectiveness of surveillance methods. Of the 37 CEAs identified, eight were eligible for inclusion in the analysis [28, 29, 37, 39, 41, 44, 46, 47] as they evaluated at least one DAA with another treatment protocol for CHC. One study (Leleu et al., 2015) [35] did not report results by genotype and as such was not eligible for inclusion in the comparative analysis. The main countries in which the analyses were conducted were: United States of America (USA), United Kingdom (UK), Italy, Switzerland, and Spain. The study selection process is summarized in Fig. 1, Additional files 1 and 2.

Table 1 Summary characteristics of included models evaluating DAAs
Table 2 Summary characteristics of included models evaluating surveillance strategies
Fig. 1
figure 1

PRISMA flow diagram

Model characteristics

Therapeutic intervention

Model type and structure

A total of eight economic models evaluating the cost-effectiveness of interventions (including a DAA) for the treatment of chronic hepatitis C were included in the comparative analysis [28, 29, 37, 39, 41, 44, 46, 47]. Model characteristics are summarized in Table 1.

Most models were Markov-based, included the METAVIR stages and presence of cirrhosis as health states, with lifetime as time horizon and cycle length ranging from 30 days to 12 months. When the studies used Markov models with a previous decision tree, they were classified as Markov models. One study used a different approach for modelling with discrete event simulation [41]. The discount rates used ranged from 2.0 to 3.5% per year and the sensitivity analysis was performed by deterministic and/or probabilistic methods.

HCV population

Concerning characteristics of the population, studies frequently involved separate analysis for treatment naïve and experienced patients. Some of them evaluated the effects of the treatment for specific age groups. The most frequent HCV genotype was 1 but there were studies that included different range of genotypes from 1 to 6.

Perspective and sponsor

The most adopted perspectives were national health systems and third party payer, and some studies were developed with societal perspective. Regarding the funding, a considerable part was funded by pharmaceutical companies.

Regimens

A total of 126 different combinations of intervention comparison and population were described across 8 studies considering the following features: (i) Unique Combination of Intervention versus (vs) Comparator with Time in weeks of treatment duration (UCICT); (ii) HCV genotype; (iii) prior treatment status (naive versus treatment-experienced; and (iv) presence of cirrhosis (with versus without) (Table 3). In most comparisons, the population was treatment naive (n = 79 vs n = 47), 65 combinations were stratified by presence (n = 39) or absence (n = 26) of cirrhosis, and 61 combinations evaluated all patients (cirrhotic and non-cirrhotic) in the same group. Comparisons evaluating HCV Genotypes 1 and 3 (38.8 and 40.4%) were more prevalent in the studies compared to Genotype 2 (20.6%).

Table 3 Data available for analysis of all treatment comparisons used in intervention studies by population and study characteristics

Considering the comparative interventions in the included studies, the articles evaluated a total of 11 UCICTs. The UCICT “sofosbuvir (SOF) + ribavirin (RBV) 24 weeks (wks) versus (vs) No treatment (Tx)” was the most frequently evaluated in the included studies (n = 22; 6 for Genotype 1 and 16 for Genotype 3). The UCICT were specific in relation to genotypes which makes comparison encompassing more than one genotype difficult (Table 3).

Surveillance

Model type and structure

A total of three economic models (reported in four publications) were identified evaluating surveillance strategies in chronic hepatitis C [54,55,56,57]. The models used for surveillance evaluation were similar to those used for DAA interventions, using a Markov modelling approach, one of them with a previous decision tree [54]. For one particular study, all patients started without fibrosis (METAVIR stage 0), further states were METAVIR stages 1–3 with separate stages constructed for diagnosed, undiagnosed, and misdiagnosed states, followed by Hepatocellular carcinoma (HCC), and radiofrequency thermal ablation [56]. Both other studies were based on METAVIR stages, with HCC, and Liver transplanted, and dead [57]; or all the previous states and an additional post-liver transplantation [54]. The time horizon used was lifetime for all studies. Model characteristics are summarised in Table 2.

HCV population

The HCV populations for surveillance studies were: newly diagnosed with chronic HCV and no fibrosis [56]; HBV, HCV genotypes 1–4, with suspected fibrosis, who usually present for liver biopsy [54]; and, treatment naïve, HCV genotypes 1–3 [57].

Perspective and sponsor

None of the surveillance studies were funded by industry [54,55,56,57]. The perspectives adopted were the National Health Systems [54, 56] or payer [57].

Regimens

Several alternatives were considered as surveillance regimens. The technologies used were: TE, FibroTest®, ARFI, DwMRI, FibroIndex, contract-enhanced ultrasound, and Type IV collagen, and liver biopsy. One of them included an immediate treatment as alternative [57].

Quality appraisal

Therapeutic intervention

For the eight included intervention studies including DAAs [28, 29, 37, 39, 41, 44, 46, 47], the quality appraisal showed a considerable number of problems. Studies were quality assessed using both the Philips checklist [14] (see Table 4) and the CHEERS checklist [15] (see Table 5).

Table 4 Quality appraisal: Philips checklist (intervention and surveillance models)
Table 5 Quality appraisal: CHEERS checklist (intervention and surveillance models)

Considering the Philips checklist [14], only three of the items (“S7 - Time horizon”; “S8 - Disease states/pathways”; “S9 - Cycle length”) were fully accomplished by all included studies. The items which showed a higher frequency of problems were: “D1 - Data identification”; “D2 - Pre-model data analysis”; “D3 - Data incorporation”; “D4 - Assessment of uncertainty”; and, “C1- Internal consistency”. In summary, all [28, 29, 37, 41, 44, 46, 47] but one [39] of the intervention studies did not describe sufficiently or did not use systematic reviews to estimate parameters; all eight included studies were rated as “Unclear” or “No” for pre-model analysis [28, 29, 37, 39, 41, 44, 46, 47]; four studies were rated unclear or did not provide distributions for data incorporated [28, 29, 37, 47]; and, half of the studies failed in terms of assessment of uncertainty [29, 37, 46, 47].

Using the CHEERS checklist [15], slightly improved results in terms of study reporting quality were found, particularly when considering the number of checklist criteria for which problems were identified. However, the included studies failed to meet acceptable criteria for the following four questions: “Q11b - Synthesis-based estimates”; “Q16 - Describe all structural or other assumptions underpinning the decision-analytic model”; “Q17 - Describe all analytic methods supporting the evaluation”, and “Q18 - Report the values, ranges, references, and if used, probability distributions for all parameters”. The biggest problems identified in the CHEERS evaluation were: only two studies [39, 47] did not fail and five studies [29, 37, 41, 44, 46] were rated “Unclear” for the use of synthesis-based estimates; only two of the eight included studies described structural or other assumptions underpinning the decision models [39, 41]; all the studies had unclear description of analytic methods that supported the evaluations [28, 29, 37, 39, 41, 44, 46, 47]; and, four studies had important issues, such as, not reporting probabilities, range of estimates, or not reporting sufficient detail the sensitivity analysis [28, 29, 39, 46]. One aspect that was positive was that all the studies employed sensitivity analysis.

Considering the quality of the parameters used for calculation of ICERs, QALYs and costs, in some studies did not have sufficient description regarding uncertainty (Table 5, Q18). Among the papers that evaluated DAAs, the source of utility parameters was scientific literature, likewise for the transition probability parameters. However, the parameterization of costs occurred exclusively with data from scientific literature in four papers [37, 39, 41, 46]. The same number of the studies [28, 29, 44, 47] used cost data from the payer, mostly from the local health systems. It is also worth noting the use of expert opinion for the parameterization of costs. This approach was used by two studies [28, 29].

Hence, structural or other assumptions, analytic methods supporting the evaluation, and even justification, validation and calibration of the decision-analytic model were points unclear for a considerable part of the included studies. Regarding the calibration or validation, four studies [28, 29, 44, 46], reported the validation of the model structure. Three studies [41, 44, 47] reported the validation of the outputs of the models. One study [46] also reports the validation of inputs.

Surveillance

Four publications reported decision analytic models evaluating surveillance strategies [54,55,56,57]; however, two publications reported the same model and were quality appraised as one [54, 55]. Studies were also quality assessed using both the Philips checklist (see Table 4) and the CHEERS checklist (see Table 5).

Using the Philips checklist [14], the results for surveillance studies were very positive with just one study [54] evaluated as “Unclear” for Question D2, due to reporting omissions for pre-model data analysis (Table 4). The same pattern was observed with CHEERS (Table 5), whose most of the items of this checklist were accomplished by the three included surveillance studies. The exception was Question Q11b in which two studies [54, 57] were evaluated as “Unclear” in terms of description of the methods used to identify included studies and synthetize clinical effectiveness data.

Synthesis results

Therapeutic intervention

In summary, 62 different and not dominated comparisons were described in the 8 included studies (n = 9 combinations had negative ICERs) for each UCICT for the same patient profile stratified by genotype, treatment naive or experienced and presence of cirrhosis). A total of 29 comparisons were evaluated only once in the eight included papers. In addition, the UCICTs “SOF + pegylated interferon (PEG) + ribavirin (RBV) 12 wks vs boceprevir (BOC) PEG RBV 48 wks” and “SOF PEG RBV 12 weeks vs PEG RBV 48 wks”, were the most frequent described (n = 5 studies). For those, the mean and 95% confidence interval (CI) of ICER in international dollar PPP of 2015 were calculated (Table 6).

Table 6 Synthesis of ICERs from the included intervention studies when available with more than one comparison

As we took into consideration the different comparisons (intervention and comparator) for the calculation of ICERs, the synthetized values are all based in unique comparisons. All comparisons are shown in the column labelled “Treatment” in Table 6.

The outcome of all studies was cost per quality-adjusted life-years (QALY), a measure that represents the cost incurred for gaining one year of life adjusted for the quality of life. Due to many comparisons and a small sample, a great variability was found in the ICERs (large coefficient of variation). Half (55) of the combinations resulted in mean ICERs above $30,000. The other half (with ICERs below $30,000) was tested 62 times overall. Approximately 27% of the ICER estimates suggested that DAAs were not cost-effective considering a threshold of $50,000 (see Table 6).

Surveillance

A meta-analysis of the results from these models was not possible due to the different surveillance strategies compared a result of the small number of included studies. As a result, we present a narrative summary of the results from the models identified.

In the model presented by Liu et al. (2011), results indicated that early treatment of CHC can be the cost-effective strategy compared to the implementation of testing approaches [57]. However, for clinical settings where testing is required prior to treatment, FibroTest® only was more effective and also less costly than liver biopsy [57].

The model by Canavan et al. (2013) demonstrated that a strategy of annual definitive Fibroscan® TE diagnosed 20% more cirrhosis cases than the current strategy, with 549 extra patients per 10,000 accessing screening over a lifetime and, consequently, 76 additional HCC cases diagnosed [56].

In the third model identified (Crossan et al., 2015), the authors concluded that when applying the standard UK cost-effectiveness threshold range, the cost-effective strategy was a “treat all” approach resulting in an ICER of £9204 [54]. In the same direction, a research published in 2011, with the payer perspective in USA, proposed a “shift towards strategies that initiate immediate treatment without fibrosis screening” [57].

In summary, the findings of two [54, 57] of the three models evaluating surveillance strategies suggest that treating all CHC patients regardless of the staging of liver disease, could be cost-effective. These analyses were conducted according to the perspectives of USA third party payer (direct healthcare costs only), and the UK National Health Service.

Discussion

This systematic review was able to identify and analyse studies and model structures and parameters used to estimate the cost-effectiveness of surveillance or treatment of people living with CHC. The review demonstrated that of the eight intervention studies that we evaluated in detail, very similar model structures were used to investigate the cost-effectiveness of DAA treatments for CHC.

Models

The majority of included models adopted a Markov approach [28, 29, 39, 44, 46, 47] with METAVIR-based classification used as health states [39, 41, 46]. The presence of cirrhosis (with/without) was another important factor for model structures, with some models including this characteristic as a separate health state [28, 29, 37, 44, 47]. A lifetime time horizon was used in the majority of cases [28, 29, 37, 39, 41, 44, 47], but cycle length ranged from 30 days [37] to 12 months [44, 46]. All the differences found in model structures and cycle length can impose limitation to comparability of the studies and to stakeholder’s decision. The variation found in cycle length might have clinical implication for the results, as the HCV treatment time is being shortened with the use of new drugs. Populations with different characteristics were analysed (e.g.: HCV genotype; prior treatment status (naïve or experienced); cirrhotic or non-cirrhotic; HIV coinfection). Moreover, a number of different treatment comparisons (several drugs and treatment durations) were used for the included studies.

Quality appraisal

The included studies were quality appraised using two checklists (CHEERS and Philips [14, 15]). This assessment identified a number of issues, largely a result of reporting omissions. Accordingly, the major finding of the results of our study is that modelling process should be better described, especially considering model validation and calibration. The implications of these unclear descriptions are that the results can be biased and the decisions made on their basis cannot achieve what was expected. Consequently, real life outcomes might be much different from modelling results, producing unexpected additional budget impact for the health system. Studies should better report their modelling process.

We have not enough data to state if the results obtained by microsimulation models or cohort Markov structure were better. Although discrete event simulation can be considered more powerful in terms of capacity of reflecting real-world changes, the memory-less assumption of the Markov model is not a critical issue for CHC.

A good reason to explain the variation in the results is the use of different data from different settings and perspectives, this issue is as critical as the model structures in term of producing variability. We argue that model structures were relative common among the different studies, and it was not possible to identify any study with insufficient modelling structure.

Data synthesis

The ICERs of treatment were quantitatively synthesized. The cost-effectiveness results for treatment and surveillance indicated important differences. This heterogeneity needs to be contextualized in relation to the different populations, interventions, populations, settings and perspectives of the studies. However, we could only undertake a quantitative synthesis of the models evaluating treatments. The conducted synthesis was limited too by the number of studies that could be combined.

CEAs of HCV treatments should be discussed in relation to the considerable high variability in their ICERs estimates. This analysis suggests that in most circumstances DAAs were cost-effective (when using an ICER threshold of $50,000 per quality-adjusted life-year [QALY]). Considering that new treatments with DAAs have demonstrated high effectiveness [9, 58], the cost dimension is the main challenge for implementation worldwide. Although the cost of the new CHC drugs shows global variation some have suggested they are unaffordable [59]. However, in some countries negotiation with pharmaceutical companies has been successful in providing discounts [60]. This strategy could therefore be adopted in other settings with universal health systems, and has the potential to not only improve cost-effectiveness but increase patients’ access to the highly effective DAAs.

Previous systematic reviews

A previously published systematic review of CEAs that evaluated DAAs found that the modelling structures were similar [12]. In that review, the quality of the included studies was reported as being acceptable by the authors that used CHEERS (reporting quality) checklist only [12]. In our systematic review, we included a second checklist and synthetized ICERs when possible. However, these synthesis results are limited by the literature search update. Regarding surveillance of CHC, a recent systematic review compared TE with liver biopsy and found it cost-effective especially for patients with a higher degree of liver fibrosis. In that review a high variability in methodological quality was found, using the Drummond 10-item checklist [10].

Surveillance studies

Focusing on the included surveillance studies, a treat-all strategy was suggested as cost-effective by two studies [54, 57]. However, those findings can be limited to the local settings, thresholds, and also the perspectives used of USA payer and UK NHS. These conclusions may not apply to lower and middle-income countries. Surveillance and treatment prioritization for the subgroup of CHC patients with higher risk of liver disease progression can be an option. Moreover, the presence of different surveillance strategies in the included studies complicates the analysis of this systematic review. Thus, clinicians and policy makers might have similar problems to achieve the most appropriate treatment decision due to the number of alternatives to be considered.

Clinical implications

Treatment of chronic hepatitis C was revolutionized by high efficacy of direct-acting antiviral drugs (DAAs). However, the decrease of the burden of liver disease in CHC patients by DAA treatment has been associated with high costs to health authorities worldwide [61]. The analysis of studies that evaluated cost-effectiveness of HCV eradication by DAAs is essential for elaboration of public health strategies to promote large primary care access to DAAs regimens, especially in low to middle-income countries with high CHC prevalence.

Limitations

This paper has several limitations. Our findings, especially those related to clinical implications and ICERs synthesis, just represent the circumstances present at the moment of the last search update (May 2015); and, study selection and data extraction can impose risk of bias, even after training of the review team. Considering the limitations of the present review, the variability of the studies included is certainly a factor that should be addressed.

Conclusions

CEAs of CHC treatments presented variability in their cost-effectiveness estimates. Our analysis suggests that there were still some circumstances where DAAs were not cost-effective. Thus surveillance, as opposed to a treat-all strategy may still need to be considered as an option for deploying DAAs, particularly where acquisition cost is at the limit of affordability for a health service. We identified existing models, which could be used to compare surveillance and treat-all strategies. Future studies should compare the cost-effectiveness of the surveillance of liver disease with a treat-all strategy for CHC patients considering different settings and perspectives.