FormalPara Key Points for Decision Makers

Evidence about the health economic outcomes of a diagnostic test is often lacking and has been mentioned as a common reason for diagnostics failing to obtain appropriate coverage.

Evaluating the cost effectiveness of diagnostic biomarkers is challenging because diagnostics themselves do not influence long-term outcomes directly, but rather impact on therapeutic decisions and the subsequent care process.

Economic evaluations on diagnostic biomarkers typically require comprehensive models to deal with all possible test–treatment combinations in various populations to assess their value in terms of health economic outcomes.

More effort should be made to align the choice of health economic evaluation designs and outcomes with the actual information needs of the various public and private payers and care provider decision makers.

Incorporating the results of non-health outcomes and patient preferences and improving the evidence base of other input parameters is crucial to fully capture the potential value of diagnostic biomarkers.

1 Background

The role and the potential value of biomarkers has received increasing attention over the last 2 decades [1]. Biomarkers represent a wide variety of technologies that are used in various stages of the disease process [2]. For example, they may indicate surrogate and clinical endpoints in order to predict clinical benefit and to monitor patients during and after treatment. Biomarkers are also used in earlier stages of the disease process, such as biomarkers that help in early disease detection, biomarkers for staging a disease, and companion diagnostics that are used to guide the selection of therapeutic strategies. As such, a biomarker is defined as “an indicator of a normal biological process, a pathogenic process or a pharmacologic response to a therapeutic intervention” [3]. Due to their various applications, biomarkers have a large potential for optimizing the therapeutic approach or the timing of treatment [4]. This review focuses on biomarkers for diagnosing, staging, and guiding the selection of therapeutic strategies for non-communicable diseases since the number of research activities is rapidly increasing in this field [5]. Despite the large number of diagnostic biomarkers becoming available, only a small number are implemented in the clinical setting, because of a lack of robust evidence supporting their clinical utility or costs and because of the tension between technical possibilities and resource constraints [6].

To prioritize between competing innovations, decision makers require information about the health economic impact of interventions, which is assessed by their benefits and costs. In this context, payers have become more critical and put more weight on examining the clinical utility of a diagnostic test, including biomarkers, when making coverage decisions [7]. The clinical utility is assessed by the link between the test accuracy of a diagnostic test and the associated health outcomes, and provides insight into the benefits of diagnostics [8]. The importance of analyzing the health impact of diagnostics is emphasized by Cohen et al. [9], who concluded from a stakeholder analysis that the cost of the biomarker is not as important for implementation as is the biomarker impact on the longer-term health outcomes. When examining the health impact of a diagnostic test, patient and societal outcomes are considered to be the most important outcome measures [10]. Patient outcomes include the effects on morbidity, mortality, and quality-adjusted life-years (QALYs), and societal outcomes include cost effectiveness and net benefit from a societal perspective [10]. However, evidence about these health and economic outcomes of a diagnostic test is often lacking and has been mentioned as a common reason for failing to obtain appropriate coverage [11].

Evaluating the cost effectiveness of diagnostic biomarkers is challenging because diagnostics themselves do not influence long-term outcomes directly, but rather impact the subsequent care process [12], which may or may not be very effective. In examining the effectiveness of a diagnostic biomarker, one needs to take into account (1) the accuracy of the diagnostic test; (2) the impact of the diagnostic on therapeutic decisions; and (3) the effectiveness of the therapies selected [9, 13]. Another issue in evaluating the health economic impact of a diagnostic biomarker is the choice of its comparators. The number of comparators can become (very) large because diagnostic biomarkers are often combined with other (biomarker) tests, resulting in an unwieldy number of realistic test strategies. It is not yet clear how economic evaluations on diagnostic biomarkers handle these methodological challenges and how economic evaluations of diagnostics are (and should be) designed in this regard. The primary objective of this study is to systematically review the current literature on (methodological) characteristics of cost-effectiveness evaluations of diagnostic biomarkers. The secondary aim is to explore to what extent studies deal with a range of specific issues related to the economic evaluation of diagnostic-based personalized medicine, such as different payer perspectives, preference heterogeneity, and multiple applications in subpopulations.

2 Methods

2.1 Search Strategy

A systematic literature search was performed in PubMed and the National Health Service Economic Evaluation Database (NHS EED) (until 5 February 2015) to identify the most relevant published economic evaluations on diagnostic biomarkers [14]. In both databases, free text words and MeSH terms related to “biomarkers” and “diagnosis” were combined (for full search queries see “Appendix”). The search in the PubMed database was complemented by the relevant economic terms “cost-effectiveness” and “cost-benefit analysis” (MeSH terms) from the sensitivity filter of the Canadian Agency for Drugs and Technologies in Health (CADTH) [15]. The searches were refined to papers published in the English language from 2010 onwards to limit the scope of this review to the most recent and state of the art economic evaluations; hence, we considered a 5-year retrospective horizon to be sufficient.

2.2 Study Selection

After identification of publications by the electronic searches, duplicate records were removed. Selection of papers was based on the following eligibility criteria:

  1. 1.

    Patients: the intervention is being applied to human subjects for diagnosis of one the main five non-communicable diseases, being cardiovascular or circulatory diseases, cancer, chronic respiratory diseases, diabetes, and intermediate risk factors associated with obesity [16].

  2. 2.

    Intervention: a diagnostic biomarker for the aforementioned diseases. Diagnostic biomarkers included risk biomarkers for diagnosis and staging of diseases, and companion diagnostics, which are defined as diagnostic tools to guide treatment. Universal screening tools, triage procedures, and severity, or progression analyses were excluded.

  3. 3.

    Study type: a health economic evaluation is reported, including methods, input data, and results, being a model or trial-based cost-minimization (CMA), cost-effectiveness (CEA), cost-utility (CUA), or cost-benefit analysis (CBA). Publications merely reporting on methodological issues, reviews, or comment letters and editorials were excluded.

  4. 4.

    Setting: the intervention is evaluated in a country defined as an upper-middle-income or high-income economy by the World Bank [17].

Papers were first screened on title and abstract and were excluded when one or more of the eligibility criteria were not met. Subsequently, two reviewers (LS and MO) independently assessed the remaining full texts to make a final decision on inclusion for this review. Dissimilarities between the reviewers were resolved by discussion.

2.3 Data Extraction

Data extraction of empirical and methodological study characteristics was done using an adapted version of the format reported by Pham et al. [2] and the CHEERS statement [18]. The items reflected various study characteristics, including the decision problem, the target population and target audience, the evaluated biomarkers, methodological aspects such as the model type and perspective, and the presence of uncertainty analyses. Secondly, it was assessed whether studies handled a number of specific issues that were previously described by Husereau et al. [19] as being of particular importance to the economic evaluations of personalized medicine. These include the presence of variable framing of research questions, the evaluation of multiple subpopulations or strategies, testing the sensitivity of effects with regard to compliance, preference heterogeneity, variable opportunity costs, and the inclusion of cost-sharing arrangements. Two reviewers (MO and MVDM) independently performed data extraction and resolved any dissimilarity by discussion.

3 Results

3.1 Study Selection

The literature search located 319 publications in the PubMed and NHS EED databases, and two papers were identified through hand-searching. A total of 39 duplicates were removed, resulting in 282 unique papers. After screening titles against the eligibility criteria, 159 papers were selected for abstract and full-text examination. Of these, 68 articles were excluded as they did not describe a health economic evaluation. Another 42 papers were excluded as these examined diagnostic biomarkers for other than the main disease groups we focused on (cancer, obesity, diabetes, cardiovascular, and respiratory diseases), and eight papers were excluded as they assessed biomarkers for universal screening or examined a total triage procedure. Furthermore, eight articles were excluded as they did not apply to middle- or high-income countries. A final set of 33 economic evaluations on diagnostic biomarkers was included in this review (Fig. 1).

Fig. 1
figure 1

Flow diagram of paper selection process. NHS EED National Health Service Economic Evaluation Database

3.2 General Study Characteristics

Table 1 presents an overview of the general study characteristics. Most health economic evaluations considered biomarkers for confirming a diagnosis (N = 16) [2035] and some for staging a disease (N = 4) [3639]. Genetic testing was employed for treatment selection in patients (N = 7) [4046] or to test for a familial disease in patients and their relatives (N = 6) [4752]. Biomarkers were applied to diagnose several types of cancer (N = 23) [20, 21, 2430, 3240, 4448], cardiovascular/circulatory diseases (N = 9) [22, 23, 31, 41, 43, 4952], and respiratory disease (N = 1) [42]. Economic evaluations were most often performed in colorectal cancer and evaluated genetic testing strategies like testing of BRAF and KRAS genetic mutations. In contrast to the evaluations on colorectal cancer which assessed genetic testing techniques, evaluations on diagnostic biomarkers for lung and thyroid cancer assessed techniques for needle aspiration and pre/intra-operative molecular classification. In the field of cardiovascular diseases, economic evaluations most often assessed biomarkers that were primarily used for diagnosis in patients (i.e., troponin for diagnosing myocardial infarction). The number of strategies that were assessed differed between studies and ranged from two to 17 strategies (median: three strategies). Fourteen studies reported a single comparison of two strategies (N = 14) [21, 23, 26, 27, 3032, 34, 3639, 42, 44], i.e., an evaluation of a biomarker strategy compared with no biomarker (N = 6) [21, 30, 31, 34, 36, 42] or a head to-head comparison of two specific biomarkers (N = 8) [23, 26, 27, 32, 3739, 44].

Table 1 General study characteristics of the included studies

The maximum number of comparisons was 16 (N = 1), in a study where 17 strategies were constructed on the basis of clinical criteria, prediction algorithms, tumor testing, and upfront germline mutation testing [47]. Economic evaluations that assessed methods for genetic testing of a disease in patients and their relatives defined and compared more strategies than did evaluations on tests for diagnosing and staging a disease in patients (mean of N = 6 vs. N = 3 and N = 2 strategies, respectively). About half of the studies explicitly mentioned an aim to inform national decision makers (N = 13) [25, 26, 28, 29, 38, 4043, 4648, 51] or clinicians (N = 3) [31, 32, 39]. For the remaining publications, the target audience was not clearly stated (N = 17).

3.3 Methodological Characteristics

Table 2 displays a summary of the methodological characteristics of the studies.

Table 2 Methodological characteristics of the included studies

3.4 Type of Health Economic Evaluation

Most studies were model-based cost-effectiveness evaluations (N = 25) [20, 22, 2629, 3135, 3841, 4352]. The majority of these studies were CUA (N = 20) reporting the incremental costs per QALY only (N = 11) [22, 23, 27, 31, 35, 38, 41, 43, 44, 49, 51] or the incremental costs per QALY in combination with a cost-effectiveness estimate such as costs per life-year gained (N = 7) [26, 28, 42, 45, 46, 48, 50]. Verry et al. [39] reported the incremental costs and incremental QALYs, but not the incremental cost-utility ratio, and Steinfort et al. [32] reported incremental costs per QALY and cost savings. Nine studies performed a CEA, of which two were trial based [25, 30], and reported the incremental costs per life-year gained (N = 5) [29, 33, 40, 47, 52], the incremental costs per additional case detected (N = 3) [25, 30, 34], or the incremental costs per extra patient surviving at 5 years (N = 1) [20]. Eight trial-based studies were identified, containing two CUAs [23, 42] and two CEAs [25, 30]. The remaining four trial-based studies presented incremental costs only and were classified as CMAs [21, 24, 36, 37]. In two of these studies, evidence was provided to support the equivalence of effects between the intervention and its comparator(s) [24, 37].

3.5 Perspective and Time Horizon of the Analysis

Cost effectiveness was mostly assessed from a healthcare system perspective (N = 15) [20, 22, 23, 25, 31, 33, 3840, 4244, 48, 49, 51], while some other studies adopted a hospital perspective (N = 7) [21, 24, 30, 32, 34, 36, 52], a third party payer perspective (N = 5) [26, 27, 41, 46, 47], or limited the perspective to that of the operating room (N = 1) [37]. Five studies adopted a societal perspective [28, 29, 35, 45, 50], of which three studies presented the productivity losses that were included [28, 35, 45].

The four studies that were classified as CMAs were trial-based evaluations and incorporated both a short time horizon (duration of diagnostic process or of hospital stay) and adopted a local (hospital) perspective. Looking at CEAs, the time horizon varied substantially. Three studies used a time horizon capturing the duration of the diagnostic process or the duration of hospital stay [25, 30, 34], whilst there were also four studies that incorporated a lifetime horizon [29, 33, 34, 47]. The majority of CUAs used a lifetime horizon (N = 10) [22, 2628, 31, 35, 43, 48, 49, 51] and adopted a healthcare system perspective (N = 11) [22, 23, 31, 38, 39, 4244, 48, 49, 51]. Only one CUA used a hospital system perspective [32]. The time horizon in model-based studies ranged from the duration of the diagnostic process (N = 2) [32, 34] to a lifetime horizon (N = 14) [22, 2629, 31, 33, 35, 43, 4749, 51, 52]. Panattoni et al. [43] incorporated both a short-term and long-term (lifetime) horizon to distinguish direct effects from long-term health economic outcomes.

3.6 Decision Model

Most model-based evaluations did contain a Markov model to link the direct effects of biomarkers to long-term costs and effects (N = 19) [2629, 33, 35, 3841, 4347, 4952]. Nine of these studies used a decision tree to capture the differences in accuracy of diagnostic strategies (sensitivity/specificity) on (changes in) treatment recommendation, and fed this into a Markov model to estimate the effects on resource use and health outcomes associated with the recommended treatment(s) [26, 27, 41, 43, 47, 4952]. Nine other economic evaluations used a decision tree only for modeling outcomes (N = 9) [2022, 25, 31, 32, 34, 37, 48].

3.7 Thresholds

The thresholds to determine cost effectiveness varied in CEAs. Of all studies assessing the cost per life-year gained (N = 5), three studies considered a threshold between the US$50,000 and US$100,000 [29, 40, 47], one study used a maximum willingness-to-pay threshold of €35,000 per life-year gained [52], and another study employed a threshold of US$25,876 for cost effectiveness [33], which was extrapolated from recommendations of the Commission on Macroeconomics and Health of the World Health Organization (WHO). Other CEAs assessing the cost per additional case detected did not define a cost-effectiveness threshold. Most of the CUAs that mentioned a cost-effectiveness threshold adopted the National Institute for Health and Care Excellence recommendations, representing a value between £20,000 and £30,000 per QALY (N = 8) [22, 23, 31, 44, 46, 48, 49, 51], or adopted the US recommendation of US$50,000 per QALY (N = 7) [2628, 35, 38, 41, 46]. One study chose three different thresholds (NZ$10,000, NZ$30,000, and NZ$50,000) [43], and another applied the recommendation of the WHO which indicates that an incremental cost-effectiveness ratio (ICER) within three times the Gross Domestic Product per capita is cost effective [45]. Three CUAs did not mention a threshold [32, 39, 50]. As a result of the differences in comparators, outcomes and costs considered, time horizons and cost-effectiveness thresholds, the comparability of the cost-effectiveness results among CEAs and CUAs is limited.

3.8 Sensitivity Analyses

Only deterministic sensitivity analyses were employed in nine studies to assess the robustness of model outcomes against parameter uncertainty and model assumptions [23, 25, 26, 30, 34, 39, 45, 48, 50]. Three studies performed only probabilistic sensitivity analyses [4244], and most studies did both probabilistic and deterministic sensitivity analyses (N = 16) [22, 2729, 3133, 35, 38, 40, 41, 46, 47, 49, 51, 52]. Scenario analyses were presented in seven publications [26, 28, 29, 34, 35, 39, 41]. These analyses mainly involved two-way sensitivity analyses, yet the study of Kwon et al. [28] investigated the cost effectiveness and the cost utility of BRCA mutation testing in a realistic and an ideal scenario. Across included studies, the cost-effectiveness result was most sensitive to assumptions regarding the test accuracy and costs of the biomarker (N = 7) [22, 27, 33, 35, 38, 41, 51], the relative risk of an event (N = 4) [31, 38, 41, 43], and the proportion of people accepting genetic testing (N = 2) [29, 47]. One study reported that the cost-effectiveness result was most sensitive to the discount rate [52].

3.9 Handling of Specific Issues

Several economic evaluations assessed the application of diagnostics in multiple subpopulations by either comparing multiple scenarios or testing assumptions in deterministic sensitivity and scenario analyses (N = 7) [36, 43, 47, 48, 5052]. This was particularly the case in genetic cascade testing where the number of relatives tested or the age of patients (at the starting time of genetic testing) were varied. Over half of the studies (N = 19) [20, 22, 24, 25, 28, 29, 33, 35, 40, 41, 43, 4552] (Table 3) modeled multiple realistic strategies resulting from various test sequences or combinations of tests, with a maximum of 16 strategies being compared with the reference strategy [47].

Table 3 Specific issues regarding the economic evaluation of diagnostics

Other commonly reported issues in the economic evaluation of diagnostics were the assumptions made regarding the compliance or acceptance of genetic testing among relatives (N = 7) [28, 29, 44, 47, 49, 50, 52]. The robustness of these assumptions was evaluated in deterministic or probabilistic sensitivity analyses. Most of these studies indicated that varying the compliance or acceptance rate of testing either did not change the ICER significantly [28, 29, 50] or did improve the ICER [44, 47, 49]. The studies that found no changes in the ICER after varying compliance rates were CEA studies adopting a societal perspective and assessed incremental costs per life-year gained. On the other hand, the studies that did find improvement of the ICER adopted a healthcare or third-party payer perspective and assessed the incremental costs per QALY (N = 2) [44, 49]. The study of Wordsworth et al. [52] did incorporate the acceptance rate of testing, but they did not study the effect of changing this value on outcomes. Patient preferences were incorporated in only one study (N = 1) [31]. This study used the ‘wait-trade-off’ method to quantify the disutility associated with the discomfort of undergoing a test. Incorporating the wait-trade-off did not affect the cost effectiveness of the investigated procedure [32]. None of the studies considered different payer perspectives, cost sharing arrangements or variable opportunity costs due to population density variability.

4 Discussion

Economic evaluations on biomarkers cover a wide clinical spectrum including both markers for diagnosing or staging a disease and markers for genetic testing to guide treatment. One of the more significant findings to emerge from this study is that the design and methodology of an economic evaluation is and should be tailored to the diagnostic test being evaluated, to appropriately reflect the multiple applications of diagnostic biomarkers and to comprehensively assess their added value. However, developments in the evidence base underlying economic models and measurement of outcomes beyond health are required to fully capture the potential real-world value of diagnostic biomarkers and hence to inform value-based payments.

This study found three methodological aspects that were frequently employed in evaluating diagnostic biomarkers. First, multiple diagnostic strategies were compared to reflect the potential applications of biomarkers. The large number of comparators resulted from the combination of biomarkers and their relation with future treatment strategies. Second, several scenarios were defined in the base-case analyses of model-based studies to represent the relevant applications in (sub)populations. The complexity of models was particularly associated with studies assessing genetic testing strategies, which require the evaluation of diagnostic strategies in subpopulations depending on the number of relatives and the age of persons being tested. Trial-based studies were characterized by a less complex design because they often included short time horizons, evaluated only costs (CMA), and adopted a local perspective. Last, model-based studies made extensive use of deterministic sensitivity and scenario analyses to assess the robustness of the results regarding test acceptance and compliance.

With regard to the cost-effectiveness results, this study showed that the accuracy of a diagnostic test, the unit costs, and the proportion of people (relatives) accepting genetic testing were the driving factors of cost effectiveness. These results are in accordance with those obtained by Frank and Mittendorf [53], who reported the same key drivers of uncertainty in the cost effectiveness of pharmacogenomic profiling in metastatic colorectal cancer. Overall, the finding of this study indicates that specific methods are used in economic evaluations to reflect various applications of diagnostic biomarkers.

Previous studies indicated a lack of evidence base to populate economic models about diagnostics [5456]. An example is the paucity of data available about prescribing behavior and adherence to treatment [54], which is why assumptions have to be made. Also in this study, we saw that sensitivity analyses were used extensively, especially with regard to compliance and acceptance of a diagnostic test. The poor evidence base leads to a high level of uncertainty of the cost-effectiveness results and is a likely cause of the limited implementation of diagnostics. Even when supportive efficacy data from randomized clinical trials is available, biomarkers are likely to be used differently in actual practice, compared with trials or academic environments that closely follow clinical guidelines. Furthermore, as real-world populations include multiple cohorts with different rates of disease, biomarker prevalence and behaviors, the population impact is likely to vary from that estimated for hypothetical cohorts or trial participants.

Another concern regarding health economic analyses of diagnostic biomarkers is that their potential value is not fully captured by the conventional outcomes measures used in economic evaluations. Beyond the outcomes on health, diagnostic biomarkers may affect ‘personal utility’ assessed by non-health outcomes [57]. This may be due to the utility of diagnostic information (‘value of knowing’), for example, in genetic testing of relatives, or it might result from the experienced discomfort of undergoing a test (process utility). However, few studies incorporated the personal utility that is associated with the use of diagnostic biomarkers. This finding supports the ideas of Veenstra and Brooks [58], who stated that the role of patient-centered value in test assessments and reimbursement policies needs to be explored. Buchanan et al. [57] suggested that the role of non-health outcomes in economic evaluations is still limited because methods such as discrete choice experiments are rarely used to inform economic evaluations. This is in accordance with the results from this review seeing only one paper that incorporated direct disutility, and the input for this parameter was based on assumptions rather than evidence-based methods to elicit patient preferences. Valuing these outcomes in economic evaluations is also of importance as it might affect adherence and thereby patient outcomes [56]. Further research in this area is required to provide guidance for quantifying and incorporating non-health outcomes in economic evaluations.

A limitation of the included studies was that in many cases the target audience was not clearly stated. A good understanding of the target audience and perspectives is useful to identify their evidence needs and incentives to adopt a new technology when proven valuable [55]. Crucially, the outcomes assessed in an economic evaluation should clearly resonate with the target audience. Cost-effectiveness outcomes from a healthcare system perspective (public sector) or societal perspective are typically recommended for decision making, but often fail to effectively inform decisions as made by specific private payers and providers that operate in financial silos within the healthcare system. Researchers should provide a clear statement on who their target audience is, to substantiate the framing of the study design. Taken together, future research on the cost effectiveness of diagnostic biomarkers should focus on the evidence of model input, outcomes beyond health, and the perspective of the analysis to fully examine the value of diagnostic biomarkers.

In the current study, a comprehensive search strategy was used. Literature was searched in the PubMed and NHS EED databases as it was considered as being an effective strategy for capturing most relevant economic evaluations on diagnostic biomarkers. As a result, some relevant publications may not have been captured. The literature was searched on papers published from 2010 onwards to review recent economic evaluations on diagnostic biomarkers. The largest increase in the number of publications on diagnostic biomarkers was observed during the last 5 years (from 2010 onwards) and was thus captured in this review. Last, only studies published in English were included; therefore, some potentially relevant publications may have been missed. The scope of the review was limited to non-communicable diseases in middle-income and high-income countries. Economic evaluations on diagnostic biomarkers for communicable diseases in low-resource settings were not reviewed in this study.

The present study contributes to the existing knowledge about methods to examine the health economic impact of diagnostic biomarkers. A key strength of this review was the broad perspective on the potential use of biomarkers in the whole spectrum of diagnostics. There was, however, a bias towards biomarkers in cancer diagnoses as the use of biomarkers in this field is rapidly extending [56, 59] and overrepresented in the literature. That said, no major differences between study design and outcomes were observed between disease groups.

5 Conclusions

Published health economic evaluations of biomarkers used for diagnosing and staging diseases are characterized by a large number of comparators and potential clinical applications, which need to be modeled in order to determine the biomarkers’ value. Improving the evidence base of and methods for incorporating non-health outcomes and patient preferences is crucial to fully capture the value of diagnostic biomarkers, and to inform value-based reimbursements.