FormalPara Key Points for Decision Makers

During ten rounds of screening in Norway, digital breast tomosynthesis was simulated to result in around 500 deaths averted and 2300 life-years gained at an additional screening cost of €29 million.

Results suggest that digital breast tomosynthesis may be cost effective in a national screening program, being cost effective in over 80% of the simulations at willingness-to-pay thresholds per life-year gained above €22,000.

1 Introduction

Digital breast tomosynthesis (DBT) is an imaging technology providing three-dimensional reconstructions of the breast from a series of low-dose mammographic exposures, over a limited angular range. Standard digital mammography (DM) provides two-dimensional images, which could impede breast cancer detectability, particularly in females with dense breasts. Several observational studies in Europe and the US have shown promising results for tomosynthesis as a screening tool, compared with DM [1,2,3,4,5,6,7,8]. These include higher cancer detection rates as well as lower false-positive screening results, while the rates of interval cancer (cancers detected between scheduled screenings) seem to remain similar. In contrast, early results from the randomized To-Be trial in Norway showed mixed results regarding the benefits of DBT including synthetic mammograms [9,10,11].

Further studies from the US suggested DBT to be cost effective relative to DM, either in combination with DM, or as a stand-alone screening modality [12,13,14,15]. This first wave of literature focused on short-term outcomes available from observational data, such as comparing screening costs and recall, biopsy and detection rates with DM. A female is called back for further assessment with or without a biopsy if suspicious findings are identified on the screening mammograms. If the result of the recall examination is negative, it is defined as a false-positive screening.

During the past 5 years, the literature has been complemented by studies using more detailed cost-effectiveness models for simulation of long-term outcomes in routine screening with DBT in the US [16, 17], Canada [18] and Europe [19, 20]. The results are varied. A Canadian study found DBT to be cost effective with an incremental cost-effectiveness ratio (ICER) of Canadian $17,149 per quality-adjusted life-year (QALY) [18]. A US study found a similar result with an ICER of $20,300/QALY [16]; while the opposite was found in a second US study with ICERs ranging from US$195,026 to US$270,135/QALY depending on the model [17], all assuming a willingness-to-pay (WTP) of US$100,000/QALY. A Dutch study found DBT was not cost effective at a WTP of €20,000 per life-year gained, but was cost effective at a WTP of €35,000 (ICER = €27,023) [19]. As discussed in the studies, results depend on the additional cost of DBT per screening [17, 19, 20] and the reduction in recall rates [18]. Differences in sources for input parameters and modelling approaches, in considering DBT as a stand-alone modality or as an addition to DM and in screening routines and programs across countries, could lead to conflicting results. However, the lack of data from randomized trials in all phases of follow-up after DBT is also apparent; for instance, how recall and detection rates differ between prevalent and subsequent screenings with DBT, and how this may affect the procedures and costs of recalls and treatment compared with DM, has not been reported. The need for more evidence on DBT from randomized trials is clear in two recent systematic reviews [21, 22]. From historical data on DM, detection rates at prevalent screenings tend to be higher than those for subsequent screenings [23]. Further, interval cancer rates after screenings with DBT may differ compared with DM.

Starting in four counties in 1996 and gradually expanding, BreastScreen Norway became nationwide in 2004. The Cancer Registry of Norway is responsible for administration of the program, which offers biennial screening for all females aged 50–69 years [23]. A randomized controlled trial started in January 2016 at Haukeland University Hospital in Bergen (the To-Be1 trial, [9]). All females participating in the screening program and who agreed to take part in the study through December 2017 (one screening round) were randomized into two arms, either DM or DBT. Some of the participants were screened for the first time, while others had previous screenings. To-Be1 was immediately followed by To-Be2, offering all females screening with DBT in the following screening round. In essence, this means that participants were randomized either to two rounds of DBT, or to DM followed by DBT. This was done to increase the number of first (DBT after DM) and subsequent DBT screenings in the trial, and thus the screening and follow-up data for DBT mentioned above. In addition, one could study whether the results differ depending on screening method in the previous round, and the interval cancer rates in the two arms in the intervening period. New participants in To-Be2 were screened with DBT, and were not part of the randomization. The study ended in January 2020, and more than 37,000 females were included in total across the two rounds. Data from all procedures used and tumors detected were collected and registered at the Cancer Registry of Norway.

As To-Be was a randomized trial within the population-based screening program, the recall and detection rates by cancer stage for prevalent and subsequent screenings, and interval cancer rates in the intervening period can be estimated directly from the data. The same applies to additional costs of DBT screening, recall costs and short-term costs of treatment (within a year). Hence, many parameters of interest in a cost-effectiveness analysis can be estimated from a single source of empirical data, instead of using estimates from different sources, observational data or expert opinion. However, long-term costs and survival still need to be estimated from other sources. We used recent estimates for 10-year costs [25] and survival from participants in BreastScreen Norway.

Due to the lack of data from randomized trials described above, the first objective was to present estimates for recall and detection rates, medical procedures and resources used at screening, recalls and treatment, for the combinations of prevalent and subsequent DM and DBT screening as observed in the trial. Second, by using the estimates in a decision tree model we simulated and compared the number of recalls, false positives, cancer cases detected at screening or as interval cancer, costs at different stages of follow-up, deaths and life-years gained, if switching from DM to DBT in BreastScreen Norway in 2018. If DBT is implemented nationwide, one would observe two combinations at the initial screening: Being screened with DBT and no previous screenings within the program (prevalent screening with DBT), or being screened with DBT and having a DM screening in the previous round. In the next screening round, there would once again be two possible combinations: Having a prevalent DBT screening, or being screened with DBT following a DBT screening in the previous round (subsequent screening with DBT). As mentioned above, recall and screening cancer detection, stage distribution of the tumors, as well as the procedures used in each stage, may differ between the three combinations—prevalent DBT, DBT following a DM screening in the previous round and subsequent DBT screenings. There may further be a difference in the corresponding estimates for prevalent DM and subsequent DM screenings. In addition, interval cancers may differ between DBT and DM screening programs. Any difference between the two technologies affecting the stage distribution should thus lead to an effect on mortality [23, 24]. Interval cancers have higher mortality and are more costly to treat than screen-detected cancers [25, 26].

2 Methods

2.1 Description of the Model, the To-Be Sample and General Assumptions in Prediction Analysis

A diagram of the decision tree model is given in Fig. 1. We do not consider any difference in screen detection of TNM4 (Tumor-Nodes-Metastasis [27]) cancers, due to only one case being detected within To-Be. Cancers of unknown TNM stage are also relatively rare in the program, and not considered. As the goal is to simulate costs and effects of implementing DBT in BreastScreen Norway and compare these with the current DM screening practice, the model runs in calendar time starting from 2018, not in age of participants. The cycle length is one screening round, corresponding to 2 years. We simulated ten rounds of screening, corresponding to 20 years. Under the DBT scenario, we distinguished between the first and subsequent cycles in order to capture the different combinations of prevalent and subsequent DBT screens and DBT screens after DM screens as described in the Introduction. The transition probabilities, and references for these, are given in Tables 1 and 2. It is apparent that the detection rate (number of screen-detected cancers per stage per 1000 screenings) for TNM1 cancers is somewhat higher for DBT than for DM (Table 2), while the opposite is true for TNM3 (Table 2) and interval cancers (Table 1). Interval cancer was defined as breast cancer diagnosed 24 months after negative screening or 6–24 months after false-positive screening [28]. Detailed information on the trial and clinical results are given elsewhere [10, 28].

Fig. 1
figure 1

Structure of the model. The transition probabilities are given in Tables 1 and 2. The death state is death from breast cancer. Background mortality, that is mortality from the “no cancer” state, is not considered and thus assumed equal. DBT digital breast tomosynthesis and synthetic mammography, DCIS ductal carcinoma in situ, DM standard digital mammography, TNM Tumor-Node-Metastasis

Table 1 Input parameters, sources, point estimates, standard errors and distributions for parameters not depending on TNM stage in the model in Fig. 1
Table 2 Input parameters, sources, point estimates, standard errors, and distributions for the TNM stage-specific parameters in the model

The size of the screening population, and the distribution of prevalent and subsequent screenings, are assumed to change over time according to Statistics Norway’s main alternative for population growth by sex and ageFootnote 1 (Supplementary Table 1, see electronic supplementary material [ESM]). We assumed that the participation rate in BreastScreen Norway would be the same under both scenarios, at 75% [23]. The screening should be run as effectively as at Haukeland, meaning that the equipment is used at full capacity during the year. As all transition probabilities are estimated from the To-Be trial data (Tables 1 and 2), we assumed that the sample is representative for Norway. Among the 32,976 females attending BreastScreen Norway in Bergen between Jan 2016 and Dec 2017, around 91% agreed to be randomized as part of To-Be1. Further, 87% of those eligible for re-attendance (below the upper age limit of 69 years) agreed to be screened with DBT in To-Be2. Hordaland County (incl. Bergen) has historically had very similar detection rates at screening as the Norwegian average (Tables 4.9 and 4.10 in [23]). In total, the sample included 10,184 prevalent screenings with DBT (1913 randomized in To-Be1), 24,181 screenings with DBT after DM and 11,461 screenings with DBT after DBT. In addition, there were 1976 prevalent screenings with DM and 12,752 screenings with DM after DM in To-Be1. The transition probabilities for prevalent DBT in Table 2 are based on the 1913 females who were randomized to DBT in To-Be1. With the remainder of the methods, we assumed parameters were equal in the DM and DBT scenario in cases where empirical data was not available. Cancer stage-dependent transition probabilities were only used for screening cancers, as any difference in stage distribution for interval cancers between the two scenarios was unknown. We assumed that the interval cancer rates were the same for prevalent and subsequent screenings within each scenario.

2.2 Cost Estimation

The model inputs relating to costs are given in Tables 1 and 2. The year of valuation is set to 2018 for all unit costs (except for hourly wages for radiographers and radiologists, 2017) to simulate results of implementing DBT nationwide instead of just in Hordaland county at the time of the To-Be trial. The study only considers direct health care costs, that is, costs of procedures related to recalls and treatment, and additional costs of switching from DM to DBT for screening examinations. The analysis is not based on total unit cost estimates for DM and DBT screening, but rather on all factors differing between DM and DBT screening (reading times, cost of new machines, increased IT storage and connectivity needs etc.; see [11] for a detailed description of the estimation). In short, we assumed an average additional examination time in the laboratory of one minute per DBT screening based on registrations performed in To-Be1 [9] and an average of 2.5 radiographers to be present at the examinations. For the screening and consensus readings, the time used was registered in seconds for each female in the data. Consensus readings are performed to determine whether a recall examination is required. Hourly wages for radiographers and radiologists for 2017 were acquired from Haukeland University Hospital. Investment costs for the DBT were acquired from the project budget (additional IT storage and connectivity costs for DBT, additional cost of tomosynthesis-equipped mammographs). The annual investment cost was estimated assuming a life expectancy of 10 years for the equipment, and a constant depreciation rate of 4%. Multiplying the extra time spent per examination by the wage of staff, and adding investment costs split equally across females in the DBT arm, yielded an estimate for the additional cost of DBT compared with DM. Requirements for all other infrastructure was assumed equal for the screening methods. Hence, the cost difference between DBT and DM of €8.10 should be representative (€0.40 lower than reported in [11], as the results have been updated with more data). Procedures performed at recall assessments were registered using national radiologic procedure codes, and costs were estimated from mapping procedure codes to associated reimbursement tariffs. Further details on the estimation of all costs are given elsewhere [11].

It would be preferable to use separate treatment cost estimates for cancers detected at DBT and DM screenings in order to take into account savings due to earlier cancer detection within TNM stages in one of the scenarios. However, the To-Be trial is too small for precise treatment cost estimation [11] and only includes costs incurred during the first year of treatment. Consequently, in the simulation we instead used 10-year treatment costs from Moger et al. [25], adjusted for inflation from 2008 to 2018 levels by the consumer price index using an inflation of 25%.Footnote 2 The price of a diagnosis related group (DRG) increased by approximately the same percentage from 2008 to 2018. The treatment costs estimated in Moger et al. [25] were based on the reimbursement paid to the hospital by DRG category of some 16,000 cancer cases, split by TNM stage for screen-detected cancers, and with separate estimates for interval cancers. Accordingly, treatment costs within each TNM stage for screen-detected cancers, and overall for interval cancers, are assumed to be the same in both the DBT and DM scenarios. A discount rate of 4% per year was applied to the costs when running the model through the ten cycles, as recommended by the Norwegian Agency for Public and Financial Management [29]. As in Moger et al. [11], an exchange rate of €1.00 = NOK9.75 was used throughout.

2.3 Effect Estimation

Model inputs related to effects are also given in Tables 1 and 2. Life-years gained in each scenario is based on estimates for 10-year overall survival from the Cancer Registry of Norway. This approach is used mainly as a trade-off between the fact that breast cancer patients have increased mortality risk for a period up to 20 years following detection, the wish to use as recent data as possible, and limiting mortality from other causes. The average age at death was around 82 years in 2019.Footnote 3 We assumed that the 10-year survival probability is equal in the two scenarios for both TNM stage-specific screen-detected cancer and for interval cancer. Estimating life-years gained in each scenario is based on (1) the product of the TNM stage and interval cancer-specific probabilities of death within 10 years from diagnosis and the mean survival times for females dying within 10 years, and (2) the corresponding probabilities of surviving 10 years multiplied by 10 for the survivors. The life-years are discounted at a rate of 4% per year, as recommended by the Norwegian Agency for Public and Financial Management [29]. Important aspects like the rate of over-diagnosis and the background mortality due to causes other than detected breast cancer are assumed equal under both scenarios. Based on different models for estimating over-diagnoses [30,31,32], we assumed rates of 20% for ductal carcinoma in situ (DCIS) and 15% for TNM1 tumors. Benign tumors are assumed not to affect mortality, and are not considered in the analysis.

2.4 Statistical Methods

Descriptive results for medical procedure use and associated costs at screening, recall and treatment are presented as means and standard deviations for continuous variables and as percentages for categorical variables. In addition to the three combinations (prevalent DBT, DBT following DM and subsequent DBT screenings), we also present results from the DM arm in To-Be1 for comparison. p-Values for difference across screening combinations are reported from Kruskal–Wallis tests for continuous variables and chi-square tests for categorical variables. In the descriptive comparison, procedure frequency and associated costs in the treatment stage are estimated from the To-Be trial by mapping procedures into DRG categories and using associated DRG weights and prices for 2018 (as in [11]). Thus, these are only describing the short-term costs within a year following screening, and not used in the simulation analysis as described above.

We estimated the difference in false positives (recall examinations minus screen-detected cancers), and, both by cancer stage and in total, the difference in cancers detected, overall deaths averted within 10 years, life-years gained censored at 10 years and costs censored at 10 years across the ten screening rounds. Estimates are given both for the total population and standardized per 100,000 females invited to screening. To estimate the difference in overall deaths, life-years and costs, some additional assumptions are needed. Given an additional number of cancers detected within a cancer state (TNM stage at screening or interval cancer in Fig. 1) in one scenario (DBT or DM), these would not go undetected forever in the other scenario. We assumed that additional cases detected in a state in one scenario also would be detected in the other scenario no later than by the next screening, but possibly in a different state (screen-detected DCIS-TNM3 or interval cancer, and corresponding survival estimates). We used two complementing approaches. The first approach assumed that the additional cancers detected in a state in one scenario would be detected according to the corresponding distribution of screen-detected cancers by TNM stage and interval cancer under the other scenario (adjusting for 20% over-diagnosis for additional cases in stage DCIS and 15% in TNM1). This is the base case. The scenario analysis assumed that additional cancers detected in a state other than TNM1 in one scenario would be screen-detected in stage TNM1 in the other scenario. TNM1 is by far the most common cancer state (Table 2). Assuming screen-detected survival for many of the non-detected cases in a state and censoring survival at 10 years should yield conservative estimates for the effect difference between the two scenarios. Deterministic results for difference between scenarios are shown for the base case in the paper, with corresponding mean values under each scenario reported in Supplementary Table 2 (see ESM).

Quality-of-life estimates are difficult to implement in the model. Despite this limitation, we discuss results in terms of willingness-to-pay (WTP) for thresholds up to €35,000, an approximate upper limit for quality-adjusted life-years used by the National Institute for Health and Care Excellence [33]. There is no officially recommended limit for WTP per life-year in Norway.

2.5 Sensitivity Analyses

In probabilistic sensitivity analyses, we applied the distributions given in Tables 1 and 2 to the parameters, using the corresponding point estimates and standard errors as input. Parameters related to population size and distribution, participation rate, prevalent/subsequent screenings and over-diagnosis were assumed invariant. To assess uncertainty, results from 10,000 simulations were plotted in the cost-effect plane, and cost-effectiveness acceptability curves (CEAC) were estimated. CEACs are presented assuming both the €8.10 additional cost of DBT estimated in To-Be and a maximum additional cost of DBT to reach cost effectiveness in at least 50% of the simulations at a WTP of €35,000. Results from the scenario analysis and from using undiscounted costs and life-years gained are reported.

As one-way sensitivity analyses, we report the effect of equal or lower recall rates in the DBT compared with the DM scenario. Further results using cancer detection rates from BreastScreen Norway excluding Hordaland 2016–2019 in the DM scenario (deterministic results and CEAC), or using a 20-year horizon for survival (CEAC only) are included in the ESM. The first indicates to which extent the results depend on the cancer detection rates from the To-Be data, while the second indicates the impact of extending the time horizon.

All input parameters and standard errors from the To-Be trial were estimated using Stata version 16. The analysis was run in R version 3.6.2.

3 Results

3.1 Descriptive Analysis

The results from the descriptive analysis comparing medical procedure use and associated costs at screening, recall and treatment for the different screening groups are presented in Table 3. The reading times for DBT were significantly higher than for DM. The consensus rate was also significantly higher for prevalent screenings than for subsequent screenings, regardless of study arm. The time used at consensus, on the other hand, was longer for subsequent compared with prevalent screenings. In the treatment stage, there are few indications of any differences across the DBT groups, but the statistical power is limited. There are indications that radiotherapy is given more frequently to patients in the DBT arm, similar to what was found in the previous analysis using To-Be1 data only [11].

Table 3 Descriptive statistics at different stages of follow-up, where n denotes the number of observations in each stage (i.e. a subset of the previous stage). Treatment stage includes only malignant tumors. Costs are given per observation in the respective stage

3.2 Deterministic Analysis

Estimates from the deterministic prediction analysis using the base case are given in Table 4. During ten rounds, approximately 5 million screenings are conducted. According to the point estimates, DBT is expected to result in an additional 750 false positives, but also 5200 more screen-detected cancer cases compared with DM. As seen from the state distributions and number of cases detected, the main difference between DBT and DM is expected for stage TNM1 with more cancers for DBT versus DM, while there are less cancers detected in stages TNM2 and 3, as well as interval cancers, for DBT versus DM. The latter, combined with the higher treatment costs for later stage and interval cancers and the modest additional cost of DBT screening, results in slightly higher total costs for DM compared with DBT, and DM being dominated in the deterministic analysis.

Table 4 Point estimates for false positives, cancers detected, overall deaths averted, discounted life-years gained, cost difference and incremental cost-effectiveness ratio (ICER)

3.3 Sensitivity Analyses

The corresponding graphs presenting the simulation results on the cost-effectiveness plane and the CEAC are shown in Fig. 2. More cases are detected for DBT in state TNM1 in all simulations, while this varies across the simulations for all other cancer states. A gain in life-years with DBT was shown in 98% of the simulations, while 54% of the simulations showed lower costs for DBT. The CEAC indicates that DBT is cost effective in over 80% of the simulations for WTP thresholds above €22,000. If the additional cost of DBT was twice as high as estimated (€16.20), DBT would be cost effective in over 50% of the simulations at a WTP of €11,000. At a WTP of €35,000, additional costs of DBT screening up to €32 yields cost effectiveness for DBT in at least 50% of the simulations.

Fig. 2
figure 2

Results of 10,000 simulations plotted in the cost-effectiveness plane per 100,000 females invited to screening (top). Cost-effectiveness acceptability curves based on the simulations (bottom). Black curves show the result for an additional cost of DBT of €8.10 estimated from To-Be, red curves show the result for an additional cost of DBT of €32, the maximum additional cost for cost effectiveness at a WTP of €35,000

The structural change of the model by using the scenario analysis did not substantially alter the results compared with the base case. The point estimate for the ICER was negative at €2800 per life-year gained, DBT being dominant. DBT was cost effective in over 50% of the simulations at all WTP thresholds, and in 80% of the simulations at WTPs above €23,000. Without discounting costs and effects, DBT was cost effective in over 50% of the simulations at all WTP thresholds, and in over 80% of the simulations for WTPs above €25,000. Finally, when using identical recall rates for DBT as estimated for DM, DBT was cost effective in over 80% of the simulations for WTPs above €21,000. The result improved further in favor of DBT for lower recall rates, for example to €15,000 for 0.5% lower recall rates in the DBT scenario.

Supplementary Table 3 and Fig. 1 show the results using transition probabilities to cancer in the DM scenario estimated from the BreastScreen Norway excluding Hordaland, 2016–2019 (see ESM). The transition probabilities are somewhat lower than those estimated for DM from the To-Be trial, resulting in a greater difference in the total number of cancers detected, and smaller difference in the number of interval cancers. DBT was cost effective in > 50% of the simulations for WTP thresholds above €4000, and at a WTP of €35,000 for additional costs of DBT up to €24. Similarly, Supplementary Fig. 2 shows results using a 20-year time horizon for survival. The point estimate for the ICER was negative at €768, DBT being dominant. The CEAC for the estimated additional cost of €8.10 is similar to that of the base-case analysis using a 10-year horizon, although increasing more steeply in favor of DBT. DBT was cost effective in > 80% of the simulations at WTP thresholds above €14,000.

4 Discussion

To our knowledge, this is the first study comparing diagnostic procedures, treatment, short-term cost and effect estimates for DBT versus DM based mainly on data from a single randomized trial. The descriptive analyses showed that biopsies and DBT imaging were more frequent in the DBT arm in recall assessments. This could be due to the higher screening detection rate for DBT estimated in the trial. This could further be due to the technology, but also a learning effect by the radiologists [34]. DBT is available to the hospital for clinical examinations, regardless of whether DBT is implemented in the screening program or not, and is typically performed in different locations to screenings. This is also reflected in the higher recall costs for DBT seen in Table 3, as recall costs were based on reimbursement tariffs. In the treatment stage, there were few indications of different procedure use across the three groups (prevalent DBT, DBT following DM and subsequent DBT). Any differences between the groups could be related to the TNM stage distribution. For example, this is indicated for total mastectomy in the prevalent DBT group, given the higher relative frequency of TNM2 cancers in this group compared with the groups DBT following DM and subsequent DBT. The prevalent DM group was too small for comparison in the treatment stage.

In the base case, we estimated that during ten rounds of screening, and with censoring effects at 10 years post-screening detection, DBT would result in around 500 all-cause deaths averted and 2300 life-years gained at a relatively modest additional screening cost of around €29 million. In the sensitivity analyses, the results indicated that DBT was cost effective at a wide range of WTP thresholds per life-year gained. When the additional cost of DBT screening was more than €23–€42 higher than DM, DBT became cost effective in <50% of the simulations at the WTP threshold of €35,000. The time horizon for survival was censored at 10 years due to the simple model structure, but the analysis in the ESM indicates results more in favor of DBT if the time horizon is increased. The results indicate cost effectiveness for higher additional costs of DBT, for instance at a WTP of €11,000 when the additional cost was twice as high as estimated from the To-Be trial data. This implies they should hold even though estimates for the costs of implementing DBT in mobile screening units, or that screening may possibly be run less effectively in rural parts (fewer females screened per day) were not considered in the analysis. In summary, the results in the paper follow to a large extent from comparing two screening technologies estimated to have a modest difference in price, where one has a slightly higher screening detection rate than the other, and where the validity of detection for both technologies are assumed to be the same.

A recent meta-analysis of observational data for DBT compared with DM in breast cancer screening showed no reduction in recall rates for the European studies (pooled average increase 0.5%, recall rates for DM 2.6–4.9%), while cancer detection rates were higher (pooled average increase 2.4 per 1000 screenings, detection rates for DM 5.3–7.1 per 1000) [35]. This is in line with the To-Be sample used here, as the overall recall rate was 4.3% for DM and 0.1% higher for DBT, while the detection rate was 5.8 per 1000 for DM with an increase of 1.8 per 1000 for DBT (not shown in tables). Generally, the recall rates are significantly lower in European screening programs than is seen in the US and Canada, while the opposite is true for the detection rates. The former means that substantial gains in cost effectiveness for DBT from lower recall rates is unrealistic in Norway, and the importance of reducing recall rates when evaluating cost effectiveness, as pointed out in [18], is less of an issue.

Due to differences noted in the Introduction and above, it is difficult to compare results both across the previous cost-effectiveness studies [17,18,19,20] and with this study. Some overall considerations can be made. Comparing the base-case ICER estimated here with those mentioned in the Introduction, the latter indicate cost effectiveness for DBT at higher values of WTP per QALY or life-year gained. However, the additional costs assumed for DBT screenings were higher, ranging from around €25 to €50 (US$56). As previously mentioned, the cost-effectiveness conclusion is considerably influenced by the additional cost of DBT. In a Dutch study that had life-years gained as the outcome [19], an additional cost of around €25 was applied in the MISCAN-model (€91 for DBT vs €66.37 for DM). As indicated in Fig. 2, applying an additional cost of €25 instead of €8.10 for DBT in the base-case analysis results in a very similar CEAC as reported in the Dutch paper. Perhaps somewhat coincidental given the vast differences between the analyses (decision tree vs microsimulation), the time horizons and target populations (simulations across calendar time vs age of a screening population), the effects (10-year overall survival probabilities and associated life-years vs actually modelling survival) and cost estimates, it could indicate some generalizability of the results at least within a European setting. Estimating the additional cost of DBT compared with DM will probably still be debated, but the approach taken earlier in To-Be [11] should be a valid approach.

Some pragmatic model assumptions are applied compared with microsimulation models, such as the shorter time horizon and the assumption that all additional cases detected per state in one scenario (minus over-diagnosed cancers) will also be detected in the other scenario. Given the higher number of screen-detected cancers in the DBT scenario, many of these will be in states with higher treatment costs in the DM scenario in the model. This a factor driving the modest total cost difference for DBT versus DM observed in Table 4. A similar mechanism limiting the difference between the two scenarios also applies to the life-years, as most non-detected cases in a state in the DM scenario will get screen-detected survival in the model. Another limitation partly resulting from the above assumptions is that it is difficult to model quality of life and lead-time effects. Some parameters estimated from To-Be, such as the recall and screening detection rates, may be affected by the fact that screenings and follow-up were done within the controlled setting of a trial, but assuming that this affected both arms, it should affect the results for the differences to a limited extent. A similar argument could be made for the estimated difference in total costs and life-years gained over ten screening rounds, but one would expect these to be uncertain in any case. A further caveat is that the costs in Moger et al. [25] include treatments that were likely related to the breast cancer diagnosis, that is, treatment for breast cancer or treatments for common metastases of breast cancer, as well as radiation, chemo- and hormonal therapy. Immunotherapy and other medication treatment was not captured in the estimates. Any extra costs of implementing DBT in mobile screening units (of which there are four in Norway, used continuously through the screening rounds in less densely populated areas) were disregarded in the analysis. Personalized screening and higher quality treatment during the next 20 years might lead to further improved survival for females diagnosed with interval and screen-detected cancers in stage TNM2 and higher. However, as seen in Supplementary Table 4 (see ESM), there have been substantial gains in overall survival in these states in Norway already, with survival approaching that of DCIS.

Some aspects remain for which there exists no information, including from the To-Be trial, when comparing DBT with DM. Examples are over-diagnosis and effects on longer-term outcomes. This is reflected in the analysis by assuming that there is no difference between the two technologies. The To-Be trial may be too small to provide any definite answers regarding all aspects of costs and effects for DBT screening. To shed further light on these issues, several large randomized controlled trials are currently underway in Europe such as the PROSPECTS trial in the UK,Footnote 4 the TOSYMA trial in Germany [36] and the TMIST trial in Alabama, USA [37]. 

5 Conclusion

The results from the To-Be trial suggest that DBT may be cost effective if implemented in BreastScreen Norway. Similar to what is found in previous studies, the conclusion is dependent on the additional cost of DBT screening. In To-Be, this was estimated using a micro-costing approach, resulting in a lower estimate than commonly seen if reimbursement tariffs are used. The estimate reflect the economies of scale when using DBT for a vast number of examinations in public mass screening. For the additional cost of €8.10 estimated in the To-Be trial, DBT was cost-effective in over 50% of the simulations at all levels of WTP per life-year gained, and for WTPs above €11,000 when the additional cost was twice as high. The results from one between-screening interval in To-Be further suggest fewer interval cancers after DBT screening compared with DM. However, any difference to DM in long-term outcomes is still unknown, and conclusions on the cost-effectiveness of breast cancer screening programs in the future could change if the trend of improved overall survival following diagnosis continues.