FormalPara Key Points for Decision Makers

Literature regarding the health economic benefits of simultaneous low-dose computed tomography (LDCT) screening for multiple diseases is absent.

Evidence on the cost-effectiveness of LDCT screening for cardiovascular disease is very limited and is absent for chronic obstructive pulmonary disease.

More evidence on LDCT for diseases other than lung cancer is needed to support further cost-effectiveness analyses.

1 Introduction

Population-based screening could avoid severe consequences of diseases through early detection and early treatment, thereby improving health outcomes and potentially reducing healthcare costs in the long term [1]. However, large-scale screening requires major upfront investments. Given that healthcare budgets are limited, assessment of the long-term impact on health outcomes and costs is required to assist public policy decision-making toward the implementation of screening [2].

Low-dose computed tomography (LDCT) is a promising technology for use in population-based screening programs because it is non-invasive, has a low radiation dose and is relatively inexpensive compared to other imaging modalities [3]. Chest LDCT for lung cancer, specifically, has high sensitivity, has been investigated extensively and is recommended for population-based screening by the US Preventive Services Task Force (USPSTF) [4]. In many European countries, nationwide lung cancer screening is currently being investigated but not yet implemented. Previously, two reviews investigated methodological differences between health economic evaluations of LDCT lung cancer screening [5, 6]. Both concluded that there is no consensus on whether LDCT lung cancer screening is cost-effective and that reported conclusions on cost-effectiveness depend heavily on the screening strategy evaluated (e.g. which population was targeted) as well as methodological decisions (e.g. whether overdiagnosis was included in the analysis).

The European Society of Radiology and European Respiratory Society (ESR/ERS) position statement on lung cancer screening additionally stated that researchers still disagree on what the drivers of a cost-effective lung cancer screening program would be [7]. Drivers could include the target population selected, frequency of screening and the inclusion of smoking cessation programs. According to the National Lung Screening Trial (NLST) data from the US, the main driver is the number of computed tomography (CT) scans, showing that the number of extra CT scans for follow-up after a positive screen should be minimised. Furthermore, the position statement mentions that lung cancer screening might have even more value when broadening screening to also include detection of chronic obstructive pulmonary disease (COPD), cardiovascular disease (CVD) and other smoking-related diseases. This suggests that the cost-effectiveness of lung cancer screening with LDCT may be improved by screening for more diseases on a single scan, thereby increasing screening yield with minor additional costs.

Clinical evidence indeed exists showing that LDCT biomarkers can be used to also detect, amongst others, subclinical COPD and coronary artery disease [8, 9], or even a wider range of cardiovascular, respiratory and oncological diseases [10]. For health economic evidence, an early health economic evaluation, conducted by our group, estimated the potential of screening for lung cancer, COPD and CVD in a multi-disease screening program [11], concluding that there is potential value in simultaneously screening for these diseases.

This review aims to analyse the evidence for the cost-effectiveness of population-based screening programs using chest LDCT across a range of diseases, with a specific focus on full health economic evaluations.

2 Methods

2.1 Search Strategy

The systematic review (Prospective Register of Ongoing Systematic Reviews registration CRD42021290228) was conducted by searching in Scopus and PubMed, following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [12]. The focus of the search strategy was to identify articles and scientific reports describing full health economic evaluations on CT-related population-screening programs. Search terms were kept broad, to ensure that all relevant literature was included, such as searching for “computed tomography” although the inherent focus of this paper is chest LDCT. In addition, to ensure that important studies were not missed in the search, a second set of search strategies was used to search for the cost-effectiveness of screening for diseases that are known to have screening potential using CT (lung cancer, COPD and CVD) without specifying CT. Only studies from the last 10 years before the start of this systematic review have been included due to significant changes in treatment, especially for lung cancer. The search strategy was reviewed by an information specialist and included articles in English, from 1 January 2011 to 22 July 2022.

Health economic evaluations were identified by including search terms similar to those used by Degeling et al. such as cost-effectiveness analysis, cost-utility analysis, cost-benefit analysis, decision analytical model and economic evaluation [13]. Overall, a variety of terminology was included. Studies that were familiar to the authors before the search as well as the reference lists of systematic reviews identified with the search were used in combination with the final set of publications of these studies to verify that the search strategy identified all relevant papers. Therefore, reviews were included in the search strategy, but not in the final set of included studies. Articles, conference papers and reviews were included, and editorial letters, notes, book chapters, short surveys and special reports were excluded. The search strategies were discussed with an information specialist and can be found in Online Resource Table 1 (see the electronic supplementary material).

2.2 Inclusion and Exclusion

Studies were screened to include full health economic evaluations of population-based screening using chest CT. Full health economic evaluations required the reporting of incremental costs and health outcomes of screening strategies or total costs and health outcomes from which the incremental values could be derived. Population-based screening was defined as screening focusing on target groups that are not limited to a patient group, but consist of a group of individuals at risk of a certain disease. Therefore, studies investigating a screening policy within an already existing patient group were excluded. All abstracts were screened independently by the first reviewer (CB) and a random sample of 10% by a second reviewer (MOW). If it was not clear from the title and abstract if a study should be included, the study was labelled as relevant for full-text assessment. Studies on which the reviewers did not agree were discussed until consensus was reached. The first reviewer screened all studies included in the full-text assessment. The second reviewer screened a random sample of 10% and another 10% that the first reviewer identified as challenging cases. All disagreements within the 20% sample were discussed with a third reviewer (HK) until consensus was reached.

2.3 Data Extraction

Information was extracted from included papers by one reviewer (CB) using Covidence online software for systematic reviews [14], and all unclear extractions were discussed between all reviewers (CB, MOW, HK). Extractions included general information such as the authors, journal name, year, country and aim of the study. Information extracted about screening included diseases screened for, recurrence of screening, starting and stopping age of screening and other disease-specific population descriptions. Modelling decisions were extracted such as model type, study perspective, comparator strategy, time horizon, discount rate, treatments in the care pathway and which co-occurring diseases and incidental findings were included. Evidence extracted as used in the simulation model included screening costs, sensitivity, specificity and participation rates. Finally, results were extracted, including (1) the subgroup(s) within the target population being considered, (2) incremental health outcomes and costs, (3) the incremental cost-effectiveness ratio (ICER) of the most cost-effective strategy, (4) the comparator, (5) which willingness-to-pay (WTP) threshold the results were compared to and (6) the conclusion on whether or not the best-performing strategy was considered cost-effective. The reported ICER values are either the final reported base case value or the most cost-effective result when multiple strategies are compared in the main analysis (not considering results from the sensitivity analysis). Incremental costs and effects were reported as found in the respective studies or calculated as the difference between the total costs and effects of the intervention and no screening or another comparator. If the total incremental costs and effects were reported, it was converted to per person costs and effects to be able to compare outcomes across countries with different population sizes. The cost-effectiveness conclusions were reported as found in the respective studies and, if not reported, the ICERs were compared to the WTP threshold reported. The reported incremental costs associated with the reported ICERs were expressed in 2021 values of the local currency using the local inflation rates and subsequently converted to US dollars using the average exchange rate from November 2021. US dollars have been used as a reporting currency, as it is easily interpretable by readers from different countries.

Heterogeneity and bias in the results from different studies were not evaluated in this paper, due to expected variability in methods used. This paper focuses on providing a narrative synthesis and comparison of studies.

2.4 Quality Assessment

The reporting quality of all papers included in the final set was assessed using the Consolidated Health Economic Evaluation Reporting Standards (CHEERS) checklist [15]. This checklist contains 28 reporting criteria, which were scored with 0, 1 or not applicable, based on which the reporting of studies was classified as high (> 85%), medium (60–85%) or low quality (< 60%).

3 Results

3.1 Search Results

The search strategy yielded 1799 unique studies as shown in the PRISMA flowchart, Online Resource Fig. 1 (see the electronic supplementary material). Screening title and abstract resulted in 62 studies for full-text assessment. During this assessment, studies were excluded if they did not present a full health economic evaluation (n = 7), full texts were not available in English (n = 9) and studies did not report, and did not allow the calculation of, incremental health outcomes and costs of screening (n = 4). The remaining 43 studies were included for data extraction, of which the most important information can be found in Table 1.

Table 1 Summary of included studies

3.2 General Characteristics

The general characteristics of all studies are summarised in Online Resource Table 2 (see the electronic supplementary material). All studies focused on lung cancer screening (n = 40, 93%) apart from three studies on CVD (7%) [16, 17]. One study included used CT calcium scoring but then focused the health economic analysis only on identifying incidental pulmonary nodules to detect lung cancer [18]. This study was therefore considered a lung cancer screening study. Studies focused on the healthcare context of the US (n = 15, 35%), Canada (n = 4, 9%), the UK (n = 3, 7%) and 13 other countries, with some countries investigated more than once. The CHEERS score for reporting quality ranged from 37% to 100%, with 20 studies (47%) classified as high quality, 20 (47%) as medium quality and three (7%) as low quality. Online Resource Fig. 2 displays the proportion of studies that reported each item on the CHEERS checklist. No considerable conclusions could be drawn between the reporting quality of the studies and their methods or conclusions. However, it is striking that no studies included a health economic analysis plan, which is an addition to the 2022 checklist. Studies considered target screening populations from the age of 45 up to 100. The minimum number of pack-years of smoking to be eligible for lung cancer screening ranged from 20 to 40. However, some studies had no specific requirement on the minimum number of pack-years smoked (n = 10), of which two used risk assessment to identify the eligible population. The maximum number of years since smoking cessation for former smokers to be eligible for screening was 10 or 15 years, while some studies had no smoking cessation requirements (n = 19, 44%) and some studies only screened current smokers (n = 4, 9%). Two studies investigated the cost-effectiveness of screening in China and Japan, respectively, including non-smokers due to the lower proportion of lung cancer cases attributed to smoking [19, 20]. Two of the three CVD studies included asymptomatic individuals aged 40–85 with intermediate risk based on the Framingham risk score, and the third included individuals aged 40–70 with a family history coronary artery disease events.

Figure 1 illustrates the screening strategies that were considered to be the most cost-effective and are ordered firstly alphabetically according to the country of the study context followed by the surname of the first author. The vertical lines indicate at which ages screening takes place. From this figure, it is clear that there is a consensus on the age (55–74) and frequency (annually) of screening. However, this consensus is likely partially artificial given that the majority (n = 25, 58%) of studies use NLST data (in combination with national survival data or other, smaller trials) and therefore reflect the NLST inclusion criteria in the definition of their screening strategy. The eligibility criteria of this trial were current and former smokers aged 55–74 with ≥ 30 pack-years smoked or who stopped smoking within 15 years from the start of screening. One study investigating the cost-effectiveness of lung cancer screening separately for men and women concluded that men should be screened annually between the ages of 55 and 80 and women should be screened biennially between the ages of 50 and 80 [21], and another concluded that screening is only cost-effective in men [19].

Fig. 1
figure 1

Visual representation of the age of the target screening population for lung cancer screening. A continuous line represents once-off screening as each black vertical line indicates a screening round. At the end of each study, the smoking requirements are shown. “30;10” indicates > 30 pack-years smoked and < 10 years since cessation. “0;0” indicates that there were no minimum number of pack-years and 0 years since smoking cessation (only current smokers). “NA;NA” indicates that minimum pack-years is not applicable because smoking was not a requirement (in studies that use a risk calculator to identify eligible individuals) and no maximum years since smoking cessation in the eligible population. Whether a screening strategy is classified as cost-effective or not was firstly based on the reported study conclusion or, if no conclusion was presented, by comparing the base-case results in the paper with the provided willingness-to-pay threshold. NLST National Lung Screening Trial. NA not applicable. *Study based (partially) on NLST data

3.3 Modelling Choices

Modelling choices are summarised in Online Resource Table 3 (see the electronic supplementary material). Microsimulation was the most common model type (n = 16, 37%), followed by life table analysis (n = 10, 23%), Markov cohort models (n = 10, 23%), decision trees (n = 4, 9%), discrete event simulation (n = 2, 5%) and multistate risk model (n = 1, 2%). In general, the reasoning behind the choice of model type in the publications was limited. Only ten studies discussed the choice of model type. Most studies reportedly applied a public health or public payer perspective (n = 25, 58%), followed by a societal perspective (n = 10, 23%) and a commercial health insurer perspective (n = 6, 14%) or did not report any perspective (n = 2, 5%). The time horizons applied were lifetime (until death) (n = 23, 53%) or 5, 10, 15, or 20 years (n = 10, 23%). In the other cases (n = 10, 23%), the applied time horizon was not reported explicitly.

The majority of studies did not include incidental findings, co-occurrence of other diseases or comorbidities (n = 35, 81%) in the cost-effectiveness model. One study [22] mentioned that incidental findings with potential clinical implications from the NELSON (Dutch-Belgian Lung cancer screening) trial were reported [23] without significant impact on morbidity and mortality, and therefore did not include such findings in the model. Four studies [24,25,26,27] included only the additional (diagnostic) costs of incidental findings. Only one study [18] included both health effects and costs of incidental findings in their model. This study evaluated the cost-effectiveness of screening for incidental pulmonary nodules in the part of the lung that was screened during cardiac CT calcium scoring [18]. Although this study defined the pulmonary nodules as being incidental because the scan was intended for calcium scoring, only results and outcomes of lung cancer screening were included in the model. In general, the stated reason not to include the effect of incidental findings or comorbidities was a lack of evidence.

3.4 Modelling Evidence and Parameters

The parameters used for modelling, including their values, are given in Online Resource Table 4 (see the electronic supplementary material). Only 21 studies (49%) included non-perfect screening participation rates in the base case or the sensitivity analysis. This means that half of the studies assumed 100% participation, which is unrealistic for any screening program, or, less likely, implemented a lower participation rate without reporting it.

The cost of screening varied greatly, from $37 to $1232 after conversion, as some models only included the cost of an LDCT scan and others included additional costs such as invitations to screening, database management and overhead screening costs. All costs are reported in their original currency in Online Resource Table 4. Sensitivity and specificity values for LDCT used in the model were not reported in 14 studies (33%); 18 studies (42%) explicitly stated the sensitivity and specificity values used, of which five reportedly included sensitivity and specificity based on the NLST. Nine Studies (21%) only reported a false positive rate, and two studies (5%) reported the included sensitivity, but not specificity. Five of the studies included sensitivity values depending on the size of the nodule and the stage of disease, respectively [21, 28]. Reported sensitivity values ranged between 43% and 100% and specificity between 62 and 99%.

3.5 Economic Evaluation Results

The results and conclusions of all included studies are summarised in Online Resource Table 5 (see the electronic supplementary material). The reported incremental costs and effects per screened individual were used or were calculated based on the total costs and effects reported. Most of the studies (n = 33, 77%) concluded that the evaluated screening program would be cost-effective. For lung cancer screening, the ICERs ranged from − $4019/LYG to $113,800/LYG and $190,500/quality-adjusted life year (QALY) compared to no screening and six other comparators (2021 US$). The ICERs for cardiac CT calcium scoring were $15,900/QALY, $37,400/QALY and $45,300/QALY compared to no screening (2021 US$). Studies evaluating lung cancer screening compared to no screening are plotted in Fig. 2 for studies reporting LYG and Fig. 3 for studies reporting QALYs. Studies with negative ICERs were excluded from the figures to support readability. The negative ICERs were driven by a decrease in incremental cost of screening. The country indicated on the plot indicates the context of the study. In these figures, the colour indicates whether a study is cost-effective either based on the reported conclusion or, if not reported, based on comparing reported results with the study-specific WTP threshold.

Fig. 2
figure 2

Lung cancer screening incremental cost-effectiveness plane with LYG per person plotting each study, indicating the context country. Whether a screening strategy is classified as cost-effective or not was firstly based on the reported study conclusion or, if no conclusion was presented, by comparing the base-case results in the paper with the provided willingness-to-pay threshold. LYG life years gained

Fig. 3
figure 3

Lung cancer screening incremental cost-effectiveness plane with incremental QALYs per person plotting each study, indicating the context country. Whether a study is classified as cost-effective or not was firstly based on the reported study conclusion or, if no conclusion was presented, the base-case results were compared to the provided willingness-to-pay threshold. QALY quality-adjusted life year

4 Discussion

This systematic review presents and analyses the reported evidence of full health economic evaluations on the cost-effectiveness of screening programs using chest LDCT without limiting the evidence to a specific disease. All but three studies evaluated lung cancer screening; the others evaluated coronary calcium scoring using LDCT for CVD. A limited number of studies included co-morbidities, co-occurrences of diseases or incidental findings in their cost-effectiveness models. One of the studies evaluated screening for incidental detection of pulmonary nodules using full chest CT calcium scoring [18], which is the only study that suggests multi-disease screening even though the cost-effectiveness model only covers the costs and benefits of detecting pulmonary nodules. Thus, while a recent early health economic analysis with limited data suggested that extending LDCT screening for lung cancer with screening for emphysema and CVD may be valuable [11], full health economic evidence of the impact of multi-disease screening with LDCT on cost-effectiveness is absent. Multi-disease modelling is also of increasing interest in the field of blood-based (genomic) biomarkers, as explored in recent studies [29, 30]. Additionally, the effects of incidental findings were rarely modelled, because of a lack of evidence. Increasing attention to COPD and coronary calcium and especially the effect of smoking cessation programs included in lung cancer screening programs, such as the Yorkshire Enhanced Stop Smoking (YESS) study, will likely provide more evidence in this regard [31].

The ERS/SER position statement also suggests broadening screening to include COPD, CVD and other smoking-related diseases, but based on this systematic review, such multi-disease screening programs are not yet reflected in the current health economic evidence. It is expected that the reason for this lack of health economic evidence is very likely due to a lack of clinical evidence for multi-disease screening. Other than the limited evidence on the cost-effectiveness of screening for CVD using LDCT, no health economic evaluation was found investigating the value of COPD screening using LDCT. The current standard of detection of COPD is spirometry, which is cheap, simple and gives an immediate result [32]. Although this explains the absence of health economic evaluations of screening for COPD using LDCT, health economic evidence on the potential value of adding screening for COPD to lung cancer screening is still lacking even though clinical evidence is increasing [33,34,35].

There was substantial variation in study outcomes caused by, firstly, the different healthcare contexts in which the analyses are conducted, secondly, the different modelling methods used and, thirdly, the difference in input and assumptions made, among others. A microsimulation with a public health perspective, 3% discount rate and a lifetime horizon was most commonly used. Analyses were conducted in the healthcare contexts of 15 different countries, which are not easily transferable between healthcare contexts due to differences in clinical guidelines and cost of healthcare procedures. Only some studies included screening costs broader than only the LDCT scan, such as invitation costs and costs to store and secure personal data, which are unavoidable costs. Moreover, only a few studies acknowledged that a screening participation rate of 100% is not achievable. It is assumed perfect participation is unrealistic given the limited evidence on real-world participation. For example, screening participation of eligible smokers in the US was 7.3%, while at least 40% participation would be required for a cost-effective screening program [36]. Since only two studies were found on the health economic evaluation for diseases other than lung cancer, results from this study overlap with results of previous systematic reviews focusing purely on health economic evaluations of LDCT lung cancer screening [5, 6]. This review confirms the conclusions of these previous reviews that studies are heterogenous in the healthcare setting, the screening intervals and the methods used for analysis. In contrast with these previous reviews, this review highlights the large overlap in target screening population between different health economic evaluations although the most cost-effective target screening population is not the same across all health economic evaluations. This review does not have a large focus on screening and modelling consequences such as overdiagnosis, lead-time bias and length bias and, rather, adds to the literature by highlighting the importance of incorporating screening participation rates in cost-effectiveness studies and by providing a more up-to-date review and an insightful comparison of the different target populations considered, and, most importantly, based on the aim of reviewing literature for screening for any disease, this review highlights the gap in literature for health economic evaluations of multi-disease screening using CT.

In contrast to the large variation in health economic outcomes, researchers largely agree on the target screening population for LDCT lung cancer screening. However, this agreement could partially be artificial due to the limited availability of data to use in health economic models evaluating lung cancer screening. Most of the studies used data from the NLST trial and therefore the large agreement of the target screening population also corresponds with the NLST criteria of current and former smokers aged 55–74 with a minimum of 30 pack-years of smoking history [24]. Some studies aimed to identify the target population in which screening is most cost-effective within the NLST criteria, concluding that more strict criteria did not result in improved cost-effectiveness. Even studies without NLST data as input decided upon the NLST screening criteria as the target population.

The limitations of this current review are, first, that the CHEERS checklist is primarily intended as a reporting checklist and was not developed to assess the methodological quality of evaluations. However, as reporting quality is likely correlated with the quality of the evaluation itself, it has been used in multiple similar reviews to gain some insight into the evaluation quality. Second, the search was limited to the last 10 years because of a rapid change in lung cancer treatment during recent years and to therefore make results comparable. Applying a longer search period could have resulted in the inclusion of more studies. Third, we searched PubMed and Scopus and included studies in English. Including more databases could have provided more inclusions. In addition, including all languages would have extended our results, as we found a few non-English studies. Finally, this review does have a risk of bias, as firstly, not all health economic evaluations performed might have been published, especially when results show an intervention to be unfavourable.

5 Conclusion

With limited care resources and the increasing technical capabilities to detect multiple different diseases on a single CT scan, exploring the potential benefits of multi-disease screening strategies is increasingly valuable. This directly applies to LDCT given its low cost, low radiation, non-invasive nature and broad range of applications. However, this review shows that health economic evidence for multi-disease screening using CT does not yet exist. Further research on multi-disease screening using LDCT should focus on gathering additional clinical evidence, only then, estimating the health economic impact of multi-disease screening will be feasible. In such an analysis, it is important to consider the complexity of competing risks and heterogeneity of diseases within the target population. In particular, it would also be important to investigate how the optimal target screening population may shift when moving from lung cancer screening to multi-disease screening.