Background

Cancer is a leading cause of death worldwide, accounting for an estimated 9.6 million deaths in 2018 [1]. Lung cancer is the commonest form of cancer (2.09 million cases) as well as the main cause of cancer related mortality (1.76 million deaths) [1]. Due to the asymptomatic nature of lung cancer, they are often diagnosed at an advanced stage when the prognosis is poor or futile. In more recent years low-dose computed tomography (LDCT) has been demonstrated to be a sensitive tool for the detection of early stage lung cancer [2]. However, researches also indicated that LDCT is associated with high false-positive rates in the diagnosis of lung cancer, resulting in unnecessary invasive procedures and patient anxiety [3,4,5]. In 2011, a high quality trial, the National Lung Screening Trial (NLST) [6], compared LDCT to chest radiology (CXR) in a large sample of high risk adults showed a 20% relative reduction in lung cancer mortality for LDCT over 6.5 years. Since then, lung cancer screening using LDCT for high risk groups is recommended by lots of organizations. But the most recent meta-analysis did not demonstrate superiority of LDCT screening over usual care in lung cancer mortality [7, 8].

Systematic reviews of randomized controlled trials (RCTs) are well recognized as the most reliable and appropriate reference standard to address questions of various types of medical intervention. In September 2018, new data from the largest European trial (NELSON) showed an even bigger reduction in deaths from lung cancer than was seen in NLST [9]. There were also four other RCTs [10,11,12,13] reported the mortality results in 2018 and 2019. Moreover, trial sequential analyses (TSA) of LDCT for lung cancer screening have not been reported previously. We aim to assess the updated evidence regarding the ability of LDCT to reduce lung cancer mortality and to evaluate the possible harms associated with LDCT screening.

Methods

This review and meta-analysis was reported according to the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines [14]. The protocol was registered with the International Prospective Registry of Systematic Reviews (PROSPERO at www.crd.york.ac.uk under following ID: CRD42018111630).

Search strategy

The databases searched for this study were composed of Medline (Ovid), EMBASE, CENTRAL (Cochrane Database of Systematic Reviews), CINAHL (Cumulative Index to Nursing and Allied Health Literature), Index to Taiwan Periodical Literature System, TRIP Database and Google Scholar (all from inception until June 17, 2019). Reference lists of the selected studies and systematic reviews were further reviewed for additional citations of published or unpublished reports. Automatic e-mail updates for saved searches were set up to identify new search results from the databases.

The search strategy consisted of subject headings, keywords and related terms for these topics. Language restrictions were not used. The MEDLINE (Ovid) search strategy can be found in Additional file 1: Table S1.

Eligibility criteria

We included studies that met all of the following criteria: (1) we accepted only randomized controlled trials; (2) comparing LDCT to any other type of lung cancer screening; (3) adults, aged≧18 years, asymptomatic with risk factor for lung cancer (current or former smokers, family history of lung cancer, underlying lung disease, or environmental exposure to toxins); (4) benefits of interest included: lung cancer mortality, all-cause mortality, early detection (stage I) rates; (5) harms of interest included: death and major complications after invasive procedures (30–60 days post invasive procedures). Major complications were listed below: death, anaphylaxis, cardiac arrest, cerebral vascular accident/stroke, congestive heart failure, myocardial infarction, intervention-required thromboembolic complications, acute respiratory failure, respiratory arrest, bronchial stump leak requiring tube thoracostomy or other drainage for > 4 days, bronchopulmonary fistula, empyema, prolonged mechanical ventilation > 48 h postoperatively, tube placement-required hemothorax, brachial plexopathy, lung collapse, chylous fistula, injury to vital organ or vessel, wound dehiscence, and infarcted sigmoid colon. Invasive procedures included: surgery, biopsy, bronchoscopy or fine needle aspiration cytology.

Two independent authors (KLH and SYW) screened the trials based on the above criteria, and disagreements were resolved by consultation with a third author (WCL). Included studies were then assessed for methodological quality using the revised Cochrane risk-of-bias tool for randomized trials (RoB 2) [15]. The assessed factors included risk of bias arising from the randomization process, risk of bias due to deviations from the intended interventions, missing outcome data, risk of bias in measurement of the outcome and risk of bias in selection of the reported result.

Data extraction

KLH and SYW extracted the data respectively, with disagreements resolved by consultation with other team members. Data related to the study characteristics and outcomes were collected from included trials. We extracted the following data: study name, country, number of participants, characteristics of population, screening type and interval, definition of positive results and outcome measures.

Statistical analysis

We carried out analysis using Review Manager (RevMan) Version 5.3 (Copenhagen: The Nordic Cochrane Centre, The Cochrane Collaboration, 2014) software. Inverse variance meta-analysis for combining data was performed. If clinical or methodological heterogeneity between the study results was suspected, a random-effects meta-analysis was used. Results as summary risk ratio (RR) with 95% confidence intervals (CI) for dichotomous data were presented. Statistical heterogeneity was assessed using the Tau2, I2 and Chi2 statistics. We considered heterogeneity as substantial if the Tau2 was greater than zero, or there was a low P value (less than 0.10) in the Chi2 test for heterogeneity. For the I2 metric, moderate, substantial and considerable heterogeneity were considered to be 30–60%, 50–90%, and 75–100%, respectively. The following subgroup analyses were also conducted based on: (1) type of control groups (such as CXR screening or usual care or no screening); (2) quality of studies; (3) sample size; (4) sex. Sensitivity analyses were performed to examine the robustness of the effect size. The funnel plot approach was used to investigate publication bias if we included more than 10 studies in the analysis of the outcome in question. A trial sequential analysis was conducted using software (TSA Viewer, version 0.9.5.10 Beta). This is a type of cumulative meta-analysis that reduces both type I and type II errors from repetitive statistical testing. Trial sequential analysis provides the necessary sample size for our meta-analysis and boundaries that determine whether the evidence in our meta-analysis is reliable and conclusive [16, 17]. The required information size was calculated, and the trial sequential monitoring boundaries were computed using the O’Brien-Fleming approach. An optimal information size was considered as a 2-sided 5% risk of a type I error, 20% risk of a type II error (power of 80%), relative risk reduction of 20%, and the pooled control group event rate across the included studies.

Results

Study selection

Our literature search identified 2180 potentially relevant articles. Once duplicates had been removed, 1896 citations were screened, of which 36 full-text manuscripts were assessed for eligibility. Twenty-seven studies were excluded for the following reasons: eleven because they were review articles; two because there were no relevant outcomes (mortality data); two because they were guidelines; three because they were protocol designs; three because they were smoking cessation programs; three because the screening groups didn’t include LDCT; two because they interested in doctors’ behavior or impact on new technique; and one because it was an actuarial study. A list of full-text manuscripts that were excluded along with reasons for their exclusions is given in Additional file 2: Table S2. Finally, nine RCTs (with multiple publications) met our inclusion criteria. Figure 1 summarizes the literature search flow.

Fig. 1
figure 1

PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) diagram of study flow

Study characteristics

Tables 1 and 2 presented the characteristics of selected studies. Seven trials [9, 10, 12, 13, 18,19,20] compared LDCT screening to no screening or usual care, and two trials [11, 21] compared LDCT to CXR. The trials were conducted in Italy, Denmark, Germany, USA, Netherlands, Belgium and China. Trials started between 2001 and 2013. One study [20] recruited only male. One study [10] had published preliminary mortality data and the duration of follow-up was less than 5 years. The sample size of included trials ranged from 2472 to 53,454. Most trials adopted 1 to 2-year interval screening. All included trials recruited high risk populations with age ranging from 45 to 75 years. The nature of high risk participants varied but was usually defined in terms of age and current and past smoking. Overall risk of bias for mortality outcomes was rated high in two RCTs [12, 20] and some concerns in six RCTs [9,10,11, 13, 18, 19]. Low overall risk rating was applied to one trial [21]. Domain ratings for each RCT are shown in Fig. 2.

Table 1 Summary screening criteria of included randomized controlled trials
Table 2 Results of included randomized controlled trials
Fig. 2
figure 2

Risk of bias summary for included studies reporting mortality (red shading denotes high risk of bias, yellow shading denotes some concerns and green denotes low risk of bias)

Benefits and adverse outcomes

There were nine [9,10,11,12,13, 18,19,20,21] contributing included studies to lung cancer mortality outcomes. When compared with controls (no screening or CXR), LDCT screening was associated with a statistically significant reduction in lung cancer mortality (RR 0.83, 95% CI 0.76–0.90) with no heterogeneity observed (p = 0.43, I2 = 1%; see Fig. 3a). Trial sequential analysis confirmed that the conclusion for lung cancer mortality was sufficient and no more trials were needed (Additional file 3: Figure S1). Seven included trials [11,12,13, 18,19,20,21] contributed information on all-cause mortality. On the contrary, LDCT screening demonstrated no statistically significant difference in all-cause mortality (RR 0.95, 95% CI 0.90–1.00) (Fig. 3b). There was no heterogeneity with this outcome (I2 = 0%). Pooled analysis of seven RCTs showed significantly greater proportions (RR 2.08, 95% CI 1.43–3.03) of early stage cancers in LDCT groups compared to controls.

Fig. 3
figure 3

Forest plots of comparisons between low-dose computed tomography (LDCT) versus no screening or chest radiology (CXR) for a lung cancer mortality b all-cause mortality

As to the harm of screening, two studies reported number of death after invasive procedures for diagnosis purpose [6, 20]. Nineteen deaths were reported after 2129 invasive procedures in persons screened by LDCT and 11 deaths were reported after 792 invasive procedures in the control group. No significant difference (RR 0.64, 95% CI 0.30–1.33¸ I2 = 0) was shown. Only one study (NLST) reported major complication rates following invasive procedures for LDCT and CXR group. The risk was higher among persons who underwent LDCT compared with CXR screening (4.1 vs 3.2 per 10,000 screened) [6].

Subgroup and sensitivity analyses

The role of subgroup and sensitivity analyses is to explore the potential sources of observed heterogeneity (Table 3). Among nine RCTs, the DANTE and MILD trials were judged to be of low quality (high risk of bias), whereas the remaining trials were judged to be of moderate to high quality (some concerns and low risk of bias; see Fig. 2). In the subgroup analysis according to study quality, compared with controls, LDCT screening demonstrated a statistically significant reduction in lung cancer mortality among high quality studies (RR 0.82, 95% CI 0.73–0.91). However, the same situation has not been observed in low quality studies (RR 0.87, 95% CI 0.64–1.20, I2 = 23%). As mentioned above, these suggest that trial quality might be a potential source of heterogeneity. We further explore the heterogeneity on the basis of sample size. We conducted a subgroup analysis based on the different sample size. A sample size that is too small reduces the power of the trial and increases the margin of error, which can render the trial meaningless. Pooled analysis of findings from seven [10,11,12,13, 18,19,20] fairly small trials (total n = 27,968) comparing LDCT with controls showed no significant difference in lung cancer mortality. While findings from two [9, 21] large trials (NELSON, NLST; total n = 69,276), the results of the pooled data displayed a RR of 0.80 (95% CI 0.71–0.91). In addition, regardless male or female, LDCT showed a reduction of lung cancer mortality. Sensitivity analyses were robust. The positive association was consistent with any of these analyses. Reliability and stability of our conclusions were further confirmed.

Table 3 Exploration of heterogeneity on the LDCT versus control for lung cancer mortality

Discussion

This is the first meta-analysis of LDCT for lung cancer screening based on sufficient evidence demonstrated by TSA with the latest NELSON, MILD and LUSI mortality results [9, 12, 13] included. NELSON trial is the only European fully powered RCT which presented its 10 year mortality findings in September 2018 at the International Association for the Study of Lung Cancer (IASLC) 19th World Conference on Lung Cancer (WCLC). In total, nine RCTs are included. Most RCTs (DANTE, DLCST, ITALUNG, LUSI, MILD, NELSON) are conducted in European countries, some trials are conducted in the USA (LSS, NLST) and China (Yang 2018). The majority of included studies are judged to be of moderate to high quality (some concerns and low risk of bias for mortality outcomes), but two studies (DANTE, MILD) are judged to be of low quality (high risk of bias for mortality outcomes). Pooled results comparing LDCT to no screening or CXR establish a survival benefit and show an increase in detection of stage I cancers. As for harms of lung cancer screening, LDCT leads to an increase in the frequency of invasive procedures, but does not lead to more death soon after an invasive procedure compared with the control arms. Our results are similar to previous meta-analyses [7, 8, 22, 23] but we identify more studies [9,10,11], more participants and more events which enhanced the precision of the results. We also conducted trial sequential analyses which provide estimates about the reliability of current evidence and prevent premature conclusions from meta-analyses.

A range of potential sources for heterogeneity is investigated. There is significant difference in the lung cancer mortality between subgroups of higher versus lower quality trials (higher quality trials RR 0.82 [95% CI 0.73–0.91] vs lower quality trials RR 0.87 [95% CI 0.64–1.20]). When removing the poor quality trial (DANTE and MILD), analysis reveal a significant decrease in lung cancer mortality in favor of LDCT compared with controls. High risk of bias ratings are applied to DANTE and MILD trials in our study. MILD trial has different LDCT screening strategies (annual and biennial). But recruitment (4099 subjects) is low, well below the announced sample size calculation (10,000 subjects, 30% mortality reduction at 10 years) and pronounced imbalances in baseline characteristics in three important characteristics (sex, current smoking status and predicted FEV1) are found. As for DANTE, the uneven numbers between LDCT (n = 1264) and control (n = 1186) groups seem not compatible with 1:1 scheme randomization in blocks of four mentioned in the methods. Apart from that, different randomized numbers in the publications from 2009 (n = 1276, 1196) [24] and 2015 (n = 1264, 1186) [20] raise question about the quality. The lack of precision on the methods of DANTE and MILD trials makes it difficult to interpret the results. A considerable reduction in heterogeneity is observed after we exclude the poor quality trial, suggesting that variation in trial quality may be a potential source of heterogeneity.

Sample size (statistical power of the trial) may also be a potential source of heterogeneity. Significant difference in the lung cancer mortality between subgroups of larger versus smaller trials (larger trials RR 0.80 [95% CI 0.71–0.91] vs smaller trials RR 0.87 [95% CI 0.73–1.04]) is observed. The two larger (15,822–53,454 subjects) trials (NLST, NELSON) have dominated the positive screening effects of the meta-analysis. NLST and NELSON are the only two trials that are powered enough for the outcome of lung cancer mortality. They are also the only two trials reported a significant decrease in lung cancer mortality. Seven smaller (2472–6717 subjects) trials (DANTE, DLCST, ITALUNG, LUSI, MILD, Yang 2018 and LSS) are not sufficiently powered to detect statistically significant differences in mortality and found no significant difference between the screening modalities due to the larger 95% CI.

In addition, there are major geographic differences, particularly in Asia, where 60 to 80% of women with lung cancer are never-smokers [25]. In Yang 2018 study [10], they enrolled fewer active smokers (21.5%) and males (46.8%) than other USA and European trials. Although smoking is the primary etiologic factor responsible for lung cancer, racial/ethnic and sex differences may exist. According data from WHO [26], age-standardized rate of current tobacco smoking among population aged≧15 years were estimated 2.2% for female in South-East Asia. Whereas for female in Americas and Europe, the rate were 12.4 and 20.7%. Previous studies [27] also indicated that lung cancer significantly associated with Asian non-smoking women. This group of lung cancers may be caused by other carcinogens rather than those contained in cigarettes. Only 7.1% participants in the LDCT group meet the NLST criteria in Yang 2018 study. If only western-eligible subjects receive LDCT screening, non-smoking-related lung cancer, which is highly prevalent in Asian women, will not be successfully identified. There is insufficient evidence to ascertain if USA and European criteria are appropriate for lung cancer screening programs outside of USA and Europe. There are some concerns about the quality of the trial. The uneven numbers between LDCT (n = 3550) and control (n = 3167) groups seem not compatible with 1:1 scheme randomization in blocks of 12 mentioned in the methods. Then the duration of follow-up is not sufficient (2 years) for now and probably do not have enough power to test the hypothesis out.

Several biases arise in the evaluation of screening studies, including lead-time, length-time and overdiagnosis, which should be taken into account when interpreting these data. Firstly, when lead time is short, as is true with lung cancer, it is difficult to demonstrate that treatment of medical condition found on screening is more effective than treatment after symptoms appear. Secondly, Screening is more likely to detect slow-growing tumors, which have a better prognosis, including longer survival. However, most type of cancers demonstrate a wide range of growth rates. Thirdly, although LDCT shows significantly greater proportions of early stage lung cancer compares to controls, further evaluation will be required to determine which patients with positive screening results have cancer. Higher early stage detection rates of LDCT not only results in excess follow-up testing but also psychological distress. Various trials [3, 28,29,30] suggest that LDCT screening has the potential to cause short-term (< 6 months after screen) psychosocial impact on high-risk participants but that effects do not appear to persist long term (> 6 months after screen). Overdiagnosis also results in unnecessary diagnostic procedures and lead to unnecessary treatment. The magnitude of overdiagnosis of LDCT was 18.5% (95% CI 5.4–30.6%) in NLST [28], 67.2% (95% CI 37.1–95.4%) in DLCST [29] and zero in ITALUNG [18]. It is important to note that the definition of overdiagnosis varied across studies. Finally, we generalize individual trials into groups although population, smoking history, number of screening rounds, duration of follow-up, definition of positive lung nodules, and radiologists’ skill may differ between trials. Caution is needed in interpreting the findings from our results.

Conclusion

The present meta-analysis based on sufficient evidence demonstrated by TSA indicates that there is significant reduction in lung cancer mortality between LDCT and other control groups. Moreover, the results of the subgroup analyses indicate that, LDCT screening has shown statistically significant mortality benefits in high-quality trials, whereas low-quality trials found no significant difference. It is mandatory to identify lung cancer risk factors among the Asian population and to establish appropriate eligible criteria in the screening program for different races. The benefit of LDCT is expected to be heavily influenced by the risk of lung cancer in the different target group (smoking status, female and Asian) being screened. Due to tenuous balance of benefits and harms, medical decision making is recommended for individuals who are considering LDCT screening. More studies are warranted to optimize the approach to LDCT screening.