Lung cancer screening by nodule volume in Lung-RADS v1.1: negative baseline CT yields potential for increased screening interval

Objectives The 2019 Lung CT Screening Reporting & Data System version 1.1 (Lung-RADS v1.1) introduced volumetric categories for nodule management. The aims of this study were to report the distribution of Lung-RADS v1.1 volumetric categories and to analyse lung cancer (LC) outcomes within 3 years for exploring personalized algorithm for lung cancer screening (LCS). Methods Subjects from the Multicentric Italian Lung Detection (MILD) trial were retrospectively selected by National Lung Screening Trial (NLST) criteria. Baseline characteristics included selected pre-test metrics and nodule characterization according to the volume-based categories of Lung-RADS v1.1. Nodule volume was obtained by segmentation with dedicated semi-automatic software. Primary outcome was diagnosis of LC, tested by univariate and multivariable models. Secondary outcome was stage of LC. Increased interval algorithms were simulated for testing rate of delayed diagnosis (RDD) and reduction of low-dose computed tomography (LDCT) burden. Results In 1248 NLST-eligible subjects, LC frequency was 1.2% at 1 year, 1.8% at 2 years and 2.6% at 3 years. Nodule volume in Lung-RADS v1.1 was a strong predictor of LC: positive LDCT showed an odds ratio (OR) of 75.60 at 1 year (p < 0.0001), and indeterminate LDCT showed an OR of 9.16 at 2 years (p = 0.0068) and an OR of 6.35 at 3 years (p = 0.0042). In the first 2 years after negative LDCT, 100% of resected LC was stage I. The simulations of low-frequency screening showed a RDD of 13.6–21.9% and a potential reduction of LDCT burden of 25.5–41%. Conclusions Nodule volume by semi-automatic software allowed stratification of LC risk across Lung-RADS v1.1 categories. Personalized screening algorithm by increased interval seems feasible in 80% of NLST eligible. Key Points • Using semi-automatic segmentation of nodule volume, Lung-RADS v1.1 selected 10.8% of subjects with positive CT and 96.87 relative risk of lung cancer at 1 year, compared to negative CT. • Negative low-dose CT by Lung-RADS v1.1 was found in 80.6% of NLST eligible and yielded 40 times lower relative risk of lung cancer at 2 years, compared to positive low-dose CT; annual screening could be preference sensitive in this group. • Semi-automatic segmentation of nodule volume and increased screening interval by volumetric Lung-RADS v1.1 could retrospectively suggest a 25.5–41% reduction of LDCT burden, at the cost of 13.6–21.9% rate of delayed diagnosis. Electronic supplementary material The online version of this article (10.1007/s00330-020-07275-w) contains supplementary material, which is available to authorized users.

The efficiency of annual screening depends on an individual risk of LC, which varies substantially even among subjects with risk factors. In one analysis of NLST population, Kovalchik et al showed that the number needed to screen increased by 60-fold for the lowest risk quintile compared with that for the highest risk quintile [7]. The efficiency of annual screening scans was substantially reduced in the lowest-risk subgroup. To improve cost-effectiveness, a Canadian cost-effectiveness modelling study [8] and post hoc analysis of NLST data [9,10] suggested to increase screening interval after negative LDCT. Two positive European trials showed reduction of LC mortality by protocols that used longer screening intervals and nodule volumetry, but had populations with lower risk than NLST [2,3].
To date, studies addressing LC risk by LDCT result in NLST eligible are based on linear measurement of nodule diameter [6]. Nodule characteristics were the strongest predictors of LC in several retrospective analyses in NLST population [9,10], thus representing an option for systematic improvement of screening efficiency. Risk stratification might vary between linear measurements and nodule volumetry, especially when different definitions of diameters are applied (e.g. mean diameter or maximum diameter) [11]. Nodule volumetry is deemed the most accurate predictor of LC risk [12], and it is proposed by updated quality assurance initiatives and guidelines, for both optimal baseline classification and longitudinal characterization of growth (e.g. volume doubling time) [13][14][15][16][17][18][19].
The linear algorithm of the American College of Radiology (ACR) Lung CT Screening Reporting & Data System (Lung-RADS) [20] was updated in 2019 to include conversion of diameter into volume of a sphere (version 1.1) [21]. The distribution and LC risk of Lung-RADS v1.1 categories by semiautomatic nodule volume is not published in the peer review literature. Also, it is unknown whether these categories might allow for post-test risk stratification for referral to longer intervals between screening rounds.
The aims of this study were to report the distribution of Lung-RADS v1.1 volumetric categories in a population of NLST-eligible subjects and to analyse LC outcomes within 3 years for exploring personalized algorithm for LCS.

Materials and methods
The Multicentric Italian Lung Detection (MILD) study was a prospective randomized controlled LCS trial launched in 2005 (ethics committee approval INT 53/05; ClinicalTrials.gov NCT02837809). Informed consent was obtained from eligible subjects before participation in the MILD study. The sponsors had no role in conducting and interpreting the study [22].
MILD eligibility criteria were as follows: age > 49 years, smoking history ≥ 20 pack-years, current or former smoker with smoking cessation within 10 years prior to study enrolment and no oncologic history in 5 years before recruitment. The overall population of MILD study included 4099 participants, with 2376 subjects in the intervention arm (median age 58 years, median 39 pack-years, men 68%) [23]. MILD study population showed a relatively lower-risk profile than the NLST population, as a consequence of lower threshold in age and smoking eligibility criteria [1]. In this study, we extracted a higher-risk population from the MILD study by applying the NLST eligibility criteria to the intervention arm: age ≥ 55 years and pack-years ≥ 30 [1]. Fifty-three percent (1248/2376) of MILD subjects were NLST eligible.
Selected pre-test metrics were gender, smoking status, percentage of predicted forced expiratory volume in 1 s (FEV1% pred , with threshold at 0.9) and the ratio between forced expiratory volume in 1 s and forced vital capacity (FEV1/FVC, with threshold at 0.7) [24].
Baseline LDCT result was defined as follows.

Baseline LDCT result
Baseline LDCT results of the 1248 NLST-eligible subjects were retrospectively analysed by an advanced workstation for LCS (CIRRUS Lung Screening), featuring computeraided detection (CAD) and semi-automated segmentation for direct measurement of nodule volume, for both solid and subsolid nodules. All CAD marks were jointly reviewed by two experienced thoracic radiologists (8 years and 11 years of experience in LC screening, each with > 10.000 LDCTs) [25]. The segmentation of nodule volume was checked and adjusted, if necessary, using semi-automatic parameter tuning (density threshold and irregularity metrics; manual contouring not allowed

Statistical analyses
The distribution of each variable was described in the whole population and, according to the diagnosis of LC, at each predefined time interval. Frequencies were reported as percentage.
The primary outcome was the number of LC diagnosed within 1 year, 2 years and 3 years after baseline. The secondary outcome was the stage of LC. The outcomes were collected during the MILD trial, which included systematic prospective registration of LC stage, type of resection and histology.
In this analysis, the cumulative frequency of LC was calculated at the three pre-defined time intervals. The cumulative frequency of LC was tested against pre-test metrics or baseline LDCT result. The relative risk (RR) was calculated by the ratio between the risks of LC in each LDCT group for each parameter, at each time interval. The RR of pre-test parameters was calculated using the category at lower risk as reference. The RR of indeterminate and positive LDCT results was calculated using negative LDCT result as reference.
Kaplan-Meier curves were used to test the probability of LC for each time interval by unadjusted hazard ratio (HR). The odds ratio (OR) by univariate and multivariable models was calculated at each time interval.
The reduction of LDCT burden was calculated by simulation of 4 screening algorithms for increased round interval by post-test risk stratification with volumetric nodule categories of Lung-RADS v1.1 ( Table 1). The simulated algorithms stemmed from the annual algorithm of Lung-RADS v1.1 and were developed to encompass the following: -Biennial or triennial frequency for negative LDCT -Six-month recall or annual recall for indeterminate LDCT Positive LDCT was always intended for short-term recall within 3 months.
An ideal 100% adherence was assumed throughout the LCS period, and the recall rate at incidence rounds was calculated according to the literature (range 3-14%) [26]. Increasing screening interval is expected to result in some degree of delayed LC diagnosis; therefore, the rate of delayed diagnosis (RDD) was calculated as follows: ratio between the number of LC diagnosed within the proposed interval (6 months or 1 year for indeterminate, 2 years or 3 years for negative) and the total number of LC in the full cycle of screening (2 years or 3 years). The proportion of LC in stage I was calculated for each algorithm at each time interval.
Statistical analysis was performed by Statistical Analysis System software (version 9.4; SAS Institute) and MedCalc statistical software (version 17.4; MedCalc Software Bvba).

Diagnosis of LC
The probability of LC at years 1, 2 and 3 was higher in subjects with either positive or indeterminate baseline LDCT ( Table 2).
LDCT result was the strongest predictor of LC through 3 years (Table 2): positive baseline LDCT yielded a RR of 96.87 at year 1, with a progressive decrease through year 2 (RR 39.74) and year 3 (RR 27.32) (Fig. 3). Indeterminate LDCT was associated with a RR of 9.40 at years 1 and 2, slightly decreasing to 6.27 at year 3. The HR for LC was highest in year 1 after positive LDCT and decreased through years 2 and 3 (Fig. 4).

Univariate analysis of LC risk
Baseline LDCT result was a predictor of LC risk, with greater magnitude than any pre-test metric ( Table 2): analysis at 1year interval showed an OR of 102.06 (95% CI 13.49 to 772.19; p < 0.0001) decreasing in analyses at 2-year and 3year intervals. Indeterminate LDCT steadily yielded an OR above 9 at 1-year and 2-year intervals, while decreasing to 6.36 at 3-year interval.
Multivariable analysis of LC risk In multivariable analysis, baseline LDCT result remained a predictor of LC risk (Table 3). Positive LDCT showed an OR of 75.60 (95% CI 9.86 to 579.83; p < 0.001) for analysis at 1-year interval and  A selective analysis of risk by pre-test metrics was performed in the group with negative LDCT: no outstanding pre-test variable was found for risk stratification (all p > 0.05, at any time point).

Stage of LC
Stage and histology of LC are detailed by year and LDCT category in Table 4. After a positive LDCT, 13 LC subjects were diagnosed in the first year with 30.8% of stage I LC (4 stage I, 2 stage II, 3 stage III, 4 stage IV) and further 9 were diagnosed in the second and third years with 55.6% stage I LC (5 stage I and 4 stage IV). Otherwise, early-stage LC was most common after negative baseline LDCT (83% stage I and 17% stage IV) or indeterminate LDCT (75% stage I and 25% stage II), through the 3 years of this analysis.

Personalized LCS algorithm
The overall RDD ranged from 13.6 to 21.9%, with a rate of stage I between 75 and 100% among the potentially delayed LC diagnoses (Table 5). Algorithm 1 (2-year interval for negative LDCT, 6-month follow-up for indeterminate LDCT) showed the lowest RDD (13.6%) with the potential 25.5-26.5% reduction of LDCT over 2 years, compared to annual benchmark. Algorithm 4 (3-year interval for negative LDCT, 1-year follow-up for indeterminate LDCT) showed not only the highest potential reduction of LDCT burden (40.6-41%) but also the highest RDD (21.9%) over 3 years, compared to annual benchmark.

Discussion
Nodule volume by semi-automatic software allowed stratification of LC risk across Lung-RADS v1.1 categories. This method classified 80.6% negative LDCT with 0.1% frequency of LC at 1 year, notably 10 times lower than the mean risk of LC in NLST eligible. Multivariable analysis showed that nodule volume outstands as a risk factor above pre-test metrics, especially for stratification of 1-year risk of LC. The proposed algorithms for increased interval screening showed a potential 25.5-41% reduction of LDCT burden, at the risk of 13.6-21.9% RDD.
To the best of our knowledge, this is the first analysis of Lung-RADS v1.1 using semi-automatic volume measurement.
The proportion of nodule categories and their LC risk estimated in the latest Lung-RADS v1.1 are different from the values measured in this study. The Lung-RADS v1.1 estimated 90% frequency of categories 1 and 2 with < 1% probability of LC in 1 year [21]. We measured the 80.6% frequency for categories 1 and 2, with as low as 0.1% risk of LC at year 1. Furthermore, we included also measurement of risk at year 2 (0.3%) and year 3 (0.6%). Caverly et al [27] reported that LC risk ≤ 0.3% would make LCS a preference-sensitive practice.  This practice was endorsed by Robbins et al [28] in a retrospective analysis of NSLT data by diameter classification. Our study displays that a 0.3% risk is found 2 years after negative LDCT, suggesting that biennial screening might be proposed after negative baseline LDCT by Lung-RADS v1.1. This personalized approach might be assisted by counselling for decision aid and continuous individualized risk assessment (pretest and post-test risks) [29,30].
The Lung-RADS v1.1 estimated a 5% frequency of category 3, with 1-2% probability of LC. We measured the 8.6% frequency of category 3 with a 0.9% risk of LC in year 1. A single cancer in this category was diagnosed at 10 months, which is consistent with the 6-month recall proposed by Lung-RADS v1.1. The 6-month recall should be preferred over shorter follow-up (e.g. 3 months) [31] because the relatively small volume of this category is prone to low accuracy in short-term volumetry for assessment of actual growth [32].
The Lung-RADS v1.1 estimated the 4% frequency of category 4, whereas we measured the 10.8% frequency of this category. Such discrepancy might be accountable to the direct measurement of nodule volume by semi-automatic software as opposed to the theoretical conversion of diameter into geometrical sphere. We found a 9.6% risk of LC in category 4 compared with the expected > 5% reported in Lung-RADS v1.1. Baseline nodule volume by semi-automatic segmentation allowed remarkable post-test risk stratification in year 1, with a RR of 96.87 for category 4 compared with category 1 or 2.
The negative predictive value of Lung-RADS v1.1 in our selection was comparable to the 2-year performance of the NELSON volumetric protocol [33]. However, the 268 mm 3 positive threshold of Lung-RADS v1.1 likely predicated a lower positive predictive value compared with that of the NELSON protocol (500 mm 3 threshold at baseline and subsequent characterization by volume doubling time) [34,35]. The geometrical volumetric threshold of Lung-RADS v1.1 could be further developed by comparison with NELSON reference, which is based on the direct measurement of nodule volume by semi-automatic software and is currently validated in over 10 years of follow-up [36]. It is anticipated that future studies will test a blend of Lung-RADS v1.1 category and volume doubling time to improve its accuracy.

Simulation of increased screening interval
The standard of reference from NLST showed that efficiency of annual screening is variable even among subjects at high risk [7], for whom increased screening interval might be hypothesized in selected cases. Post hoc analysis of the NLST dataset brought a wealth of data for optimization of LCS  [37], notably increased diameter thresholds [37] (featured in Lung-RADS v1.0) and evidence of safe feasibility of biennial screening [9,10]. Patz et al [9] retrospectively modelled biennial screening with a 0.48% frequency of LC in 2 years after negative baseline LDCT by linear measurement (< 4 mm). The same authors also reported a 1.1% frequency of LC in 3 years after negative baseline LDCT. Our results show that a semi-automatic volume in category 1 or 2 of Lung-RADS v1.1 (e.g. negative LDCT result) selects even a lower frequency of LC (0.3% at 2 years and 0.6% at 3 years). These figures came along with the 80.6% of population with negative baseline LDCT, which points to a potentially remarkable reduction of LDCT burden by longer-than-annual screening interval. With this purpose, we simulated several scenarios of increased screening interval and compared these with the reference standard of annual screening by Lung-RADS v1.1.
The proposed algorithms showed a reduction of LDCT burden ranging from 25 to 41% through years 2 and 3, reflecting a potential-paralleled reduction in both costs and cumulative radiation burden. Our biennial simulation is aligned with the prospective results from the interval randomization of MILD, which showed an overall 38% reduction of LDCT burden in the biennial arm, without detrimental effect [38]. The biennial strategy might also help with the growing demand of capacity for steadily increasing utilization of LCS [39]. Biennial follow-up is endorsed by the National Health Service (NHS) England in their protocol for implementation of targeted lung health check programme at population level   I II III IV I II III IV I II III IV ADC SCC SCLC Other ADC SCC SCLC Other ADC SCC SCLC Other  Thirty-two LC subjects diagnosed in 3 years; estimated 5350 LDCT screening by annual algorithm in 3 years [40]. The NHS England proposed a reference volume < 80 mm 3 for biennial follow-up of solid nodules, which is smaller than the < 113 mm 3 reference analysed in our paper. Noteworthy, the performance of either threshold might vary significantly depending on the software for semi-automatic segmentation of volume [41]. Otherwise, the triennial simulations appeared quite hazardous because they were associated with up to 21.9% delayed diagnosis, with potential overlooking of a substantial proportion of stage I LC. Despite this figure is quite close to the frequency of overdiagnosis modelled on the NLST data [42], however, we cannot assume such overlapping. A NELSON report showed that a 2.5-year interval is associated with a significant increase of interval cancer and substantial stage shift [43]. We confirm that LDCT result alone does not allow safe selection for triennial screening interval. Complementary approach to LCS by integrating biological risk stratification might allow safe application of biennial or triennial screening interval [44]. Algorithm 4 of this paper was prospectively tested in the bioMILD trial (NCT02247453) where triennial LDCT was proposed along with personalized risk stratification by microRNA signature classifier (MSC). The bioMILD trial started in 2013 (4119 subjects, 50-75 years, ≥ 30 pack-years), with a semi-automatic segmentation of nodule and volumetric thresholds closely overlapping the most recent Lung-RADS v1.1. The preliminary results of bioMILD trial reported that the personalized interval of LCS every 3 years is safe when both semi-automatic nodule volume and MSC render negative result at baseline [45].
This study has some limitations. We grouped Lung-RADS category 4 under the definition of positive LDCT result and did not detail the granularity of categories 4A, 4B and 4X. We did not assess nodule-specific cancer rate but measured subject-specific cancer rate; this approach was intended to better serve in screening practice. The small population might have hindered a significant association of pre-test metrics with a risk of LC; nonetheless, it could show robust post-test risk stratification by Lung-RADS v1.1. The retrospective nature of this study hampers the actual estimation of stage shift and interval cancers [43]. Nonetheless, such a retrospective approach provided a picture of the nodule distribution under cutting-edge technological conditions for semi-automatic nodule volume and recently updated Lung-RADS v1.1. The present study did not account for further imaging findings (e.g. emphysema, interstitial lung abnormalities, pleural plaques), which might confer an additional post-test risk beyond nodule [40,[46][47][48][49]. Finally, the proposed algorithms might result in different outcomes when applied out of the present population and methods (e.g. volume LDCT analysed by CAD); therefore, caution is granted before application of these findings. The number of LC is quite low in groups with negative or intermediate LDCT, and this figure should be carefully interpreted because different absolute values might be observed in other settings. Prospective trials are needed to test whether the trade-off of longer screening intervals after negative LDCT will lead to a reduction in the number of lung cancer deaths avoided [50,51].
In conclusion, this study represents the first analysis of Lung-RADS v1.1 category distribution using semiautomatic software for segmentation of nodule volume on thin-slice LDCT. This volumetric approach led to personalized stratification of LC risk among NLST-eligible subjects, and notably, the risk of LC in Lung-RADS v1.1 category 1 or 2 was substantially lower than that in category 3. Lung-RADS v1.1 categories showed that potential for longer than 1-year screening interval in up to 80% of NLST-eligible subjects.

Compliance with ethical standards
Guarantor The scientific guarantor of this publication is Mario Silva.

Conflict of interest
The authors of this manuscript declare no relationships with any companies whose products or services may be related to the subject matter of the article.
Statistics and biometry One of the authors has significant statistical expertise (Federica Sabia, MSc).
Informed consent Written informed consent was obtained from each subject recruited in the MILD trial.
Ethical approval Institutional review board approval was obtained.
Study subjects or cohorts overlap This paper represents a retrospective analysis of the MILD trial, which was a prospective lung cancer screening trial funded in 2005. The trial was registered in a public trial registry (www.clinicaltrials.gov Identifier: NCT02837809). Previous publications from the MILD trial investigated different subjects of lung cancer screening by low-dose CT and circulating biomarkers. The subject and result of those publications do not overlap with the present manuscript submitted to European Radiology. Please find the list of such publications at the bottom of this cover letter .
List of previous papers that included screenees from the MILD lung cancer screening trial: Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.