Introduction

Lung cancer is the most common cause of global cancer incidence and cancer-related deaths [1]. The aggressive and heterogeneous nature of lung cancer has precluded efforts to increase early detection via screening with chest radiography or sputum evaluation [2]. Clinical trials have proven that low-dose computed tomography (LDCT) for early lung cancer detection decreases mortality by 20% compared with chest X-rays [2]. LDCT provides detailed representation of the lung parenchyma and notable sensitivity to findings associated with early lung cancer, primarily lung nodules [3,4,5]. However, LDCT screening generates 300–500 images per patient, imposing an overwhelming burden on radiologists. To reduce this load, deep-learning techniques are being used to develop computer-aided detection (CAD) systems to screen for pulmonary nodules.

The main tasks of CAD systems for pulmonary nodule screening are nodule detection and characterization to eliminate false positives [6]. Deep-learning models for lung nodule analysis are trained to detect and classify nodules using a large set of labeled data (CT scans) and a convolutional neural network (CNN)-based algorithm [7, 8]. The performance of object detection systems has been improved by the addition of regional proposal networks (RPN) such as Faster R-CNN [9] that tell the CNN module where to look for objects.

Several lines of evidence indicated that 3D CNNs achieved higher competition performance metrics (CPMs) than their 2D counterparts for detecting lung nodules [10,11,12,13]. However, 3D CNNs are still in the initial stages of development [6, 11]. Several deep-learning techniques for 2D object detection, including the Residual Network (ResNet) module [14], the ResNeXt module [15], the Feature Pyramid Network (FPN) [5], and anchor assignment [16, 17], have been adapted successfully to improve the performance of 3D object detection. The development of 3D CAD systems for lung nodule detection was further promoted by the LUNA16 Challenge, which supplied the research community with a framework for testing and comparing algorithms on a common large database with a standardized evaluation protocol [18].

Using module substitution and pruning experiments, this study aims to develop a deep learning model for detecting pulmonary nodules in CT images with improved performance over existing systems by modifying the 3D RPN derived from a 2D object detection model based on Faster R-CNN, called RetinaNet [5, 9, 19]. By training and testing the model on 3 datasets representing patients with different demographic backgrounds, the study aims to broaden the application of the modified 3D RPN.

Methods

Lung nodule datasets

In this secondary data analysis, three datasets of lung modules acquired on LDCT were used to evaluate the performance of the modified 3D RPN in this study. The Lung Nodule Analysis 2016 (LUNA16) dataset is the largest public dataset, comprising 1186 lung nodules from 888 patients [18]. This dataset has been used widely to evaluate a variety of deep-learning–based pulmonary nodule detection methods [7, 20,21,22]. In addition, two private ongoing pulmonary nodule datasets maintained by the Radiology Department at the National Cheng Kung University Hospital (NCKUH) were used in this study: the NCKUH Lung Nodule received Operation (LNOP) dataset that included patients undergoing surgical resection for lung nodules with histological confirmation, and the NCKUH Lung Nodule in Health Examination (LNHE) dataset that included patients with lung nodules that were found by LDCT.

The LUNA16 dataset contains 1186 lung nodules. To minimize the bias caused by variation in nodule number, approximate 1000 pulmonary nodules were retrieved from LNOP and LNHE datasets. Therefore, the data of 1027 lung nodules derived from 708 patients, which were collected in the LNOP dataset from Dec. 2018 to Dec. 2021, were retrieved for training and testing deep learning models. In addition, the data of 1000 lung nodules derived from 420 patients, which were collected in the LNHE from Jan. 2019 to Dec. 2020, were used in this study.

Moreover, for temporal validation, the whole 1027 and 1000 lung nodules from LNOP and LNHE, respectively, were used as train sets. Additional 348 and 500 lung nodules that were recently collected in LNOP and LNHE, respectively, were used as test sets.

Data annotation

The regions of interest (ROIs) of pulmonary nodules on axial images were manually labeled slice by slice by a thoracic radiologist (C.L.) and a thoracic surgeon (C.C.). After reaching consensus, 2D ROIs were converted to form 3D ROI. The 3D ROI of lung nodule was defined as the ground truth in this study.

3D region proposal network

The architecture of the proposed 3D RPN consisted of three blocks: backbone, neck, and head (Fig. 1A). The backbone network is used for feature extraction; the neck is used for feature fusion; and the head is used for dense prediction, which generates a prediction frame (anchor box) for each anchor point on the feature map. The training environment and training strategy is listed in Table 1.

Fig. 1
figure 1

The architectural architecture of deep learning model. (A) 3D RPN. The boxes with anchor sizes of 5, 10, and 20 voxel sizes in each layer of detectors were used in the head block. Because the outputs included probability, x, y, z, d, the dimensions of each layer were 3* 5 = 15. (B) The complete pulmonary nodule detection system

Table 1 The training environment and training strategy

Architecture and modification of the pulmonary nodule detection system

The architecture of the 3D lung-nodule detection system is composed of three modules: pre-processing, deep learning model (3D RPN), and post-processing (Fig. 1B). 3D patch-based image input was adopted for pre-processing and post-processing. In the pre-processing module, to resample all CT images to the same size, the voxel spacing of all CT images was resampled to 1:1:1 mm. Each radiodensity value was converted from Hounsfield units (HU) (range, − 1200 to 600 HU) to a decimal between 0 and 1 and stored as a single-precision floating-point number. In the post-processing module, the extrapulmonary region is removed to reduce the false positive.

Pruning experiments

In the training, a series of pruning experiments were performed using the LUNA16 dataset to modify each block of the 3D RPN for better performance. Although the ResNet module is commonly used to construct the backbone network [9], we first replaced the ResNet module with the ResNeXt module in the training phase [15]. Subsequently, the design of the Cross Stage Partial Network (CSPNet) [23], was incorporated into the ResNeXt module to form the CSP-ResNeXt module (Fig. 2). The FPN design was then added to the neck and detector of the selected 3D RPN with the CSP-ResNeXt module, achieving feature fusion and multi-level outputs on the neck and detector. The next pruning experiment involved modification of the anchor assignment of the 3D RPN.

Fig. 2
figure 2

The CSPNet and ResNeXt modules are integrated into the design of the backbone network

Nearest anchor assignment

Anchor assignment, also called training sample selection, is the training of an object detection model to decide which anchor boxes on the input image patch are positive, negative, or ignored samples based on the ground truth in the training phase [9]. Only positive and negative samples are involved in the used for calculating the loss function. Because most lung nodules were almost spherical in shape with varied sizes, the boxes with anchor sizes of 5, 10, and 20 voxel sizes in each layer of detectors were used in the head block of 3D RPN (Fig. 1A). Several studies of object detection have used fixed Intersection over Union (IoU) matching for anchor assignment [5, 24]; however, the IoU matching method often results in multiple positive samples (Fig. 3).

To search for a more suitable anchor assignment method for 3D lung nodule detection, we applied the nearest anchor method in this study. The nearest anchor method assigned the only one anchor box with anchor point closest to the ground truth as the positive anchor (Fig. 3). If multiple anchor boxes shared a common anchor point, only the anchor box closest to the ground truth in size was selected as the positive sample.

Fig. 3
figure 3

Illustration of the nearest anchor method. The IoU-based method could recognize both blue and yellow anchor boxes as positive samples. In contrast, the nearest anchor method recognized the blue anchor as the positive anchor, because it had an anchor point closest to the ground truth (green)

Performance evaluation measures

The modified 3D RPN was then trained on the LUNA16, LNOP, and LNHE datasets. The performance of the modified 3D RPN was evaluated by 10-fold cross-validation using free-response receiver operating characteristic (FROC) and CPM. The FROC is the curve drawn by the model showing the true positive rate under different confidence thresholds. The average recall rate (sensitivity) was defined at 0.125, 0.25, 0.5, 1, 2, 4, and 8 false positives per scan as previously described [25, 26]. CPM, a metric derived from FROC, was the average recall of 7 specific false positives per scan on the FROC. CPM and sensitivity were expressed as mean ± standard deviation (SD). After training, the modified 3D RPN was then tested on the LUNA16, LNOP, and LNHE test sets, with the average recall rate set at 2 false positives per scan.

Results

Comparison of pulmonary nodule characteristics between the three datasets

The distribution of the 3D maximum diameter of each nodule for three datasets is shown in Fig. 4A. The LUNA16 dataset had a right-skewed distribution with a largest 3D maximum diameter of 32 mm. Lung nodules in the LNOP dataset were larger than those in the LUNA16 dataset, with the largest 3D maximum diameter at 93 mm. Lung nodules in the LNHE dataset were smaller than those in the LNOP dataset, with the largest 3D maximum diameter at 43 mm.

The distribution of solid-component percentage of each nodule in the LNOP dataset was right-skewed, with near 40% of nodules having 10% solid component (Fig. 4B). In contrast, the distribution of lung nodule solid-component percentage was relatively evenly distributed in the LUNA16 dataset (Fig. 4B).

Modification of the 3D RPN

To improve the 3D RPN, a series of pruning experiments was conducted using the LUNA16 dataset. The performance evaluation revealed that the CPMs for the ResNet, ResNeXt, and CSP-ResNeXt modules were 86.8%, 88.2%, and 89.7%, respectively (Table 2). After adding the FPN design to the CSP-ResNeXt module, the CPM improved from 89.7 to 90.1% (Table 2). Although the IoU matching method has been widely used in several studies, the nearest anchor method achieved a slightly higher CPM (92.2% vs. 91.1%) (Table 2).

Table 2 Pruning experiments for modification of the 3D region proposal network

Performance of the modified 3D RPN

Of the three datasets, the modified 3D RPN trained on the LUNA16 dataset had highest sensitivities at various numbers of false positives per scan, while the modified 3D RPN trained on the LNHE dataset had the lowest sensitivities (Table 3). In addition, the modified 3D RPN trained on the LUNA16 dataset had the highest CPM (90.1%), followed by the LNOP (CPM, 74.1%) and the LNHE (CPM, 70.2%) (Table 3).

Table 3 Performance comparison of the modified 3D RPN trained on three datasets

Furthermore, the modified 3D RPN trained and tested on the same datasets had sensitivities of 94.6%, 84.8%, and 79.7% for LUNA16, LNOP, and LNHE, respectively (Table 4). The sensitivity dropped substantially if the test set differed from the training set.

Table 4 Sensitivity comparison of the modified 3D PRN trained and tested on various combinations of datasets at 2 false positives per scan

Temporal validation

To confirm the predictive performance of the modified 3D RPN, temporal validation was performed. The modified 3D RPN tested on LNOP and LNHE test sets achieved CPM of 71.6% and 71.7% (Table 5). The CMP of modified 3D RPN on LNOP test set slightly decreased from 74.1 to 71.6%, while the CMP of modified 3D RPN on LNHE test set slightly increased from 70.2 to 71.7%. Under the most clinically acceptable condition (false positive per scan = 2), the sensitivity of LNOP test set increased from 84.8 to 85.7%, and the sensitivity of LNHE test set increased from 79.7 to 83.5% (Table 5).

Table 5 Performance of the modified 3D RPN on test datasets

The influence of solid components of nodules

To assess the extent to which percentages of solid components of lung nodules affect the predictive performance of the modified 3D RPN, stratification analyses were performed. Each of the three datasets, LUNA16, LNOP, and LNHE, was stratified by the percentages of solid components of nodules into three ranges: 0 to 10%, 10 to 50%, and 50 to 100%. Subsequently, the performance of the modified 3D RPN trained on each dataset stratified by the solid component range was examined. Within each data source, the performance of the modified 3D RPN increased with the percentages of solid components of nodules (Table 6). Among data sources, LUNA16 had higher CPM rates than LNOP and LNHE.

Table 6 Performance comparison of the modified 3D RPN trained on three datasets stratified by the range of solid components of nodules

Discussion

Pruning experiments on the 3D RPN with module substitutions showed that the optimal 3D RPN contained the CSP-ResNeXt module, FPN, nearest anchor method, and post-processing masking, achieving a CPM of 91.1%. The modified 3D RPN trained on the LUNA16 dataset had the highest CPM (90.1%), followed by the LNOP (74.1%) and the LNHE (70.2%) datasets. The modified 3D RPN trained and tested on the same dataset had the highest sensitivity (LUNA16, 94.6%; LNOP, 84.8%; LNHE, 79.7%). Furthermore, the reliability of the modified 3D RPN was confirmed by temporal validation.

Comparison of pulmonary nodule characteristics between the three datasets showed that nodules in the CT images from patients in the LNOP dataset (Taiwanese patients with histologically-confirmed lung nodules) and LNHE dataset (Taiwanese patients with nodules found during health examination) were larger and had a greater non-solid component than did those in the LUNA16 dataset (Western patients) (Fig. 4). This finding is consistent with reports showing that Asian patients tend to have a higher proportion of non-solid pulmonary nodules [27], which have a ground-glass opacity on CT images [28]. This difference in nodule properties between populations may contribute to the differences in performance of our model on the 3 datasets, suggesting the importance of considering ethnicity factors in datasets used for training and testing diagnostic deep-learning models.

Fig. 4
figure 4

Characteristics of lung nodules in three datasets. (A) Distribution of 3D maximum diameter of each nodule. (B) Distribution of percentage of solid component in each nodule

Accurate classification of ground glass nodules is of great therapeutic value, as they are associated with both benign inflammatory conditions and various types of malignancy [29, 30]. In addition, consolidation-to-tumor ratio is positively associated with tumor invasiveness [31]. Because of their optical properties, ground glass nodules may be undetected in CT scans, and deep-learning algorithms are being developed to distinguish these nodules from surrounding tissues [23, 32]. Our finding that the LNOP dataset is enriched in data for non-solid nodules as compared to the LUNA16 dataset suggests that the LNOP may be useful in the development of algorithms for detecting and classifying ground glass nodules.

As shown in Supplementary Table S1, the CPM and sensitivity of our modified Faster R-CNN-based 3D RPN on the LUNA16 dataset surpassed that of other CAD models for lung nodule detection. DeepLung [22], a 3D Faster R-CNN designed for nodule detection with 3D dual path blocks and a U-net-like encoder-decoder structure and a gradient boosting machine with 3D dual path network features for nodule classification; DeepSEED [20], which has an encoder-decoder structure in conjunction with a RPN and uses dynamically-scaled cross entropy loss to reduce false positives and combat the sample imbalance problem associated with nodule detection; CPM-Net [7], a 3D center-points matching detection network that is anchor-free and automatically predicts the position, size, and aspect ratio of nodules; and SCPM-Net [21], a 3D sphere representation-based center-points matching detection network that is anchor-free and automatically predicts the position, radius, and offset of nodules without manual design of nodule/anchor parameters.

The present secondary data analysis has several limitations. The modified 3D RPN model is complex, containing 1,284,508 parameters, requiring about 80 hours to perform 10-fold cross validation on a dataset of 1000 lung nodules. In future studies, we aim to shorten the training time by simplifying the model without sacrificing specificity. In addition, it has been reported that CT manufacture did not affect performance of deep leaning model for detecting lung nodules [33]. In contrast, reconstruction kernel affected texture features and wavelet features of CT images [34], and the poor image quality resulted in more false positives per scan. The investigation of the influence of CT hardware, reconstruction kernels, and image quality on performance of the modified 3D RPN will be another future research direction. Furthermore, the improved 3D RPN model will be trained on the updated LNOP and LNHE datasets with more lung nodule data. We will also try to access more powerful hardware to speed up the lung nodule detection process. To reduce false positives, we will add a false-positive reduction model to the modified 3D RPN model.

Conclusion

The modified 3D RPN model trained on the LUNA16 dataset exhibited a sensitivity of 96.6% at 8 false positives per scan and a CPM of 90.1%, which may serve as a potential CAD tool to facilitate lung nodule detection and of lung cancer diagnosis. In addition, the difference in performance between datasets comprising Western and Asian patients indicates the need for establishing training and testing datasets specific to Asian patients. The LNOP dataset may be useful for training and testing CAD models to identify lung nodules with ground glass opacity, which are associated with malignancy and tumor invasiveness.