Background

Lung is a very common site for metastasis from various malignancies. Detection of small, especially subcentimeter sized, lung nodules is an important critical task during routine oncologic whole body 18F-Fluorodeoxyglucose Positron emission tomography/computed tomography (18F-FDG PET/CT) evaluation, especially when these nodules are not FDG avid. Often small lung nodules are early signs for metastasis in cancer patients with important therapeutic impact. FDG uptake is an important criteria for diagnosis of lung metastases in these patients with various malignancies, but not sensitive enough for the detection of small subcentimeter sized and early metastatic lung nodules (Strobel et al. 2007; Volker et al. 2007; Sawicki et al. 2016). Failure to detect small and early cancerous lung lesions on imaging studies might be a reason for malpractice suits (Baker et al. 2013; Whang et al. 2013; Weikert et al. 2019). The reasons of misdiagnosis are multi-layered and include pattern recognition error, incomplete/unsatisfactory search, overload of data, stressed physicians, etc. (Del Ciello et al. 2017). Computer-aided detection (CAD) is commercially available for LN detection since the early 2000s and has been studied a lot in the last decade on dedicated chest CTs with deep inspiration breath-hold technique. Classical machine learning and radiomics have been used for lung nodule detection and segmentation with nodule volumetry and characterization. The more recent rise of deep learning with CNN (convoluted neural network) and availability of large annotated lung nodule datasets have allowed the development of CAD tools with fewer false-positives per scan (Chassagnon et al. 2023).

Technically, by definition, lung nodules are focal opacities, well- or poorly-defined, measuring less than 30 mm in diameter (Hansell et al. 2008). Lung nodule detection is an important task in oncologic PET/CT imaging for metastatic work up, especially in tumors with predilection for lung metastases like melanoma, sarcoma, colorectal, head and neck, and thyroid cancers.

Computed Tomography (CT) represents the current standard for detection of small lung nodules (LN), and dedicated post-processing methods have been established to further increase LN detection (Davis 1991). Besides the detection of LN on PET images due to increased uptake of FDG, the dedicated interpretation of the CT data part—an integral component of any PET/CT examination reading—applying lung window settings, increases the sensitivity for the detection of lung metastases in cancer patients (Strobel et al. 2007; Volker et al. 2007; Sawicki et al. 2016). To precisely detect these lung nodules with PET/CT, low-dose thick-slice CT with shallow breathing, thin-slice full inspiration breath-hold CT, and even respiratory gated PET/CT to reduce respiratory motion related artifacts, have been implemented (Werner et al. 2009; Farid et al. 2015). The effective radiation dose in low-dose chest CT scan is generally about 1.5 mSv (range: 1–5 mSv) while a conventional “normal-dose” diagnostic chest CT scan might result in an effective radiation dose of approximately 8 mSv or more, depending on the specific equipment and protocol used (Coakley et al. 2011). The slice thickness in thick-slice CT usually ranges from 5 to 10 mm and in thin-slice CT it ranges from 1 to 2.5 mm. Additionally, implementation of advanced post-processing methods, such as the use of thin-slice MIP (maximum intensity projection) images and computer-aided detection (CAD) systems, demonstrated a benefit in detection of lung nodules in the chest CT data (Beyer et al. 2007; Peloschek et al. 2007; Kawel et al. 2009; Messay et al. 2010; Roos et al. 2010; Christe et al. 2013). CAD systems were validated in both secondary and primary concurrent reader paradigms. To our knowledge, incorporation of CAD systems for the detection of LN in the routine oncologic whole body 18F-FDG PET/CT imaging protocol have not been validated and implemented. The goal of this study was to compare the performance of 18F-FDG PET, low-dose thick-slice CT, diagnostic thin-slice CT, and CAD as a secondary reader for the detection of lung nodules in tumor patients.

Methods

The study was approved by the Ethics Committee and the need for written informed consent was waived according to the unique retrospective data analysis design. Consecutive 18F-FDG PET/CT scans of 100 patients (56 male, 44 female; age range: 22–93 years, median age: 63 years) including low-dose CT and diagnostic thin-slice lung CT images were retrospectively selected. Patients had various types of malignancies: melanoma (n = 49), head and neck cancer (23), colorectal cancer (8), and the remaining patients (20) with mix of other tumors, such as carcinoma of cervix/uterus, breast, sarcoma, and cholangiocarcinoma. The inclusion criteria were availability of above-mentioned imaging datasets in a 18F-FDG PET/CT examination of these consecutive tumor patients with predilection for lung metastases and follow-up imaging of either 18F-FDG PET/CT or diagnostic chest CT.

18F-FDG PET/CT imaging

PET/CT scans were acquired on a Discovery 600 unit (GE Healthcare, USA) from vertex to mid-thigh after intravenous injection of 18F-FDG (18Fluorine-fluorodeoxyglucose) (mean activity 302.5 MBq; range 257–355 MBq). 18F-FDG PET/CT imaging protocol included firstly, a low-dose CT (lCT) with shallow breathing from vertex to mid-thigh with the following parameters: tube voltage 120 kV, tube current: automatic exposure control, pitch 0.88, slice thickness reconstruction in 5 mm; secondly a PET study (2 min acquisition time per bed position); and thirdly, a diagnostic lung CT (dCT) in expiration and breath-hold technique (tube voltage 120 kV, tube current 180 mA, pitch 1.35, slice thickness reconstruction in 1 mm).

Image interpretation

The read-out was performed by three independent readers with various levels of experience in reading CT and PET/CT images, a senior reader with > 15 years of experience (Reader 1), a mid-level reader with 10 years of experience (Reader 2), and a junior reader with 1 year of experience (Reader 3). Each study was retrieved from picture archiving and communication system—PACS (Merlin PACS, Phönix-PACS, Freiburg, Germany) and loaded onto GE ADW workstation (GE Healthcare, USA), wherein analysis of the scans was done. PET images were evaluated for the presence of LN. Lung nodules were defined as focal visible uptake of FDG in the lungs. In CT, lung nodules were defined in visual assessment as round opacities, well- or poorly-defined, measuring less than 3 cm in diameter. Triangular and calcified nodules were excluded from the analysis, as these often represent benign findings, such as intrapulmonary lymph nodes (Hansell et al. 2008). Thin-slice lung CT images were evaluated for the presence of LN by scrolling through maximum intensity projection (MIP) images. Thin-slice lung CT images in full inspiration were loaded into the CAD software (Lung VCAR, GE Healthcare, Chicago, IL, USA) for computer-aided detection of lung nodules (Chen et al. 2012). Lung VCAR software uses innovative Digital Contrast Agent (DCA) feature (a 3D filter), which automatically highlights spherical shapes to enhance visualization of suspicious lung nodules. A threshold of 2 mm was used for the software evaluation. Default number of suspicious nodules highlighted by the CAD software were noted (CAD primary reading—CADp). CADp detected lesions were checked and filtered by the physician and obvious false positive markings due to vessel crossings, artefacts, or benign nodules with calcifications were not considered as positive findings (CAD secondary reading—CADs). The order of the reading of the 4 different image datasets was random to reduce any recall bias. The time taken by each reader for evaluation of lung nodules for each modality was noted along with the number of nodules identified by the reader. Follow-up scans with either 18F-FDG PET/CT or CT chest were available in all the cases. Based on the nodule morphology, FDG uptake (wherever applicable), follow-up imaging, and clinical information, consensus opinion among the three readers was built up on the probable benign or metastatic nature of the nodules and served as ‘reference standard’.

Statistics

The statistical analyses were performed using Stata (version 17.0, StataCorp, College Station, Texas, USA). Categorical variables were summarized by absolute and relative frequencies. Quantitative variables were analyzed using descriptive statistics. In order to assess the inter-reader agreement for the different techniques with regard to the judgment on whether a patient was considered to have metastases (yes/no), Cohen/Conger’s kappa coefficients and corresponding 95% confidence intervals were calculated. To compare the different techniques with regard to their potential to discriminate between patients with and without metastases, diagnostic metrics such as sensitivity, specificity, negative predictive value (NPV), positive predictive value (PPV), and accuracy were determined as well as ROC AUCs (area under the receiver operating characteristic curve) together with their 95% confidence intervals, overall and by each reader.

Results

Nodule detection

The number of nodules detected by the three readers with four different techniques in the 100 patients, and the time taken for each LN reading are shown in Table 1. On an average, 40 LN were detected in 17 patients using the 18F-FDG PET images only, 121 LN in 37 patients using ICT (Fig. 1), 283 LN in 60 patients with dCT, and 282 LN in 53 patients with dCT using the help of CADs (Fig. 2) (Table 1).

Table 1 Performance of three readers with regards to the reading time required and number of nodules detected using different imaging techniques
Fig. 1:
figure 1

18F-FDG PET/CT images for staging of a 30-year-old male patient after resection of melanoma (Breslow 3.7 mm) around the ear. On MIP (maximum intensity projection) (A), axial PET (B), and axial fused PET/CT (D) images FDG uptake (arrow) is visible in a solitary small nodule in the middle lobe. The nodule (arrow) was detected with low-dose CT (C), thin-slice diagnostic CT (E), and CAD (F). The nodule was resected and was metastatic on histopathology

Fig. 2
figure 2

Images of the same patient showing a small nodule in low-dose CT (C) and thin-slice diagnostic CT (E) (arrow) in the upper left lobe without visible uptake in MIP (A), PET (B) and PET/CT (D) images. The nodule was missed by the readers in all CT images and only detected with CAD (F, arrow)

On an average, CAD detected 49 extra LN, missed by the three readers, in 100 PET/CT examinations, whereas CAD overall missed nearly 53 LN, which were detected with dCT. The common reasons for CAD missing LN were proximity to vessel, subpleural location or attachment to pleura, or too small a size (2 mm).

The mean number of false-positive LN for CADp were 16.8 per scan. In the given 100 patients, CADp showed total of 1967 nodules, of which on average 282 were accepted by the three readers, and remaining average 1685 were rejected as false markings.

The sizes of the nodules ranged from 2 to 26 mm with a mean size of 4.1 mm (no nodule larger than 30 mm). The nodules were randomly spread across various lobes of the lungs, with more nodules being in peripheral location (78% in peripheral location and 22% in central location) and in lower zones (67% in the lower zones and 33% in the upper zones) (Table 2).

Table 2 General distribution of 300 detected lung nodules

Time requirement for image analysis

The average time for all three readers required for the evaluation of PET, lCT, dCT, and CADs was 25, 31, 60, and 40 s, respectively. Thus, nearly 33% reduction in time requirement for evaluation of lung nodules was achieved with the help of CAD compared to dCT. The maximum benefit was seen for the junior-most reader with approximately 39% of time reduction (details as given in Table 1).

Inter-reader agreement

There was very good inter-reader agreement (inter-rater reliability) with kappa ranging between 0.84 and 0.93 for four different techniques among the three readers (Table 3).

Table 3 Inter-reader agreement for different imaging techniques

Performance for diagnosis of lung metastases

Follow-up scans (either 18F-FDG PET/CT or chest CT) were available in all patients. Average follow-up duration was 25.53 months (range: 1–72 months). Interobserver consensus was built up on the true or false nodules. Reference standard for the diagnosis of metastasis was based on the histopathology (n = 5) and/or the follow-up imaging (n = 100; 18F-FDG PET/CT in 93 and chest CT in 7) and clinical information (all). For example, in cases of increasing sizes of lung nodules with typical morphology in follow-up imaging, were rated as positive for metastasis.

AUC in ROC analysis, sensitivity, specificity, negative predictive value (NPV), positive predictive value (PPV) and accuracy of different readers and different techniques are given in Tables 4 and 5, and Fig. 3.

Table 4 Sensitivity, specificity, NPV, PPV, and Accuracy for the diagnosis of lung metastases per patient with different techniques
Table 5 ROC AUC Analysis for four different imaging techniques for diagnosis of lung metastases
Fig. 3
figure 3

ROC AUC (area under the receiver operating characteristic curve) analysis for all readers regarding the diagnosis of lung metastases on a per patient basis [A: for all three readers combined, B: for reader 1 (R1), C: for reader 2 (R2), D: for reader 3 (R3)]

In summary, regarding the diagnosis of metastasis on a patient basis, PET AUC (0.72) was inferior to lCT, dCT, and CAD because of low sensitivity (48%) in all three readers, but lCT, dCT, and CAD showed comparable good results in all the readers (AUC between 0.78 and 0.81). There was no significant difference in the performance of the three readers with regards to the detection of number of LN; however the least experienced reader (Reader 3) required 133% extra time compared to Reader 1 or 2 in evaluation of 18F-FDG PET images, 91% extra time for lCT, 121% extra time for dCT, and 74% extra time for CADs. There was no superiority of the advanced dCT or CAD techniques regarding the diagnosis of metastasis.

Discussion

The present study demonstrates the feasibility of implementation of CAD software in routine clinical workflow for the detection of lung nodules and metastases in the CT part of 18F-FDG PET/CT studies of tumor patients. To our knowledge, no study in the past has evaluated CAD for this specific purpose in routine 18F-FDG PET/CT read-out protocol. A few studies evaluated successfully CAD like software applications in detection of pulmonary lesions in the PET part of 18F-FDG PET/CT images (Ballangan et al. 2011, 2013; Yang et al. 2014; Cui et al. 2015). In these studies, lung lesions were detected from PET images alone since the image quality of the CT part used for attenuation correction was insufficient for diagnostic evaluation. There is a large variety of PET/CT protocols used nowadays in tumor patients worldwide ranging from ungated PET and low-dose CT as the most simple, to gated PET/CT and diagnostic CT as the most advanced protocols (Werner et al. 2009; Farid et al. 2015). It has been shown that implementation of thin-slice CT can improve the detection of lung nodules and metastases in tumor patients (Strobel et al. 2007). The interpretation of additionally acquired dCT (diagnostic 1 mm thin-slice CT) images with breath-hold in full inspiration increased the detection rate of LN in our study by 20% compared to ICT (low-dose 5 mm thick-slice slice CT) with shallow breathing. PET with CADs performed equally to dCT. Several studies have shown the potential role of CAD software in lung nodule detection in CT alone without PET (Armato et al. 2002; Awai et al. 2004; Peldschus et al. 2005; Roos et al. 2010; Christe et al. 2013). Christe et al. found that the combination of a human observer with a CAD system provides optimal sensitivity for lung nodule detection (Christe et al. 2013). A study by Peldschus et al. has shown that radiologists missed clinically significant lung nodules in 33% of the patients during routine interpretation of the chest scans, emphasizing the use of CAD (Peldschus et al. 2005). Various reconstruction parameters like slice thickness and slice increment can influence the performance of the CAD software. CAD software performs significantly better with thinner slices (Kim et al. 2005; Marten et al. 2005; Gurung et al. 2006). Hence, for the successful implementation of CAD in 18F-FDG PET/CT reading, thin-slice (1 mm) breath-hold CT should be obtained with the PET/CT acquisition protocol. It has been shown that the sensitivity of a single reader plus CAD is higher than the combined reading of two radiologists (Rubin et al. 2005; White et al. 2008). CAD software performance is not influenced by caseload, fatigue, or other factors. In whole body PET/CT interpretation, there is a high chance of missing small lung nodules due to exhaustion and overload of data. Incorporation of CAD into the PET/CT read-out protocol facilitates the detection of lung nodules as the software very clearly highlights the nodules and missing rate is negligible.

One limitation of the available CAD software algorithms is that they are still generating many false-positive (FP) detections, which fall into two categories: a) true nodules with a low probability of malignancy (pleural thickening, partially calcified granulomas, apical scars, thickened walls of emphysema bullae) and b) false nodules (intersection of bronchial or vascular structures and peribronchial thickening). In order to maintain a diagnostically justifiable specificity, the number of FP results has to be reduced by human cross-checking and rejection, respectively. In the present study, CADp produced a mean number of 16.8 false positive nodules per scan.

Teramoto et al. proposed an improved ensemble method for reduction of false-positives using convolutional neural networks, a type of deep learning architecture, using both the CT and PET components (shape and metabolic feature analysis), dramatically helping to improve the results with elimination of false-positives while maintaining the value of true-positives (LeCun et al. 2015; Teramoto et al. 2016). The initial sensitivity in nodule detection was 97.2% with 72.8 false-positives (FP) per case. After incorporating the proposed new FP-reduction method, the false-positives dropped to 4.9 FPs/case, maintaining the sensitivity of detection at 90.1%. Inclusion of the information obtained from the PET component is equally important and the future studies with CAD and artificial intelligence (AI) should include CT as well as PET features for maximization of the output and benefits with acceptable implementation and utilization in the clinical practice.

Interestingly, Liang et al. found a higher probability (though not statistically significant) of detection of nodules in lower lobes, whereas Weikert et al. did not find any such dependency of lesion detection on the location within the lung (Liang et al. 2016; Weikert et al. 2019). In our study, nearly two thirds (67%) of the nodules were in the lower zones (bilateral lower lobes + right middle lobe + lingula) compared to the upper zones (bilateral upper lobes), which showed remaining 33% of the nodules.

Vassallo et al. (2019) compared unassisted and CAD-assisted detection and time efficiency of radiologists in reporting lung nodules on CT scans of patients with extra-thoracic malignancies and found that CAD-assisted reading improved the detection of lung nodules, slightly increasing the reading time. They observed that the total scan reading time increased by 11% using CAD (296 s vs. 329 s). In our study, the average time required for the evaluation of lung nodules in 18F-FDG PET, lCT, dCT, and CADs was 25, 31, 60, and 40 s, respectively, and we could observe a nearly 33% reduction in time requirement for evaluation of lung nodules with the help of CAD compared to dCT. The maximum benefit was demonstrated for the most unexperienced reader with approximately 39% reduction in time requirement for assessment of LN.

Marco Das et al. observed that CAD was especially helpful for detecting small lung nodules and improved the performance of the radiologists, and there was increased agreement among radiologists with the use of the CAD systems (Das et al. 2006). In our study, there was very good interobserver agreement (inter-rater reliability) with kappa ranging between 0.842 and 0.929 for four different techniques among the three readers (p < 0.001).

We found an improved detection rate with 1 mm thin-slice lung CT. The lCT (low-dose 5 mm lung CT) could detect 121 LN in 37 patients, whereas dCT (diagnostic 1 mm thin-slice lung CT) could detect 283 LN in 60 patients, nearly 58% more nodules being detected with dCT. Detection of additional nodules without visible FDG uptake, even if related to small size of the lung nodule, might result in recommendation of a short time follow-up scan to exclude or confirm metastatic disease. In our follow-up, we observed that the number and sizes of the nodules were essentially stable in 70 patients and progressed (metastatic nature) in 30 patients. Regarding the diagnosis of lung metastases on a per patient basis, there was no significant difference in the performance of PET/lCT, PET/dCT, and PET/CAD despite of variable reader experience. Though, there was no significant difference in the performance of the three readers with regards to the detection of number of LN, the least experienced reader (Reader 3) required 133% extra time compared to Reader 1 or 2 in evaluation of 18F-FDG PET images, 91% extra time for lCT, 121% extra time for dCT, and 74% extra time for CADs. Means, least experienced reader (Reader 3) took significantly more time for detection of same number of nodules. Reader 3 would have missed more nodules had there been time limit.

PET combined with low-dose CT (PET/lCT) showed the best balance between sensitivity and specificity regarding the diagnosis of metastases per patient. Detection of additional small nodules without visible FDG uptake might prompt the recommendation of a short time follow-up scan to exclude or confirm metastatic disease. Probably, the tiny nodules (less than 5 mm) detected with dCT and CAD may not always be of metastatic nature. The PET and lCT may detect less nodules compared to dCT and CAD, but the nodules detected by them are more likely to be of metastatic nature than those detected by dCT and CAD. The clinical relevance of detecting smaller subcentimeter sized FDG non-avid LN and its impact on outcome has to be shown in further studies.

We believe that the implementation of CAD, AI, and deep learning in the detection of LN by integrating PET, diagnostic CT data, and clinical information has an interesting potential especially in patients with high risk for pulmonary metastases like melanoma, sarcoma, head and neck cancer, and rectal cancer, among others.

Conclusion

Implementation of CAD for the detection of lung nodules/metastases in routine 18F-FDG PET/CT read-out is feasible. The combination of diagnostic thin-slice CT and CAD significantly increases the detection rate of lung nodules in tumor patients compared to the standard 18F-FDG PET/CT read-out. PET combined with low-dose CT showed the best balance between sensitivity and specificity regarding the diagnosis of metastases per patient. CAD reduces the time required for lung nodule/metastasis detection, especially for less experienced readers.