Integrating manual diagnosis into radiomics for reducing the false positive rate of 18F-FDG PET/CT diagnosis in patients with suspected lung cancer
The high false positive rate (FPR) of 18F-FDG PET/CT in lung cancer screening represents a severe challenge for clinical decision-making. This study aimed to develop a clinical-translatable radiomics nomogram for reducing the FPR of PET/CT in lung cancer diagnosis, and to determine the impact of integrating manual diagnosis to the performance of the radiomics nomogram.
Among 3,947 18F-FDG PET/CT-screened patients with lung lesion, 157 malignant and 111 benign patients were retrospectively enrolled and divided into training and test cohorts. The data of manual diagnosis were recorded. A total of 4,338 features were extracted from CT, thin-section CT, PET and PET/CT, and the four radiomics signatures (RS) were then generated by LASSO method. Radiomics prediction nomogram integrating imaging-based RS and manual diagnosis was developed using multivariable logistic regression. The performances of RS and prediction nomograms were independently validated through key discrimination index and clinical benefit.
The FPR of manual diagnosis was found to be 30.6%. Among the four RS, PET/CT RS exhibited the best performance. By integrating manual diagnosis, the hybrid nomogram integrating PET/CT RS and manual diagnosis demonstrated lowest FPR and highest area under curve (AUC) and Youden index (YI) in both training and test cohorts (FPR: 5.4% and 9.1%, AUC: 0.98 and 0.92, YI: 85.8% and 75.5%, respectively). This hybrid nomogram respectively corrected 78.6% and 37.5% among FPR cases produced by PET/CT RS, without significantly sacrificing its sensitivity. The net benefit of hybrid nomogram appeared highest at <85% threshold probability.
The established hybrid nomogram integrating PET/CT RS and manual diagnosis can significantly reduce FPR, improve diagnostic accuracy and enhance clinical benefit compared to manual diagnosis. By integrating manual diagnosis, the performance of this hybrid nomogram is superior to PET/CT RS, indicating the importance of clinicians’ judgement as an essential information source for improving radiomics diagnostic approaches.
KeywordsRadiomics 18F-FDG PET/CT False positive rate Manual diagnosis Lung lesion differentiation
Lung cancer is a leading cause of cancer deaths globally . Medical imaging has been recognized as an indispensable tool for the early diagnosis of lung cancer . Combining the structural information of computed tomography (CT) and the metabolic information of positron emission tomography (PET), a dual-modality 18F-FDG PET/CT imaging has been widely accepted as an advanced technique for lung cancer detection, with superior accuracy to both PET and CT [3, 4, 5].
Nonetheless, 18F-FDG PET/CT still faces a severe challenge in distinguishing benign lesions from malignant lung tumors. In a meta-analysis with large sample size, the false-positive rate (FPR) of 18F-FDG PET/CT for suspicious lung lesion is 25% on average, and as high as 39% in regions with endemic infectious lung disease . In a multicenter study, the FPR in patients with high-risk lung nodules reaches up to 60.2% . Additionally, due to the non-negligible error rate of biopsy histology (23%)  and the co-existence of infection and cancer , negative biopsy results are not sufficient for clinical physicians to confidently rule out the possibility of malignancy. The final confirmation of false-positive cases is limited by lack of definitive diagnostic approach. Only repeated biopsy and long-term follow-up , or imperfect diagnostic treatment  can be used to indirectly obtain a relatively accurate diagnosis. Such clinical dilemmas may result in additional costs of follow-up and potentially inaccurate clinical decisions, which remain large obstacles for PET/CT to play its due role in the early diagnosis of lung cancer.
The main reason for the high FPR of manual diagnosis can be attributed to two points. In the aspect of CT structural information, infectious lung lesions often mimic some key radiographic features of malignancy, including spicule sign and lobulation sign . In the aspect of PET metabolic information, the activated immune cells in inflammatory site (e.g. tuberculosis) can result in an elevated level of glucose consumption, which also mimics the features of malignancy . The above confusing macroscopic imaging manifestations may lead to a significant misguidance to the clinical experience-based differentiation of lung lesions . Although many efforts have been made, such as delayed image acquisition and new PET tracers [13, 14], their actual clinical effects are still uncertain. Hence, developing new assistance methods for clinicians to reduce FPR is the key to solve the above dilemma.
As proposed in 2012, “radiomics” is a next-generation computer-aid diagnostic (CAD) technique for extracting a large number of image features from radiation images via a high-throughput approach and constructing nomograms based on machine-learning algorithms [15, 16]. Since medical images are considered not merely as pictures for visual assessment but rather as mineable quantitative data, a radiomics approach can exert a great ability to improve diagnostic accuracy [8, 17, 18, 19]. Compared to previously used lung texture analysis, radiomics analysis generates more multidimensional information, such as intensity histogram, tumor shape, texture patterns, multiscale wavelets, tumor location and quantitative clinical biomarkers [16, 20], which serve an evolution of CAD with higher accuracy and wider application range . Besides, radiological studies on CT have provided evidence that the discrimination capacity of the radiomics method on lung cancer is better than that of radiologists’ clinical experience [22, 23]. Thus, a radiomics nomogram may offer great potential to reduce FPR of 18F-FDG PET/CT in the differential diagnosis of lung lesions.
In recent years, PET/CT-based radiomics studies have emerged in the field of lung cancer, by focusing on algorithm methodology [24, 25, 26], survival prediction [27, 28] and tumor origin classification . However, to our knowledge, study on the establishment and validation of PET/CT radiomics nomogram for reducing the FPR of lung cancer diagnosis has not yet been reported. Moreover, it remains unknown whether manual diagnosis based on clinical experience is a synergetic or misleading source of radiomics information. Therefore, in this study, we established four radiomics signatures (RS) of CT, thin-section CT, PET and PET/CT, and hybrid radiomics nomograms integrating RS and manual diagnosis. In addition, the performances of these RS and hybrid nomograms were compared, and the potential value of clinical experience integration was further determined.
Materials and methods
This retrospective analysis was approved by the Ethics Committee of Xijing Hospital (Approval No. KY20173008–1), and the informed consent was waived. A total of 3,947 18F-FDG PET/CT-screened cases with lung lesion at Xijing Hospital from 2007 to 2017 were involved. Firstly, after excluding the patients (i) without obvious lung lesion, (ii) without documented pathological diagnosis, (iii) with a lesion diameter of <0.8 cm, (iv) with mainly non-solid component within the lesion (e.g. exudates, fibrosis and cavity), or (iv) with chemotherapy and/or radiotherapy before PET/CT scans, 325 patients with suspicious solid lung lesion were preliminarily included for further pathological grouping. Secondly, according to the pathologic diagnosis and follow-up findings (diagnostic treatment, medical imaging scans or secondary biopsy), 111 patients were categorized into a benign group after excluding the potentially false-negative diagnosis caused by sampling error. Thirdly, 157 patients with primary lung cancer were classified into malignant group based on biopsy and immunohistochemistry, by excluding the ones with pulmonary metastatic cancer from other origin. Finally, these patients were randomly divided into cohorts of training (n = 135) and test (n = 133). The workflow diagram of patient selection and grouping is shown in Supplementary Fig. S1.
PET/CT and thin-section CT imaging
All the patients received 18F-FDG PET/CT and thin-section CT (TSCT) scans on the same scanner (Biograph 40, Siemens) by following a standard clinical protocol . CT parameters were 100 kV, 110 mAs (CARE-Dose4D technology), 0.5-s rotation time, 5.0 mm (24 × 1.2 mm) slice thickness, 700 mm field of view, and 512 × 512 matrix. PET parameters were 4.44~5.55 MBq/kg, 60 min after tracer administration, and 3 min per bed position. PET and CT images were reconstructed using an ordered-subsets expectation maximization algorithm with four iterations and eight subsets. TSCT parameters were 100 kV, 110 mAs, 0.5-s rotation time thickness, 700 mm field of view, and 512 × 512 matrix.
In this retrospective study, manual diagnoses were obtained from the first cancerous or benign diagnosis of structured reports in the medical information system, taking no account of the secondary or subordinate diagnosis. Following the routine clinical workflow, the diagnoses of these structured reports were made through consensus by three experienced PET/CT clinicians, referring to the medical history, symptoms and other examination results. Furthermore, the sensitivity (SEN), specificity (SPE), positive predictive value (PPV), negative predictive value (NPV), false positive rate (FPR) and false negative rate (FNR) of manual diagnosis were calculated.
Radiomics feature extraction
Three-dimensional regions of interest (3D-ROI) on the CT and PET images were respectively delineated by using ITK-SNAP software (www.itksnap.org). All the delineation work was done under the guidance of experienced PET/CT clinicians blinded to the pathological grouping. CT 3D-ROI was manually delineated slice-by-slice around the lesions in CT images of lung window. For PET segmentation, 3D-ROI was semi-automatically drawn using the “adaptive brush” tool, by referencing to the 3D-ROI with a standard uptake value (SUV) threshold of 40% in the TrueD tool kit of Siemens MMWP workstation. For the lesions with anatomy boundary close to mediastinum or bronchus, PET images were used as reference to determine the actual boundary in CT delineation.
A total of 4338 imaging radiomics features were respectively extracted from both PET and CT images using the MATLAB procedure algorithm, as described in the supplementary materials.
Establishment of radiomics signatures (RS) and nomograms
After feature preselection according to the classification ability and redundancy (shown in supplementary materials in detail), we used a least absolute shrinkage and selection operator (LASSO) algorithm to select the most distinguishing features for each imaging modality (CT, TSCT, PET and PET/CT) as previously described [31, 32]. These selected features were weighted by their respective coefficients, and four different RS were then calculated using a linear combination of the features. Univariable logistical regression analysis was carried out on the four RS and the clinical factors in the training cohort. Only factors with statistically significant odds ratio (OR) were used to construct a prediction nomogram with multivariable logistic regression analysis for validation.
Validation and comparison of RS and nomograms
Calibration curves were plotted to assess the linearity of each prediction nomogram, accompanied by the Akaike information criterion (AIC) value. A relatively corrected AUC value was calculated by bootstrapping validation (1000 bootstrap resamples). By using the cutoff value derived from receiver operating characteristic (ROC) curve analysis in the training cohort, the optimal diagnostic parameters [SEN, SPE, FPR, FNR, Youden’s index (YI = SEN + SPE-1)] were calculated. The values of AUC and YI, which positively correlated with diagnostic performance, were used to quantitatively compare the predictive accuracy of the established nomograms. Decision curve analysis was implemented to determine the clinical benefit of RS and nomograms, following a previously reported protocol .
All statistical analyses were performed using R software version 3.0.1 (http://www.Rproject.org). LASSO-logistic regression was analyzed by the “glmnet” package. The “RMS” package was used for multivariate binary logistic regression, nomograms and calibration. AUC calculation and decision curve analysis was carried out with “Hmisc” and “dca.R” packages. To determine the agreement between the estimated probability and the actual malignant rate, calibration plots were obtained using the “calibrate” from the “rms” package. Decision curves were derived by using the “decision_curve” from “rmda” package to quantify the net benefits at different threshold probabilities. Chi-squared test was used to compare the difference between categorical variables, while t-test was used for the comparison of two groups. A P value less than 0.05 was considered statistically significant.
Characteristics of patients
(n = 135)
(n = 133)
(n = 79)
(n = 56)
(n = 78)
(n = 55)
Age, years (mean ± SD)
60.0 ± 12.6
51.3 ± 18.1
59.9 ± 10.2
51.4 ± 21.1
Small cell lung cancer
Diagnostic performance of manual diagnosis
Analyzing the data shown in Table 1, the overall sensitivity, specificity, PPV and NPV of manual diagnosis were 93.6, 69.4, 81.2 and 88.5%, respectively. The overall FPR was found to be 30.6% (30.4% for training cohort and 30.9% for test cohort).
Establishment of radiomics signatures (RS) and nomograms
Performance comparison of RS and nomograms
Diagnostic performance of nomograms
Hybrid nomogram PET/CT RS + manual diagnosis
Compared to PET/CT RS without integration of manual diagnosis, the hybrid nomogram exhibited higher AUC (training / test: 0.98 / 0.92 vs. 0.96 / 0.89), higher YI (training / test: 85.8% / 75.5% vs. 69.9% / 72.6%) and lower FPR (training / test: 5.4% / 9.1% vs. 25.0% / 14.6%), as shown in Fig. 2c. The hybrid nomogram respectively corrected 78.6% (11/14) and 37.5% (3/8) FPR among cases produced by PET/CT RS in training and test cohorts (a representative case is presented in Fig. 3b). However, the incorporation of manual diagnosis into the hybrid nomogram respectively generated four and two false-negative cases in both training and test cohorts, resulting in a slight reduction in its sensitivity (94.9% vs. 91.1% for training cohort; 87.2% vs. 84.6% for test cohort). A representative case is presented in Supplementary Fig. S3B. Moreover, by integrating manual diagnosis in the test cohort, the AUC of the CT-, TSCT- and PET-based prediction nomograms could also be respectively increased by 29.7%, 23.4% and 3.9% (data from Supplementary Tables S3 and S4).
Decision curve analysis
In this study, two major findings were highlighted. First, compared to manual diagnosis, the hybrid nomogram integrating PET/CT RS and manual diagnosis could reduce FPR from 30.6% to 5.4–9.1%, without affecting its sensitivity. Second, the hybrid nomogram achieved higher diagnostic accuracy, lower FPR and optimum clinical net benefit, compared to PET/CT RS without clinical judgment integration. These findings not only demonstrate the potential of radiomics technique in solving the clinical dilemma of high FPR in PET/CT-based lung cancer diagnosis, but also reveal the importance of clinical experience for improving radiomics performance.
Over the last two decades, researchers have made substantial efforts to apply prediction models in either a manual or computer-aided manner for lung lesion differentiation, mostly based on CT images [34, 35, 36, 37, 38]. A few studies have highlighted the great potential of PET-based texture analysis [39, 40, 41]. However, a limited number of texture features and lack of independent validation can influence the actual effectiveness and stability of CAD methods. In this study, we impartially selected features, including broader information of intensity, shape, texture and wavelet, from 4338 multi-dimensional features, following a radiomics protocol. In addition, the performances of RS and hybrid nomogram were evaluated by independent validations.
In addition, our study provides evidence that clinical experience may exert remarkably synergetic effect on PET/CT RS with regard to FPR reduction and diagnostic accuracy enhancement. In previous studies, clinical diagnosis and CAD methods are often set as competitors in the race of predictive accuracy. Nonetheless, clinical experience-based manual diagnosis represents an integration of clinical and imaging information, in a brain-based intelligent way. Our results imply that clinical experience is not likely to be obsolete or dispensable in the presence of artificial intelligence diagnosis, but remains a crucial player for establishing better diagnostic method.
After analyzing the established RS formulas, a looming link between the major differentiating PET features and clinical experience appears to exist. The major differentiating PET features derived from the original plane in the formulas could partly agree with the general principles of clinical experience, in which the malignant possibility is positively correlated with non-uniformity and PET signal value. For instance, the PET features W1. VAR _ 11 and W1. LRHGLE _ 13, which respectively are associated with the degree of variation and the proportion of high value-pixels, are positively correlated to malignancy. Meanwhile, the PET feature W1. RP _ 2, which related to the uniformity degree of equal value-pixels, is negatively correlated to malignancy. At present, one of the major barriers to the CAD method application is the limited theoretical interpretability of these prediction models [19, 42]. Our results suggest that the radiomics models may confer an underlying theoretical consistency with clinical experience. Although this claim is too preliminary and the detailed intuitive meaning of radiomics features (especially wavelet feature) remains enigmatic, more studies are warranted for establishing a theoretical connection between radiomics mechanism and clinical experience, to provide a deeper understanding on radiomics and to discover new radiological principles.
In the aspect of clinical benefit, the hybrid nomogram exhibits highest net benefit when the threshold probability was <85%, especially in the range of 60–85%. According to the principle of decision curve, the threshold-probability value largely depends on the prevalence of positive results in a population . Previous meta-analyses have shown that the median prevalence of malignancy in lung nodule PET/CT cases ranged from 60 to 72.5% [6, 44], fitting the preferred range of hybrid nomogram. Furthermore, our results demonstrated the practical clinical value of this hybrid nomogram for differentiating lung cancer patients.
This study has several limitations. Firstly, this retrospective study investigated patients from a single center, and thus multicenter prospective studies are warranted. Considering that the radiomics features are very sensitive to acquisition and reconstruction procedures [45, 46], a certain degree of standardization is a necessary precondition for multicenter studies. Secondly, this hybrid nomogram was solely based on RS and manual diagnosis. Therefore, additional clinical information, such as serum biomarkers, may further enhance the performance of radiomics nomogram. Thirdly, the performances of CT- and TSCT-based RS are relatively poor in this study, probably due to the sub-optimal CT and TSCT imaging quality of the standard PET/CT imaging protocol.
We would like to thank Zhe Wang, Zhiyong Quan, Ni Wang, Jingwei Yi, Qingju Zhang, Jin Zeng and Xiaohu Zhao for their technical assistance. Fei Kang thanks Prof. Yaochi Yang and Shuangqin Wu for providing the necessary impetus to conduct this study.
This work was supported by the National Natural Science Foundation of China (Grant Nos. 81871379, 816771713, 81601521), the National Key R&D Program of China (Grant No. 2016YFC0103804), and the Young Elite Scientists Sponsorship Program of China Association for Science and Technology (Grant No. 2017QNRC001).
Compliance with ethical standards
Conflict of interest
No other potential conflict of interest relevant to this article was reported.
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.
This retrospective analysis was approved by the Ethics Committee of Xijing Hospital (Approval No. KY20173008–1), and the informed consent was waived.
- 2.Wood DE, Kazerooni EA, Baum SL, Eapen GA, Ettinger DS, Hou L, et al. Lung Cancer screening, version 3.2018, NCCN clinical practice guidelines in oncology. J Natl Compr Cancer Netw. 2018;16:412–41.Google Scholar
- 8.Gillies RJ, Kinahan PE, Hricak H. Radiomics: images are more than pictures, they are data. Radiology. 2016;278:563–77.Google Scholar
- 23.Ma JC, Wang Q, Ren YC, Hu HB, Zhao J. Automatic lung nodule classification with radiomics approach. SPIE; 2016; https://doi.org/10.1117/12.2220768.
- 24.Desseroit MC, Tixier F, Weber WA, Siegel BA, Cheze Le Rest C, Visvikis D, et al. Reliability of PET/CT shape and heterogeneity features in functional and morphologic components of non-small cell lung Cancer tumors: a repeatability analysis in a prospective multicenter cohort. J Nucl Med. 2017;58:406–11.PubMedPubMedCentralGoogle Scholar
- 26.Hatt M, Laurent B, Fayad H, Jaouen V, Visvikis D, Le Rest CC. Tumour functional sphericity from PET images: prognostic value in NSCLC and impact of delineation method. Eur J Nucl Med Mol Imaging. 2018;45:630–41.Google Scholar
- 27.Arshad MA, Thornton A, Lu H, Tam H, Wallitt K, Rodgers N, et al. Discovery of pre-therapy 2-deoxy-2-(18)F-fluoro-D-glucose positron emission tomography-based radiomics classifiers of survival outcome in non-small-cell lung cancer patients. Eur J Nucl Med Mol Imaging. 2019;46:455–66.PubMedPubMedCentralGoogle Scholar
- 28.Kirienko M, Cozzi L, Antunovic L, Lozza L, Fogliata A, Voulaz E, et al. Prediction of disease-free survival by the PET/CT radiomic signature in non-small cell lung cancer patients undergoing surgery. Eur J Nucl Med Mol Imaging. 2018;45:207–17.Google Scholar
- 29.Kirienko M, Cozzi L, Rossi A, Voulaz E, Antunovic L, Fogliata A, et al. Ability of FDG PET and CT radiomics features to differentiate between primary and metastatic lung lesions. Eur J Nucl Med Mol Imaging. 2018;45:1649–60.Google Scholar
- 31.Huang YQ, Liang CH, He L, Tian J, Liang CS, Chen X, et al. Development and validation of a radiomics nomogram for preoperative prediction of lymph node metastasis in colorectal cancer. J Clin Oncol. 2016;34:2157–64.Google Scholar
- 37.Digumarthy SR, Padole AM, Lo Gullo R, Singh R, Shepard JO, Kalra MK. CT texture analysis of histologically proven benign and malignant lung lesions. Medicine (Baltimore). 2018;97:e11172.Google Scholar