Introduction

Pulmonary nodules (PNs) are small (≤ 3 cm) lesions surrounded by lung parenchymal tissue that are not transparent and not the results of atelectasis, mediastinal lymphadenopathy, or pleural effusion [1,2,3]. In cases where these nodules are > 6 mm in size, computed tomography (CT)-based routine follow-up is warranted [4], with a 1.1-fold increase in the risk of PN malignancy with each 1 mm increase in diameter [5]. Analyses of patient clinical data and CT imaging findings are the most commonly used approach to PN diagnosis [6,7,8].

CT features often indicative of PN malignancy include CT bronchus sign, vascular convergence sign, pleural retraction, lobulation, and spiculated sign [6,7,8]. Clinical risk factors for PN malignancy include more advanced age, elevated serum levels of tumor marker proteins, and a history of smoking [6, 9]. Researchers have devised an array of predictive models based on these clinical and imaging features with the goal of more reliably identifying malignant PNs [6,7,8]. Most CT-derived imaging features, however, are classified as binary variables that can be inconsistently identified based on the experience level of the attending physician. More reliable quantitative imaging strategies are thus needed to minimize this potential for bias, thereby increasing the odds of accurately diagnosing PNs.

18F-fludeoxyglucose (18F-FDG) positron emission tomography (PET)/CT scans have emerged as a powerful approach to PN diagnosis, with standardized maximum uptake values (SUVmax) serving as a proxy for radiotracer uptake on imaging scans [10]. Given these advantages, researchers have also incorporated PET/CT imaging parameters into predictive models designed to diagnose PNs in an effort to achieve superior accuracy [11,12,13,14,15,16,17,18,19,20,21,22,23]. However, there has been substantial variability among studies with respect to the purported diagnostic performance of these individual PET/CT-based models [11,12,13,14,15,16,17,18,19,20,21,22,23]. There thus remains the pressing need for large-scale analyses capable of systematically clarifying the diagnostic utility of the models developed to date.

Accordingly, the present meta-analysis was conducted to clarify the diagnostic performance of PET/CT-based models when used for the differential diagnosis of potentially malignant PNs.

Materials and methods

Study selection

Studies of potential relevance were identified by searching the Web of Science, PubMed, and Wanfang databases for all articles published through September 2023 based on the following search strategy: (((((positron emission tomography) OR (PET/CT)) AND (model)) AND ((lung) OR (pulmonary))) AND (nodule)) AND ((((differential) OR (diagnosis)) OR (probability)) OR (predictive)). This meta-analysis was registered at https://inplasy.com/ (No. INPLASY2023100042).

To be eligible for inclusion, studies had to be: (1) focused on the differential diagnosis of malignant or benign PNs, (2) centered on the development or testing of PET/CT-based models that were provided within the study, and (3) transparent with respect to the true positive (TP), true negative (TN), false positive (FP), and false negative (FN) values associated with the tested models. provided. Case reports, non-human studies, and reviews were excluded from this study.

Data extraction and quality analyses

Two investigators were responsible for independently extracting pertinent data from these studies, including baseline study data, baseline patient data, and the results of diagnostic analyses. Any discrepancies were resolved by a third investigator. The QUADAS-2 tool was used to gauge risk of bias [24].

Definitions

TP results were those for which both PET/CT-based models and final diagnoses were indicative of PN malignancy, whereas FP results were those for which PET/CT-based models predicted that a given lesion was malignant but it was ultimately found to be benign. Conversely, TN results were those for which both PET/CT-based models and final diagnoses indicated that a PN was benign, whereas FN results were those for which PET/CT-based models predicted that a given lesion was benign but it was ultimately found to be malignant.

Meta-analysis

Stata v 12.0 (Stata Corporation, TX, USA) was used to compute pooled sensitivity, specificity, diagnostic score, negative likelihood ratio (NLR), positive likelihood ratio (PLR), and summary receiver operating characteristic (SROC) curves for this study. A given predictive model was considered to exhibit high diagnostic performance if it exhibited an NLR < 0.2 or a PLR > 5. An area under the SROC curve (AUC) value greater than 0.8 was also considered to indicate a high degree of diagnostic utility [3]. RevMan v 5.3 was used to compare pooled SUVmax values between benign and malignant PNs. I2 values were employed to gauge the degree of heterogeneity, with I2 > 50% indicating that such heterogeneity was significant. The possibility of publication bias was assessed with Deeks’ funnel plots, and P < 0.05 served as the threshold for defining statistical significance.

Results

Study selection

The initial search strategy returned 526 studies of which 13 were found to be relevant and incorporated into the final analyses (Fig. 1). These 13 studies were retrospective in design, and included 11 and 2 studies respectively conducted in China and Spain. For further study-specific details, see Table 1.

Fig. 1
figure 1

The study selection process for this meta-analysis

Table 1 Characteristics of studies included in meta-analysis

A total of 1,731 and 693 malignant and benign PNs were ultimately included in these studies. Numbers of predictors included in individual predictive models ranged from 2 to 7 (Table 2). Except for PET/CT, age was the predictor in 12 of the 13 models. The common malignant CT features, such as lobulation, spiculation, and pleural retraction, occurred in 6, 5, and 3 models. Different models could provide different performances and therefore a different number of TP, TN, FP, FN. For details regarding raw TP, FP, TN, and FN data, see Table 3.

Table 2 The details of each predictive model
Table 3 Raw Data of diagnostic performance of studies included in this meta-analysis

Bias analyses

The potential for bias was examined using the QUADAS-2 tool (Fig. 2). This approach revealed that 9 of the 13 studies did not indicate whether patients were consecutively enrolled [12, 15,16,17,18,19,20, 22, 23], and a partially overlapping set of 9 studies did not provide sufficient clarity regarding blinding status [11,12,13,14, 16,17,18,19,20]. Reference standards were used for diagnostic confirmation in all studies.

Fig. 2
figure 2

(A) The quality assessment of each included study. (B) The summary of the quality assessment

PET/CT-based model diagnostic performance

TP, FP, TN, and FN data for PET/CT-based models were provided in all 13 studies. The respective pooled sensitivity, specificity, PLR, and NLR values for these models were 88% (95%CI: 0.86–0.91, Fig. 3a), 78% (95%CI: 0.71–0.85, Fig. 3b), 4.10 (95%CI: 2.98–5.64, Fig. 3c), and 0.15 (95%CI: 0.12–0.19, Fig. 3d), with all four being subject to significant heterogeneity (I2 = 69.25%, 78.44%, 71.42%, and 67.18% respectively). The AUC value was 0.91 (95%CI: 0.88–0.93, Fig. 3e), and the SROC curve deviated substantially from a shoulder-like appearance, indicating that a threshold effect is unlikely to influence these results. A Fagan plot with a 20% pre-test probability exhibited respective 51% and 4% post-test PLR and NLR probabilities (Fig. 3f), with no evidence of significant publication bias (P = 0.996).

Fig. 3
figure 3

The results of (A) sensitivity, (B) specificity, (C) PLR, (D) NLR, (E) SROC, and (F) Fagan diagram for PET/CT based model

The diagnostic utility of PET results alone

Raw TP, FP, TN, and PN data for diagnoses made solely based on PET-derived SUVmax values were provided by 6 studies [14, 16, 17, 20,21,22]. The respective pooled sensitivity, specificity, PLR, and NLR values for diagnoses made based only on these values were 92% (95%CI: 0.85–0.96, Fig. 4a), 51% (95%CI: 0.37–0.66, Fig. 4b), 1.89 (95%CI: 1.36–2.62, Fig. 4c), and 0.16 (95%CI: 0.07–0.35, Fig. 4d), with all four values again being subject to significant heterogeneity (I2 = 88.08%, 82.63%, 80.19%, and 86.38%). The corresponding AUC value was 0.82 (95%CI: 0.79–0.85, Fig. 4e), and the appearance of the SROC curve did not reveal any evidence of a threshold effect. A Fagan plot with a 20% pre-test probability exhibited respective 32% and 4% post-test PLR and NLR probabilities (Fig. 4f), with no evidence of significant publication bias (P = 0.566).

Fig. 4
figure 4

The results of (A) sensitivity, (B) specificity, C) PLR, (D) NLR, (E) SROC, and (F) Fagan diagram for PET/CT alone

SUVmax values

The mean SUVmax values for benign and malignant PNs were reported in 4 total studies [13, 15, 20, 21]. Significantly higher pooled SUVmax values were observed for malignant PNs as compared to benign nodules (P < 0.00001, Fig. 5a), although significant heterogeneity was detected (I2 = 60%). Sensitivity analyses suggested that the study conducted by Liu et al. [16] was the greatest source of heterogeneity, but even with the removal of this study the pooled SUVmax of malignant PNs remained higher than that of benign PNs (P < 0.00001). Funnel plots revealed a low risk of publication bias (Fig. 5b).

Fig. 5
figure 5

(a) The forest plot of the pooled SUVmax values between malignant and benign PNs. (b) The assessment of the publication bias of SUVmax values

Discussion

The present meta-analysis explored the performance of PET/CT-based models as tools for the differential diagnosis of PNs. The overall pooled AUC value of 0.91 was indicative of excellent predictive performance in this context, while the low NLR value (0.15) demonstrates that these PET/CT-based models can satisfactorily diagnose benign PNs when predictive scores fall below the established cut-off value. As the pooled PLR value of 4.10 was less than 5, however, this suggests that the diagnostic ability of these PET/CT-based models for malignant PNs is only moderate when predictive scores fall above the established cut-off value.

PET/CT imaging can yield both CT images that offer morphological insight regarding a given lesion, as well as PET images capable of quantifying glucose metabolism rates. PET scans thus enable the detection of malignant lesions composed of highly metabolically active cells, given that they take up 18F-FDG and glucose at higher rates than do benign cells [25, 26]. In the present meta-analysis, a significantly higher pooled SUVmax value was exhibited by malignant PNs as compared to benign PNs.

The diagnostic utility of individual CT features is relatively limited when evaluating PNs. In prior meta-analyses assessing the diagnostic performance of lobulation sign, calcification, and spiculation as approaches to differential diagnosis of PNs, the AUC values were between 0.65 and 0.76 [1,2,3]. The AUC for the diagnostic utility of PET alone in the present study was 0.82, but the pooled specificity was just 51%. High levels of 18F-FDG uptake can also be observed for benign inflammatory, infectious, or granulatomous disease-associated lesions [27], contributing to a relatively low PLR of 1.89. The comparison of diagnostic performance between the predictive model and PET alone suggests that the diagnostic ability of PET alone is limited when evaluating PNs, emphasizing the need to combine multiple signs in an effort to improve the performance of diagnostic models.

There are many advantages to utilizing mathematical models when diagnosing PNs. Notably, these models can ensure that patients can be assessed in a more objective manner, yielding a predictive score reflective of the odds of PN malignancy. In addition, these models can provide risk coefficients for all predictive factors incorporated therein, allowing researchers to directly establish the relative risk associated with each incorporated factor.

The Mayo model was the first predictive model designed to distinguish between benign and malignant PNs [28]. Herder et al. [29] combined the Mayo model with PET results to establish the first PET/CT-based model, which exhibited an AUC of 0.92 in line with the pooled AUC measured in the present meta-analysis. This AUC value was also higher than that of the Mayo model (0.79) or PET scanning results alone (0.88) [29].

In addition to imaging features, predictive models can also incorporate levels of tumor markers or particular clinical features [3]. More advanced age and higher serum concentrations of carcinoembryonic antigen have both been linked to a greater risk of PN malignancy [3, 9]. While age was a factor that was included in most predictive analyses analyzed herein, none incorporated tumor markers. Additional research focused on developing new PET/CT-based predictive models incorporating clinical characteristics, imaging features, and tumor marker levels are thus warranted to improve diagnostic accuracy.

This meta-analysis is subject to certain limitations. For one, as all included studies were retrospective in design, these findings are subject to a high risk of bias. Moreover, many of the included studies failed to indicate whether patients were recruited consecutively, and this oversight may have influenced the diagnostic performance of the models developed in individual studies. Next, different models contained different predictive factors, and the diagnostic results were not only influenced by PET/CT, but also influenced by other factors. However, different models also have the similarity that the predictive models can provide the comprehensive and quanitative analysis for the PNs. Lastly, the included studies did not utilize consistent reference standards, again potentially impacting the resultant diagnostic accuracy.

Conclusions

In summary, PET/CT-based models appear to exhibit promising diagnostic performance when used to distinguish between benign and malignant PNs, outperforming PET-derived SUVmax values alone when employed for the differential diagnosis of PNs.