Introduction

Due to the high morbidity and mortality rates of lung cancer worldwide [1], differentiating malignant from benign pulmonary nodules is critical. The early detection of lung cancer could improve the survival rate and reduce mortality rate and a complete treatment could be achieved by early tumor resection. In clinical practice, computed tomography (CT) has been widely used for discriminating malignant nodules from benign pulmonary nodules. However, due to the overlap of morphological signs, distinguishing malignant nodules from benign nodules only based on CT findings is a clinical challenge for radiologists and physicians.

Positron emission tomography/computed tomography (PET/CT), as a widely used technique in clinical practice, provides combined metabolic and morphological information, which has made a great contribution to staging lung cancer, detecting metastatic lymph nodes, and evaluating the response to treatment in lung cancer patients [2,3,4,5,6]. In addition, PET/CT has been used for differentiating the malignant nodules from benign pulmonary nodules, and it has shown a higher accuracy than CT [5,6,7].

Recently, with the rapid development of magnetic resonance imaging (MRI) techniques, diffusion-weighted imaging (DWI) has gradually become an alternative for diagnosing pulmonary lesions. It can quantitatively provide an apparent diffusion coefficient (ADC), which reflects the cellularity of biological tissues [8]. According to previous studies, DWI has shown to be advantageous not only for predicting lung cancer invasiveness and pathological type of pulmonary tumors, but also for discriminating malignant nodules from benign pulmonary nodules [9,10,11].

To date, there is only one meta-analysis that compared the diagnostic performance of DWI and PET/CT in differentiating malignant and benign nodules/masses [12]. However, it included only literature in English and failed to conduct a subgroup analysis in their primary analysis because of the small amount of included studies [12]. Dividing the studies into subgroups based on specific characteristics (e.g. lesion diameter, imaging modality, study design) is important and can help identify the sources of heterogeneity of published papers. Moreover, there have been a few relevant comparative studies published in English and Chinese during the past several years. Therefore, we aimed to perform an updated meta-analysis of comparative studies to conduct a comprehensive and detailed analysis of this topic and conducted subgroup analyses to further explore the influencing factors of DWI and PET/CT in differentiating pulmonary nodules.

Materials and methods

Search strategy

The PubMed, EMBASE, the Cochrane Library, Wanfang, China National Knowledge Infrastructure (CNKI), and VIP databases were systematically searched from inception until June 2021 to retrieve comparative studies related to the diagnostic performance of DWI and PET/CT. Two radiologists assessed titles and abstracts separately for identification of potentially eligible studies. The keywords that were used to comprehensively search for the relevant articles in the above-mentioned databases are presented in Additional file 1.

Inclusion and exclusion criteria

Reviewers selected relevant studies that met the following inclusion criteria: (1) patients with pulmonary nodules who underwent both DWI and PET/CT in the same period; (2) the data of true-positive (TP), true-negative (TN), false-positive (FP), and false-negative (FN) were clearly reported; (3) the characteristics of pulmonary nodules should be detected by histological examination after surgical resection or imaging follow-up of more than 2 years without changing.

Data extraction and quality assessment

Two radiologists independently extracted the main characteristics of all the included articles, as shown in Tables 1 and 2, to reduce potential bias, and blindly evaluated the risk of bias using the Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) [13]. All disagreements were resolved through consensus.

Table 1 main characteristics of 10 studies included in the meta-analysis
Table 2 Pooled analyses of DWI and PET/CT for the evaluation of pulmonary lesions in including studies respectively

Statistical analysis

A bivariate random-effects model was used to not only calculate the pooled sensitivities and specificities with 95% confidence intervals (CIs), but also to estimate the pooled positive likelihood ratio (PLR), negative likelihood ratio (NLR), and diagnostic odds ratios (DORs) with 95% CIs [14]. Forest plots were used to assess variability via examining the study results visually. Moreover, I2 > 50% indicated as high heterogeneity, and a two-tailed P-value < 0.05 was considered statistically significant. To identify the sources of heterogeneity, subgroup analyses of DWI and PET/CT were separately carried out (Tables 3 and 4). Moreover, summary receiver operating characteristic (SROC) curves were plotted based on sensitivity (Y-axis) and specificity (X-axis) to indicate the diagnostic accuracy of PET/CT and DWI. To assess the effect of threshold, the Spearman correlation coefficient was used, in which was calculated between the logit of sensitivity and the logit of (1-specificity), and a correlation coefficient (ρ) > 0.6 was considered as a significant correlation. Finally, publication bias was assessed by the Deeks’ funnel plot asymmetry test [15]. All statistical analyses were conducted by STATA 16.0 (StataCorp LLC, College Station, TX, USA) and Meta-DiSc.

Table 3 Subgroup analysis for the diagnostic performance of DWI in detecting pulmonary lesions
Table 4 Subgroup analysis for the diagnostic performance of PET/CT in detecting pulmonary lesions

(http://www.hrc.es/investigacion/metadisc_en.htm) software.

Analysis of diagnostic performance

The diagnostic performance of DWI and PET/CT was compared using the included studies. The parameters of both techniques used in the included studies were different, in which ADC and the contrast ratio between the lesions and spinal cord of signal intensity (SI-CR) were used in DWI, while the maximum standard uptake value (SUVmax) and the contrast ratio between the lesions and contralateral lung of SUV (SUV-CR) were utilized in PET/CT.

Results

Selection of eligible studies and quality assessment

After comprehensively searching in the online databases, 150 articles were retrieved, while most of studies were excluded after scanning the abstracts and titles by two reviewers independently. Finally, 10 articles that enrolled 871 patients with 948 pulmonary nodules who underwent both PET/CT and DWI in the same period were included (Fig. 1).

Fig. 1
figure 1

Flow chart for studies selection in the meta-analysis

Methodological quality assessment was conducted among all the 10 studies with the QUADAS-2 tool (Fig. 2). Most of the studies were assessed as high risk in the index test domain and in the reference standard domain due to the use of an un-prespecified threshold or the different ways patients used as reference standard. As for the reference standard, samples that underwent histological diagnosis or follow-up of more than 2 years were identified [16, 17]. Therefore, only 2 studies were found with an unclear risk because they did not provide the appropriate time for radiological follow-up. What’s more, most of studies were also considered as unclear risk in the patients’ selection domain because the method (consecutive or random) of patients’ enrollment was not reported.

Fig. 2
figure 2

QUADAS-2 quality assessment of included studies

Characteristics of the eligible studies

All characteristics of the included studies that were published from 2008 to 2020 are summarized in Table 1 [8, 18,19,20,21,22,23,24,25,26]. Overall, a total of 10 studies were included, 5 studies were prospective and the other 5 were retrospective. As for DWI, 3 studies used 3.0-T MRI machines, and 6 studies utilized 1.5-T MRI machines to perform the scanning, the other one study [24] used both 1.5-T and 3.0-T MRI to carry out the DWI scanning, thus being counted twice as we extracted both results to analyze separately.

Furthermore, most of the studies used ADC as the parameter, while only one study utilized SI-CR. As for PET/CT, on the other hand, SUVmax was used as an index of FDG uptake in 6 studies, and SUV-CR was considered in the other studies. Besides, 3 studies used histopathological findings only as the reference standard, while 7 studies combined follow-up data with histopathological findings to identify the biological behaviors of pulmonary lesions.

Analysis of diagnostic accuracy and heterogeneity

The results of the pooled analyses of the studies are presented in Table 1. As for DWI, the pooled sensitivity and specificity were 0.85 (95% CI 0.77–0.90) and 0.91 (95% CI 0.82–0.96), while the pooled sensitivity and specificity of PET/CT were 0.82 (95% CI 0.70–0.90) and 0.81 (95% CI 0.72–0.87), respectively (Fig. 3). Additionally, the area under the ROC curve (AUC) of DWI was 0.94 (95% CI 0.91–0.96), which was higher than that of 0.87 (95% CI 0.84–0.90) for PET/CT (Z = 1.58, P > 0.05) (Fig. 4). The DOR of DWI and PET/CT was 54.46 (95% CI 17.98–164.99) and 15.77 (95% CI 8.19–30.37), respectively (Fig. 5). The PLR of DWI and PET/CT was 9.58 (95% CI 4.56–20.13) and 4.22 (95% CI 2.87–6.22), while NLR was 0.17 (95% CI 0.11–0.26) and 0.22 (95% CI 0.13–0.38), respectively (Fig. 6).

Fig. 3
figure 3

Forest plots of pooled sensitivity and specificity for A DWI and B PET/CT of 10 included studies. One DWI study [24] was counted twice due to the use of both 1.5-T and 3.0-T MRI on all patients, resulting in a total of 11 studies in the table. DWI Diffusion weighted imaging, PET/CT Positron emission tomography/computed tomography, CI Confidence interval

Fig. 4
figure 4

Summary receiver operating characteristics curves for A DWI and B PET/CT of 10 eligible studies. One DWI study [24] was counted twice due to the use of both 1.5-T and 3.0-T MRI on all patients, resulting in a total of 11 studies in the table. AUC Area under the curve, SENS Sensitivity, SPEC Specificity, SROC Summary receiver operating characteristics

Fig. 5
figure 5

Forest plots of the estimated pooled diagnostic odds ratio A DWI and B PET/CT of 10 included studies. One DWI study [24] was counted twice due to the use of both 1.5-T and 3.0-T MRI on all patients, resulting in a total of 11 studies in the table. DWI Diffusion weighted imaging, PET/CT Positron emission tomography/computed tomography, CI Confidence interval

Fig. 6
figure 6

Forest plots of pooled diagnostic likelihood ratio A DWI and B PET/CT of 10 included studies. One DWI study [24] was counted twice due to the use of both 1.5-T and 3.0-T MRI on all patients, resulting in a total of 11 studies in the table. DWI Diffusion weighted imaging, PET/CT Positron emission tomography/computed tomography, CI Confidence interval

Regarding the diagnostic accuracy, there was significant heterogeneity for both sensitivity (I2 = 83.69 for DWI, I2 = 85.58 for PET/CT) and specificity (I2 = 81.39 for DWI, I2 = 60.75 for PET/CT). While both DWI and PET/CT studies indicated the absence of threshold effect under the Spearman correlation coefficient test [(ρ = − 0.068, P = 0.842 for DWI); (ρ = 0.224, P = 0.533 for PET/CT)]. Thus, subgroup analyses were independently carried out to assess the potential sources of heterogeneity presented in Tables 3 and 4. It was found that not only lesions included both nodules and masses, but also the sensitivities of DWI and PET/CT increased using histological results as the reference standard. While studies using SUV-CR in PET/CT have higher specificity.

Publication bias

In the present meta-analysis, the Deek’s funnel plot asymmetry testing of DWI and PET/CT (P = 0.750 and 0.150, respectively) revealed the absence of publication bias (Fig. 7).

Fig. 7
figure 7

The Deeks’ funnel plot asymmetry test for A DWI and B PET/CT of 10 eligible studies. One DWI study [24] was counted twice due to the use of both 1.5-T and 3.0-T MRI on all patients, resulting in a total of 11 studies in the table. ESS: effective sample size

Discussion

Accurate differentiation of malignant tumors from benign tumors is crucial for patient management. Although PET/CT has been used to evaluate suspicious nodules as a recognized technique, it still has some drawbacks, such as high cost and radiation exposure. MRI, as a promising tool, has been recommended for clinical imaging of pulmonary tumors [27], and showed to play an important role in thoracic oncology. In the present meta-analysis, we compared the diagnostic performance of DWI and PET/CT for differentiating the malignant from benign pulmonary nodules and masses.

Spearman correlation coefficient test was performed in our study, and the results showed that there existed an insignificant threshold effect between the logit of sensitivity and the logit of (1-specificity). The current study demonstrated that DWI yielded higher pooled sensitivity (0.85 vs. 0.82), specificity (0.91 vs. 0.81), and AUC (0.94 vs. 0.87, P > 0.05) compared with PET/CT. According to a previous review, the diagnostic performance of DWI with different b-values can achieve high sensitivity (70–89%) and specificity (61–97%) [28]. On the other hand, PET/CT derives sensitivity between 49 and 100% as well as specificity between 17 and 85% for this differentiation [18, 21, 29, 30]. Additionally, DOR is a measurement combining sensitivity with specificity to assess diagnostic performance [31]. In our study, the DOR of DWI was higher than that of PET/CT (54.46 vs. 15.77), suggesting that DWI could have a slightly better performance in the differential diagnosis of pulmonary nodules.

As the likelihood ratio has been regarded as a more meaningful estimate in clinical practice, a highly robust diagnostic test might have a PLR > 10 and an NLR < 0.1, while a moderately advantageous one should have a PLR of 5–10 and an NLR of 0.1–0.2[32, 33]. In our study, PLR values of DWI and PET/CT were 9.58 and 4.22, while NLR values of DWI and PET/CT were 0.17 and 0.22, respectively. The PLR of DWI was higher and the NLR was lower than PET/CT, which suggested that DWI could be a moderately advantageous test in clinical practice.

Despite that both imaging modalities have great potential for pulmonary nodule assessment, FP and FN results are inevitable. Some inflammatory diseases, such as fungal infections, have shown FP results on DWI because of the infiltration of inflammatory cells [34]. Meanwhile, Deppen et al. [35] and Croft et al. [36] reported that granulomatous disease could lead to FP scans on PET/CT due to the high glucose metabolism. Besides, some adenocarcinomas, especially well-differentiated type, may represent FN findings on both DWI and PET/CT, because they have lower tumor cellularity and more abundant mucus than other types of carcinoma [34, 37,38,39,40]. Usuda et al. [41] reported that after pulmonary resection for lung cancer, DWI is more useful to discriminate suture granuloma from suture recurrence compare with PET/CT. Thus, a study on the mismatched results between PET/CT and DWI should be essentially conducted.

In the present meta-analysis, we included eligible studies published in Chinese to decrease the probability of publication bias. Moreover, we performed subgroup analyses to examine sources of heterogeneity. In the subgroup analyses, lesions that included masses and nodules [22, 24] had higher sensitivities in both DWI (0.87 vs. 0.74) and PET/CT (0.97 vs. 0.77) in comparison with lesions that included only nodules [8, 18, 19, 25]. This may demonstrate that the smaller the lesion size is, the lower the diagnostic accuracy of DWI and PET/CT can be. Khalaf et al. [30] also found that the diagnostic performance of PET/CT depends on lesion diameters; the bigger they are, the higher sensitivity (91–100%) and the lower specificity (17–47%) it has. Furthermore, compared with using histological and follow-up examinations[8, 18, 19, 21, 22, 24, 25] as the reference standard, using only histological examinations[20, 23, 26] would have higher sensitivities in both techniques [DWI (0.82 vs. 0.90); PET/CT (0.75 vs. 0.92)]. It might be because patients with malignant nodules would more likely to undergo surgeries than those with benign nodules; therefore, more malignancies and fewer benign lesions might be included in studies using histology as the reference standard. This could lead to a higher sensitivity in these studies. Moreover, our results showed that SUV-CR [8, 19, 21, 24, 25] derived from PET/CT had a higher specificity compared with the SUVmax (0.84 vs. 0.78) [18, 20,21,22,23, 26]. In addition, SUV-CR has previously shown a higher sensitivity in comparison with SUVmax [8, 42]. SUV-CR, as a semi-quantitative parameter, is not likely affected by factors, such as lesion size, body size, etc., while SUVmax could be affected [43]; thus, SUV-CR could be more clinically valuable in discriminating benign nodules from malignant lung nodules.

Our study contains some limitations. First, the risk of bias in all the included studies was high, therefore, high-quality studies should be conducted in the future. Second, most of the studies used follow-up examinations as the reference standard, which might lead to misclassification and potential bias in the result [16, 17]. Third, our meta-analysis did not include unpublished studies, which might ignore some negative or insignificant results.

In conclusion, DWI may have similar performance compared with PET/CT for differentiating the malignant from benign pulmonary nodules or masses. DWI has some advantages over PET/CT in terms of lower cost and no exposure to ionizing radiation. Therefore, DWI could be considered as a potential alternative for differentiating pulmonary lesions. However, prospective studies with higher quality and larger sample sizes should be carried out to validate the clinical value of DWI.