Introduction

Prostate cancer (PCa) is the most prevalent malignant tumour among men and the second leading cause of cancer-related deaths following lung or bronchus cancer. The growing elderly population has led to the highest increase in the number of estimated new PCa cases [1].

In men with an elevated level of serum prostate-specific antigen (PSA), the diagnosis of PCa before prostatectomy is confirmed histologically by performing a transrectal ultrasound (TRUS)-guided biopsy. However, the high false-negative rate of TRUS-guided biopsy is thought to be unacceptable [2], and the poor tolerance of patients to the invasive procedures is another challenge [3]. Therefore, a non-invasive method to diagnose prostate cancer with high accuracy is required.

Various magnetic resonance methods have been investigated for the detection of PCa. In addition to conventional anatomic T2-weighted imaging (T2WI), functional MR techniques such as diffusion-weighted imaging (DWI), dynamic contrast-enhanced imaging (DCE-MRI) and magnetic resonance spectroscopy (MRS) have shown promise in the improvement of non-invasive detection of PCa [46]. In particular, DWI is an MR-based technique that probes the function of tissues. It is sensitive to thermally driven molecular water motion, which in vivo is impeded by cellular packing, intracellular elements, membranes and macromolecules. Reduced diffusion of water has been attributed to the increased cellularity of malignant lesion, with reduction of the extracellular space and restriction of the motion of extracellular water [7, 8]. This approach was initially applied to neurologic disorders [9]. Recently, numerous studies have been implemented to characterize abdominal and pelvic lesions [1012]. Among them, one of the most promising applications is the detection of PCa with DWI.

Numerous studies have explored the diagnostic performance of DWI in detecting PCa with widely varied sensitivity and specificity (29–94 % and 39–100 %, respectively) [1333]. Recently, there have been several meta-analysis articles [3438] regarding this topic with slight differences in the pooled results. Therefore, this study aims to evaluate the diagnostic performance of DWI in detecting PCa, through a synthesis of a larger number of published experimental research, and to deduce its clinical utility.

Materials and methods

Literature search and screening

A systematic literature search was performed independently by two investigators in MEDLINE, Web of Science, EMBASE, SpringerLink and ScienceDirect to identify relevant articles published before September 2013 by using keywords of “Diffusion magnetic resonance imaging or diffusion-weighted imaging or DWI or magnetic resonance imaging or apparent diffusion coefficient” and “prostate cancer or prostatic neoplasms or prostate”. The species was defined as “Humans”. We did not limit our search to publications from certain nations, but articles published only in English were identified.

Inclusion criteria were (a) DWI was performed to identify prostate lesion; (b) sufficient data were available to calculate true-positive (TP), false-positive (FP), false-negative (FN) and true-negative (TN) values; (c) all patients had histopathologic results (biopsy or surgery) as reference standard; (d) the study population should be no less than 10. Review articles, abstracts, letters, comments, guidelines and case reports were excluded as well as republished studies. Investigators were not blinded to the information about the authors, the authors’ affiliation or the journal name.

Data extraction and quality assessment

As decided upon beforehand, we extracted the following information: patient baseline (study population, age, level of PSA, Gleason score of cancer lesion, tumour volume, etc.), study design (prospectively or retrospectively), blinding procedure, reference standard, time interval between index test and reference standard, image protocols adopted to perform DWI (magnetic field strength, b values, type of coil and diagnostic threshold) and the diagnostic results (TP, FP, FN and TN). The calculation of TP, FP, FN and TN was on a per-lesion or per-segment basis.

The quality of included studies was assessed according to QUADAS-2 (Quality Assessment of Diagnostic Accuracy Studies) [39]. Data extraction and quality assessment were carried out independently by the same two investigators, and disagreements were resolved by consensus.

Statistical analysis

The Q statistic of the Chi-square value test and the inconsistency index (I 2) were used to estimate the heterogeneity between enrolled studies, and P < 0.1 or I 2 > 50 % indicated the presence of heterogeneity [40]. If notable heterogeneity was observed, the diagnostic performance was summarized by using a random-effects coefficient binary regression model [41]. The summary receiver operating characteristic (SROC) curve was constructed and areas under the SROC curve (AUC) served as the determination of the diagnostic performance for the detection of PCa by DWI [42].

Threshold effect can be recognized visually by noticing a typical pattern of “shoulder-arm” shape in the ROC plane. Meanwhile, the Spearman correlation coefficient between the logit of sensitivity and the logit of (1 − specificity) was computed to confirm the existence of threshold effect. A strong positive correlation with P < 0.05 would suggest threshold effect [43].

Heterogeneity could also be generated from other related factors. Therefore, meta-regression analysis and subgroup analysis were used to determine factors that contributed to the heterogeneity and explore how those factors influence the diagnostic results [44]. In addition, a sensitivity analysis was performed to ensure the reliability of included studies. The heterogeneity test, assessment of threshold effect, diagnostic performance as well as the meta-regression analysis and subgroup analysis were carried out by Meta-DiSc (version 1.4) [45].

Publication bias was assessed through an asymmetry test and the Deeks’ funnel plot using Stata (version 12.0). An inverted symmetrical funnel plot with P > 0.05 was considered to indicate the absence of publication bias [46].

Results

The comprehensive literature search identified 537 articles, of which 21 [1333] were eligible and finally included in this study.

Study characteristics and quality assessment

Of the included 21 studies, ten were conducted prospectively, the remaining 11 retrospectively. MRI reviewer blinding to other test results and clinical data was reported in 14 studies and non-blinding in four studies, with another three unclear. Images were acquired with and without the use of an endorectal coil in nine and 12 studies, respectively. All patients had biopsy or surgery results as reference standard. Eleven studies took biopsy as reference standard, seven studies took radical prostatectomy as reference standard and the other three studies took either biopsy or surgery results as reference standard. The quality of included studies was good. Quality assessment for all included studies is presented in Table 1. Figure 1 shows a graphical display for QUADAS-2 results regarding the proportion of studies with low, high or unclear risk of bias.

Table 1 Quality assessment of the 21 included diagnostic studies
Fig. 1
figure 1

Graphical display for QUADAS-2 results regarding proportion of studies with low, high or unclear risk of bias. The results showed that a high risk of bias existed in patient selection

There were a total of 1,204 patients with 820 positive for cancer enrolled in 21 studies, and their ages ranged from 40 to 87. The PSA level (mean/range, or median if an extreme value was observed) of each study was recorded, ranging widely from 0.48 to 1,000 ng/mL. The Gleason score (median/range), TNM stage and diameter of lesion were also recorded if available. Principal study and patient characteristics are summarized in Table 2. Methodological and imaging protocol characteristics related to the diagnostic test are listed in Table 3.

Table 2 Study and patient characteristics of included studies
Table 3 Methodological and imaging protocol characteristics regarding the diagnostic test

Cancer was evaluated on a per-lesion or per-segment basis. A total of 8,448 prostate lesion (2,864 malignant, 5,584 benign) were analysed within 21 studies. Multiple subsets of data in the same study were counted for the following reasons: (a) different b values were used to perform DWI; (b) prostate lesions were assessed in different regions (peripheral, transition or central zone). Thus, we had 27 subsets of data available for analysis. Diagnostic results of each subset are presented in Table 4.

Table 4 Diagnostic results of DWI on a per-lesion or per-segment basis

Diagnostic performance

The pooled sensitivity and specificity with corresponding 95 % confidence intervals were 0.62 (95 % CI 0.61–0.64) and 0.90 (95 % CI 0.89–0.90), respectively. Sensitivity of individual studies ranged widely from 29 % to 94 %, while specificity of individual studies ranged from 39 % to 100 %. According to the SROC curve, the AUC was 0.8991, indicating a good diagnostic accuracy. Pooled positive likelihood ratio (PLR) and negative likelihood ratio (NLR) with corresponding 95 % confidence intervals were 5.83 (95 % CI 4.61–7.37) and 0.30 (95 % CI 0.23–0.39). Forest plots of sensitivity, specificity, PLR and NLR are shown in Fig. 2. The SROC curve for all 27 subsets of data is shown in Fig. 3.

Fig. 2
figure 2figure 2

Forest plots of SEN (a), SPE (b), PLR (c) and NLR (d) of DWI in detecting PCa. The Q statistics and I 2 indexes of sensitivity and specificity suggested the presence of notable heterogeneity, and the diagnostic performance was summarized by using a random-effects coefficient binary regression model

Fig. 3
figure 3

Summary receiver operating characteristic (SROC) curve for DWI in detecting PCa. The AUC was 0.8991, indicating a good diagnostic accuracy but not excellent

Heterogeneity assessing and meta-regression analysis

The heterogeneity test of sensitivities and specificities showed Q = 777.24 (p < 0.000), I 2 = 96.7 % and Q = 320.85 (p < 0.000), I 2 = 91.9 %, respectively. Thus, a highly significant heterogeneity was detected.

Threshold effect was eliminated through the ROC plane, which showed the absence of a “shoulder-arm” shape. Further analysis showed that the Spearman correlation coefficient between the logit of sensitivity and the logit of (1 − specificity) was 0.219 (p = 0.273), and confirmed that there must be factors other than threshold effect that result in the notable heterogeneity. A single-factor meta-regression analysis showed that patient condition, magnetic field strength and MRI reviewer blinding to other test results and clinical data contributed significantly to the heterogeneity.

Subgroup analysis

Subgroup analysis was performed between different study characteristics. Non-blinding (or unclear) studies and studies about cancer detection in the peripheral zone yielded the highest sensitivity of 79 % (0.79 [95 % CI 0.74–0.83] and 0.79 [95 % CI 0.75–0.83], respectively). Non-blinding (or unclear) studies yielded the highest specificity of 93 % (95 % CI 0.91–0.94). The results of the subgroup analysis are presented in Table 5.

Table 5 Results of subgroup analysis

Sensitivity analysis

Among the 21 included studies, the mean level of serum PSA concentrated was 5.95–26.3 ng/mL, whereas in the other two studies, the mean level of PSA was extremely high (70.6 ng/mL) [14] or unknown [25], and yielded the lowest sensitivity and specificity. Therefore, we conducted a sensitivity analysis for the 19 studies.

There was no notable threshold effect in the evaluated 19 studies. The pooled weighted sensitivity, specificity, positive LR and negative LR with corresponding 95 % confidence intervals were determined to be SEN, 0.63 (95 % CI 0.62–0.65); SPE, 0.90 (95 % CI 0.89–0.91); PLR, 6.52 (95 % CI 5.23–8.12); NLR, 0.28 (95 % CI 0.21–0.37). The AUC was 0.9120.

Publication bias

The funnel plot shows that studies were distributed symmetrically on a scatter plot of diagnostic odds ratio (DOR) against 1/(effective sample size, ESS)1/2. The result of the Deeks’ funnel plot asymmetry test (P = 0.67) showed no evidence of the existence of notable publication bias.

Discussion

PCa is more likely to be diagnosed in patients with advanced age, especially over the age of 60 [1]. Accurate cancer detection and evaluation is essential to focal treatment planning [47]. Diagnosis of PCa with quantitative DWI involves the apparent diffusion coefficient (ADC), which is lower in PCa than normal prostate tissue [48]. Previous studies have demonstrated that DWI was a feasible method to detect PCa. Meanwhile, DWI was also considered to play an important role in monitoring therapy response, evaluating cancer aggressiveness and metastasis, guiding targeted biopsy and patient follow-up [49]. Nevertheless, all applications referred to above were based on an accurate diagnosis of PCa.

In this study, we explored the ability of DWI in detecting PCa. Results showed that for prostate cancer detection, DWI had high specificity (90 %) and relatively low sensitivity (62 %). Both sensitivity and specificity showed large variability. Next, we focused on the SROC curve, which gave an AUC of 0.8991 indicating a good, but not excellent, diagnostic performance. This result was in accordance with previous studies [34, 35, 37, 38]. Moreover, owing to the larger number of original studies and extensive statistical analysis, results of this study make up for some limitations that previous studies acknowledged and gave objective and practical suggestions for.

There was significant heterogeneity between the included studies. To explore the source of heterogeneity, we first eliminated threshold effect through the ROC plane. Meta-regression analysis showed that study population, patient age, study design, reference standard, diagnostic threshold, time interval and type of coil did not contribute to the heterogeneity statistically. Patient condition, magnetic field strength and MRI reviewer blinding to other test results and clinical information were thought to be the most important variable sources of heterogeneity. The results of sensitivity analysis for 19 studies were similar to the original results, indicating that the results of this study were reliable.

Detectability of PCa depends on tumour characteristics including tumour Gleason score, histological volume, architecture and location [50]. There was greater sensitivity for tumours of higher grade or larger size [51]. Numerous studies have suggested strong correlation between Gleason score and tumour volume and between PSA level and tumour volume [52, 53]. The level of serum PSA is related to patient condition, such as tumour volume and progression, and is easily affected by multiple factors [54]. In this meta-analysis, tumour volume was not described in much detail, and the level of PSA varied widely from 0.48 to 1,000 ng/mL. We performed a subgroup analysis between studies with the mean PSA < 20 ng/mL and ≥20 ng/mL. Table 5 shows that patients with high PSA level had higher sensitivity and relatively low specificity when diagnosed with DWI. Meanwhile, the Q statistics and I 2 decreased significantly within the two subgroups, especially the high PSA group. No significant difference was found in the mean Gleason score between the two groups. Further investigation of tumour characteristics was limited because data on a per-patient basis were required. Therefore, we suggest that large-scale, quality-controlled studies specifically addressing those factors should be conducted in the future.

In the subgroup analysis, we compared the effect of two magnetic field strengths, 3.0T and 1.5T. High field strength (3.0T) demonstrated high sensitivity and specificity for the detection of PCa with DWI (Table 5). Prostate imaging at 3.0T benefits from higher signal-to-noise ratio (SNR), and enables either an increased spatial resolution or an increase in SNR of the ADC maps [55]. For this reason, improvements in the localization and detection of PCa were expected [26, 56, 57]. However, some studies [58, 59] reported that DWI performed at 3.0T generally had similar ADC values, but worse image quality compared with 1.5T, suggesting that there was no significant advantage for the diagnosis of PCa by 3-T MRI over 1.5-T MRI. Therefore, to take full advantage of the benefits of high field strength, improved acquisition techniques are required.

There are as yet no standardized DW-MRI techniques, and a large variety of imaging parameters exist for DWI in the number and size of b values, diagnostic threshold and coils. Performing DWI requires at least two b factors which allows for the calculation of ADC. High b value permits high diffusion weighting, and tumour tissue often has higher signal intensity or lower ADC values on ADC maps compared with native tissue [60]. The typical b value for prostate imaging varies in the range 0–1,500 s/mm2. Some studies [61, 62] suggested that the use of b = 2,000 s/mm2 is diagnostically superior to that of b = 1,000 s/mm2. However, other studies [15, 17] reported that for predicting PCa, the optimal b value for 3.0-T DWI was 1,000 s/mm2. A recent study also suggested the use of the true diffusion coefficient, which can be obtained using a minimum of three b values and is less influenced than the ADC by b value selection [63]. In this meta-analysis, there was profound discrepancy in the choice of b values between individual studies, ranging from 0 to 2,000 s/mm2. We failed to analyse the potential influence of different b values because three or more b values (median, 3 values/study; range, 2–6) were used to acquire different diffusion weighting in the same study. Moreover, the considerable overlap of ADC between cancer and noncancerous tissue made it difficult to determine a diagnostic threshold [64, 65]. Besides, the level of suspicion (LOS) was estimated in six studies [15, 25, 27, 29, 31, 33] for qualitative interpretation of DWI results, which made a uniformed image interpretation even harder. In brief, all those challenges prompted further optimization of image acquisition and interpretation.

An endorectal coil provides a superior SNR compared with a pelvic phased array coil but causes the displacement of the prostate gland, reduced patient compliance and increased susceptibility artefacts [66, 67]. The subgroup analysis results showed that for the detection of PCa, sensitivity of DWI with an endorectal coil used was significantly higher (0.77 [95 % CI 0.73–0.80]) than without an endorectal coil (0.60 [95 % CI 0.58–0.61]). Therefore, although the overall diagnostic accuracy was not improved, the use of an endorectal coil was recommended for increased sensitivity.

The subgroup analysis also found that studies which took radical prostatectomy as reference standard had a slight improvement in specificity, while sensitivity dropped dramatically from 73 % to 59 % compared with studies that took prostate biopsy as reference standard. We speculated that this might be caused by the high false-negative rate of prostate biopsy [2]. Over the last few years, lots of effort has been made on the optimization of initial prostate biopsy in clinical practice, and inherent within those optimizations is variation of the core number, location, labelling and processing for pathological evaluation [6872]. To date, there is no consensus in this regard. New imaging methods that allow targeted biopsy (such as MRI-guided biopsy) were reported to be possible and improve the assessment of true tumour aggressiveness [73]. Hopefully, with the development of new imaging methods, we expect the role of prostate biopsy in the diagnosis of PCa to be near to perfect.

Furthermore, given the fact that about 70–75 % cancer arise in the peripheral zone (PZ), we guessed that a separate imaging protocol specific to PZ tumours might lead to more accurate diagnosis, because tumours arising in the PZ tend to be more aggressive [74, 75]. Thus, we analysed the diagnostic performance of DWI in detecting peripheral zone PCa alone within eight studies [13, 16, 20, 21, 2426, 28]. The pooled sensitivity and specificity were 0.79 (95 % CI 0.75–0.83) and 0.85 (95 % CI 0.82–0.86), respectively. Sensitivity was significantly high in detecting peripheral zone PCa compared with all regions evaluated together (sensitivity 62 %). However, the overall diagnostic accuracy was not improved as expected compared with the original results (AUC 0.8991). It was worth noting that there was still significant heterogeneity between these eight studies. Therefore, this conclusion remains to be confirmed by further investigation and should be considered with caution.

There are still many challenges in the diagnosis of PCa. The current pathway for men suspected of having PCa results in overdiagnosis and overtreatment, as well as systematically missed significant tumours in the anterior and apical parts of prostate gland [76]. Additionally, tumours located in the transition zone are more challenging to detect [77]. Although many MR imaging methods (T2WI, DWI, DCE-MRI and MRS) have been explored in the detection of PCa, they all have substantial limitations [78]. Therefore, the combined use of DWI with T2WI, DCE-MRI or MRS was recommended [79].

We should acknowledge some limitations of this meta-analysis. First, although a comprehensive literature search was performed in several authoritative databases, neglecting a grey literature search and non-English-language articles might have introduced potential publication bias. Second, the image interpretation of DWI was performed for the most part qualitatively, and in many studies blinding was either unclear or absent. In the subgroup analysis, studies designed without (or unclear) MRI reviewer blinding to other information yielded higher results for both sensitivity and specificity compared with studies which were designed blinded. Therefore, an objective interpretation of image results was queried. Third, although QUADAS was adopted to ensure high quality of included articles, there were still many retrospective studies, and many participants in the included studies were diagnosed or suspected of prostate cancer on the basis of ultrasound, CT or other clinical information, and therefore might have caused patient selection bias (Fig. 1) and a greater sensitivity, which was confirmed by the subgroup analysis results.

In conclusion, our meta-analysis showed that DWI was an informative MRI modality and had moderately high diagnostic accuracy for the detection of PCa. Further application of DWI in detecting PCa requires the optimization of image acquisition techniques and interpretation.