Biparametric prostate MRI: impact of a deep learning-based software and of quantitative ADC values on the inter-reader agreement of experienced and inexperienced readers

Objective To investigate the impact of an artificial intelligence (AI) software and quantitative ADC (qADC) on the inter-reader agreement, diagnostic performance, and reporting times of prostate biparametric MRI (bpMRI) for experienced and inexperienced readers. Materials and methods A total of 170 multiparametric MRI (mpMRI) of patients with suspicion of prostate cancer (PCa) were retrospectively reviewed by one experienced and one inexperienced reader three times, following a wash-out period. First, only the bpMRI sequences, including T2-weighted imaging (T2WI), diffusion-weighted imaging (DWI) sequences, and apparent diffusion coefficient (ADC) maps, were used. Then, bpMRI and quantitative ADC values were used. Lastly, bpMRI and the AI software were used. Inter-reader agreement between the two readers and between each reader and the mpMRI original reports was calculated. Detection rates and reporting times were calculated for each group. Results Inter-reader agreement with respect to mpMRI was moderate for bpMRI, Quantib, and qADC for both the inexperienced (weighted k of 0.42, 0.45, and 0.41, respectively) and the experienced radiologists (weighted k of 0.44, 0.46, and 0.42, respectively). Detection rate of PCa was similar between the inexperienced (0.24, 0.26, and 0.23) and the experienced reader (0.26, 0.27 and 0.27), for bpMRI, Quantib, and qADC, respectively. Reporting times were lower for Quantib (8.23, 7.11, and 9.87 min for the inexperienced reader and 5.62, 5.07, and 6.21 min for the experienced reader, for bpMRI, Quantib, and qADC, respectively). Conclusions AI and qADC did not have a significant impact on the diagnostic performance of both readers. The use of Quantib was associated with lower reporting times.


Introduction
Prostate multiparametric MRI (mpMRI) is the most accurate imaging study for prostate cancer (PCa) diagnosis, and it is increasingly used worldwide for early detection, staging, follow-up, and active surveillance [1][2][3]. The PI-RADS recommendations, describing a standardized protocol and reporting system [4], have contributed to the very high diagnostic performance in PCa detection reported by several level 1 evidence trials and a Cochrane systematic review [5][6][7][8][9]. An alternative to mpMRI is an abbreviated protocol known as biparametric prostate MRI (bpMRI), which combines T2-weighted imaging (T2WI) and diffusion-weighted imaging (DWI)/apparent diffusion coefficient (ADC) without using contrast media [8,[10][11][12]. bpMRI has the advantages of avoiding the costs and potential side effects related to the use of contrast media, and of shortening the acquisition times. It has been shown that bpMRI has a high accuracy when interpreted by experienced radiologists, whereas the available evidence suggests lower performance for less experienced readers [13][14][15][16]. Furthermore, the use of a shorter bpMRI protocol only partly addresses the problems related to the heavy clinical workload for the genitourinary radiologist. With the growing demand for prostate MRI, it is imperative to maximize the diagnostic potential of bpMRI and at the same time to optimize reporting times for each patient [17][18][19].
Several possible approaches may be investigated to improve the diagnostic potential of bpMRI for less experienced radiologists. An example could involve the use of quantitative data extracted from the ADC map as a decision support system for the interpretation of equivocal lesions. The PI-RADS guidelines state that diffusion-weighted imaging is the dominant sequence for scoring lesions in the peripheral zone of the prostate, which is the most common site where PCa lesions arise. The calculation of quantitative ADC (qADC) values for a particular lesion could improve the diagnostic confidence of the radiologist in scoring suspicious lesions [20][21][22][23][24][25]. This is especially the case in bpMRI where the absence of dynamic contrast enhancement (DCE) could make equivocal cases much more difficult to interpret [26].
Another potential approach to enhance the diagnostic performance of bpMRI could be the use of artificial intelligence (AI) as a computer-aided diagnosis (CAD) system. A CAD system can serve as a complete automated diagnosis tool, or as a support tool for the radiologist with the goal of improving diagnostic accuracy and/or productivity [27,28]. In recent years, numerous machine learning (ML) and deep learning (DL) algorithms have been applied to medical imaging and prostate MRI, mostly in preliminary research settings [29][30][31]. AI algorithms can be adapted to a variety of different tasks in prostate imaging, including quality control, segmentation, detection, and characterization [27].
The aim of this study was to evaluate the use of qADC measurements and an AI-based CE-approved software (Quantib Prostate) in the interpretation of prostate bpMRI, with focus on inter-reader agreement, performance in detecting PCa, and reporting time.

Patient population and MRI protocol
The study retrospectively included mpMRI studies performed at our institution for suspicion of PCa during the period of May 2021 through November 2021, with waiver of informed consent approved by the institutional review board. Inclusion criteria were the availability of all three mpMRI sequences (T2WI, DWI/ADC, DCE) and of the official PI-RADS v2.1-compliant mpMRI report provided by a senior genitourinary radiologist (VP) with 15 years of experience at a high-volume referral center in prostate diagnostics (> 1000 prostate mpMRI studies read per year). Exclusion criteria were represented by inadequate image quality of one of the bpMRI sequences (T2WI, DWI or ADC), assessed according to PI-QUAL parameters [32].
MRI examinations were performed on a 3.0 Tesla MRI (GE Discovery 750, GE Healthcare, Milwaukee, USA), using a 32-channel surface phased-array body coil (TOR-SOPA), with a PI-RADS v2.1-compliant protocol, consisting of a high-resolution T2WI in axial and coronal planes. During DWI, b values were set at 50, 800, and 1500; the ADC map was computed with b values of 50 and 800. Perfusion imaging (DCE) was performed following intravenously by gadobutrol (0.1 mmol/kg). Table 1 contains a detailed list of MRI protocol parameters. The patients were instructed to perform a rectal enema 2-4 h before the test.

Image interpretation and analysis
MRI studies were independently interpreted by one inexperienced and one experienced radiologist (AF, SC). The inexperienced radiologist had 3 months of experience in prostate imaging, had received fellowship-level training comprising theoretical lectures and practical training including supervised reading of approximately 100 cases, and had Lesion number, location, and PI-RADS score were recorded. Each radiologist interpreted the MRIs three separate times, using a wash-out period of at least three weeks between each reading for memory extinction. The first reading was based on interpretation of the bpMRI sequences (T2WI, DWI, ADC). The second reading was performed using the bpMRI sequences, as well as the normalized ADC value for the suspicious foci to provide additional insight into the suspicion level of each focus. For the calculation of qADC, a circular region of interest (ROI) was placed independently by each radiologist interpreting the study on each suspicious focus in the ADC map at the area corresponding to the highest signal intensity at high b values in DWI. An equally sized ROI was placed on the normal-appearing peripheral zone or transition zone (according to the location of the suspicious focus) on the same slice as the lesion to normalize the qADC value ( Fig. 1). The average pixel value of the lesion ROI was divided by the average pixel value of the normal prostate ROI to yield an ADC ratio. As an additional parameter for scoring suspicious lesions, the resulting ratio was considered as significant for upgrading PI-RADS 3 lesions to PI-RADS 4, using a threshold of 0.59 [26]. The third reading was performed using the Quantib Prostate v1.2.0 software, using the bpMRI sequences, as detailed below. Reporting times were recorded for both radiologists.

Quantib prostate work-up
Quantib® Prostate (Quantib BV, Rotterdam, The Netherlands) is an FDA and CE-approved MRI viewing and reporting platform based on deep learning (DL). During the Quantib Prostate work-up, the first step was the semiautomated generation of a segmentation contour of the prostate gland based on the T2WI axial sequence (Fig. 2), which was reviewed by the radiologists and manually edited, if necessary. The second step was image interpretation on the DICOM viewing interface of the software that showed both the bpMRI sequences and an automatically generated colorimetric map based on Convolutional Neural Networks (CNNs), overlaid on T2WI images, that shows in different colors the voxels that are more likely to have clinically significant PCa (csPCA, ISUP Grade ≥ 2) (Fig. 3). In this step, the radiologists were able to identify and score suspicious lesions by directly clicking on them on the MRI images. In the last step, the final report was automatically generated, manually edited if required, and exported by the software.

Statistical analysis
The inter-reader agreement on lesion score was determined between the two readers for all three interpretation methods as well as between each reader and the mpMRI findings using the weighted Cohen's kappa. In addition, a Cohen's kappa score was also calculated at both patient and lesion levels for presence or absence of suspicious foci (PI-RADS ≤ 2 vs. PI-RADS ≥ 3 PCa. There were no statistically significant differences among the study groups in the proportion of overall PCa and clinically significant PCa (p = 0.91, Table 2).
The mean qADC value of identified lesions was 0.56 (± 0.16 SD) for the inexperienced reader and 0.58 (± 0.18 SD) for the experienced reader.
The clinical, radiologic, and pathologic characteristics of the three different patient groups are summarized in Table 2.

Inter-reader agreement between experienced and inexperienced readers
The inter-reader agreement for lesion scoring between the experienced and inexperienced readers was fair (k = 0.38, p < 0.00001) for bpMRI, moderate (k = 0.41, p < 0.00001) for Quantib and moderate (k = 0.41, p < 0.00001) for qADC.

Discussion
The results of this study showed that the use of the software Quantib Prostate allowed the radiologist to achieve slightly higher inter-reader agreement with mpMRI, compared to just interpreting bpMRI sequences, both in the case  of experienced and inexperienced readers. The agreement between the experienced and inexperienced readers was comparable, varying slightly between fair and moderate for bpMRI, Quantib, and qADC, with a minor trend toward higher agreement with the use of Quantib. Neither Quantib Prostate nor quantitative ADC measurements could increase detection rate of PCa with reference to mpMRI, for either experienced or inexperienced readers when interpreting bpMRI studies. The agreement of bpMRI with mpMRI was moderate for both bpMRI, Quantib, and qADC, a data that are in line with the available literature. The fact that detection rate is comparable to that of mpMRI suggests that cases of disagreement did not impact detection of PCa foci. The use of Quantib Prostate was associated with a shorter reporting time, which is potentially valuable in the clinical workflow, due to the increasing demand for prostate MRI examinations, particularly for the inexperienced user due to the longer times needed to report MRI studies.
Although this study showed comparable diagnostic accuracy, several artificial intelligence and deep learning algorithms have been developed to increase or automate the interpretation of prostate MRIs [33][34][35]. Due to lack of extensive real-world validation, however, there are no definitive data mandating their use in clinical practice to date [36]. Despite interesting results from earliest reports of AI implementations focusing on automated detection and characterization of PCa, currently there is increasing interest for using AI technology to improve quality control and workflow efficiency in radiology [37].
The decrease in reporting times with the use of Quantib Prostate can be attributed to several factors. First, the automated segmentation allows for automated accurate calculation of the PSA density, a task that would otherwise require the radiologist to take three orthogonal measurements of the prostate gland, calculate the prostate volume, and therefore the PSA density. Second, the colorimetric map could allow for easier/quicker identification of suspicious foci, probably more relevant for the inexperienced reader. Third, once the lesions are identified, the user can use a dedicated tool and click on the lesion to start an automated segmentation of the lesion. After assigning a location and a PI-RADS score to the lesion(s), a structured report is automatically generated and exported. Furthermore, the colorimetric map calculated by the software could reveal suspicious foci that were not initially evident to the radiology on the bpMRI sequences, as it was noticed to happen occasionally in this study.
The AI software, however, was found to be sensitive to the quality of the overall image and the presence of artifacts in our dataset. Optimal image quality and typical prostate shape are needed for successful prostate segmentation algorithms. Welldefined margins are also necessary for accurate results. A user can manually correct a segmented contour if the segmentation is incorrect, but this will add to the reporting time. Further, the image analysis algorithm also requires adequate image quality to produce a colorimetric map that is accurate and contrasted sufficiently to show suspicious foci.
There are several limitations to this study, including the retrospective single-center nature of this study. Secondly, the PI-RADS scoring system was used to classify lesions using bpMRI, even though the score was originally designed to make full use of the entire mpMRI protocol. Therefore, a full agreement between bpMRI and mpMRI may not be feasible in all cases. Furthermore, the diagnostic performance was assessed in terms of detection rate for those lesions that underwent targeted biopsy. Consequently, it is not possible to estimate the true diagnostic accuracy of all readers, since we may have missed lesions at mpMRI that never underwent targeted biopsy. A true assessment of the diagnostic accuracy would require all patients to have a histology report and/or a relatively long period of clinical and radiological follow-up. In addition, it is highly probable that the performance of the inexperienced reader likely improved during the data collection stage; therefore, the calculated inter-reader agreement and detection rate represent an average representative value along the radiologist's learning curve.
Lastly, all prostate MRI scans used in the study were obtained from a high-volume referral center with extensive experience in prostate MRI and an optimized acquisition protocol. Considering that the AI-based algorithms are quite sensitive to artifacts and degradation of image quality, the findings of this study might not be generalized to different clinical settings.
Future studies should determine whether Quantib Prostate can facilitate a faster learning curve for radiologists with limited experience in genitourinary radiology. In addition, the impact of the use of Quantib Prostate could be investigated for mpMRI in upcoming studies.

Conclusions
In conclusion, in both experienced and inexperienced readers, the deep learning-based software Quantib Prostate was associated with slightly higher inter-reader agreement with mpMRI. Both the use of Quantib and quantitative ADC achieved similar diagnostic performance in terms of detection rate compared to using only bpMRI sequences. When using Quantib Prostate, both experienced and inexperienced readers could report bpMRI scans in a shorter amount of time.
Funding Open access funding provided by Università degli Studi di Roma La Sapienza within the CRUI-CARE Agreement. The authors declare that no funds, grants, or other support was received during the preparation of this manuscript.

Conflict of interest
The authors have no relevant financial or nonfinancial interests to disclose. The authors declare that they have no conflict of interest.
Informed consent This article does not contain any studies with human participants or animals performed by any of the authors.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.