Keywords

1 Introduction

Prostate cancer (PCa) is the second most common cancer in men, with almost 1.4 million new cases diagnosed per year worldwide [22]. With the acceleration of the industrialization process and the impact of environmental pollution, the incidence of PCa—caused by enriched foods, smoking, and excessive alcohol use—continues to increase at a rate of 6.63% per year. Early detection of PCa can improve patients’ prognoses considerably. Recent advances in MRI technology that allow both anatomical and functional imaging to be performed simultaneously, mpMRI, have improved our ability to detect and characterise prostate tumors [16]. According to patient management guidelines, noninvasive diagnostic tools such as mpMRI play an important role in the referral of patients to active surveillance, watchful waiting, and active treatment [15, 24]. The Prostate Imaging Reporting and Data System (PI-RADS) was introduced by the European Society of Urogenital Radiology (ESUR) in 2012 to standardize prostate mpMRI examination protocols and the reporting of suspicious lesions (providing standardized terminology and sector map-based locations). The PI-RADS system categorizes prostate lesions based on the likelihood of cancer according to a five-point scale. PI-RADS was developed by a consensus-based process that uses a combination of published data, and expert observations and opinions [20]. The clinical utility of PI-RADS scoring is growing, and several studies have confirmed that the system improves the diagnostic accuracy of mpMRI [12].

The definitive diagnosis of PCa depends on the recognition of cancer cells in a tissue biopsy. Based on histological tumor architecture, a Gleason classification system was proposed. The Gleason score of biopsy-detected PCa comprises the Gleason grade of the most extensive (primary) pattern, plus that of the second most common (secondary) pattern, and ranges from one to five [11]. The “clinically significant” (csPCa) descriptor is used widely to differentiate PCa types that cause morbidity or death from those that do not. Defining what is clinically significant and what is insignificant PCa (iPCa) is challenging. According to the literature, iPCas do not typically cause harm and are at high risk of being overtreated, with the treatment itself risking harmful side effects to patients [4].

In recent years, a large number of review articles that concern the application of artificial intelligence (AI) in prostate cancer diagnosis have been published [7,8,9, 17, 21]. They discuss various aspects of AI application in PCa, which concern not only to mpMRI image analysis, but also ultrasound image analysis, histopathology image analysis, MRI-ultrasound fusion, MRI-histopathology registration, and clinical outcome predictions. In 2022 alone various different review articles were published. Li et al. [13] presented an extensive review over a long period that studied the applications of machine learning (ML) and deep learning (DL) in prostate MRI segmentation, registration, lesion detection and scoring, and treatment decision support. Sushentsev et al. [19] analyzed two classes of AI method: DL and traditional machine learning (TML), demonstrating their comparable performance in the differentiation of csPCa/iPCa, as well as discovering common methodological limitations. According to the authors, consensus on datasets, segmentation, ground truth assessment, and model evaluation remains to be established. The narrative review of Belue and Turkbey [5] introduced emerging medical imaging AI paper quality metrics, such as the Checklist for Artificial Intelligence in Medical Imaging (CLAIM) and Field-Weighted Citation Impact (FWCI), and applied those analyses to the top AI models for segmentation, detection, and classification of PCa—including potential areas of impact in radiologists’ workflow. Although those methods are commonly reported in the literature with promising results, the authors concluded that prospective multicenter studies are necessary to determine the impact of AI on improving radiologists’ performance. Sunoqrot et al. [18] provided an interesting review that focuses on open datasets, commercially/publicly available AI, and grand challenges. The authors concluded that well curated public datasets are available, but are relatively small and vary in quality. Computational AI challenges are needed to deliver independent validation and to build trust in AI for prostate MRI.

This article presents a subjective critical review of AI in prostate MRI analysis. It considers the most recent advances, challenges, and opportunities presented in the context of the ongoing project, AI-augmented radiology - detection, reporting and clinical decision making in prostate cancer diagnosis (INFOSTRATEG) being conducted at the Laboratory of Applied Artificial Intelligence of the National Information Processing Institute in Poland.

2 The Potential of Artificial Intelligence in MpMRI Analysis

Current clinical practice and guidelines utilize mpMRI prior to biopsies to identify potentially suspicious lesions. Radiological interpretation, together with relevant clinical information, supports clinicians in proper patient management, which remains crucial in light of the high prevalence of PCa and its low mortality rate. Many patients with no cancer or with indolent cancer can benefit from long-term active surveillance and lower numbers of unnecessary biopsies; this, in turn, minimizes the occurrence of unnecessary side-effects, such as pain, bleeding, and infection. Despite continuous improvement in MRI technique, interpretation of prostate MRI remains challenging and is generally recognized to present a steep learning curve [13]. Low specificity and high interobserver variability remain problematic disadvantages of MRI—particularly for nondedicated or less experienced radiologists, who have received only short term training in prostate MRI [23]. At present the processing and interpretation of prostate mpMRI data in clinical routine is performed chiefly by human experts; it remains highly subjective and strongly dependent on experience and training. Improving the PCa diagnostic pathway (and potentially reducing overdiagnosis of iPCa and underdiagnosis of csPCa) is a key challenge. AI techniques may support the radiological workflow of PCa diagnosis and reduce interobserver variability among radiologists. This enables more consistent diagnoses by clinicians, which can result in improved patient outcomes. AI models utilize the quantitative nature of imaging data to construct a more robust feature space based on mpMRI representation. AI models can help in cancer diagnosis by facilitating ancillary tasks in cancer detection that are labor- and experience intensive, such as prostate gland segmentation, PCa detection on mpMRI images, and characterization of a cancer’s advancement and aggressiveness [6].

2.1 Prostate Segmentation

Prostate gland segmentation aims to outline the whole gland boundary, as well as its zonal division. This is critical for the calculation of the entire prostate volume and of the serum prostate-specific antigen density, which are important PCa biomarkers. Manual segmentation of the prostate and its zones is a time-consuming and tedious task. It is also highly subjective and dependent on the experience of the radiologist. Prostate gland segmentation is used commonly in clinical practise estimation of prostate volume based on the ellipsoid formula, as it is easy to apply, highly time-efficient, and is characterized by high interobserver agreement; however, it offers only an approximation of a reality, which, in many cases, is much more complex. AI/DL methods have high potential to reduce the time and variability associated with prostate gland segmentation on MRI.

In [3], the authors propose a segmentation pipeline that comprises three convolutional neural networks. The first localizes the prostate by creating a bounding box. The second completes prostate gland segmentation by classifying each pixel as belonging either to the prostate or to the background. The third differentiates between the transition zone and the peripheral zone by classifying every voxel in the image as one of these two classes. Each of the convolutional neural networks was implemented using a customized hybrid three-dimensional/two-dimensional (3D/2D) U-Net architecture. The model achieved mean Dice scores for segmentation of 0.940 for the whole prostate, 0.914 for the transition zone, and 0.776 for the peripheral zone. Recently a comparison of three standard DL architectures for prostate segmentation was proposed in [10]. UNet, an efficient neural network (ENet), and an efficient residual factorized ConvNet (ERFNet) were trained and tuned on the PROSTATEx public dataset to segment the whole gland and the transition zone separately (the peripheral zone masks were obtained by subtraction). The top result was achieved by ENet: 91% for the whole gland, 87% for the transition zone, and 71% for the peripheral zone.

2.2 Prostate Lesion Detection and Characterization

Identifying and characterizing csPCa is a crucial component of proper treatment planning. The probability of csPCa can be assessed radiologically based on PI-RADS (although even when using the current version, v2.1, considerable inter-/intrareader disagreement is observed frequently. AI/DL methods have the potential to become common tools for differentiation between csPCa and icsPCa, and for assessment of the locations and extents of aggressive cancers.

Typically, AI models are subdivided into two groups of methods, with regard to the nature of the input data and the expected result of the analysis. The first group focuses on lesion detection, uses whole MRI images for analysis and provides pixel-level output, as well as extracting regions with probable csPCa and icsPCa. The pixel-level analysis provides pixel-level probability maps of cancer distribution. This produces patient-level predictions of suspicious areas automatically. Although AI-based detection models are typically of the two-class variety (csPCa versus non-csPCa), multiclass lesion detection models—in which detection results relate to different grading systems like histopathological International Society of Urological Pathology (ISUP) score to express the aggressiveness of csPCa, or radiological PI-RADS score—are also viable. The second group of models, dedicated purely to lesion classification, assumes radiologist-outlined lesions (regions of interest) as inputs, which are then sorted into different categories. Some two-class or multiclass models aim to automate stratification at lesion level.

Studies reported in the literature demonstrate detection models that range between 75% to 85% accuracy; however, the methodology and study population are highly diverse. Nevertheless, it can be observed that the results reported fall within the range of reported radiologist performance [8]. A key challenge for fully automated detection algorithms is the number of false positive lesions. Mehralivand et al. [14] presented a new, fully automated, DL-based PCa detection system for prostate MRI using a large scale, diverse, expert annotated training dataset. Although the approach achieved reasonable performance metrics, an average of 0.8–1.9 false positive lesions per patient were reported. Multiclass lesion classifications are more varied due to their diverse cohort sizes and methodology. When classifying according to ISUP grade, mean AUC per classification category can range significantly—even for the same category. In a review study conducted by Twilt et al. [21], ISUP 3 category mean AUC ranged between 0.379 and 0.96.

3 Urgent Challenges

3.1 Datasets

The AI community continues to wait for extensive and well annotated datasets. In PCa research most datasets are small (often derving from a single institution) that are homogeneous in acquisition protocols and scanner manufacturers. The development of robust and unbiased AI models requires large, heterogeneous, and reliably annotated datasets, which reflects the variability of cancer’s appearance and the diversity of equipment manufacturers. Labels are typically created by a single expert; however, intraobserver variability exists even between experienced radiologists in the assessment of cancer extent on different mpMRI modalities and in the selection of individual lesions features. Such differences continue to be observed, despite the introduction of the PI-RADS standard. Dataset labeling should be performed during multireader studies, including interdisciplinary discussions and panels, to reduce bias in labeling.

3.2 Defining Ground Truth

“Ground truth” refers to the labels that are assigned to expert annotations. Radiological delineations without histopathology confirmations are severely limited due to the high risk of missing cancer foci that were not identified by a radiologist. In PCa the most reliable validation is based on retrospective identification of cancerous regions on whole-mount radical prostatectomy specimens, which can be projected onto an mpMRI scan. Although histopathology information is evident, this method requires advanced MRI-histopathology registration. Moreover, manual pathologists’ annotations are even more time consuming than radiological delineations, which limits the possibility of creating large datasets. Prostatectomy is typically performed on cancer at advanced stages, which limits the possibility of including cases of low risk or indolent cancer. The alternative approach assumes pathological information from biopsies. The use of systematic biopsy is limited due to the random sampling procedure and the range being limited mainly to the peripheral zone, which might entail the missing or undersampling of some PCa foci, and lead to underdiagnosis. The best solution assumes pathological confirmation from target biopsies, implemented under a fusion of MRI and TRUS-ultrasonography. PCa candidates are typically selected by a prebiopsy MRI in which a radiologist highlights PI-RADS 3 or above lesions. Much attention should be paid to accurate mpMRI analysis, which, in the case of database creation might relate to multireader verification of visible lesions and verification with biopsy history.

3.3 Different Evaluation Criteria

Comparison of different AI models is quite problematic—not only due to the diversity of the databases used and the various cohort sizes, but also in context of the lack of standardized evaluation criteria. In the case of PCa, which progresses gradually, using clinical endpoints in the form of patient outcomes like death or recurrence is mostly unviable. This underlines the importance of defining suitable benchmarks and verification criteria for model evaluation. Organization of grand challenges and open databases might improve the validation and benchmarking of models.

3.4 Limited Multireader Studies and Prospective Evaluation

As well as comparisons between AI models, attention should be directed toward multireader studies that compare the efficiency of AI models with that of radiologists or clinicians. Such comparisons may better indicate the potential of AI models—particularly in the support or training of less experienced specialists. Finally, a prospective evaluation in a controlled medical environment should be performed to open the door for clinical deployment, preceded by clinical trials.

4 Future Directions

We observe the development of AI models in PCa diagnosis and their increasing effectiveness every day. We are moving, incrementally, toward personalized medicine, and exploring the potential of radiomics and domain knowledge, while attempting to overcome the existing limitations and challenges.

A research group at the Laboratory of Applied Artificial Intelligence of the National Information Processing Institute is conducting the AI4AR PCa[1] project, which is dedicated to the analysis of mpMRI images for PCa diagnosis. During the initial phase reliable, well annotated databases of mpMRI examinations are being compiled. The researchers aim to collect between 400 and 600 cases with full clinical descriptions. All cases are annotated by three experienced radiologists who have over five years of experience in describing mpMRI examinations and who possess practical knowledge of the PI-RADS standard. Radiologists analyze mpMRI independently, without access to historical information. All cases are later analyzed by an interdisciplinary team of researchers, radiologists, and clinicians to reduce bias and to confirm lesions’ locations and extent. Ground truth is based on MRI-ultrasound fusion guided targeted prostate biopsy. Only cases confirmed histopathologically, with verified lesion locations that correlate with historical biopsy data, are treated as reliable and are included in the database.

Simultaneously a novel structured reporting system is under development as a flexible environment for the structured and standardized reporting of radiological image data. The system possesses a modular architecture and is integrated with the XNAT[2] imaging informatics platform, which facilitates common management, storage and quality assurance tasks for imaging and associated data. A dedicated module for prostate mpMRI structured reporting was proposed, which is standardized according to the PI-RADS radiological lexicon. The structured report scheme is also used in the data labeling process; all cases in the database possess not only visual labels, but also structured ones, which contributes to the uniqueness of the proposed dataset. In the second phase, the database will be extended by inclusion of cases from other medical centers to increase its heterogeneity regarding different equipment manufacturers and acquisition protocols.

The final implementation of the system is planned in the form of a radiological educational platform that is dedicated to learning the structural reporting of prostate mpMRI examinations, and considers various forms of support by AI algorithms. Verification of the platform in the third phase of the project assumes the conductance of multireader studies to assess the effectiveness and impact of AI models on the quality and accuracy of the reporting process. A study of the behavior of platform users will allow us to assess the potential of AI for less experienced radiologists, and will standardize or improve human reader performance.

5 Conclusions

AI in prostate MRI analysis shows great promise and impressive performance that is comparable to that of human experts. Overcoming the technology’s limitations and demonstrating its clinical effectiveness will unlock opportunities for clinical deployment in the form of educational systems, reporting and patient management support systems, and “second reader” or patient triage systems.