Magnetic resonance imaging of the prostate: where do we stand?

Prostate cancer (PCa) is a heterogeneous disease and tumor grading is a major predictor of prognosis [1]. Multiparametric magnetic resonance imaging (mpMRI) is a valuable tool to non-invasively detect PCa and has even proven superior compared to systematic biopsy [2]. In order to improve and standardize this technique and its application, guidelines for acquisition, interpretation, and reporting of mpMRI of the prostate have been established [3]. Using a current version of these guidelines, the Prostate Imaging–Reporting and Data System (PI-RADS) version 2, and its decision rules, high sensitivities and specificities for the detection of clinically significant PCa can be reached [4]. A major limitation is an only moderate to good interreader agreement [5]. In order to address this issue, an updated version of PI-RADS, version 2.1, has been published just recently [6].

What is machine learning and what are the challenges?

Over the past years the promises of the application of machine learning (ML) in radiology have created a hype with increasing amounts of papers being published and increasing numbers of companies and researchers from different fields being involved. ML describes a broad class of analysis algorithms that are not programmed with explicit decision rules but build models for semi-autonomous or autonomous predictions inferred from ideally large and high-quality training datasets [7]. ML can incorporate vast amounts of parameters derived from imaging as well as clinical parameters and combine them in ways not easily comprehensible by humans to calculate the probability of a diagnosis. In supervised ML, sufficiently large training datasets have to be labeled based on a reliable reference standard, as otherwise the algorithms’ decisions will be flawed. Several specific types of ML algorithms exist, and their description as well as the discussion of associated challenges would go far beyond the scope of this comment.

Due to its ability to incorporate large amounts of parameters ML might help to identify associations between features and selected diseases or disease states yet unknown or too subtle for human recognition and subsequently improve diagnostics. In addition, ML could help to overcome limitations of human image interpretation such as a suboptimal interreader agreement. However, its application can be unsatisfying for radiologists, as we have been trained using an approach to image interpretation, where findings on different MRI sequences are supposed to represent changes in tissue composition more or less specific for a selected disease. These associations often have been evaluated extensively in studies before being used in clinical routine or even being integrated into decision rules for assigning a diagnosis.

Ignoring the hype around ML, technical challenges, and the potential dissatisfaction one might experience, another aspect is very important when discussing potential applications: Experienced clinicians and radiologists must define tasks and applications, where ML can have a positive impact on patient care. An example for such an application of ML in patients undergoing mpMRI of the prostate has been given by Wang et al. they were able to show that ML algorithms incorporating PI-RADS version 2 assessment categories and mpMRI radiomics achieve higher predictive values for PCa compared to PI-RADS alone and therefore might help to further improve the diagnostic performance of mpMRI [8].

What can we learn from the current study and where do we go from here?

In this issue of European Radiology, Antonelli et al. present the results of a study on selected supervised ML algorithms for the detection of Gleason 4 pattern PCa based on parameters derived from mpMRI acquired using a standardized imaging protocol as well as selected clinical parameters. The authors used the results of template as well as MRI-targeted biopsies as a high-quality reference standard and a relatively large number of lesions as a training dataset. They were able to show that selected supervised ML algorithms can predict the presence of Gleason 4 pattern in equivocal or suspicious lesions previously contoured by an experienced radiologist in the peripheral and transition zone. Furthermore, they were able to show that selected ML algorithms outperformed experienced radiologists. As said before, PCa is a heterogeneous disease and tumor grading is a major predictor of prognosis, especially the presence and percentage of Gleason 4 pattern [1]. Therefore, the task selected for ML in this study is of high clinical relevance, as a potential tool for the detection of clinically significant PCa in patients before biopsy as well as for patients with biopsy-proven low-grade PCa that consider undergoing or already undergo active surveillance [9]. It would be interesting to evaluate, what effect the contouring of lesions by different readers has and if the tested ML algorithms can help to overcome the limitation of an only suboptimal interreader agreement in interpretation of mpMRI by humans. The evaluation of ML as a tool for clinical decision making or patient stratification in studies assessing patient outcome as an endpoint in the previously described scenarios is eagerly awaited.

Antonelli et al. also evaluated the best combination of parameters used as an input for the selected ML algorithms. Remarkably, the input data is comprised of relatively few and established parameters such as prostate-specific antigen density, apparent diffusion coefficient values, and maximum enhancement resulted in an optimal Gleason 4 pattern prediction. This can be seen as reassuring regarding the way radiologists interpret imaging and also indicates how in the future ML might not just potentially replace humans, but also help to identify useful parameters readily available in clinical practice and subsequently improve our way of interpreting imaging and defining rules to do so.

As Antonelli et al. already state in their paper, external validation of the selected algorithms as well as further training with much larger datasets should be performed. Only if ML can be used successfully using imaging from different scanners and acquired with potentially even slightly different imaging protocols as input data, it will be applicable in a broader context. If ML stands this test and a benefit for patients can be proven, it will prevail and be a welcomed addition to our current way of performing radiology.