Reliability of whole mount radical prostatectomy histopathology as the ground truth for artificial intelligence assisted prostate imaging

Jager, Auke; Postema, Arnoud W.; van der Linden, Hans; Nooijen, Peet T.G.A.; Bekers, Elise; Kweldam, Charlotte F.; Daures, Gautier; Zwart, Wim; Mischi, M.; Beerlage, Harrie P.; Oddens, Jorg R.

doi:10.1007/s00428-023-03589-4

Reliability of whole mount radical prostatectomy histopathology as the ground truth for artificial intelligence assisted prostate imaging

Original Article
Open access
Published: 06 July 2023

Volume 483, pages 197–206, (2023)
Cite this article

Download PDF

You have full access to this open access article

Virchows Archiv Aims and scope Submit manuscript

Reliability of whole mount radical prostatectomy histopathology as the ground truth for artificial intelligence assisted prostate imaging

Download PDF

Auke Jager ORCID: orcid.org/0000-0003-3937-7262¹,
Arnoud W. Postema^1,2,
Hans van der Linden³,
Peet T.G.A. Nooijen³,
Elise Bekers⁴,
Charlotte F. Kweldam⁵,
Gautier Daures⁶,
Wim Zwart⁶,
M. Mischi⁷,
Harrie P. Beerlage¹ &
…
Jorg R. Oddens^1,7

1296 Accesses
2 Altmetric
Explore all metrics

Abstract

The development of artificial intelligence–based imaging techniques for prostate cancer (PCa) detection and diagnosis requires a reliable ground truth, which is generally based on histopathology from radical prostatectomy specimens. This study proposes a comprehensive protocol for the annotation of prostatectomy pathology slides. To evaluate the reliability of the protocol, interobserver variability was assessed between five pathologists, who annotated ten radical prostatectomy specimens consisting of 74 whole mount pathology slides. Interobserver variability was assessed for both the localization and grading of PCa. The results indicate excellent overall agreement on the localization of PCa (Gleason pattern ≥ 3) and clinically significant PCa (Gleason pattern ≥ 4), with Dice similarity coefficients (DSC) of 0.91 and 0.88, respectively. On a per-slide level, agreement for primary and secondary Gleason pattern was almost perfect and substantial, with Fleiss Kappa of .819 (95% CI .659–.980) and .726 (95% CI .573–.878), respectively. Agreement on International Society of Urological Pathology Grade Group was evaluated for the index lesions and showed agreement in 70% of cases, with a mean DSC of 0.92 for all index lesions. These findings show that a standardized protocol for prostatectomy pathology annotation provides reliable data on PCa localization and grading, with relatively high levels of interobserver agreement. More complicated tissue characterization, such as the presence of cribriform growth and intraductal carcinoma, remains a source of interobserver variability and should be treated with care when used in ground truth datasets.

Artificial intelligence–assisted cancer diagnosis improves the efficiency of pathologists in prostatic biopsies

Article Open access 21 February 2023

Identification of areas of grading difficulties in prostate cancer and comparison with artificial intelligence assisted grading

Article Open access 15 June 2020

Critical evaluation of artificial intelligence as a digital twin of pathologists for prostate cancer pathology

Article Open access 04 March 2024

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Artificial intelligence (AI) is gaining attention in the field of prostate cancer (PCa) imaging [1,2,3,4]. AI offers the potential to improve diagnostic accuracy and reduce operator dependency in magnetic resonance imaging (MRI), the current the standard of care in prostate imaging [4]. Additionally, AI may also play a role in advancing the clinical implementation of other imaging modalities such as multiparametric ultrasound [5].

For an AI-based imaging modality for PCa diagnosis to be effective, it must accurately localize PCa lesions and classify them into clinically relevant risk group (e.g., low, intermediate, and high-risk) [6]. The succes of any AI algorithm relies heavily on the quality of the ground truth data used in its development and training [7]. To achieve accurate data labeling, the regions assessed by the imaging modality must be categorized. For PCa diagnosis, the labeling is based on prostate histopathology, which can be obtained through prostate biopsy or radical prostatectomy specimen (RPS) [8]. While prostate biopsies are prone to underrepresenting the presence and extent of PCa, RPS provides a comprehensive view of the prostate and is considered a more suitable reference in developing AI-based imaging modalities [4].

Pathology annotation is idealy performed by an expert pathologist according to the International Society of Urological Pathology (ISUP) Grade Groups (GG) [9]. Studies have shown only fair to moderate agreement in PCa grading between pathologists, primarily involving biopsy cores instead of RPS and based on slide level agreement rather than localization of PCa [10,11,12]. In ISUP grading and lesion border annotation of PCa, interobserver variation can occur. Currently, there is no standardized protocol for RPS labeling as a ground truth and the reliability of such detailed pathology annotation is not well understood.

In a multicenter trial aimed at developing an AI-based image analysis algorithm for PCa diagnosis on three-dimensional multiparametric transrectal prostate ultrasound, a comprehensive model was developed to provide a ground truth based on RPS [13]. In the current paper, we describe the development of a standardized protocol for RPS annotation (part 1) and the results of a study evaluating the feasibility and reliability of this protocol (part 2).

Methods

Part I: creating the annotation protocol

The whole-mount RPS pathology protocol was developed by an expert panel consisting of urologists and urology residents (AJ, AP, JO, HB), uropathologists (HL, PN, AH, CW, KK), and engineers (MM, WZ). Its purpose is to provide a reliable ground truth for correlation of pathology (location and grading) with prostate imaging.

The expert panel conducted three consensus meetings to refine and finalize the protocol. The first version was tested on two RPS, each annotated by five uropathologists (PN, AH, CW, KK, EB) and evaluated in a second consensus meeting. After this meeting, the second version of the protocol was developed and applied on four RPS, each annotated by two uropathologists. In the third consensus meeting, the definitive version of the annotation protocol was determined.

The full version of the standard operating procedure of the annotation protocol is provided as supplementary materials 1. Identification of Gleason patterns (GPs) and secondary tumor characteristics in the current protocol was performed according to the growth patterns defined in the ISUP guidelines 2019 [9].

Stepwise summarized annotation protocol

Annotation of prostate cancer

Gleason patterns

Clinical evaluation according to the ISUP Grade Groups often combines areas that contain different GPs. However, when training an AI algorithm for PCa diagnosis on imaging, it is crucial that the algorithm recognizes specific image characteristics that are distinct for different GPs and are a result of tissue morphology. To achieve this, the expert panel decided that it should be avoided to annotate areas that contain a mixture of GPs (such as Gleason Score 3+4=7). Instead, it was decided to annotate cancerous tissue areas that contain solely GPs 3, 4, or 5, when possible. The pilot study showed some areas containing both GP 3 and 4 or 4 and 5 are too heterogeneous to separately annotate. For these areas, the option to annotate areas as Gleason Scores was incorporated in version 2 of the protocol.

Secondary tumor characteristics

Cribriform growth (CG) and/or intraductal carcinoma (IDC) are clinically important prognostic factors in PCa and are therefore separately annotated [9, 14, 15].

Annotation of benign abnormalities

Prostatitis and high-grade prostatic intraepithelial neoplasia are benign conditions that can sometimes lead to false positive results in prostate imaging. In order to understand the impact of these conditions, it is important to determine their presence in a given prostate.

Level of precision

Due to medical imaging resolution and inaccuracies when correlating imaging to pathology, annotation precision below 0.5 mm was deemed unnecessary for the purpose of the current protocol. Evaluation of the pilot study showed a wide variation in the level of precision between pathologists, causing a variation in time expenditure due to unnecessarily precise annotations. To ensure uniformity in the level of precision between pathologists, the scale at which cancerous tissue is annotated was set at 1 or 2 mm. Additionally, the polygon line thickness in the annotation tool was standardized at 0.2mm, which prevents overly detailed annotations.

Part II: feasibility and reliability of the final annotation protocol

Part II of this study was performed using the definitive version of the annotation protocol on full-mount RP slides originating from ten patients prospectively included in August and September 2021 at the Amsterdam University Medical Centres and the Netherlands Cancer Institute. These patients participated in a multicenter trial currently being carried out in the Netherlands (NCT04605276) [13]. Creating the annotation protocol was part of this trial and the protocol does not interfere with regular clinical evaluation. The study was approved by the institutional review board, reference number 2020_268#B202178. The ten prostates were randomly assigned to five uropathologist. Each pathologist annotated four prostates; each prostate was annotated by two different pathologists. The participating pathologists had at least 7 years of experience with prostate pathology and were trained at different centers in the Netherlands. Pathologists were blinded to each other and for clinical patient characteristics, including MRI and biopsy results.

Whole-mount histopathology slide preparation

The prostate specimen was fixated in formalin for at least 24 h. After, fixation specimens were sectioned from apex to base in 4-mm slices using a TruSlice specimen cut-up system (Cellpath Ltd, Newtown, UK). The prostate slices were fitted in cassettes, embedded in paraffin, and cut into whole-mount pathology slides (4 μm thick).

Whole-mount pathology slides were scanned on high resolution (40× enlargement, 20× objective, 2.1 camera lens) using a Pannoramic 1000 Digital Slide Scanner (3DHISTECH, H-1141 Budapest, Öv u. 3., Hungary) and uploaded to a web-based pathology annotation tool (Slidescore, Amsterdam, the Netherlands). The parasagitally cut apical and basal pathology slides were not included for annotation in the current study.

Study outcomes

The primary outcome for this study was to evaluate the accuracy of PCa tissue localization and grading. This was evaluated by analyzing the surface-based interobserver agreement per RPS between pathologists expressed as the weighted dice similarity coefficient (DSC). DSC is defined as: 2 × |X ∩ Y| / (|X| + |Y|). Weighted DSC was defined as \(\frac{\left(X+Y\right) DSC}{2Z}\). X and Y are the surface areas annotated by pathologists 1 and 2 on a single pathology slide. X ∩ Y is the area where X and Y overlap. Z is the mean surface area annotated by pathologists 1 and 2 on all pathology slides belonging to one RPS. The DSC is a value that ranges from 0 to 1, where 0 indicates no overlap between two annotated surface areas and 1 indicates perfect overlap (Fig. 1) [16]. The weighted DSC per RPS is the sum of the weighted DSC from each slide. Weighted DSC was calculated for PCa, defined as any GP 3 or higher, for clinically significant PCa (csPCa), defined as any GP 4 or higher, and for CG and/or IDC.

The secondary outcomes were the level of agreement in tissue characterization on a per-slide level and agreement on localization and grading of the index lesion. The agreement in tissue characterization on a per-slide level was expressed as Fleiss kappa (interobserver variability). Kappa values were interpreted as follows: Poor agreement for kappa <0.00, slight agreement for kappa is 0.00 to 0.20, fair agreement for kappa is 0.21 to 0.40, moderate agreement for kappa is 0.41 to 0.60, substantial agreement for kappa is 0.61 to 0.80, and almost perfect agreement for 0.81 to 1.00. Agreement on a slide level was evaluated for [1] any PCa, [2] csPCa, [3] primary and secondary GP, [4] presence of CG/IDC, [5] presence of a minor pattern 5. Any PCa was defined as any GP 3 or higher, csPCa as any GP 4 or higher. Primary, secondary, and minor GPs were defined according to the 2019 ISUP consensus meeting [9]. Primary GP was defined as the pattern with the largest surface area. Secondary GP was defined as the pattern with the second largest surface area, or, if there was a higher GP present, as the highest GP (provided that the surface area accounts for ≥5% of the total tumor area). A minor pattern 5 was defined as a GP 5 that accounts <5% of the total tumor area in a slide.

The index lesion was defined as the lesion with the highest ISUP GG with a surface area of ≥0.5cm². If multiple lesions with the same ISUP GG are annotated within one RPS, the lesion with the highest volume was considered to be the index lesion. To properly compare grading of the index lesions, the Gleason patterns were translated to ISUP GG according to the definitions provided by the ISUP guidelines [9]. The agreement on grading and localization was expressed as a percentage and DSC, respectively.

For both the primary and secondary outcomes, the results of the five participating pathologists were bundled to allow for comparison between two observers (pathologists 1 and 2).

Additional evaluation included time expenditure of executing the protocol per annotated prostate. Time expenditure was reported by the pathologist performing the annotations.

Results

A total of 10 RPS consisting of 74 whole mount pathology slides were used to evaluate the reliability of the definitive version of the protocol.

Lesion level

The average total surface area annotated by the pathologist in all ten RPS was 34.55 cm² for PCa and 31.80 cm² for csPCa. Overall agreement on localization, expressed as weighted DSC, was 0.91 for any PCa and 0.90 for csPCa. Agreement varied between prostates, with a tendency towards a lower DSC with smaller areas of PCa (Table 1). CG/IDC was annotated in four out of ten prostates and showed an overall weighted DSC of 0.64 cm² (Table 2). Figures 2 and 3 show the worst and best performing pathology slides.

Table 1 Total surface area annotated for PCa and csPCa by the pathologists for each prostate in square centimeters. Agreement on localization is given as weighted DSC. PCa prostate cancer, csPCa clinically significant prostate cancer, DSC Dice similarity coefficient, P1 pathologist 1, P2 pathologist 2

Full size table

Table 2 Total surface area annotated for CG/IDC by the pathologists for each prostate in square centimeters. Agreement on localization is given as weighted DSC. The poor agreement for prostate 6 is also visualized in Fig. 4. CG cribriform growth, IDC intraductal carcinoma, DSC dice similarity coefficient, P1 pathologist 1, P2 pathologist 2

Full size table

Slide level

A total of 74 whole mount pathology slides, originating from ten RPS, were annotated using the definitive version of the pathology protocol. Agreement on the presence of any PCa was perfect. Pathologists were in 100% agreement that 48 out of 74 pathology slides contained PCa (GP ≥ 3). For csPCa, agreement was almost perfect with a Fleiss kappa of .915 (95% confidence interval (CI) .689–1.0). Agreement on primary and secondary GP was almost perfect and substantial, with Kappa of .819 (95% CI .659–.980) and .726 (95% CI .573–.878), respectively. Table 3 gives an overview of interobserver variability for each annotation category.

Table 3 Interobserver variability for different annotation categories on a per pathology slide level

Full size table

Index lesion

The weighted overall DSC for the index lesions of all ten RPS was 0.92. The mean DSC was 0.89. Agreement on ISUP GG was seen in 70% of the index lesions (Table 4).

Table 4 Index lesion characteristics and agreement between P1 and P2. P1 pathologist 1, P2 pathologist 2, ISUP GG International Society of Urological Pathology Grade Group, DSC Dice similarity coefficient

Full size table

Time intensity

The median annotation time per prostate for the first version of the protocol was on average 3 h (range 1–5). For the definitive version of the protocol, average annotation time decreased to 2 h (range 1–4).

Discussion

There is a need for more efficient and reliable imaging for PCa diagnosis. Although MRI has shown to significantly improve patient selection prior to biopsy, its limited availability, high costs, and substantial interobserver variability remain an issue [17]. With the 2022 European Union recommendations to include PCa in population-based screening programs, the demand for accurate and reliable imaging will only intensify. AI-assisted automated detection methods for PCa have the potential to address these issues [13]. However, to effectively train and validate these diagnostic methods, it is crucial to assess the reliability of the ground truth. In this particular case, the ground truth is represented by RPS histopathology, which serves as the reference standard for PCa diagnosis [18, 19]. Studies that utilize prostate histopathology as the reference standard often fail to evaluate the reliability of their reference standard [20, 21]. In cases where evaluations are conducted, they typically focus on the accuracy of the correlation between pathology and imaging, overlooking the assessment of pathology annotation itself. This often involves relying on a single pathologist to annotate pathology slides, despite the well-known interobserver variability in PCa grading [11, 12, 22]. Furthermore, existing studies mainly report on grading agreement at the slide level, leaving a gap in our understanding of the agreement on the localization of PCa lesions among pathologists.

The current study aimed to address this gap in knowledge and demonstrated outstanding agreement in the localization of PCa, csPCa, and the index lesion, with weighted DSCs of 0.91, 0.90, and 0.92, respectively. Moreover, agreement on presence of PCa and csPCa on a per-slide level was near-perfect. These results demonstrate that the proposed protocol provides a reliable reference or ground truth for PCa localization and characterization.

The characterization of secondary tumor characteristics (CG/IDC) proved more challenging. On a per-slide level, agreement was substantial; however, there was less agreement on localization with an overall weighted DSC of 0.64. This can be partly attributed to a difference in annotation precision; some pathologists annotate many small areas of CG/IDC where others annotate fewer but larger areas. However, it also reflects a discrepancy in the interpretation of what should be classified as CG/IDC. The limited agreement on CG/IDC is a known issue [23]. Van der Slot et al. showed only moderate agreement between five pathologists on a per prostate level in 80 RPS [10]. While the current study demonstrated a modest improvement in agreement on a per-slide level, it also shows that the characterization of various types of GP 4 remains a complex task [22]. Figure 4 illustrates a case that exemplifies the difficulties in characterizing CG/IDC. In this case, the pathologists involved did not come to a consensus on the presence of CG/IDC in a larger area, even after revisiting the case. A third pathologist found that the pattern was not entirely consistent with the typical CG/IDC pattern. Instead, it was considered to be a borderline case, described in previous literature as “complex fused” [22].

On a per-slide level, agreement of primary and secondary GP was substantial to almost perfect. For clinical grading of PCa according to ISUP, an often-voiced concern is the interobserver variability between pathologists, reaching fair to moderate agreement for Gleason grading [11, 24]. A possible explanation of the relatively high agreement in the current study is that the detailed annotation protocol resulted in more careful evaluation of the pathology slides.

As the index lesion holds the most clinical relevance, a focused analysis was conducted to evaluate the localization and grading of the lesion [25]. The process of translating adjacent areas that were previously annotated as separate GPs into ISUP GGs, as depicted in Fig. 5, yielded a 70% concordance rate in terms of the ISUP GG assigned to the index lesion. This conversion approach also facilitates comparisons between different annotation protocols and clinical practice. The three cases of disconcordance between pathologists show the benefit of the protocol used in the current study. A discordance in ISUP GG can imply a substantially different interpretation on tissue morphology; however, examining the original annotations according to the study protocol shows that discrepancies in tissue characterization are often minor (Fig. 5).

This study has several shortcomings. Although the surface-based agreement provides many data points, the number of occurrences for some tissue types was limited (e.g., GP 5) and no reliable analysis of agreement on these tissue types could be performed. Furthermore, due to the design of the protocol, no surface-based analysis on the grading of separate GPs could be performed. To obtain a more comprehensive understanding of the agreement among pathologists for different grade classifications, further extensive analysis involving a larger sample size and annotations by additional pathologists will be necessary. However, grading of the index lesion showed an excellent surface based agreement as well as agreement on ISUP GG in seven out of ten RPS. The pathologists who participated in this study possessed extensive experience in prostate pathology. They underwent training and worked in different centers within the same country. While their expertise and diverse backgrounds contribute to the robustness of the study, it is important to acknowledge that the results may have limited generalizability to an international setting or in a setting with less experienced pathologists. Lastly, the time intensity of the study protocol was substantial. Adjustments made in the final protocol did decrease annotation time, but it remained time-intensive at an average of 2 h per prostate.

Conclusion

The results of this study indicate that the RPS pathology can be utilized for training and developing AI-based imaging modalities. Through standardization and evaluation of annotation methods, the current study achieved relatively a high level of agreement between experienced pathologist, with substantial to almost perfect agreement for PCa localization and grading. Agreement on the presence of more complex tissue morphology (e.g., CG/IDC) remains limited, and their inclusion in a ground truth dataset should be approached with caution.

Data availability

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

Schieda N, Lim CS, Zabihollahy F, Abreu-Gomez J, Krishna S, Woo S et al (2021) Quantitative prostate MRI. J Magn Reson Imaging 53(6):1632–1645
Article PubMed Google Scholar
Mannaerts CK, Engelbrecht MRW, Postema AW, van Kollenburg RAA, Hoeks CMA, Savci-Heijink CD et al (2020) Detection of clinically significant prostate cancer in biopsy-naive men: direct comparison of systematic biopsy, multiparametric MRI- and contrast-ultrasound-dispersion imaging-targeted biopsy. BJU Int 126(4):481–493
Article PubMed Google Scholar
Klotz L, Lughezzani G, Maffei D, Sanchez A, Pereira JG, Staerman F et al (2021) Comparison of micro-ultrasound and multiparametric magnetic resonance imaging for prostate cancer: a multicenter, prospective analysis. Can Urol Assoc J 15(1):E11–EE6
PubMed Google Scholar
Suarez-Ibarrola R, Sigle A, Eklund M, Eberli D, Miernik A, Benndorf M et al (2022) Artificial intelligence in magnetic resonance imaging-based prostate cancer diagnosis: where do we stand in 2021? Eur Urol Focus 8(2):409–417
Article PubMed Google Scholar
Wildeboer RR, Mannaerts CK, van Sloun RJG, Budaus L, Tilki D, Wijkstra H et al (2020) Automated multiparametric localization of prostate cancer based on B-mode, shear-wave elastography, and contrast-enhanced ultrasound radiomics. Eur Radiol 30(2):806–815
Article PubMed Google Scholar
Mottet N, van den Bergh RCN, Briers E, Van den Broeck T, Cumberbatch MG, De Santis M et al (2021) EAU-EANM-ESTRO-ESUR-SIOG Guidelines on Prostate Cancer-2020 Update. Part 1: screening, diagnosis, and local treatment with curative intent. Eur Urol 79(2):243–262
Article CAS PubMed Google Scholar
Park SH, Han K (2018) Methodologic guide for evaluating clinical performance and effect of artificial intelligence technology for medical diagnosis and prediction. Radiology 286(3):800–809
Article PubMed Google Scholar
Willemink MJ, Koszek WA, Hardell C, Wu J, Fleischmann D, Harvey H et al (2020) Preparing medical imaging data for machine learning. Radiology 295(1):4–15
Article PubMed Google Scholar
van Leenders G, van der Kwast TH, Grignon DJ, Evans AJ, Kristiansen G, Kweldam CF et al (2020) The 2019 International Society of Urological Pathology (ISUP) Consensus Conference on Grading of Prostatic Carcinoma. Am J Surg Pathol 44(8):e87–e99
Article PubMed PubMed Central Google Scholar
van der Slot MA, Hollemans E, den Bakker MA, Hoedemaeker R, Kliffen M, Budel LM et al (2021) Inter-observer variability of cribriform architecture and percent Gleason pattern 4 in prostate cancer: relation to clinical outcome. Virchows Arch 478(2):249–256
Article PubMed Google Scholar
Ozkan TA, Eruyar AT, Cebeci OO, Memik O, Ozcan L, Kuskonmaz I (2016) Interobserver variability in Gleason histological grading of prostate cancer. Scand J Urol 50(6):420–424
Article CAS PubMed Google Scholar
Netto GJ, Eisenberger M, Epstein JI, Investigators TAXT (2011) Interobserver variability in histologic evaluation of radical prostatectomy between central and local pathologists: findings of TAX 3501 multinational clinical trial. Urology 77(5):1155–1160
Article PubMed Google Scholar
Jager A, Postema AW, Mischi M, Wijkstra H, Beerlage HP, Oddens JR (2023) Clinical trial protocol: developing an image classification algorithm for prostate cancer diagnosis on three-dimensional multiparametric transrectal ultrasound. Eur Urol Open Sci 49:32–43
Article PubMed PubMed Central Google Scholar
Hollemans E, Verhoef EI, Bangma CH, Rietbergen J, Helleman J, Roobol MJ et al (2019) Large cribriform growth pattern identifies ISUP grade 2 prostate cancer at high risk for recurrence and metastasis. Mod Pathol 32(1):139–146
Article CAS PubMed Google Scholar
Kweldam CF, Wildhagen MF, Steyerberg EW, Bangma CH, van der Kwast TH, van Leenders GJ (2015) Cribriform growth is highly predictive for postoperative metastasis and disease-specific death in Gleason score 7 prostate cancer. Mod Pathol 28(3):457–464
Article PubMed Google Scholar
Zou KH, Warfield SK, Bharatha A, Tempany CM, Kaus MR, Haker SJ et al (2004) Statistical validation of image segmentation quality based on a spatial overlap index. Acad Radiol 11(2):178–189
Article PubMed PubMed Central Google Scholar
de Rooij M, van Poppel H, Barentsz JO (2021) Risk stratification and artificial intelligence in early magnetic resonance imaging-based detection of prostate cancer. Eur Urol Focus 8(5):1187–1191. https://doi.org/10.1016/j.euf.2021.11.005
Pantanowitz L, Quiroga-Garza GM, Bien L, Heled R, Laifenfeld D, Linhart C et al (2020) An artificial intelligence algorithm for prostate cancer diagnosis in whole slide images of core needle biopsies: a blinded clinical validation and deployment study. Lancet Digit Health 2(8):e407–ee16
Article PubMed Google Scholar
Khosravi P, Lysandrou M, Eljalby M, Li Q, Kazemi E, Zisimopoulos P et al (2021) A deep learning approach to diagnostic classification of prostate cancer using pathology-radiology fusion. J Magn Reson Imaging 54(2):462–471
Article PubMed PubMed Central Google Scholar
Turkbey B, Mani H, Shah V, Rastinehad AR, Bernardo M, Pohida T et al (2011) Multiparametric 3T prostate magnetic resonance imaging to detect cancer: histopathological correlation using prostatectomy specimens processed in customized magnetic resonance imaging based molds. J Urol 186(5):1818–1824
Article PubMed PubMed Central Google Scholar
Radtke JP, Schwab C, Wolf MB, Freitag MT, Alt CD, Kesch C et al (2016) Multiparametric magnetic resonance imaging (MRI) and MRI-Transrectal ultrasound fusion biopsy for index tumor detection: correlation with radical prostatectomy specimen. Eur Urol 70(5):846–853
Article PubMed Google Scholar
Kweldam CF, Nieboer D, Algaba F, Amin MB, Berney DM, Billis A et al (2016) Gleason grade 4 prostate adenocarcinoma patterns: an interobserver agreement study among genitourinary pathologists. Histopathology 69(3):441–449
Article PubMed Google Scholar
van der Kwast TH, van Leenders GJ, Berney DM, Delahunt B, Evans AJ, Iczkowski KA et al (2021) ISUP Consensus Definition of Cribriform Pattern Prostate Cancer. Am J Surg Pathol 45(8):1118–1126
Article PubMed Google Scholar
Melia J, Moseley R, Ball RY, Griffiths DF, Grigor K, Harnden P et al (2006) A UK-based investigation of inter- and intra-observer reproducibility of Gleason grading of prostatic biopsies. Histopathology 48(6):644–654
Article CAS PubMed Google Scholar
Wise AM, Stamey TA, McNeal JE, Clayton JL (2002) Morphologic and clinical significance of multifocal prostate cancers in radical prostatectomy specimens. Urology 60(2):264–269
Article PubMed Google Scholar

Download references

Acknowledgements

The authors thank Altuna Halilovic MD; Karel C. Kuijpers MD, PhD; Charlotte Wetzels MD; Anna Garrido Utrilla MsC, PhD; and Hans A. Bogaard PhD.

Funding

The funding for this study was provided by the European Union and Angiogenesis Analytics, JADS Venture Campus, Sint Janssingel 92, 5211DA, ’s-Hertogenbosch, The Netherlands (AA).

Author information

Authors and Affiliations

Amsterdam UMC, University of Amsterdam, Department of Urology, Meibergdreef 9, Amsterdam, The Netherlands
Auke Jager, Arnoud W. Postema, Harrie P. Beerlage & Jorg R. Oddens
Department of Urology, Netherlands Cancer Institute – Antoni van Leeuwenhoek Hospital, Amsterdam, The Netherlands
Arnoud W. Postema
Pathology DNA, Jeroen Bosch Hospital, Henri Dunantstraat 1, 5223, GZ, ’s-Hertogenbosch, The Netherlands
Hans van der Linden & Peet T.G.A. Nooijen
Department of Pathology, Netherlands Cancer Institute-Antoni van Leeuwenhoek Hospital, Amsterdam, The Netherlands
Elise Bekers
Department of pathology, Maasstad Ziekenhuis, Rotterdam, The Netherlands
Charlotte F. Kweldam
Angiogenesis Analytics, JADS Venture Campus, ’s-Hertogenbosch, AA, The Netherlands
Gautier Daures & Wim Zwart
Department of Electrical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands
M. Mischi & Jorg R. Oddens

Authors

Auke Jager
View author publications
You can also search for this author in PubMed Google Scholar
Arnoud W. Postema
View author publications
You can also search for this author in PubMed Google Scholar
Hans van der Linden
View author publications
You can also search for this author in PubMed Google Scholar
Peet T.G.A. Nooijen
View author publications
You can also search for this author in PubMed Google Scholar
Elise Bekers
View author publications
You can also search for this author in PubMed Google Scholar
Charlotte F. Kweldam
View author publications
You can also search for this author in PubMed Google Scholar
Gautier Daures
View author publications
You can also search for this author in PubMed Google Scholar
Wim Zwart
View author publications
You can also search for this author in PubMed Google Scholar
M. Mischi
View author publications
You can also search for this author in PubMed Google Scholar
Harrie P. Beerlage
View author publications
You can also search for this author in PubMed Google Scholar
Jorg R. Oddens
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

AJ: Conceptualization, Methodology, Acquisition, Writing – Original Draft, Visualization, Project administration. AP: Conceptualization, Methodology, Acquisition, Writing – Review & Editing. HL, PN: Methodology, acquistion, Writing – Review & Editing. EB: Resources, Acquistion. CK: Writing – Review & Editing. GD, WZ: Methodology, interpretation of data, statistical analysis. MM: Writing – Review & Editing, Supervision, Funding acquisition. HB: Writing – Review & Editing, Supervision, Funding acquisition. JO: Conceptualization, Methodology, Writing – Review & Editing, Supervision, Project administration. All authors read and approved the final paper.

Corresponding author

Correspondence to Auke Jager.

Ethics declarations

Ethics approval and consent to participate

This study was approved by an accredited medical research ethics committee (MEC AMC) under reference number 2020_268#B202178. All study participants signed an informed consent form that includes the consent for use of their data for publication. The study was performed in accordance with the Declaration of Helsinki.

Competing interests

HB is chair of the clinical board for AA.

AP, MM, and HW are scientific advisors for AA for which they receive compensation.

PN, HL, EB, and CK performed pathology annotations for which they were financially compensated by AA.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

ESM 1

(PDF 980 kb)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Jager, A., Postema, A.W., van der Linden, H. et al. Reliability of whole mount radical prostatectomy histopathology as the ground truth for artificial intelligence assisted prostate imaging. Virchows Arch 483, 197–206 (2023). https://doi.org/10.1007/s00428-023-03589-4

Download citation

Received: 20 April 2023
Revised: 05 June 2023
Accepted: 26 June 2023
Published: 06 July 2023
Issue Date: August 2023
DOI: https://doi.org/10.1007/s00428-023-03589-4

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Reliability of whole mount radical prostatectomy histopathology as the ground truth for artificial intelligence assisted prostate imaging

Abstract

Similar content being viewed by others

Artificial intelligence–assisted cancer diagnosis improves the efficiency of pathologists in prostatic biopsies

Identification of areas of grading difficulties in prostate cancer and comparison with artificial intelligence assisted grading

Critical evaluation of artificial intelligence as a digital twin of pathologists for prostate cancer pathology

Introduction

Methods

Part I: creating the annotation protocol

Stepwise summarized annotation protocol

Annotation of prostate cancer

Gleason patterns

Secondary tumor characteristics

Annotation of benign abnormalities

Level of precision

Part II: feasibility and reliability of the final annotation protocol

Whole-mount histopathology slide preparation

Study outcomes

Results

Lesion level

Slide level

Index lesion

Time intensity

Discussion

Conclusion

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Competing interests

Additional information

Publisher’s note

Supplementary information

ESM 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation