Abstract
Background
Pre-transplant procurement biopsy interpretation is challenging, also because of the low number of renal pathology experts. Artificial intelligence (AI) can assist by aiding pathologists with kidney donor biopsy assessment. Herein we present the “Galileo” AI tool, designed specifically to assist the on-call pathologist with interpreting pre-implantation kidney biopsies.
Methods
A multicenter cohort of whole slide images acquired from core-needle and wedge biopsies of the kidney was collected. A deep learning algorithm was trained to detect the main findings evaluated in the pre-implantation setting (normal glomeruli, globally sclerosed glomeruli, ischemic glomeruli, arterioles and arteries). The model obtained on the Aiforia Create platform was validated on an external dataset by three independent pathologists to evaluate the performance of the algorithm.
Results
Galileo demonstrated a precision, sensitivity, F1 score and total area error of 81.96%, 94.39%, 87.74%, 2.81% and 74.05%, 71.03%, 72.5%, 2% in the training and validation sets, respectively. Galileo was significantly faster than pathologists, requiring 2 min overall in the validation phase (vs 25, 22 and 31 min by 3 separate human readers, p < 0.001). Galileo-assisted detection of renal structures and quantitative information was directly integrated in the final report.
Conclusions
The Galileo AI-assisted tool shows promise in speeding up pre-implantation kidney biopsy interpretation, as well as in reducing inter-observer variability. This tool may represent a starting point for further improvements based on hard endpoints such as graft survival.
Graphical Abstract
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
Introduction
In the evolving landscape of kidney transplantation, the quest for precision and consistency in graft evaluation remains a pressing need, especially in the era of extended criteria donors that are boosting the availability of organs with marginal/suboptimal structural characteristics [1]. In this context, the role of pathology in evaluating donor kidneys is crucial in determining the long-term success of transplant procedures [2]. Challenges in transplant pathology include the need for a rapid response during the assessment process and the scarcity of expert pathologists, especially in peripheral centers [3]. Additionally, standardization during sampling phases (needle biopsy vs wedge biopsy), with technical procedures (frozen vs rapid processing) and with reporting protocols, which currently hinge on a multifaceted classification system [4], designed to ensure that only adequate donor kidneys are selected, is lacking [5]. The Karpinski score provides a systematic and quantifiable assessment of a kidney graft encompassing a range of five parameters focused on glomeruli, tubulo-interstitial compartment and arteries [6]. However, its application in clinical settings is burdened by poor reproducibility and consistency [7].
The construction of a hub-spoke network system with the help of expert renal transplant pathologists can help in improving the diagnostic reliability of the Karpinski score [8], recently facilitated by leveraging telepathology [9, 10] that could further benefit from employing artificial intelligence (AI) tools [11]. Moreover, digitally connecting highly specialized professionals in transplant pathology [12] partly solves the issue of low interobserver reproducibility that affects this score [13]. The application of deep learning tools to speed up pre-implantation biopsy evaluation [14] has shown promising results [15], facilitating the detection, segmentation, and classification of different renal compartments, thereby marking a significant leap forward in precision diagnostics. Herein, we describe the utility and performance of an automated AI-based tool named “Galileo'' that is designed specifically to assist the on-call pathologist with interpreting pre-implantation kidney biopsies.
Methods
Case selection
In this multicenter study, consecutive pre-implantation renal histological specimens (n = 84) from standard and marginal donors were retrieved from the archives of the Nephropathology Centers of the Department of Translational Medicine, University of Ferrara, Ferrara, Italy (center #1), Fondazione IRCCS San Gerardo dei Tintori, University of Milano-Bicocca, Monza, Italy (center #2) and ASST Spedali Civili, Brescia, Italy (center #3). Of these enrolled cases, 62 were core-needle biopsies processed using a rapid formalin fixation paraffin embedding processing protocol (from center #1), whereas 12 and 10 were wedge biopsies processed using conventional methods (from centers #2 and #3, respectively) [9, 11]. The original Karpinski scores assigned to each case [6] along with clinical information on donors were retrieved at baseline from the hospital electronic health records (age, sex, concurrent diabetes, hypertension, serum creatinine as mg/dl and estimated glomerular filtration rate (eGFR), as calculated with the EPI-CKD formula) [16]. When available, basic clinical and laboratory data on graft outcome (serum creatinine, eGFR and graft survival at 1 year) were recorded. Single Periodic Acid Schiff slides per case were scanned using the NanoZoomer S60 scanner (Hamamatsu, Shizuoka, Japan) at 40 × magnification (0.22079 μm pixel size), fully anonymized and uploaded on the Aiforia Create online platform (v 5.7, Aiforia Technologies, Helsinki, Finland). This study complies with the Declaration of Helsinki and was performed according to ethics committee approval (PNRR-MR1-2022–12375735, 03/16/23).
Algorithm development and training
For algorithm development, available cases were divided into training and validation datasets, as reported in Fig. 1.
Core-needle biopsy and wedge biopsy cohorts from the first two centers were used as a training set (n = 74). Two pathologists (SG and FDC) accurately selected regions of interest from the renal parenchyma containing the five distinct structures employed for this study (normal, ischemic and globally sclerosed glomeruli, as well as arteries and arterioles) from whole slide images collected from the first two centers (Supplementary Fig. 1). Within the selected regions of interest, annotators comprehensively outlined these structures following definitions already provided in the literature [17] using the viewer and annotation tools of the Aiforia platform on a BARCO MDPC-8127 monitor (BARCO, Courtrai, Belgium). All the annotations were subsequently reviewed and refined by two expert nephropathologists (AB and AE). Tiles were subsequently extracted from the annotated images and the AI model was then developed using a field of view of 320 µm, with a training loss of 0.05 and 6032 iterations run. Performance of the final Galileo model was assessed using total area error, precision, sensitivity and F1 score (%).
Algorithm validation
To validate the Galileo model, whole slide images of the wedge biopsies from center #3 (n = 10) were used and five regions of interest containing the renal structures under investigation were randomly selected by one of the training annotators (FDC, Supplementary Fig. 2). Within these regions of interest, three additional pathologists (GC, VL and FP, external validators #1, #2 and #3, respectively) independently annotated the different structures providing the ground truth for the validation of the algorithm. Based on subsequent detections by the Galileo model on the validation set, the metrics total area error, precision, sensitivity and F1 score (%) were calculated and used to evaluate the performance and generalizability of the deep learning pipeline. Moreover, the time required for the annotation process by external reviewers was recorded with a stopwatch and compared to the time needed for Galileo to detect specific renal structures.
Statistical analysis
Continuous variables were summarized using mean ± standard deviation (SD), as applicable, while qualitative variables were presented as counts and frequencies. To compare means and qualitative variables, t-tests, chi-square tests, and Mann–Whitney U tests were employed, depending on the nature of the data. Comparison of the human and AI evaluation of single parameters of the Karpinski score was performed with Cohen’s kappa (k). Significance was set at p-values < 0.05. Collected data underwent statistical analysis using Pandas and Scikit-learn Python libraries.
Results
Clinico-histological characteristics of the donors
The study included 84 cases with available pre-transplant biopsy, with a mean donor age of 68 years (± 8.8 years), a prevalence of male donors (67, 80%), 28 (33%) and 34 (40%) of whom had a history of diabetes and/or hypertension, respectively. The average serum creatinine of the cohort was 1.4 (± 0.4) mg/dl corresponding to an eGFR of 54.3 (± 20.1) ml/min/1.73 m2. Based on the original histological characterization, cases were divided into two groups, with 39 (47%) and 45 (53%) having a Karpinski score of ≤ 3 and > 4, respectively. A significantly higher prevalence of hypertension (28, 63% vs 6, 14%, p < 0.001), serum creatinine (1.7 ± 0.4 mg/dl vs 1.1 ± 0.2 mg/dl, p = 0.001) and lower eGFR at baseline (39.3 ± 10.4 ml/min/1.73 m2 vs 71.5 ± 13 ml/min/1.73 m2, p < 0.001) were present in the latter group (Karpinski > 4), with no significant differences in terms of age, sex and diabetes prevalence (Supplementary Table 1).
Galileo performance
A total of 2880, 79, 506, 587 and 2024 regions of interest representing normal, ischemic glomeruli, globally sclerosed glomeruli, arteries and arterioles were extracted, respectively. The algorithm demonstrated excellent performance during the training phase for 813 training regions, with a precision, sensitivity, F1 score and total area error of 81.96%, 94.39%, 87.74% and 2.81%, respectively (Supplementary Fig. 3). The validation phase confirmed the excellent results obtained during training on a separate external dataset (Table 1), with an average precision, sensitivity, F1 score and total area error of 74.05%, 71.03%, 72.5% and 2%, respectively. Moreover, the Galileo algorithm was significantly faster as compared to the three external validators, requiring 2 min overall (vs 25, 22 and 31, p < 0.001, Table 2), allowing the direct integration of the algorithm results within the final histological report (Fig. 2).
Graft survival and human/AI reproducibility
One year after transplant, the recipients with functioning kidney transplants had an average serum creatinine of 1.7 mg/dl, with an average eGFR of 68 ml/min/1.73 m2. Twenty-two (27%) cases experienced graft loss. Comparing the two groups based on the original Karpinski score (≤ 3 vs > 4), no differences were noted in terms of serum creatinine (1.1 ± 0.7 mg/dl vs 2.2 ± 1.3 mg/dl, p = 0.09), but lower eGFR (91.8 ± 38.9 ml/min/1.73 m2 vs 47.2 ± 21.2 ml/min/1.73 m2, p = 0.01) and higher graft loss rate at 1 year (12, 14% vs 32, 38%, p < 0.001) were recorded for the latter group. A comparison of the scores given by the pathologist and by the AI system Galileo on the single parameters of the Karpinski score demonstrated good reproducibility for the glomerular, arteriolar and artery scores (k = 0.69, 0.65 and 0.67, respectively).
Discussion
Pre-implantation biopsy plays a central role in kidney graft evaluation and on decisions concerning the possibility to use the kidneys for transplantation. However, the assessment of pre-implantation kidney biopsies is not standardized in terms of the technical procedures adopted and pathologists’ evaluations. Harmonization of this process is needed [4]. Currently, tissue samples may be obtained by core-needle biopsy or wedge biopsy. The most appropriate processing technique (e.g. snap frozen vs rapidly processed) for these specimens is debated [18]. Different policies can have a significant impact on the final report, with possible under-/over-estimation of chronic damage in different renal compartments [19]. This can, in turn, influence the outcome of the graft [20], with the best correlation being described when pre-implantation biopsies are interpreted by experienced renal pathologists [21].
However, the most frequently encountered scenario involves relying on on-call general pathologists, who may have limited knowledge in nephropathology [12]. Moreover, reliance on general pathologists increases inter-observer variability. General pathologists typically assign higher scores for glomerulosclerosis and arterial thickness, which are the most important parameters for evaluating chronic renal damage [8]. To address this challenge, remote teleconsultation by renal experts can be solicited after the biopsy slides are digitized [22]. Once the slides have been scanned, it is also possible to apply computational tools [12]. Indeed, the creation of an AI-based tool that could assist pathologists, by improving accuracy and expediting their review, could be highly beneficial.
The detection of glomerulosclerosis in pre-implantation biopsies is significantly associated with graft survival, with studies demonstrating the predictive role of glomerulosclerosis > 10% [23], with no incremental effects for values above that threshold [24]. This highlights the importance of subtle changes around this cutoff which can be affected by inter-observer variability. Hence, AI-assisted detection of glomeruli, with reliable distinction between normal, ischemic and globally sclerotic, improves diagnostic assessment using whole-slide images [25]. Despite reported challenges with the segmentation and classification task of certain renal structures (e.g. variable shapes/dimensions/internal architecture, interspersed nature within the renal parenchyma, and heterogeneity of pre-analytical variables), previous attempts to apply AI in renal pahology demonstrated high reliability of glomerular detection and classification (e.g. precision in classifying healthy vs sclerosed glomeruli ranging from 0.834–0.935 and 0.806–0.976) [26]. In addition, fibrosis and lumen narrowing of vascular structures (arteries and arterioles) is significantly associated with long term graft survival, especially for mild-moderate (> 25%) arteriosclerosis [24]. Fortunately, AI-assisted segmentation from whole slide images has demonstrated good reliability in discriminating blood vessels vs tubules with an accuracy and precision of 0.93 and 0.88 [27], respectively, confirmed by subsequent studies (accuracy 0.89) which also demonstrated that significantly less time was needed for the algorithm as compared to the pathologists (2 min vs 20 min) [15].
In this study, the Galileo system was trained on a heterogeneous and multi-institutional cohort of renal core-needle and wedge biopsies that included a broad range of pre-analytical variables. The aim was to obtain a robust AI-assisted tool that could be generalized and employed in different settings, to accommodate the heterogeneity of cases encountered in routine clinical practice. Excellent precision and sensitivity were noted for Galileo during the training phase (81.96% and 94.39%), with total area error restricted to only 2.81%. The validation phase on an external dataset annotated by a different panel of pathologists allowed this AI-based tool to achieve good reliability in terms of precision and sensitivity (74.05% and 71.03%), with further reduction of the total area error (2%). Even reaching these promising levels of performance, the AI models can be significantly demanding in terms of computational power, which can potentially limit their wider applications by on-call pathologists due to the potential need for dedicated high performance workstations and to the long computational times [28]. The employment of cloud-based AI suites, like the one used in the present study, can significantly shorten the processing times, i.e., 2 min vs 22–31 for pathologists, which is highly important in the transplantation setting. Another possible limitation of adopting AI could be the reluctance of pathologists in trusting black box solutions [28], which might possibly be mitigated by explainability methods. In this setting, the ability to visually represent renal structures detected by the AI algorithm Galileo in an explainable manner greatly improved end-user acceptance, and facilitated the creation of a final pathology report that integrated qualitative and quantitative findings. Although promising, the current version of the Galileo algorithm includes five histological classes among those required for the interpretation of pre-implantation renal biopsies. The evaluation of interstitial fibrosis and tubular atrophy (IFTA), not covered by Galileo in its current form, is highly subjective and shows low interobserver reproducibility (Cohen’s kappa of 0.5 among 4 pathologists [29]), which makes it unsuitable for an AI algorithm. Some authors proposed overcoming this subjectivity by quantifying IFTA through image analysis methods, for example using color space transformations and structural feature extraction from the images, that would not need human interaction/training [30]. However, this approach has some limitations including loss of information during the color space transformation, high stain variability (not able to correctly classify all the renal structures), and error in the segmentation of these structures with consequent possible inaccurate quantification of interstitial fibrosis (being based on the identification and subsequent removal of non-fibrotic regions from the tissue). In this setting, application of the adaptive stain separation method seems promising [15], and similar approaches will be implemented prospectively in the Galileo algorithm. Ancillary histological modifications at the glomerular (e.g. mesangial nodular expansion), vascular (e.g. hyalinosis and thrombotic microangiopathy) or tubular (e.g. acute damage/necrosis) level may be of interest to further refine the stratification of the risk in the transplant setting, especially in deceased donors. Further training on larger case series with rare histological instances will be carried out to allow Galileo to recognize ancillary but useful modifications of the renal parenchyma. Moreover, further applications of the Galileo algorithm on additional external datasets will help corroborate its reliability and generalizability in the routine clinical setting, as well as its impact on outcomes (e.g. graft survival).
Data availability statement
Authors agree to make data and materials supporting the results or analyses presented in their paper available upon reasonable request.
References
Marletta S, Di Bella C, Catalano G et al (2023) Pre-implantation kidney biopsies in extended criteria donors: from on call to expert pathologist, from conventional microscope to digital pathology. Crit Rev Oncog 28:7–20
Cima L, Nacchia F, Ghimenton C et al (2019) Histopathology and long-term outcome of kidneys transplanted from donors with severe acute kidney injury. Prog Transplant 29:36–42
Eccher A, Girolami I, Becker JU (2023) Horizon of the pre-implantation kidney biopsy for allocation: multidisciplinarity, methodology and innovation. J Nephrol 36:947–949
Zaza G, Cucchiari D, Becker JU et al (2023) European Society for Organ Transplantation (ESOT)-TLJ 3.0 consensus on histopathological analysis of pre-implantation donor kidney biopsy: redefining the role in the process of graft assessment. Transpl Int 36:11410
Caliò A, Barreca A, Marletta S et al (2023) Histology for nephrology, from pre-implantation to post-transplant kidney biopsy. Lesson learned from ReBIrth (Renal BIopsy for Kidney Transplantation Therapy). Pathologica 115:199–204
Karpinski J, Lajoie G, Cattran D et al (1999) Outcome of kidney transplantation from high-risk donors is determined by both structure and function. Transplantation 67:1162–1167
Bukowy JD, Dayton A, Cloutier D et al (2018) Region-based convolutional neural nets for localization of glomeruli in trichrome-stained whole kidney sections. J Am Soc Nephrol 29:2081–2088
Girolami I, Gambaro G, Ghimenton C et al (2020) Pre-implantation kidney biopsy: value of the expertise in determining histological score and comparison with the whole organ on a series of discarded kidneys. J Nephrol 33:167–176
L’Imperio V, Brambilla V, Cazzaniga G et al (2021) Digital pathology for the routine diagnosis of renal diseases: a standard model. J Nephrol 34:681–688
L’Imperio V, Casati G, Cazzaniga G et al (2023) Improvements in digital pathology equipment for renal biopsies: updating the standard model. J Nephrol. https://doi.org/10.1007/s40620-023-01568-1
Cazzaniga G, Rossi M, Eccher A et al (2023) Time for a full digital approach in nephropathology: a systematic review of current artificial intelligence applications and future directions. J Nephrol. https://doi.org/10.1007/s40620-023-01775-w
Neri F, Eccher A, Rigotti P et al (2021) Advantages of using a web-based digital platform for kidney preimplantation biopsies. J Pathol Inform 12:41
Eccher A, Neil D, Ciangherotti A et al (2016) Digital reporting of whole-slide images is safe and suitable for assessing organ quality in preimplantation renal biopsies. Hum Pathol 47:115–120
Girolami I, Pantanowitz L, Marletta S et al (2022) Artificial intelligence applications for pre-implantation kidney biopsy pathology practice: a systematic review. J Nephrol 35:1801–1808
Salvi M, Mogetta A, Meiburger KM et al (2020) Karpinski score under digital investigation: a fully automated segmentation algorithm to identify vascular and stromal injury of donors’ kidneys. Electronics 9:1644
Levey AS, Stevens LA, Schmid CH et al (2009) A new equation to estimate glomerular filtration rate. Ann Intern Med 150:604–612
Bouteldja N, Klinkhammer BM, Bülow RD et al (2021) Deep learning-based segmentation and quantification in experimental kidney histopathology. J Am Soc Nephrol 32:52–68
Teixeira AC, Freire de Carvalho CC, Mororó GP et al (2017) Evaluation of frozen and paraffin sections using the Maryland Aggregate Pathology Index Score in donor kidney biopsy specimens of a Brazilian Cohort. Transplant Proc 49:2247–2250
Yong ZZ, Kipgen D, Aitken EL et al (2015) Wedge versus core biopsy at time zero: which provides better predictive value for delayed graft function with the Remuzzi histological scoring system? Transplant Proc 47:1605–1609
Kasiske BL, Stewart DE, Bista BR et al (2014) The role of procurement biopsies in acceptance decisions for kidneys retrieved for transplant. Clin J Am Soc Nephrol 9:562–571
Azancot MA, Moreso F, Salcedo M et al (2014) The reproducibility and predictive value on outcome of renal biopsies from expanded criteria donors. Kidney Int 85:1161–1168
Girolami I, Parwani A, Barresi V et al (2019) The landscape of digital pathology in transplantation: from the beginning to the virtual E-slide. J Pathol Inform 10:21
Stewart DE, Foutz J, Kamal L et al (2022) The independent effects of procurement biopsy findings on 10-year outcomes of extended criteria donor kidney transplants. Kidney Int Rep 7:1850–1865
Anglicheau D, Loupy A, Lefaucheur C et al (2008) A simple clinico-histopathological composite scoring system is highly predictive of graft outcomes in marginal donors. Am J Transplant 8:2325–2334
Gadermayr M, Dombrowski A-K, Klinkhammer BM et al (2019) CNN cascades for segmenting sparse objects in gigapixel whole slide images. Comput Med Imaging Graph 71:40–48
Altini N, Cascarano GD, Brunetti A et al (2020) Semantic segmentation framework for glomeruli detection and classification in kidney histological sections. Electronics 9:503
Bevilacqua V, Pietroleonardo N, Triggiani V et al (2017) An innovative neural network framework to classify blood vessels and tubules based on Haralick features evaluated in histological images of kidney biopsy. Neurocomputing 228:143–153
Asif A, Rajpoot K, Graham S et al (2023) Unleashing the potential of AI for pathology: challenges and recommendations. J Pathol 260:564–577
Nachiappa Ganesh R, Graviss EA, Nguyen D et al (2024) Reproducibility and prognostic ability of chronicity parameters in kidney biopsy—comprehensive evaluation comparing microscopy and artificial intelligence in digital pathology. Hum Pathol 146:75–85
Tey WK, Kuang YC, Ooi MP-L, Khoo JJ (2018) Automated quantification of renal interstitial fibrosis for computer-aided diagnosis: a comprehensive tissue structure segmentation method. Comput Methods Programs Biomed 155:109–120
Acknowledgements
Authors would like to thank Darshan Kumar and Andreea Luchian from Aiforia for their continuous technical support during the development of the algorithm.
Funding
Open access funding provided by Università degli Studi di Milano - Bicocca within the CRUI-CARE Agreement. The work has been funded by the European Union—Next Generation EU—NRRP M6C2—Investment 2.1 Enhancement and strengthening of biomedical research in the NHS (DIPLOMAT—PNRR-MR1-2022–12375735) and by the Italian Ministry of Health under the grant “Ricerca Finalizzata” GR-2021–12374235 (PROFOUND).
Author information
Authors and Affiliations
Contributions
AE, VL, APDT and FP defined the study design; SG, SM, FA, GC and FDC scanned the glass slides from the different centers; GC, FDC, VL, FP, SG performed the annotation to train and validate the algorithm; AB, JUB and GG contributed with nephrology and nephropathology external perspective; VDM and APDT contributed with their specialist point of view in digital pathology; LP critically revised the phases of the study and the final manuscript; AB, GG, FA, VL and FP provided the funding acquisition and administrative support. All authors were involved in writing the paper and had final approval of the submitted and published versions.
Corresponding author
Ethics declarations
Competing interests
Authors do not have conflicts of interest to declare.
Ethical approval
This study complies with the Declaration of Helsinki and was performed according to ethics committee approval (PNRR-MR1-2022–12375735, 03/16/23).
Informed consent to participate
Informed consent was waived because of the retrospective nature of the study and the use of anonymous clinical data.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Eccher, A., L’Imperio, V., Pantanowitz, L. et al. Galileo—an Artificial Intelligence tool for evaluating pre-implantation kidney biopsies. J Nephrol (2024). https://doi.org/10.1007/s40620-024-02094-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s40620-024-02094-4