Introduction

Artificial Intelligence (AI), described as the science and engineering of making intelligent machines able to mimic human intelligence and to learn, was officially introduced in 1956 with the invention of robots. Recently, AI has evolved considerably and has become a basic tool in many sectors, such as banking [1], agriculture [2], and medicine [3]. It also provides a significant contribution in decreasing the involvement of humans in critically dangerous activities [4, 5]. After the explosion of numeric data availability, and the ability of AI algorithms to integrate and learn from large datasets, AI has been largely applied in clinical decision-making, biomedical research, and medical education [6]. The US Food and Drug Administration (FDA) and other regulatory agencies have allowed clinicians to use AI-based tools in several medical fields [6, 7]. Currently, AI can be used for routine detection of diabetic retinopathy without the need for ophthalmologist confirmation. AI applications also extend into the physical realm with robotic prostheses, physical task support systems, and mobile manipulators assisting in the delivery of telemedicine. Many endoscopy manufacturers have launched their AI devices on the market with regulatory approval in Europe and Asia [8]. Nephrology seems to have all the assets to lend itself to AI experiments and advances. The kidney transplantation (KT) field is taking the lead in the use of AI in Nephrology. There is a large number of existing studies in the literature interested in the application of AI in KT in its different aspects. Some authors used machine learning (ML) to predict bioavailability of tacrolimus during the immediate post-transplant period and to estimate the risk of post-transplant diabetes mellitus. These outcomes were predicted based on ABCB1 and CYP3A5 genetic phenotypes, age, gender, and body mass index [9]. Predicting KT outcomes using data-driven approaches has drawn the interest of many researchers. Senanayake [10] and Sekercioglu [11] recently reviewed the ML models used in the field of KT. Actually, ML is a subtype of AI commonly used for prediction tasks. The first review was published in 2019 and covered eighteen studies that developed ML-based models to predict short- and long-term KT outcomes in adult patients. These studies were performed in the US, Iran, Italy, UK, Australia, Korea, Belgium, Germany, and Egypt. The second review was published in 2021, and the authors reviewed ML studies for predicting long-term kidney allograft survival. They identified eleven studies, most of which are case studies and pilot projects. Very few of them resulted in approved tools being officially introduced into daily practice.

This paper covers the AI basics, core concepts, and challenges, after which we focus on the KT field. We review the related studies and summarize the predictive factors to help nephrologists quickly concentrate on the most relevant work.

Core concepts

The ability to supervise the development of AI tools and their use will become a must-have skill for Nephrologists in the near future [12].The first step to understanding AI methods is to familiarize oneself with the basic concepts and the terms in use. In this section, we provide a precise and simplified explanation of the core concepts of AI useful to healthcare practitioners which will help them adequately understand how predictive models are created so that they can: (1) evaluate the models critically; (2) participate actively to minimize current limitations; and (3) collaborate with computer scientists and data scientists and take actions in order to meet the current needs in their field.

A summary of the basic terms used in the AI-related published medical articles, is listed in Table 1.

Table 1 Glossary of commonly used terms in artificial intelligence applied to healthcare

Big data

Big Data is data with large size and high complexity. The concept of big data includes an ensemble of techniques used to collect, store, analyze and manage an immense volume of both structured and unstructured data that is beyond the ability of traditional data management tools [13].

There are many types and structures of data that can be used in AI. Algorithms can learn from structured data, which is data that adheres to a pre-defined model and is therefore ready to analyze. Structured data conforms to a tabular format with a relationship between the different rows and columns. Excel files are common examples of structured data with structured rows and columns that can be sorted. Training data can also be unstructured, and information either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information may contain photos (e.g., Computed Tomography images, X-ray images, pathology images, etc.), videos, audio files, or text (e.g., medical record, datasheet, etc.). Machines cannot read texts and images. The input data need to be transformed or encoded into numbers. These numbers will be presented as vectors and matrices, so that they can be used to train and deploy the models. For example, in ML an image is considered an ensemble of pixels.

Artificial intelligence

AI is a branch of computer science that implies the use of a computer to model intelligent behaviors with minimal human intervention. AI started with the invention of robots [14], however, it evolved to cover a multitude of other branches (see Fig. 1).

Fig. 1
figure 1

Branches of artificial intelligence

Machine learning

ML is a branch of AI. It focuses on developing computer programs that can access data and use it to learn from, without being explicitly programmed for a specific task. This property makes ML fundamentally different from classic statistics [15]. ML uses a set of algorithms to analyze, interpret, and learn from a given set of data, and based on the learnings, make the best possible decisions.

An algorithm is a set of rules that precisely defines a sequence of operations. ML algorithms learn from data without human intervention. The algorithm is fed with data from which it learns and adapts without following explicit instructions. It analyzes the dataset and draws inferences from patterns in the data.

For example, if we want to predict 10-year kidney graft survival (the output), we provide the algorithm with a database with many variables such as recipient age, gender, history of rejection, infections, etc. and each KT (instance) is labeled: survived/failed by 10 years. The algorithm uses the provided data to detect the function that matches the input variables to the output values. After that, the trained algorithm generates a model capable of predicting the output for new input values different from the training data.

The input to any ML algorithm is called predictors/features, and the output from the algorithm is referred to as a target/label.

Supervised/unsupervised learning

In ML, there are two main types of tasks: supervised learning and unsupervised learning.

Supervised learning requires prior knowledge of the output values; therefore, the goal is to determine a function that best approximates the relationship between input and output, given a sample of data and the desired outputs (labels). Since kidney allograft biopsy contextualization will be based on ML in the upcoming Banff classifications [16], we will explain the concepts of supervised and unsupervised learning using similar examples. For example, to train the machine to classify a given image from kidney allograft biopsy, we input multiple specimens with known labels. The label of each image will be one of the six categories of the Banff classification [17]. Then the trained model will predict the category of a new input image. For this, the machine generates the output as a vector of scores: one score for each category. The goal is for the desired category to be assigned the highest score after training. An objective function that measures the error (or distance) between the output scores and the desired pattern of scores is computed. The machine then modifies its internal parameters to reduce this error. Farris et al. used supervised learning to develop a model for kidney allograft image analysis for evaluation and fibrosis quantification with satisfactory accuracy levels [18].

Supervised learning is applicable in the context of classification when we want to map the input to output classes, such as predicting whether the graft will survive or not [19] or classifying images into different categories [18]. Supervised learning can also be applied in the context of regression when we want to map the input to a continuous output, such as predicting the estimated glomerular filtration rate [20].

Unsupervised learning, on the other hand, does not have labeled outputs, so its goal is to infer the natural structure present within a set of data points. The most common task within unsupervised learning is clustering where we wish to learn the inherent structure of our data without using explicitly-provided labels. If we take the same previous example of the kidney allograft biopsy analysis with unsupervised learning, we will provide the algorithm with a set of images with no label, then the machine will infer the patterns in the images and will automatically divide them into groups with similar features (categories).

Deep learning

Conventional ML algorithms are limited in their ability to process data in their raw form. For several years, constructing a ML model required considerable domain expertise and meticulous engineering to implement a feature extractor in order to transform the raw data (e.g., the pixel values of an image) into a suitable internal representation or feature vector from which the learning algorithm, often a classifier, could detect or classify patterns in the input.

Deep learning is a learning method where a machine can be fed with raw data and automatically discover the representations/features needed for detection or classification. Suppose that a model wants to predict if an image contains a malignant tumor. The algorithm will learn from data (mammogram images) and try to find the patterns (features) present in the images labeled as containing a malignant tumor.

Deep learning structures the algorithm into multiple layers to create an artificial neural network (ANN). ANNs are algorithms that mimic human brain structure. An ANN has one input layer, optional hidden layers, and one output layer. Layers are rows of so-called “Neurons”. The number of neurons in each layer, the number of layers, and the type of connections between the layers (fully connected/not fully connected) are modifiable parameters for each ANN. In Fig. 2 we present an example of an ANN aiming to predict kidney graft survival. Deep Learning is called “deep” because of the additional layers added to learn from the provided data. In an ANN the input layer takes the input signals and passes them to the next layer. Several weights are applied within the nodes of the hidden layers. Weights define the importance of a feature in predicting the target value. For example, a single node may take the input data and multiply it by an assigned weight value, then add a bias before passing the data to the next layer (input × weight + bias = output). The final layer of the neural network, the output layer, uses the inputs from the hidden layers to produce the desired output. When a deep learning model is learning, it is simply updating the weights through an optimization function. Through these transformations, the machine will learn complex functions. For classification tasks, the layers of representation amplify aspects of the input that are important for discrimination and suppress irrelevant variations. This helps the system to understand the complex perception tasks with maximum accuracy. Deep learning requires much more data than a traditional ML algorithm to function properly.

Fig. 2
figure 2

Architecture of an artificial neural network (ANN) for predicting kidney graft survival: a deep neural network with one input layer, two hidden layers, and one output layer. In this network, each neuron (N) of a layer is connected to neurons in the next one, yielding a fully connected network

In their recent study, Kers et al. used deep learning to classify the histology of kidney allograft biopsies into a three-category output (normal/rejection/other diseases) using 5844 digital slide images of kidney allograft biopsies. Their model’s area under the curve reached 87% [21].

Barriers to the integration of artificial intelligence

AI applications have been validated as standard solutions for different tasks in many medical fields [6, 7]. Nephrology has all the characteristics to benefit from AI advances since patients are followed for several decades. There are enough universal recommendations and consensus to make the practice homogeneous, whether for dialysis, KT, or clinical nephrology. Actually, in many countries, nephrology has been digitalized for more than 20 years [22], which resulted in well-organized databases and easily exploitable data. Table 2 presents a number of nephrology registries worldwide.

Table 2 Examples of existing nephrology registries

The development and implementation of AI tools in healthcare are fundamentally different than the use of ML or big data in other fields. The limitations holding AI back from being fully integrated into the healthcare systems are mainly linked to the data structure, ethical challenges, and legal concerns [23]. These limitations can be categorized as follows:

  • Incompatible data formats

  • Unstructured datasets

  • High data sparsity

  • Lack of precision

  • Difficult data storage or transfer

  • Legal concerns

  • Heterogeneous data types

  • Large volumes of data

  • Data standardization (terminology, language …)

  • Data timelines, time-series, real-time analyses, etc

  • Lack of skills

  • Privacy protection

Actually, in healthcare, no two patient experiences are similar. Even at a standardized routine exam, two different doctors would likely record different data for the same patient. This problem is solved partially by the elaboration of international classifications and guidelines, such as the definition and classification of chronic kidney disease. Hence the importance of a homogeneous practice of nephrology worldwide [24].

Moreover, outcomes in healthcare such as kidney function or kidney graft survival are affected by complex parameters [25], most of which cannot be collected during a doctor visit. Some other data that affect the outcome of interest, if present in the record at all, are usually based on the patient’s imperfect recall and subjective description. Moreover, these clinical features may vary in diverse time scales, and this variability plays a vital role in indicating the health status. For example, intra-individual variability in kidney function biomarkers is associated with negative outcomes in terms of patient survival and renal survival [26].

Recent research may help overcome these issues. We cite the example of AdaCare which is a representation learning model that captures the variability of the biomarkers in the short and long term as clinical features to predict the health status at different time points [27]. It adaptively selects the clinical features that strongly indicate the health status of patients in diverse conditions and provides a personalized feature selection.

Data size is another concern in the healthcare field. ML shines when the model is trained with large databases [28]. In other fields data are easily collectible, sometimes with a simple click, such as the example of the Google ads model which is one of the most robust AI tools in the world. It is an AI model that determines when and where ads are shown for specific audiences and on specific pages. Data are collected when the client searches for something on Google and clicks on a result. Every step is captured as data [29]. Clinical datasets are inevitably far smaller which means less training data for algorithms to learn from. A randomized clinical trial aiming to collect high-quality data might involve less than 100 patients. In a systematic review of 40,970 clinical trials including 1054 nephrology trials, the authors found that compared with other specialties, nephrology trials were more likely to be smaller, with 64.5% of them enrolling less than 100 patients [30].

Bigger medical datasets do exist with millions of patients produced from imaging, electronic health record, telemedicine, genomic, and other sources of data [23]. However, poor quality remains the main issue with these datasets. They require rigorous data cleaning which is very challenging, though it reduces the data size considerably. Data cleaning is the process that ensures that datasets are correct, accurate, relevant, and consistent. Messy data can derail a big AI project, especially when disparate data sources are brought together [31]. While many data cleaning processes are still performed manually, some vendors do offer increasingly sophisticated data cleaning tools that use intelligent rules to correct large datasets which reduces the time and expense required to obtain high levels of integrity and accuracy in medical databases [32]. Recent research also proposed models to handle irregular medical records and extract feature interrelationships for individualized healthcare prediction [33].

Besides these data processing techniques, transfer learning can help to overcome the lack of data available for analysis. It is a method of ML where knowledge developed from previous training is recycled to help perform a new task.

Ma et al. [34] proposed a transfer learning framework, which leverages the massive publicly available online medical records, then learns to embed the medical features relevant to a specific task. Finally, the transferred parameters are further used for training. The authors applied the proposed framework for COVID-19 prognosis assessment and end-stage renal disease (ESRD) mortality prediction.

Another concern in the medical field is that the timelines are far longer than those in other sectors. In nephrology, we mainly deal with chronic diseases where our biggest concern is often chronic kidney disease and reaching ESRD. Similarly, in the KT field, interests have shifted to forecasting long-term outcomes [35]. An AI tool that is built to predict a long-term outcome will take years to begin to collect any feedback.

Not only is the nature of healthcare data more complex and variable, but ethical challenges exist as well, and include the cost of the error itself, interpretability issues, and patient privacy protection concerns.

Errors made by the models in industry sectors generally result in lost revenue, however in healthcare, mistakes are far costlier where the problem may be a question of life or death [36].

The lack of interpretability is another major ethical problem with ML algorithms.

ML aims to perform a prediction that is as accurate as possible, at the expense of clear interpretability. Broadly, interpretability is focused on finding an explanation for the decisions made by the models.

Most of the powerful ML algorithms operate as black boxes which raises reliability issues for both doctors and patients [37]. This issue may prevent the wide adoption of these methods by practitioners. It is easier for humans to trust a system that explains its decisions. On the other hand, it is hard to ignore the benefits of black box algorithms such as deep learning algorithms. Hence, we recommend using what works best after careful testing and large external validations [25].

Developing useful AI tools for healthcare is therefore challenging but the promise is enormous [38]. It is the way to make healthcare practice benefit from the vast amounts of data and experiences generated daily. An AI model is able to analyze and learn from the experience of millions of patients and the knowledge of thousands of clinicians, thus dramatically improving diagnosis and treatment. AI tools in healthcare will never replace physicians. Instead, they will help them do more than they could before.

Kidney transplantation outcomes: unmet needs and potential role of artificial intelligence

Since the early 1980s, short-term outcomes of KT have markedly improved due to the advancement of surgical techniques and immunosuppressive drugs; however, when it comes to long-term outcomes, no significant improvement has been achieved since the 2000s. Interest has now shifted to forecasting long-term patient and graft survival after KT [39,40,41].

Many factors such as delayed graft function (DGF) due to ischemia reperfusion injury, acute rejection (AR) and more particularly antibody-mediated rejection (AMR), chronic allograft nephropathy, and morbidities related to immunosuppressive treatment are blamed for the lack of long-term improvements in terms of patient and graft survival.

Table 3 provides an overview of the published studies aiming to predict these complications with AI: the predicted outcome, year, sample size, and findings of the studies in terms of predictors and performance measures of the predictive models.

Table 3 Published studies using artificial intelligence for predicting the main kidney transplant-related complications

Delayed graft function

There are many definitions of DGF in the literature but the most commonly adopted one is that of the United Network for Organ Sharing (UNOS), which is “the need for dialysis at least once within the first seven days after transplantation, indicated outside the context of hyperacute rejection, vascular or urinary tract complications, or hyperkalemia” [42].

DGF contributes to poor long-term gains [43], and its impact on KT outcomes is expected to grow as the use of marginal kidneys increases due to organ shortages.

DGF is associated with a significant reduction in the graft half-life. In an American cohort including more than 65,000 KTs, the half-life was 11.5 years in the absence of DGF graft function versus 7.2 years in cases involving DGF [44].

To date, no treatment or therapeutic strategy has become standard of care in the prevention or treatment of DGF. An accurate prediction of DGF can help establish an effective preventive strategy based on the predictors selected by the ML algorithm. Such a model may be beneficial not only in better graft allocation but also in determining the factors that predict DGF which enables interventions to prevent the modifiable ones.

Many authors used AI (ML) for predicting DGF (Table 3). However, most of the published studies did not generate an approved predictive model which can be introduced into daily practice. Such achievement can be obtained with the use of large, high-quality datasets including the relevant variables needed to predict DGF. The rate of DGF may also be reduced with more robust kidney graft allocation systems, which is also achievable with AI.

Antibody-mediated rejection

Several studies have evaluated the impact of AR on long-term graft survival, and it has been demonstrated that an AR episode is a major risk factor of chronic graft dysfunction and graft failure [45,46,47]. The FDA held an open public workshop in June 2010 to discuss the challenges in the treatment of AMR and highlighted the need for a clinical trial design aimed at improving the long-term outcomes [48]. In April 2017, another workshop was held to discuss new advances in AMR and the challenges of clinical trial design for its prevention and treatment [49]. Such trials can now be performed thanks to the advances of AI [50].

Shaikhina et al. used a very small dataset (80 KTs) for predicting acute AMR at 30 days post KT using ML algorithms, and their model had an accuracy of 85% [51].

Despite the decrease in its incidence AR is still a major issue because of the high rates of subclinical rejection which is detectable only by protocol biopsies [52]. AI and more particularly, deep learning, can help in extracting the features of this subtype of rejection. AI was introduced into the Banff classification in 2019 and its contribution was discussed with regards to form image recognition and rejection type recognition [16].

Graft survival

The definition of long-term kidney graft survival is not unanimous in the literature [53]. Several thresholds have been used in the published studies (3 years, 5 years, 10 years, etc.) [25, 53]. All practitioners desire a graft that remains functional for life, hence the interest of extending the period required to judge prolonged survival.

For long-term graft survival, the main endpoint over time in some of the published studies was defined as the time of graft failure by either a return to dialysis or retransplantation [54]. Other studies developed models for a combined outcome of graft failure and death (graft and patient survival) [55]. The challenge in predicting and improving long-term graft survival is: (1) the multiplicity and the complexity of the involved factors; (2) the lack of study designs that can address this need. These limitations can be overcome thanks to the ML ability to integrate and learn from large and complex datasets, and its powerful prediction ability. The predictive models can be used as surrogate endpoints in the clinical trials on long-term outcomes (see Sect. 4.5).

Patient survival

Patient survival after KT is also far below that of the general population. The leading causes of death in the KT population have changed in the past few years. Even if cardiovascular disease and infections are still the main causes of death in this population, higher rates of death from malignancy are observed, even overtaking cardiovascular disease in some series. Infections and malignancies in KT are primarily due to immunosuppressive therapy. Cardiovascular complications can also be partly linked to immunosuppressive drugs. Hence, there is a vital need for new therapies in KT.

Development of new treatments and new study designs

The development of innovative therapies that are safer and better able to prevent DGF and reduce AMR is critical to improving long-term graft and patient outcomes.

The main barrier to the development of new drugs is the lack of acceptable new study designs that can address the current needs in KT. The main end-points accepted by the FDA for the past 20 years have been the incidence of biopsy-proven rejection, as well as 1-year patient and graft survival. These end-points, however, are insufficient to assess the long-term impact of the drugs. The traditional end-points have now forced non-inferiority trial designs, given that short-term outcomes are relatively good. Long-term efficiency assessment requires clinical trials with a follow-up period of 5–10 years. The extended survey periods result in an inefficient return on investment for pharmaceutical companies; therefore, the regulatory agencies do not enforce them. Moreover, long-term studies impose delays in offering potentially beneficial treatments to transplant recipients. This led two of the main regulatory agencies worldwide, the FDA and the European Medicines Agency (EMA), to emphasize the need for an early and powerful alternative tool in KT that pertinently predicts long-term outcomes [56]. AI has been used to meet this requirement. The iBox, which is a validated AI-based predictive model, has been approved as an alternative endpoint of long-term kidney graft survival. It has been applied in the large randomized controlled trial TRANSFORM in KT in order to project the long-term risk of kidney graft failure up to 11 years post-randomization using the 1-year post-randomization validated data [25, 50].

Conclusions

The areas of application of AI in the world are expanding exponentially. Nephrologists will have to interact with AI in their daily practice in the near future; however, the nephrology community needs to be well-informed regarding this technology. AI has the potential to help them reach the unmet needs in the field by enabling accurate predictions and data analysis of the use of conventional statistics, especially in this era of data abundance, by capturing complex relationships among large datasets with a large number of variables. With the existing KT databases and registries, AI technologies seem to be the best solution to meet current gaps, especially long-term outcomes. In order to generalize the use of AI in nephrology, nephrologists worldwide are required to understand the core concepts of AI and its subtypes in order to understand how the models are created so that they can evaluate them critically and participate actively to minimize current challenges.