FormalPara Key Points

Bayesian network as a new approach for therapeutic targetting.

Individual prediction of first cyclosporine therapeutic range achievement after pediatric HSCT.

Inclusion of many clinical and biological variables, including inflammation level through CRP value in a cycloporine concentration predictive model after pediatric HSCT.

1 Introduction

Hematopoietic stem cell transplantation (HSCT) allows the treatment of numerous diseases in both children and adults, such as hematologic malignancies, non-malignant hemopathies, and severe immunodeficiencies [1, 2]. HSCT became possible in the 1960s, after identification and typing of the human leukocyte antigen (HLA) [3].

In 2015 in France, 5099 HSCTs were performed in 78 centers, and this number has been constantly growing in the last ten years (from 4201 HSCTs in 2006) [4]. Immunosuppressive drugs including cyclosporine are given to prevent graft-versus-host disease (GVHD) [5].

Cyclosporine is a drug with narrow therapeutic index: underdosing may lead to severe GVHD, and overdosing to toxic events or poor graft-versus-leukemia (GVL) effect [5,6,7]. Thus, therapeutic drug monitoring (TDM) is required and dosage regimens are adapted to reach a target cyclosporine minimal blood concentration value, chosen according to patient’s characteristics and disease [6, 8].

Many studies have been published on inter-individual pharmacokinetic variability of cyclosporine and Bayesian methods used to optimize dosing regimens. In all cases, population pharmacokinetic models were built, not individual predictive models [5, 6, 8]. To date, no studies have been conducted to predict the achievement of the therapeutic target of the initial cyclosporine blood concentration after HSCT.

In medicine, many methods have been used to predict the value of a target variable, for example, logistic regression in dementia diagnosis [9], Bayesian network models to understand visual field deterioration [10], support vector machines (SVM) for cancer detection [11], and random forest analysis to predict FOLFOX (folinic acid, fluorouracil, and oxaliplatin) responders in colorectal cancer [12]. These new methods seem promising in many fields, but they have scarcely been explored in dosing regimen forecasting [13].

In this study, our objective was to predict whether the initial cyclosporine residual blood concentration after pediatric HSCT will be in the therapeutic range using a mathematical individual predictive model.

2 Patients and Methods

The National Committee of Informatics and Liberty approved the current study.

Clinical and biological data were collected from the day of graft infusion to 2 months after transplantation in 155 pediatric patients undergoing HSCT both for malignant and non-malignant diseases between 2008 and 2016. Cyclosporine was given via a 2-h infusion twice daily, initially 3 mg/kg for malignant diseases and 5 mg/kg for non-malignant diseases, starting the day before HSCT. The first measure of cyclosporine trough concentration was performed on whole blood around the third day after the graft infusion (D + 3) (varying between day 0 [D0] and the seventh day after graft infusion [D + 7]; median D + 3; 83.2% of measurements taken on D + 3).

Six patients were excluded from the study as they were given cyclosporine for several weeks before transplantation or took it orally. Fourteen relevant variables that could have possibly influenced cyclosporine concentrations after the first dose post-transplantation were recorded: death, sex, weight, post-transplant day, age, mean dose before measurement, last dose before measurement, HSCT indication, documented infection or not, drugs interacting with cyclosporine, cholestasis, C-reactive protein (CRP) above 15 mg/L, mean hemoglobin, and mean creatinine (see Fig. 1).

Fig. 1
figure 1

Bayesian network model to predict the ability of the initial residual blood concentration of cyclosporine after hematopoietic stem cell transplantation to be in the therapeutic range. CsA cyclosporine, CYP cytochrome P450, F female, HSCT hematopoietic stem cell transplantation, M male

All continuous variables were discretized from the literature (initial concentration after transplant, age, hemoglobin, creatinine, and CRP) or from machine learning (other variables such as standard deviations and weight; Netica software version 5.23 [Norsys Software Corp., Vancouver, BC, USA]). The initial cyclosporine residual concentration after transplant was discretized in or out of the therapeutic range, according to the graft indication (malignant or not). The interval retained was 120 ± 20 mg/L for malignant disease and 150 ± 20 mg/L for non-malignant disease [2]. Age was separated into four classes (from 0 to 2, 2 to 8, 8 to 14, and > 14 years) in accordance with cytochrome P450 (CYP) 3A4 maturation [14,15,16,17,18,19,20]. Hemoglobin was discretized into three classes (< 80, from 80 to 110, and > 110 g/L) [21]. Serum creatinine was separated into three classes (high, normal, low) according to the patient age and sex. Values considered as normal were 18–26 µmol/L for patients less than 3 years old, 26–44 µmol/L in patients between 4 and 7 years old, 53–71 µmol/L in patients between 8 and 10 years old, 71–80 µmol/L in females 11 years or older, and 80–106 µmol/L in males aged 13 years and older [22]. Plasma CRP values above 15 mg/L signaled inflammation [23, 24]. Cholestasis was defined by a bilirubin plasma concentration above 50 µmol/L [25]. The interacting drugs considered were those that are most used in HSCT: all proton pump inhibitors (PPIs), macrolide antibiotics (including pristinamycine and quinupristine, but excluding spiramycine), all azole antifungals, and calcium channel blockers (nicardipine and amlodipine, but not lacidipine) [26, 27].

To generate synthetic data, we used multiple imputation models and, more precisely, data augmentation methods developed by Schafer [28] and Little and Rubin [29]. We generated a dataset of 1000 virtual subjects from the available patients’ data (n = 105) with Bayesian multiple imputation [30, 31]. We also performed Chi-square tests for categorical variables and t tests for continuous variables to assess if the synthetic data were representative for our database. To assess the performance and the robustness of the tree-augmented Naïve (TAN) Bayes algorithm, we compared it to four other methods: logistic regression, Naïve Bayes, SVM, and random forest. The criteria retained for comparison were the area under the curve of the receiving operator characteristics curve (AUC-ROC), the percentage of incorrect patient classification, the positive predictive value, and the negative predictive value, both on training data and after a ten-fold cross-validation.

To test the sensitivity of these methods to missing data, we created different ranges of randomized missing data in the synthetic population: 1, 3, 5, 7, 10, 15, and 20%.

The software used for logistic regression, Naïve Bayes network, SVM, and random forest was Orange (version 3.8.1; https://orange.biolab.si/), whereas RapidMiner (version 7.4; RapidMiner, Inc., Boston, MA, USA) was used for TAN Bayes network.

The final TAN Bayesian network model to predict whether the initial CsA residual trough blood concentration after HSCT will be in the therapeutic range was built using Netica software (version 5.23).

To explore the modalities of use of the online model, we performed an analysis of sensitivity to our findings to assess the network ability, as a result of shared information, to compensate for missing variables such as cyclosporine residual concentration, mean dose, mean hemoglobin, mean creatinine, CRP, interacting drugs, cholestasis, documented infection, and weight (Netica software version 5.23).

3 Results

Of the 155 patients in total, six were excluded as they were given cyclosporine for several weeks before HSCT or they took cyclosporine orally. Forty-four patients were also excluded from the generation of synthetic data due to one or several missing data at the time of blood sampling for the initial determination of the cyclosporine residual blood concentration following HSCT. A final total of 105 patients without missing data at this time were retained (see Fig. 2). The characteristics of the 105 patients are summarized in Table 1.

Fig. 2
figure 2

Flow chart. CsA cyclosporine, HSCT hematopoietic stem cell transplantation

Table 1 Population characteristics at initial time blood sampling for cyclosporine residual concentration after hematopoietic stem cell transplantation

Of these 105 patients, 37 had initial cyclosporine concentration values within the therapeutic interval, and 41 and 27 had sub-therapeutic and over-therapeutic cyclosporine residual concentration values, respectively. The HSCT indication was acute lymphoid leukemia (34%), inborn error (18%), immune deficiency (16%), severe aplastic anemia (9%), acute myeloid leukemia (6%), lymphoma (6%), myelodysplastic syndrome (6%), severe hemoglobinopathies (3%), and chronic myeloid leukemia (2%). Only one patient presented a cholestasis (total bilirubin over 50 µmol/L) at this time. Thirty-eight patients took interacting medications: two took azole antifungals, two macrolides, 32 PPIs, one a calcium channel blocker, and one both a PPI and a calcium channel blocker. The youngest patient was 5 months old, the oldest was 21 years old, and the median age was 7 years.

3.1 Modelling Procedure

We assessed that the synthetic dataset was representative for our database by a visual check of the probability densities distribution, and by comparing the variables of interest (Table 1).

Sensitivity for missing data of the synthetic dataset was evaluated for the five methods and illustrated with AUC-ROC evolution after a ten-fold cross-validation. With no missing data, the TAN Bayesian network was the best method (AUC-ROC 0.893), and SVM was the worst (AUC-ROC 0.673) (Table 2). The TAN Bayesian network was the least robust (AUC-ROC from 0.893 with no missing data to 0.7 with 20% of missing data, resulting in a loss of 0.193 points), but presented the best characteristics since 10% of missing data in the synthetic dataset (AUC-ROC 0.785, 25.8% incorrectly classified patients, and a predictive positive value of 0.74).

Table 2 Characteristics of the five methods with no missing data after ten-fold cross-validation

3.2 Validation of the Predictive Model

The final predictive model for the ability of the initial cyclosporine trough blood concentration after HSCT to be in the therapeutic range, after a tenfold cross-validation, presented good validation characteristics with an AUC-ROC of 0.89 ± 0.02 (see Fig. 3), 18.9 ± 2.6% of incorrectly classified patients, a positive predictive value of 85.9 ± 3.4% and a negative predictive value of 74.2 ± 5.1%.

Fig. 3
figure 3

Receiving operator characteristics curve of the final Bayesian network

3.3 Sensitivity to Missing Data of the Final Tree-Augmented Naïve (TAN) Bayesian Network

The results of the analysis of sensitivity to findings that was performed on the synthetic population are shown in Fig. 4. The most relevant variables that could influence whether the initial residual blood cyclosporine concentration was in the therapeutic range were the last dose before measurement, the mean dose before measurement, and the day of blood sampling after HSCT. Some variables were required for the prediction and could not be derived from other ones. This was especially the case for interacting drugs, mean hemoglobin, and mean creatinine, whereas a missing weight could adequately be replaced by information on the last administered dose, mean dose before measurement, age versus CYP3A4 maturation, mean hemoglobin, and mean creatinine versus age category.

Fig. 4
figure 4

Relative percentage of missing information compensated by the final network. CRP C-reactive protein, CsA cyclosporine, HSCT hematopoietic stem cell transplantation

The final model can be found at https://www.hed.cc/?s=HSCTCsApredictor&t=HSCT CsA predictor.

4 Discussion

In our pediatric population undergoing HSCT who were receiving cyclosporine to prevent GVHD, TAN Bayesian network was the best method to predict whether the initial cyclosporine concentration after HSCT will be in the therapeutic range. The therapeutic range of the cyclosporine blood concentration rather than cyclosporine blood concentration itself was chosen as the endpoint because it seemed the most clinically relevant and simplest to handle for clinicians while using an online Bayesian network. This tool will allow them to find the best cyclosporine dosing regimen to reach the therapeutic interval right after transplantation, using a simple method, provided that further external validation is satisfactory.

Many classification and prediction methods are used in medicine, and we chose to limit our study to five that are widely used: one parametric (logistic regression) and four non-parametric (Naïve Bayesian network, TAN Bayesian network, SVM, and random forest).

Belief networks are directed acyclic graphs that represent a probabilistic structure. There is a direct relationship between the graphical model and a particular probability distribution form. They consist of vertices (or nodes) and directed edges (arrows). Each edge points from one node (called the parent node) to another node (the child node). In a belief network each node is used to represent a random variable, and each directed edge represents an immediate dependence or direct influence. We used a learning algorithm to create a TAN Bayes graph structure in which a single-class variable has no parents and all other variables have the class as a parent and at most one other attribute as a parent. The probability tables are filled out using Expectation Maximization, as previously described in Sansot et al. [13].

Logistic regression, a widely used method in biology and medicine, allows the association between several variables to be quantified by estimating a probability. This technique takes confounding factors into account but requires the measures to be of good quality and may be disturbed by interactions between variables. It also cannot deal with missing data relating to continuous variables [32, 33].

SVM are learning algorithms with the ability to separate several variables into classes, with an optimal hyper-plane maximizing the margin. SVM are well-suited to work involving high dimensional data, and have robust performances when dealing with sparse and noisy data [34].

Random forest is a method based on the generation of multiple trees by bootstrapping. For each node of each tree, the best predictor is chosen. The final result given by random forest is the mean of all the individual trees [35].

To compare these methods, we had to ensure optimal conditions that avoided variance homogeneity issues, lack of statistical power, and sensitivity towards missing data, which can influence parametric methods. That is why an approach based on synthetic data was preferred. After cross-validation, TAN Bayesian network was the best method according to the comparison of the AUC-ROC curves and its very good positive predictive value.

Moreover, the synthetic data approach allowed us to test the influence of missing data, taking of statistical power issues into account. The trend in which TAN Bayesian network is the best method was also verified even in the context of 10% missing data.

Our approach testing the sensitivity of our method with regards to missing data seems very important as data in clinical practice are often incomplete because of real-life conditions.

The TAN Bayesian network, in addition to its other positive characteristics, is also easier to use and understand by clinicians than other methods such as random forest. Indeed, its representation as a directed acyclic graph allows a quick and simple answer to a question, which is consistent with what is required in clinical practice.

The final TAN Bayesian network shows, as it did in the simulation approach, good validation characteristics.

This model is intended to be online, and thus it has to respond to two major concerns: sensitivity for missing data and external validation. The sensitivity analysis performed informs us on the importance of the information provided by some variables to the network.

Information on variables which cannot be compensated for by other variables has to be filled in the online network; these indispensable variables will be listed in the user manual of the online model. This model seems very interesting for clinicians as it would allow them to predict whether patients will reach the therapeutic range of concentrations immediately after HSCT or not, and consequently a priori change the cyclosporine dose regimen. However, an external multicentric validation is warranted for this model before its use in clinical practice.

Over the past 50 years HSCT has been a huge progress in medicine, and consequently TDM of immunosuppressants became a great concern. As cyclosporine is the main immunosuppressant used after HSCT, many researchers have worked on the link between the concentration of cyclosporine and the onset of GVHD [36,37,38,39]. Malard et al. [39] have shown that trough blood cyclosporine concentrations during the first week after HSCT were correlated with the onset of grade III–IV acute GVHD, but no increase in overall survival was found. However, in this study, the cyclosporine target concentrations were high, inhibiting a possible GVL effect, and exposing patients to cyclosporine adverse effects [39]. Martin et al. [2] also found this trend in a pediatric population. In this way, our model could be useful, allowing the therapeutic interval to be reached early after HSCT, and it could be evaluated for its use in improving the onset of moderate acute GVHD in malignant diseases (optimizing the GVL effect) or decreasing the rate of severe GVHD in non-malignant diseases.

The first studies only took into account the serum creatinine and the trough blood cyclosporine concentration when adapting the cyclosporine dosing regimen [37, 38], but complex models taking into account many covariates such as hemoglobin, hematocrit, hepatic functions (through bilirubin, alanine aminotransferase [ALAT], aspartate aminotransferase [ASAT], albumin), and creatinine have been developed in more recent years [40]. However, to date, our model is the first to include so many variables at the same time to predict the achievement of the cyclosporine therapeutic index soon after HSCT. Moreover, the role of inflammation on cytochromes, notably CYP3A4, and subsequent inhibition of metabolism of drugs such as cyclosporine is well-known [41]. Inflammatory episodes are frequently associated with HSCT, for example post-transplant infections. Consequently, the inclusion of inflammation parameters in the model is relevant. Nevertheless, our study is the first to take some of these parameters into account.

4.1 Limitations

It should be noted that some other variables may influence cyclosporine trough blood concentrations, but these are not available nor considered in clinical practice, and thus this model does not include them.

Cyclosporine trough blood concentrations present a high inter- and intra-individual variability and their evolution over the treatment period is a complex phenomenon. The possibility to consider many variables at the same time means that there is real hope that Bayesian networks will aid establishment of useful models for clinicians.

This model is interesting but it concerns only the initial cyclosporine concentration after pediatric HSCT. External validation on another cohort of patients will be necessary before it can be used in clinical practice.

Another limitation is the therapeutic range of cyclosporine considered in our study. Indeed, therapeutic cyclosporine blood concentration target ranges have been set only according to the graft indication, since the concentration of interest was the first one after transplantation. If a longer period after HSCT is considered, occurrence of GVHD and other target cyclosporine concentration values will have to be taken into account. Moreover, Jacobson et al. [42] have shown that the number of days post-transplant influences cyclosporine pharmacokinetics, which is why it would probably be interesting to develop a model covering a larger duration after HSCT in order to take into account the intra-individual variability of cyclosporine pharmacokinetics.

5 Conclusion

In this study we have developed and validated an online Bayesian network to assess the ability of the initial cyclosporine blood concentration to be in the therapeutic interval after pediatric HSCT. This model allows the simulation of different dosing regimens, finding the best to enable the right cyclosporine residual concentration to be reached immediately after transplantation, minimizing the risk of adverse effects and GVHD occurrence. The characteristics of this model are satisfactory but external multicentric validation is needed before its use in clinical practice.