The role of the comprehensive complication index for the prediction of survival after liver transplantation

In the last years, several scoring systems based on pre- and post-transplant parameters have been developed to predict early post-LT graft function. However, some of them showed poor diagnostic abilities. This study aims to evaluate the role of the comprehensive complication index (CCI) as a useful scoring system for accurately predicting 90-day and 1-year graft loss after liver transplantation. A training set (n = 1262) and a validation set (n = 520) were obtained. The study was registered at https://www.ClinicalTrials.gov (ID: NCT03723317). CCI exhibited the best diagnostic performance for 90 days in the training (AUC = 0.94; p < 0.001) and Validation Sets (AUC = 0.77; p < 0.001) when compared to the BAR, D-MELD, MELD, and EAD scores. The cut-off value of 47.3 (third quartile) showed a diagnostic odds ratio of 48.3 and 7.0 in the two sets, respectively. As for 1-year graft loss, CCI showed good performances in the training (AUC = 0.88; p < 0.001) and validation sets (AUC = 0.75; p < 0.001). The threshold of 47.3 showed a diagnostic odds ratio of 21.0 and 5.4 in the two sets, respectively. All the other tested scores always showed AUCs < 0.70 in both the sets. CCI showed a good stratification ability in terms of graft loss rates in both the sets (log-rank p < 0.001). In the patients exceeding the CCI ninth decile, 1-year graft survival rates were only 0.7% and 23.1% in training and validation sets, respectively. CCI shows a very good diagnostic power for 90-day and 1-year graft loss in different sets of patients, indicating better accuracy with respect to other pre- and post-LT scores. Clinical Trial Notification: NCT03723317.


Introduction
In the last years, several scoring systems have been developed with the intent to predict early clinical course after liver transplantation (LT). The model for end-stage liver disease (MELD) is recognized as the most accurate liver allograft allocation model, and it prioritizes patients according to the severity of their disease [1,2]. However, several studies have shown that MELD alone fails to predict early post-transplant survival rates [3,4]. Consequently, other scoring systems based on pre-or post-transplant available variables have been developed to identify cases with a high risk for transplant failure. Among them, the "pre-transplant" scores D-MELD and balance of risk (BAR) [5,6], and the "post-transplant" score early allograft dysfunction (EAD) [7] proved to predict post-transplant survival satisfactorily.
Recently, the comprehensive complication index (CCI) has been developed to assess the actual complication rate after surgery [8]. Some reports have shown excellent prognostic power of this score in different fields [9][10][11]. No study to date has investigated the role of CCI in 90-day and 1-year prognostication of graft loss after LT.
This study aims to compare the abilities of CCI vs. other commonly adopted pre-and post-transplant scoring systems in diagnosing 90-day and 1-year post-transplant liver graft loss. The diagnostic capabilities were investigated in a training set and validated in a validation set.

Materials and methods
Training set was generated retrospectively analyzing 1262 patients undergoing a first LT during the period January 1, 2005-December 31, 2016. Four European collaborative LT Centres were involved in creating the Training Set, namely the Polytechnic University of Marche, Ancona (Italy), Université Catholique de Louvain, Brussels (Belgium), Sapienza University, Rome (Italy), and University of Padua (Italy). Exclusion criteria for patient selection were: (a) living donation, (b) combined transplant, (c) domino transplant, and (d), pediatric (< 18 years) transplant.
Validation Set was created retrospectively analyzing the data of 520 patients transplanted during the same timeframe in the Karolinska Institute of Stockholm (Sweden). The same exclusion criteria were adopted. The study was registered at https ://www.Clini calTr ials.gov (ID: NCT03723317).

Definitions
Organ procurement was defined as "local" when done in the same region in which the LT was performed. All complications were graded according to the Clavien-Dindo Classification [12]. A web-calculator was used for estimating BAR and CCI (available at https ://www.asses surge ry.com/).
The CCI is based on the complication grading by Clavien-Dindo Classification and implements every occurred weighted complication (wC) after an intervention. Clavien-Dindo grade I corresponds to 8.7, grade II to 20.9, grade IIIa to 26.2, grade IIIb to 33.7, grade IVa to 42.4, grade IVb to 46.2, and grade V to 100. All the complications collected were summed, even if the same patient received several times multiple administrations of the same medical (i.e., blood transfusion) or interventional (i.e., various radiological or surgical approaches) treatment. The overall morbidity is reflected on a scale from 0 (no complication) to 100 (death).
Retransplantation during the first hospitalization was calculated as IVa (liver failure) plus IIIb (reoperation) complication. Multiorgan failure (MOF) was defined as the presence of at least two organ failures and ranked as grade IVb complication. Primary non-function (PNF) was identified as a liver failure observed for non-technical reasons within seven days after surgery and ranked as IVa complication.
EAD was defined according to the Olthoff criteria [7], and classified as grade II complication. Mild renal dysfunction was associated with a serum creatinine increase overpassing the threshold of 1.5 mg/dL but not requiring renal replacement therapy (RRT), and corresponded to a grade I complication. In the case of RRT, a grade IVa complication was defined. Myelotoxicity was defined as the presence of at least one of the following conditions: anemia (hemoglobin < 8 g/dL) in the absence of bleeding, leukopenia (< 3500/μL), or severe thrombocytopenia (< 30,000/μL), being classified as a grade I complication.

Statistical analysis
Continuous variables were reported as medians and interquartile ranges (IQR). Categorical variables were reported as numbers and percentages. Missing data always involved < 10% per variable and were handled using the maximum likelihood estimation method. Mann-Whitney U test was used for comparisons between groups in case of continuous variables, and Fisher's exact test was adopted in case of categorical variables.

3
A univariate Cox regression analysis was performed in the training set for the identification of the risk factors for graft loss. All the variables with a p value < 0.20 were introduced into a multivariable model. A multivariable Cox regression model was constructed adopting the backward conditional method [13]. Beta-coefficients, standard errors, the hazard ratio (HR), and 95% confidence intervals (95% CI) were reported.
C-statistics was used for comparing the diagnostic ability of different scores in terms of 90-day and 1-year graft loss in both the training and validation set. Specifically, CCI was compared with MELD, D-MELD, BAR, and EAD. Areas under the curve (AUC), standard errors, and 95% CI were reported. The following CCI cut-off values were investigated in the training set: first quartile, median, third quartile, and ninth decile. Sensitivity, specificity, and diagnostic odds ratio (DOR) were reported for each cut-off value. The higher the DOR value, the greater its discriminative power. The same CCI threshold values obtained in the training set were validated in the Validation Set. Graft survival rates were estimated with the Kaplan-Meier method; the log-rank test was used for evaluating survival differences. A p value < 0.05 was considered statistically significant in all analyses. Statistical reports and plots were performed using the SPSS statistical package version 24.0 (SPSS Inc., Chicago, IL, USA).

Results
The training and validation sets were composed of 1262 and 520 LT recipients. All grafts were procured from donors after brain death.
In the training set, the median follow-up was 3.7 years (IQR 1.

Baseline characteristics
The characteristics of the sets are displayed in Table 1.
In the training set, median lab-MELD was 15 points, with 161 (12.8%) patients showing a MELD ≥ 30. Median waiting time and age at LT were 4 months and 56 years, respectively. HCC was the main indication for LT in 527 (41.8%) patients. The main cause of the liver disease was HCV-related cirrhosis (35.7%). Median donor age was 57 years, with 328 (26.0%) and 77 (6.1%) donors exceeding 70 and 80 years. The leading brain-death cause was cerebrovascular accident (n = 785; 62.2%). In approximately half of the cases, the procurement was performed in a local hospital. Median cold and warm ischemia times were 7.2 h and 45 min. Median BAR score was 5; 77 (6.1%) and six (0.5%) recipients had a score exceeding 15 and 20, respectively.
In the validation set, median lab-MELD was 25 points, with 118 (22.7%) recipients presenting a MELD ≥ 30. Median waiting time and age at LT were 2 months and 54 years, respectively. HCC was the leading indication for LT in 131 (25.2%) patients. HCV-related cirrhosis was reported in 148 (28.5%) cases. Pathologies uncommonly reported in the Training Set were, on the opposite, commonly reported in the validation set: biliary pathologies like primary biliary cholangitis and primary sclerosing cholangitis were reported in 124/520 (23.8%) cases, followed by 37 (7.1%) cases of familiar amyloid polyneuropathy, and 25 (4.8%) cases of autoimmune hepatitis.
Median donor age was 57 years, with 91 (17.5%) and eight (1.5%) donors exceeding 70 and 80 years. The leading cause of brain death was a cerebrovascular accident (n = 346; 66.5%). In ~ 75% of cases, the procurement was performed in a local hospital. Median cold and warm ischemia times were 8.3 h and 40 min. Median BAR score was 11; 30 (5.8%) recipients had a score exceeding 15, while no case exceed 20.

Risk factors for overall risk of graft loss
Eighteen different covariates identifiable before or during the post-LT hospital stay were tested in the training set. First, a univariate Cox regression analysis was displayed with the intent to identify the risk factors for graft loss. After selecting only the statistically significant variables, and removing the possible causes of co-linearity, a multivariable Cox regression model was built. Three independent risk factors for graft loss were identified: donor age (HR = 1.01; p value = 0.002), BAR score (HR = 1.03; p value = 0.01) and CCI (HR = 1.05; p value < 0.001) (

90-Day graft loss diagnostic ability
The diagnostic ability of five different scoring systems was evaluated in both the sets, with the intent to identify the best diagnostic test for 90-day graft loss (Table 4). CCI exhibited the best diagnostic performances in both Training (AUC = 0.94, 95% CI = 0.92-0.96; p < 0.001) and Validation Sets (AUC = 0.77, 95% CI = 0.62-0.93; p < 0.001) when compared to the BAR, D-MELD, MELD, and EAD scores. All the other scores always showed inferior AUCs, only ranging 0.58-0.60 and 0.47-0.57, respectively.
In the training set, the CCI cut-off value corresponding to the first quartile (12.2 points) yielded a sensitivity of 98.4 and a specificity of 27.7 (DOR = 23.6). The threshold value Table 3 Univariable and multivariable Cox regression analyses for the overall risk of graft loss after LT in the training set   Table 4). The same cut-offs validated in the validation set showed similar excellent diagnostic ability, although they were inferior in terms of discriminative power.
The CCI cut-off value at 12.  Table 4).

1-year graft loss diagnostic ability
The diagnostic ability of five different scoring systems was evaluated in both the sets, with the intent to identify the best diagnostic test for 1-year graft loss (Table 5).

3
In the training set, the CCI cut-off value corresponding to the first quartile (12.2 points) yielded a sensitivity of 94.9 and a specificity of 28.6 (DOR = 7.5). The threshold value corresponding to the median point (29.6) had a sensitivity of 89.1 and a specificity of 57.8 (DOR = 11.2). The value of 47.3, corresponding to the third quartile, exhibited a sensitivity of 75.0 and a specificity of 87.5, giving a high DOR value of 21.0. Lastly, the threshold value put at 84.9 (ninth decile) showed a sensitivity = 53.5 and a specificity = 99.9 (DOR = 1149.4) ( Table 5).
The same cut-offs validated in the validation set showed similar excellent diagnostic ability, although they were inferior in terms of discriminative power.
The CCI cut-off value at 12.2 (first quartile) yielded a sensitivity of 86.5 and a specificity of 33.1 (DOR = 3.2). The cut-off set at 29.6 (median) had a sensitivity of 78.4 and a specificity of 53.4 (DOR = 4.2). The cut-off put at 47.3 (third quartile) exhibited a sensitivity of 56.8 and a specificity of 80.5, giving a DOR value of 5.4. Lastly, the threshold value put at 84.9 (ninth decile) presented a sensitivity = 27.0 and a specificity = 99.2 (DOR = 45.9) ( Table 5).

Sub-analysis on the graft loss diagnostic ability using aged grafts
The diagnostic ability of the five different scoring systems was also evaluated in a sub-analysis in which only transplants performed using organs from aged (≥ 70 years) donors were considered ( Table 6). As for the 90-day risk of graft loss, CCI confirmed the best diagnostic performances in both training (AUC = 0.93, 95% CI = 0.88-0.97; p < 0.001) and validation sets (AUC = 0.92, 95% CI = 0.81-1.00; p = 0.001). Similarly, CCI was also the best diagnostic tool for predicting 1-year graft loss, with the best diagnostic performances in both training (AUC = 0.88, 95% CI = 0.82-0.93; p < 0.001) and validation sets (AUC = 0.79, 95% CI = 0.59-1.00; p = 0.002).

Graft survival rates
In the training set, we obtained an excellent stratification of graft survival rates using the investigated CCI thresholds.

Discussion
A valid scoring system should exhibit good performance metrics, such as discrimination and calibration, to maintain these qualities over time, and it should be simple and easy to calculate.
In the specific setting of LT, MELD score covers most of these characteristics when used for the prediction of death during the waiting time. This ability was the reason for the introduction of the MELD score, in 2002, in the US liver allocation process. The goal was to prioritize the sickest patients for transplantation [14]. However, the MELD score rapidly proved to be a poor predictor of short-and, worse, long-term post-transplant survival [15][16][17]. A recent systematic review, which included 37 studies covering 53,691 patients transplanted in 15 different countries, identified an overall c-statistics inferior to 0.7 and consequently suggested a global poor predictive value of the score [4].
With the intent to improve its predictive ability, the MELD score has been integrated into different models built to enhance the prediction of post-transplant survival. Unfortunately, the complexity of many of these models limits their usability. The Survival Outcomes Following Liver Transplantation (SOFT) score represents a paradigmatic example: despite its good predictive ability, the score is based on 18 different pre-transplant variables, making it difficult to calculate [18]. The same holds for the MELD-sarcopenia score, in which the single complex-to-estimate parameter "sarcopenia" limits its broad applicability [19]. Conversely, the D-MELD, based on the simple multiplication of donor age and recipient MELD, represents an easy-to-calculate model [5,20]. The BAR score, based on six donor-and recipientrelated pre-operative variables, further improves prognostication without excessively increasing the complexity [6]. Moreover, a web calculator is available for its estimation. The BAR score has proven to offer great potential in different geographical areas [6,21,22]. A Chinese study including 249 LDLT patients showed that the BAR score was the best predictor of 1-year patient survival [21]. A Brazilian study including 402 patients reported similar results when looking at three-month patient survival [22].
However, all these scores based on pre-transplant data typically yielded inferior results compared to scoring systems based on variables available in the immediate posttransplant period. Among the post-transplant scores, the Olthoff-EAD is the most commonly adopted [7,23].
The great and largely unsolved challenge in LT remains how to correctly allocate a limited resource such as organs from deceased donors, which can be addressed only with preoperative variables. Therefore, a score composed by post-transplant parameters cannot be used with the intent to optimize the allocation process. However, such a score should maintain its potential usefulness as a diagnostic tool for early (i.e., 3-month, 1-year) clinical course prediction.
In the present series, we observed that the CCI model presented high relevance for LT survival prognostication in both the Sets we investigated. Moreover, CCI outperformed both the pre-and post-transplant scores in diagnostic ability.
CCI was initially created to report complication rates more accurately. The CCI aimed to inform about the severity of cumulative postoperative complications precisely [8]. Recently, its potential role as a prognostic tool has been implemented in different fields of surgery. As an example, two international studies used CCI cut-off values of 33 and 42 as benchmarks for evaluating the quality of a Fig. 1 a Training set: 1-year graft survival rates according to the CCI risk strata. b Validation set: 1-year graft survival rates according to the CCI risk strata successfully performed liver resection or transplantation, respectively [10,24].
Several studies investigated the prognostic impact of CCI in the setting of different types of cancer. A US study showed that CCI was a strong survival predictor in patients undergoing hepatic resection for colorectal metastases independently from the RAS mutational status. Patients with high CCI (≥ 26.2) had worse recurrence-free and cancer-specific survivals with respect to low-CCI patients [9].
A study from Japan correlated postoperative complications with worse survivals in gastric cancer patients. Patients with a CCI ≥ 32.15 had significantly lower 5-year overall and disease-specific survivals than those observed in the CCI low group. Moreover, a multivariate analysis identified the CCI as an independent prognostic indicator [25]. Another study from China similarly investigated the role of CCI in the setting of gastric cancer. Patients with high CCI (≥ 26.2) presented 5-year cancer-specific survival rates markedly inferior (46.3% vs. 54.9%) [26].
CCI was also correlated with several parameters of poor outcome after surgery, further explaining its potential role as a predictor of poor early outcomes. A study from Spain correlated CCI with the frailty status in elderly patients treated with surgery, suggesting a correlation among frailness, post-surgical complications, and poor outcomes [27]. Another US study showed a correlation between CCI and time to normal activity in patients undergoing gastrointestinal and hepato-bilio-pancreatic surgery [28].
Up to now, only one Dutch study has revealed a prognostic role of CCI in the specific setting of liver transplantation. Specifically, when transplants performed with organs from deceased-cardiac donors (DCD) or deceasedbrain donors (DBD) were compared, 6-month postoperative median CCI was significantly higher in case of DCD grafts (53.4 vs 47.2). Moreover, more DCD recipients underwent re-transplantation for ischemic-type biliary lesions in this period (4% vs 1%), therefore suggesting a correlation between CCI and the development of biliary complications [11].
In the present experience, the CCI reported the best diagnostic ability respect to all the other tested scores in terms of graft loss risk, with AUCs of 0.94 and 0.77 in the training and validation sets for the diagnosis of 90-day graft loss. As for 1-year graft loss, CCI showed similar good performances in the training (AUC = 0.88) and validation sets (AUC = 0.75).
The strength of the CCI was particularly evident in light of the poor performances observed by the other tested preand post-transplant scores. Interestingly, no one of them ever showed an AUC > 0.70 in both the 90-day and 1-year graft loss risk estimation.
We also tested several CCI cut-offs. Interestingly, the value corresponding to the third quartile (47.3) was substantially similar to the threshold identified by Muller et al. on 7492 patients transplanted in 17 different centers [24].
We clearly understand that the diagnostic utility of CCI should appear marginal, mainly in consideration of the potentially long time required for its calculation. Typically, scores based on post-transplant data are collected within seven to ten days from LT [7,23]. While the CCI calculation was set in our study at the time of patient discharge. However, just for clarifying the timeframes required for the CCI estimation, 1067/1262 (84.5%) and 464/520 (89.2%) patients were discharged in the training and validation sets within one month from LT, thus consenting to obtain the CCI calculation in an acceptable time, mainly in light of its usefulness for the prediction of 1-year graft loss.
Another aspect to consider is the fact that, once a LT patient has developed a complication, the ability to improve the patient outcomes should be markedly limited if compared with the possibility to pre-operatively prevent this specific complication. We understand this shortcoming of the model, obviously limiting the impact on the CCI in conditioning important aspects like an early re-transplantation. However, we think the role of the CCI merits consideration, mainly in light of the possibility to identify patients that are more "fragile" at the discharge time.
As an example, the sub-analysis focused on the transplants performed using aged grafts showed that the CCI even improved its diagnostic ability to predict early graft loss, therefore underlying the potential utility of this score in identifying transplanted patients at time of discharge requiring particular attentions during the follow-up.
As previously reported, the CCI should play a role in the prognosis of tumoral patients [10,25,26]. Studies on colorectal metastases and gastric cancer have been reported [10,25,26], while no studies have been published up to now with the intent to correlate CCI and post-transplant HCC recurrence risk. The present study was not constructed with the aim of investigating the correlation between HCC, LT and CCI. However, we can postulate that, also in this case, a worse correlation between high CCI and cancer should exist. Therefore, in tumor patients with high CCI at time of discharge, we should justify the use of a tailored immunosuppression (i.e., everolimus, rapid steroid withdrawal), a more cautious use of steroid boluses in the management of acute rejections, or a personalized scheme of outpatient follow-up (i.e., more frequent measurements of alpha-fetoprotein or a more stringent imaging protocol). Further studies specifically focused on the correlation between CCI and HCC are required.
Another important element is the potential correlation between CCI and biliary complications. Only one study specifically reported this connection, therefore requiring more detailed studies with the intent to clarify the potential intercorrelation between poor initial clinical course and biliary complications [9]. However, also in this case, we can postulate that the early identification of patients with a greater risk for biliary complications should offer the opportunity to design tailored therapies (i.e., ursodesoxycholic acid) and personalized schemes of outpatient followup comprehending early magnetic resonance imaging, with the intent to minimize possible complications.
We think such an opportunity is not of marginal relevance. As an example, 176/1262 (13.9%) and 102/520 (19.6%) patients in the training and validation sets overpassing the identified threshold of 47.3 were alive at discharging time. In the training set, 67/102 (38.1%) of these patients had biliary complications, and 40 (22.7%) required a retransplantation during their follow-up. We are confident that these patients should potentially benefit from a modification of the post-discharge management policy, due to the peculiar condition derived from a complex post-transplant course.
One can argue that a potential bias of the study is represented by the arbitrary decision to calculate the CCI only at the time of the first post-LT hospitalization, excluding the possible complications observed by the patient after discharge. As an example, in the study by Muller et al., the CCI value was calculated at 12 months after transplantation [24]. On the opposite, we voluntarily decided to measure the CCI at the time of discharge. In fact, we think that such a measurement gives the opportunity to identify a sub-class of high-risk cases at discharge in which the previously reported management changes should be adopted with the intent to minimize their predictable poorer clinical course.
The intent of the study was not to compare the training and the validation sets. However, in light of the observed results, we noted that the Validation Set reported better CCI and 90-day results despite a higher median MELD value. We can do some suppositions for explaining this paradoxical result. First, MELD alone is not necessarily able to capture the overall recipient technical difficulties, mainly in case of "exception" pathologies like HCC and biliary cholangiopathies. In approximately half of the Training and Validation Set cases, we exactly observed these types of pathologies, respectively. Second, the high number of HCC cases in the training set should explain the higher rate of vascular thromboses/stenoses as a consequence of several intra-arterial treatments caused by bridging/downstaging strategies [29,30]. Third, a possible effect of institutional case volume should explain this result. The Validation Set is, in fact, a high-volume center, representing a centralized referral and management center for all the hepatopathies of its country. Several studies already reported better results in high-vs. mediumvolume centers [31,32]. Another aspect to consider is the higher percentage of PNF observed in the training set (2.9 vs. 0.6%, p = 0.001), potentially explainable with worse histological aspects of the used graft. Unfortunately, due to the retrospective nature of the study, we were not able to explore this aspect. Last, we cannot exclude the presence of comorbidities like refractory ascites and portal hypertension in the recipients potentially justifying the observed results. Also in this case, we were not able to retrospectively evaluate in detail these aspects.
We are aware that the study may have some limitations. First, the study is retrospective and pluricentric. The main concern connected with the retrospective nature of the study is the risk of missed post-LT complications, mainly for the grade I-II cases. A systematic retrospective collection of all the pharmacological needs of LT patients should be challenging. However, we can say that, although potentially underestimated, the diagnostic effect of CCI was particularly relevant in our series. Consequently, we can only assert that the CCI role should be even stronger in a prospectively collected database.
Another limit of the study is connected with the fact that some complications have been classified for CCI scoring on a non-empirical basis, due to a lack of literature investigating on this aspect. Obviously, such a condition is of relevance because the heterogeneity in the CCI grading of specific complications may influence the performance characteristics of the score.
As for the multicentricity of the training set, we should emphasize that the large sample size of this population should minimize possible biases related to data analysis. Moreover, the validation set was based only on a monocentric experience, however confirming an overall broad prognostic ability of CCI also in this context.
In conclusion, the CCI shows a very good diagnostic ability for 90-day and 1-year graft loss in both a multicentric training and a monocentric validation set. Its diagnostic power is superior to other commonly adopted pre-and post-LT scores. Further analyses are required to prove its validity even in the long term.