Introduction

The Child–Turcotte–Pugh (CTP) score is a widely used and validated predictor of long-term survival in cirrhosis [1]. Described initially by Child and Turcotte [2] and modified by Pugh et al. [3], the CTP has undergone several additional modifications, most recently to change the cutpoint for serum albumin levels based on laboratory normal value ranges [4]. While highly predictive of surgical risks [59], hospital mortality [10, 11], post-embolization mortality [12, 13], transplantation waitlist mortality [14], and long-term survival [15] in cirrhosis, its major limitations are the dependence upon subjective variables and convenience-based cutpoints that have never been validated formally [4].

The Veterans Outcomes and Costs Associated with Liver Disease Study Group (VOCAL) recently developed and validated a method to calculate CTP scores [16] from administrative datasets, designated as the electronic CTP (eCTP). The VOCAL cohort includes over 59,000 Veterans with prevalent cirrhosis treated between the years 2008 and 2010 for whom baseline demographics, disease etiology, and overall and transplant-free survival data have been determined. By creating operational definitions for the subjective CTP variables (ascites and encephalopathy), we demonstrated that the eCTP score was superior to the model for end-stage liver disease (MELD) score, Charlson–Deyo index, VACS index [17], and CirCom score [18] for prediction of 5-year overall and transplant-free survival in this cohort.

Given the size of the cohort and comprehensive follow-up, the cohort appeared ideal for testing the hypothesis that the current CTP laboratory cutpoint values are not optimized for predicting survival in cirrhosis. We therefore performed serial modifications of the eCTP varying the upper and lower cutpoints for INR, total bilirubin and serum albumin to determine the impact of concordance with transplant-free survival in Cox proportional hazard survival models. We found that current cutpoints indeed do not optimally predict transplant-free survival, and we define cutpoints that provide better concordance with survival models. Compared to MELD, the current scoring system used for liver transplant allocation, both original and modified CTP (mCTP) are far superior predictors of intermediate- and long-term survival in this large cohort of cirrhotic patients.

Methods

Data Sources

The derivation of the VOCAL cohort has previously been described [16]. Using the Veterans Health Administration (VHA) Corporate Data Warehouse (CDW) with local IRB approvals at each site, data from all patients with cirrhosis using a validated algorithm (two outpatient or one inpatient ICD9-CM code for cirrhosis [571.2, 571.5]) [19] from the period of 1/1/2008 to 12/31/2010 were obtained. For these individuals, we obtained all in- and outpatient ICD9-CM codes, CPT codes, pharmacy data, and laboratory values from 1/1/2002 to 12/31/2013. Death was ascertained using the Vital Status File (censoring as of December 31, 2014). Liver transplantation status was obtained by cross-referencing United Network of Organ Sharing/Organ Procurement and Transplantation Network STAR-file data [20]. For non-laboratory CTP subscores, we utilized our previously validated algorithm incorporating pharmacy data, CPT codes, and ICD9 codes to determine the presence and severity of ascites and encephalopathy [16]. Laboratory values close to the end date of the specified quarter of analysis (e.g., 3/31/2008 for first quarter 2008) were used, and the duration of survival was calculated from the quarter end-date as previously described [16].

Modified Child–Turcotte–Pugh (mCTP) Score Iterations

Using R survival package [21, 22], and year 2008 quarter 1 data, mCTP scores were derived for serial modifications of the lower cutpoint of INR (from 1.0 to 4.0), the upper cutpoint of INR (from 2.0 to 5.0), the lower cutpoint of total bilirubin (from 1.0 to 4.0), the upper cutpoint of total bilirubin (from 2.0 to 5.0), the lower cutpoint of serum albumin (from 1.0 to 4.0), and the upper cutpoint of serum albumin (from 2.5 to 4.5). Ascites and encephalopathy eCTP subscores were held constant based on previous operational definitions [16]. The discriminative capability of each CTP score in Cox proportional hazards regression models adjusting for gender and age was evaluated using the concordance system of Harrell et al. [23] Optimal cutpoints were identified then validated in dataset from 2008 quarters 2–4 as well as evaluated for concordance in various patient subsets. The mCTP models were also compared to the original CTP (oCTP) score, a proposed modification of the CTP (Huo CTP) developed in Taiwan [24], and MELD score. Results were also confirmed by using competing risk models in R (mstate package [21, 25]). Due to relatively lower LogWorth scores in multivariable models for the INR and encephalopathy subscores, a 5–13-point CTP model (mCTP [13]) in which the INR and hepatic encephalopathy subscores were dichotomized (1 = normal or below cutpoint, 2 = abnormal or above cutpoint) was also evaluated for its performance.

Exploratory Analyses

Cox proportional hazard models were fit utilizing age, gender, three-point ascites subscore, three-point albumin subscore (using 3.6 and 3.3 g/dl cutpoints), and three-point total bilirubin subscore (2.9 and 3.3 mg/dl cutpoints) with several variations: three-point versus two-point INR stratification (1.8/4.5 vs. 3.3), three-point versus two-point encephalopathy stratification (standard vs. absent/present), and inclusion of a two-point serum creatinine (sCr) subscore using a cutpoint of 1.8 mg/dl derived from serial iteration of the models. A creatinine-mCTP model (mCTP-Cr) eliminating INR was then iterated in serial Cox models to identify cutpoints for all three laboratory variables to identify the model yielding the highest Harrell’s C-statistic. To assess each model’s concordance for predicting outcome across the range of clinical severity, oCTP, mCTP, mCTP-Cr, and MELD were assessed in serial subgroups of the cohort defined by three-level oCTP ranges (e.g., 5–7, 6–8, 7–9, …, 13–15) or defined by MELD ranges (6–10, 7–11, 8–12, …., 36–40).

Results

Demographics of the VOCAL cohort with prevalent cirrhosis during the first quarter 2008 have been previously described [16]. The cohort consists of predominantly well-compensated, white (72.3 %) male (97.2 %) cirrhotic patients. The dominant non-exclusive underlying liver diseases include hepatitis C infection (48.8 %) and alcohol abuse/dependence (62.6 %); 18.7 % have cirrhosis not attributable to alcohol or viral hepatitis. The median MELD score was 10. Median eCTP class/score was A6, and 43 % were eCTP B7 or higher. For this cohort, at least 5 years of follow-up for death or transplantation were available.

To screen for the directionality of changes in individual laboratory cutpoints, we first ran Cox proportional hazard models to estimate concordance of mCTP scores altering the upper and lower limits of each of the three laboratory variables INR, serum albumin, and total bilirubin while fixing the other two variable cutpoints at the original values. As shown in Fig. 1 (triangles), varying the lower INR cutpoint over a wide range identified optimal prediction at 1.8, very similar to the existing cutpoint of 1.7. By contrast, the optimal value for the upper INR cutpoint was significantly greater than the existing 2.3, plateauing at approximately 4.5. For serum albumin, an upper cutpoint of 3.6 g/dl, similar to the existing 3.5 g/dl cutpoint, yielded optimal concordance. However, a lower cutpoint of 3.3 g/dl, significantly higher than the existing 2.8 g/dl, appeared optimal. Varying the lower cutpoint of total bilirubin over a wide range showed an initial peak of concordance at 1.3 mg/dl and second higher peak at 2.9 mg/dl, both significantly better than the current 2.0 mg/dl lower cutpoint. Marginally improved predictive capacity of the upper bilirubin cutpoint from the existing 3.0–3.3 mg/dl improved performance; further increases yielded no predictive improvement.

Fig. 1
figure 1

Impact of varying upper and lower cutoffs of CTP laboratory variables on concordance (Harrell’s C) for prediction of 5-year transplant-free survival. ac For each simulation, one variable was modified while fixing the other five variables at either original cutpoints (triangle), modified optimized cutpoints (open circle) or intermediate values (diamond). Arrowhead shows optimal cutpoint defined around original CTP cutpoints for other five variables. Filled triangle represents original CTP, and filled circle represents modified CTP. d Effect of iterating serum creatinine cutpoints in creatinine-modified CTP model. Arrowhead indicates point of optimal concordance

The optimal cutpoints were further refined by varying each single variable while fixing the other two variables at the newly identified optimal cutpoints. As shown in Fig. 1 (circles), estimates for optimal cutpoints for each of the laboratory components of the CTP score did not significantly change but the overall predictive capacity of the mCTP increased to approximately 0.709 ± 0.002 from 0.701 ± 0.002 with the oCTP model. The cutpoints for the final proposed mCTP and the oCTP are summarized in Table 1.

Table 1 Laboratory cutpoints for original CTP (oCTP), modified 5–15 CTP (mCTP), second modified CTP system dichotomizing INR and encephalopathy (mCTP13) and creatinine-modified CTP (mCTP-Cr)

Using the optimized cutpoints in a multivariable Cox proportional hazard model including age, and the five CTP subscores with or without including gender in the model [Table 2(A)], serum albumin, total bilirubin, and ascites subscores had the greatest contribution (LogWorth) to the risk prediction of the CTP model. For these three parameters, incremental risk ratios for a one score (none) to two (mild or medically controlled) to three (severe or medically refractory) increased in a relatively linear fashion with relative risks (RR) of 1.5–1.6 for scores of 2 and 2.1–3.2 for scores of 3. Encephalopathy scores showed a modest increased RR for a score of 2 [RR 1.2 (95 % CI 1.1–1.3)] and a further modest increase for a score of 3 [RR 1.8 (95 % CI 1.7–1.9)]. INR showed an extremely modest increase in the RR of death or transplant for a score of 2 [RR 1.3 (95 % CI 1.2–1.4)] and no significant impact for a score of 3. Given the low RR for scores of 3 and overall small contribution to the CTP model, we designed a second mCTP model dichotomizing encephalopathy (1 = absent, 2 = present) and INR. To identify the appropriate single cutpoint for INR in a two-level model, we calculated Harrell’s C-statistics varying INR over the range from 1 to 6.5 while varying albumin and total bilirubin cutpoints over narrower ranges (Fig. 4 in “Appendix”); we found that INR cutpoints from 3.1 to 4.0 optimized discriminative capacity. Our simplified mCTP (mCTP13) using an INR cutpoint of 3.3 (for consistency with albumin and total bilirubin cutpoints) is shown in Table 1. The Harrell’s C-statistic for age- and gender-adjusted mCTP13 in the first quarter of 2008 cohort was identical to the mCTP (0.709 ± 0.002) with a significantly higher hazard ratio 1.60 (95 % CI 1.58–1.61) than the mCTP [1.53 (95 % CI 1.52–1.54)].

Table 2 Multivariate Cox proportional hazard analysis of individual subscores in predicting 5-year transplant-free survival

We next investigated the utilization of sCr in mCTP models. Serial Cox proportional hazard models adding a lower and upper cutpoint for sCr factor were iterated over the range of sCr from 1.0 to 7.0 (Fig. 1d). These models showed that the optimal C-statistic was achieved using a lower sCr cutpoint of 1.8 and upper cutpoint of 2.3 mg/dl. After inclusion of sCr, the INR subscore no longer was statistically significantly associated with 5-year TFS (data not shown), so a simplified model was created substituting INR with sCr. To evaluate the impact of this change on the other two laboratory cutpoints, we iterated upper and lower cutpoints for total bilirubin and serum albumin over a wide range to assess trends. We found that optimal cutpoints for albumin did not change (3.6 and 3.3 g/dl), the lower total bilirubin cutpoint did not change (2.9 mg/dl), but the optimal upper total bilirubin cutpoint increased up to 3.7 mg/dl. The final model (mCTP-Cr) is shown in Table 1. In multivariable Cox analysis [Table 2(B)], the HRs for creatinine subscores 2 and 3 were 1.6 (1.5–1.7) and 2.1 (1.9–2.2), respectively. The HR for a bilirubin score of 3 increased slightly with the new upper cutpoint. Using these modifications, the C-statistic for the mCTP-Cr model improved to 0.712 ± 0.002 (Table 3).

Table 3 Performance of original CTP (oCTP), modified CTP (mCTP), simplified modified CTP (mCTP13), and creatinine-modified CTP (mCTP-Cr) in predicting 5-year transplant-free survival in cohorts drawn from 2008 quarters 1–4

The performance of the oCTP, mCTP, and mCTP-Cr in predicting 5-year transplant-free survival was validated using datasets for patients in the cohort in the subsequent three quarters of 2008 (Table 3) and compared with MELD, VACS, and Huo CTP (Table 5 in “Appendix”). All CTP-based models were more predictive for 5-year transplant-free survival than MELD or VACS. The Huo CTP modification had a negligible difference from the oCTP. For cohorts drawn from each quarter, overall concordance of mCTP, mCTP13, and mCTP-Cr exceeded that of oCTP and these were highly similar. Hazard ratios for mCTP13 were significantly greater than mCTP due to the narrower range of scores (5–13 compared to 5–15). Kaplan–Meier survival curves for oCTP, mCTP, and mCTP-Cr (Fig. 4 in “Appendix”) show that mCTP eliminates cross-over in survival curves for oCTP 12–13.

Across disease classes, mCTP and mCTP-Cr had statistically superior predictive capacity than oCTP particularly for individuals with alcoholic and non-viral cirrhosis (Table 4). mCTP and mCTP-Cr had statistically superior predictive capacity than oCTP and MELD in non-black men (Table 6 in “Appendix”), but was not statistically better in the relatively small African-American subset of patients. Due to limited numbers, no differences in performance in women were shown for mCTP and mCTP-Cr relative to oCTP, but all three models were superior to MELD. In addition, CTP-based predictive systems were markedly better at predicting 5-year transplant-free survival in individuals with MELD lower than 15, a group generally not considered for transplantation referral based on relatively low short-term mortality. Additionally, mCTP and mCTP-Cr performed better than oCTP or MELD for predicting shorter term 1-, 2-, 3- and 4-year transplant-free survival (Table 7 in “Appendix”).

Table 4 Performance of original CTP (oCTP) and modified CTP (mCTP) in predicting 5-year transplant-free survival in specific disease etiology subsets

To assess the predictive performance of mCTP, mCTP-Cr relative to oCTP and MELD, we analyzed serial subsets of the cohort over increasing MELD or oCTP score ranges (Fig. 2). We found that over MELD ranges 12–16 to 27–31, mCTP and mCTP-Cr exhibited higher concordance than oCTP. At MELD 28–32, mCTP and oCTP converge, but mCTP-Cr remains statistically superior until MELD 30–34, beyond which small cohort sizes limit the ability to differentiate model performance. Even at high MELD scores, CTP-based models better predict 5-year TFS (and 1–4-year TFS, data not shown). Over serial binning of the cohort by oCTP ranges (Fig. 2b), mCTP and mCTP-Cr maintain superior concordance for ranges from oCTP 6–8 to 9–11. For patients with oCTP 9–11, mCTP, oCTP, and MELD converge. Above an oCTP 11–13, mCTP-Cr has superior predictive capacity than mCTP or oCTP. For patients with oCTP 11–13, 12–14 or 13–15, MELD best predicts death or transplant events. Fewer than 2 % of cirrhotic patients were alive in these oCTP classes during the index quarter. Thus, among the models mCTP-Cr maintains the greatest degree of predictive consistency across the spectrum of liver disease severity.

Fig. 2
figure 2

Impact of varying clinical severity on concordance of oCTP, mCTP, mCTP-Cr, and MELD on 5-year transplant-free survival. a The 2008 quarter 1 cohort was serially subsetted by five-point MELD ranges and the C-statistics for oCTP (solid gray), mCTP (solid lack), mCTP-Cr (dashed gray, solid marker), and MELD (dashed black) plotted. The percentage of patients in each subset is plotted on the secondary axis (gray dotted line, round marker). b The 2008 quarter one cohort was serially subsetted by three-point oCTP ranges and the C-statistics for oCTP (solid gray), mCTP (solid lack), mCTP-Cr (dashed gray, solid marker), and MELD (dashed black) plotted. The percentage of patients in each subset is plotted on the secondary axis (gray dotted line, round marker)

Discussion

The CTP score, while developed to predict portosystemic shunt surgery outcomes in cirrhotic patients [2, 3], historically has remained the most widely used staging system to predict long-term survival in cirrhosis [1]. Despite widespread acceptance, several flaws of CTP as a prognostic staging system have been identified [26]. It relies on two subjective assessments, fostering inter-observer variation and hampering application to large datasets. The cutpoints for the objective laboratory variables have never been validated. Additionally, the dynamic range of the CTP with only 11 total strata and three classes theoretically reduces its utility for prioritization in organ allocation. We have previously shown that by creating operational definitions for ascites and encephalopathy based on ICD9-CM diagnosis codes, CPT procedure codes, laboratory and pharmacy refill data, and the subjective variables of the CTP score can be estimated (eCTP) with a high degree of precision [16]. In this study, which represents the largest dataset used to evaluate CTP to date, the electronic oCTP was significantly more accurate than MELD or VACS scores for predicting 1–5-year transplant-free survival. Addition of comorbidity indices such as Charlson–Deyo and CirCom [18] had minimal impact on prediction.

Our cohort of 30,897 cirrhotic subjects with prevalent cirrhosis in the first quarter of 2008 who have at least 5 years of follow-up allows for robust modeling to determine evidence-based, optimal CTP cutpoints. One striking finding on multivariable Cox proportional hazard modeling was that cutpoint estimation for INR, albumin, bilirubin and creatinine were highly independent suggesting that these three measures truly reflect different aspects of the pathophysiology of end-stage liver disease associated with outcome.

The lower bound of INR associated with increased mortality risk was adequately estimated by the oCTP score. The similarity of the lower cutpoint we identified as optimal and the original value is notable, approximating a prothrombin time >4 s longer than control. However, the original upper cutpoint of 2.3 (based on a prothrombin time 6 s longer than control) does not appear to have intrinsic validity; increased risk of death was more closely associated with significantly higher degrees of coagulopathy. INR only modestly contributes to CTP in multivariable Cox proportional hazard models as a three-level factor. Indeed, a single cutpoint at INR between 3.1 and 3.5 better discriminated the degree of coagulopathy associated with increased risk of death or transplantation. Eliminating patients on warfarin did not impact these findings (data not shown). Dichotomizing the INR subscore (1 = INR <3.3, 2 = INR ≥3.3) also did not negatively impact predictive capacity. Notably, this cutpoint is nearly identical to that identified by Asrani and Kim [27] as the point at which increasing INR no longer is associated with increasing mortality. Our data strongly support the clinical gestalt that by the time in the natural history of cirrhosis at which significant INR increases are observed, the prognosis is so poor that further increases have little impact on observed survival.

Many clinicians consider declining albumin to be the most sensitive marker of progressive liver dysfunction, both as a reflection of impaired synthetic reserve and as a precursor of the neurohumoral response that results in ascites and hepatorenal syndrome. Indeed, relatively modest reductions in albumin (3.7–3.8) from normal markedly increased risk of mortality. Nonetheless, the optimal upper cutpoint of 3.6 g/dl from our cohort approximated the originally chosen value of 3.5 g/dl. Strikingly however, high risk was imparted at serum albumin values much higher than the original 2.8 g/dl, occurring at a lower cutpoint of 3.3 g/dl. Seventeen percent of the cohort had serum albumin values of 2.8–3.2 g/dl and were given an additional point by the mCTP. Median survival for individuals with albumin 2.8–3.2 was significantly shorter [907 days (95 % CI 872–943)] than for those with albumin 3.3–3.5 [1629 days (95 % CI 1569–1700)]. Given the marked disparity in survival of individuals with albumin levels in the lower half of the oCTP albumin score 2 range (2.8–3.2 vs. 3.3–3.5), raising the lower cutpoint of albumin to 3.3 g/dl appears to be alteration of the oCTP that yields the greatest magnitude of improvement in model performance.

We observed a bimodal behavior of the lower bilirubin cutoff, in which a cutpoint of 1.3 mg/dl was superior to the current 2.0 mg/dl cutpoint, but an even stronger concordance was generated using a higher cutpoint of 2.9 mg/dl. By contrast with the MELD, there was a relatively low transition to a high-risk bilirubin value at 3.3 mg/dl [27]. We could not replicate an improvement of prediction by addition of points to the CTP for very high bilirubin levels (>8 mg/dl) as proposed by Huo et al. [24] either in the complete cohort or those with decompensated cirrhosis (Table 7 in “Appendix”). These data are in concordance with the relatively low weighting of total bilirubin in the MELD score, but differ from the linear association with mortality observed over a large dynamic range of total bilirubin when utilized in MELD models [27]. We ultimately found that in mCTP-Cr the optimal cutpoints were 2.9 and 3.7 mg/dl.

The inclusion of “subjective” assessments of HE and ascites has been a consistent critique of the CTP scoring system [26]. However, both ascites and HE have been shown to have a strong impact on survival [2830] and quality of life in end-stage liver disease [31] but correlate poorly with MELD score [3032]. Encephalopathy is very difficult to quantify administratively [33], and the lack of impact of a higher score could reflect misclassification or could reflect lack of internal validity for a three-level stratification. Dichotomizing the CTP encephalopathy subscore (1 = absent, 2 = present) based on hospitalizations and/or treatment for HE might simplify and reduce the subjectivity of this assessment. We found that doing so had no significant impact on the predictive capacity of CTP models (C-statistic 0.709 ± 0.002 for both three- and two-level models). Similarly, the Mayo group found that there was no statistical difference in Kaplan–Meier survival rates for HE grades of 2 or 3 [30]. However, in mCTP-Cr a three-level encephalopathy score proved superior (data not shown). The presence of encephalopathy ICD9 codes or medications remains critical to CTP for predictive capacity and partially explains the improved performance of CTP over MELD in predicting survival in this and other cohorts [16, 30]. A similar simplification of ascites scoring worsened CTP predictive capacity; however, the strict operational definition we used for severe or refractory ascites, which required >1 large-volume paracentesis in the preceding 90 days or history of TIPSS placement, could strongly reduce the subjectivity of this assessment in practice.

Strengths of our study include large cohort size, completeness of follow-up for outcomes, and variety of disease etiologies. The limitations of this study, similar to the original derivation cohort, include the predominance of male patients and complexity of the eCTP algorithm, that requires access to claims, laboratory and pharmacy data. Therefore, replicating this study requires comprehensive administrative datasets. External validation of the mCTP and mCTP-Cr score on non-VA administrative datasets would be important before recommending a wholesale change of the 50-year-old CTP scoring system.

Conclusion

The mCTP and creatinine-modified mCTP-Cr models show superiority over oCTP and MELD in predicting 1-, 2-, 3-, 4- and 5-year transplant-free survival in a large cohort of cirrhotic individuals ranging from well to poorly compensated. Augmenting the clinical gestalt of highly skilled and experienced liver surgeons that has stood the test of time with modern evidence-based estimates based on survival models can improve the capacity of CTP to predict short- and long-term survival in multiple cirrhosis etiologies and demographic subsets. We recommend modifying the CTP score used in clinical practice to incorporate the current evidence-based cutpoints.