Introduction

With the increasing worldwide availability of liver transplantation, a standardized assessment of severity of illness is needed to evaluate patient outcome objectively over time and between different institutions. Cirrhosis-specific scoring systems, such as the Child–Pugh classification and Show's risk score, have been shown to be good predictors of outcome of cirrhotic patients [1]. However, when used as predictors of outcome for liver transplantation patients the results are inconsistent [2,3,4]. This is partly explained by the fact that the preoperative condition is only one factor in a series of complex interactions that include intra-operative and postoperative factors. Systems for predicting the severity of illness and mortality, such as the Acute Physiology and Chronic Health Evaluation (APACHE) II system, are attractive options for this group because they rely on data collected soon after admission to the intensive care unit (ICU), which is likely to reflect preoperative, intra-operative and postoperative contributions.

The APACHE II system was described by Knaus et al. in 1985 to predict hospital mortality in ICU patients [5]. The multiple logistic regression equations were based on data collected on 5050 medical and surgical patients admitted to the ICU in 13 tertiary medical centers in the USA. This outcome prediction system has been used to evaluate and compare the performance of ICUs in different hospitals and countries. In addition to general ICU patients, APACHE II has also been studied in specific groups of patients such as those with trauma [6], sepsis [7], and cirrhosis [8].

The APACHE II prediction equation incorporates three variables: an APACHE II score, the diagnostic category of the patient, and whether the surgery was emergency or elective. The APACHE II score consists of the Acute Physiology Score, which is calculated from 14 physiologic variables that are scored from 0 to 4 and depend upon the degree of deviation from normal. Points for age and for chronic illness are also assigned. There are 50 different diagnostic categories, each with a different weight used in calculating the predicted mortality. There is no specific diagnostic category weight for liver transplantation, because there were no liver transplantation patients in the developmental database for this system. Thus, when this system is used for postoperative liver transplantation patients, the diagnostic category weight 'postoperative other gastrointestinal surgery' is used. This approach has been shown to overestimate mortality significantly [9]. Angus et al. recently derived a new diagnostic category weight based on their population of liver transplantation patients [9]. The purpose of the study was to validate the newly derived postoperative orthotopic liver transplantation (OLTX)-specific diagnostic weight for APACHE II in independent databases.

Methods

King Fahad National Guard Hospital (KFNGH) is a 550-bed tertiary care center. The 12-bed medical–surgical ICU has 600 admissions per year. The liver transplantation program is the main program in the Kingdom of Saudi Arabia. The University of Wisconsin (UW) liver transplantation program is a major program in the USA. Liver transplantation patients are admitted to the Trauma and Life Support Center, which is a multidisciplinary ICU that admits 2000 patients per year. Medical records of liver transplantation patients admitted postoperatively to the adult ICU in the period April 1996 to January 2000 at KFNGH and April 1997 to January 2000 at UW were reviewed. Re-transplantations, kidney–liver and living–related transplantations were excluded. The following data were collected: age, sex, and underlying liver disease. APACHE II scores were calculated according to the original methodology by using the worst physiologic values in the first ICU day. The only exception was Glasgow Coma Score (GCS). Most of these patients were still under the influence of postoperative sedation during the first 24 hours in ICU, and the worst GCS would reflect the effect of sedation more than the true underlying mental status. We therefore used the best GCS, which we felt would be a better reflection of the patient's mental status. All patients were given chronic health points. Vital status at discharge from the hospital was registered.

Predicted mortality was calculated with the logistic regression formula described in the original article [5]. We used two approaches: the original APACHE II diagnostic category weight of postoperative gastrointestinal surgery (-0.613), and the OLTX-specific diagnostic category weight calculated by Angus et al. (-1.076) [9]. The formulae for calculating predicted mortality (risk of death [ROD]) are as follows:

for the original approach, ln (ROD/1 - ROD) = -3.517 + (APACHE II score × 0.146) - 0.613;

for the new approach, ln (ROD/1 - ROD) = -3.517 + (APACHE II score × 0.146) - 1.076.

Standardized mortality ratio (SMR) was calculated by dividing observed mortality by the predicted mortality. The 95% confidence intervals (CIs) for SMRs were calculated by regarding the observed mortality as a Poisson variable, then dividing its 95% CI by the predicted mortality [10]. The two approaches were compared with regard to calibration (the ability to provide a risk estimate corresponding to the observed mortality) and discrimination (the ability of the predictive system to differentiate survivors from non-survivors). The calibration of both systems was evaluated with the Hosmer–Lemeshow goodness-of-fit C-statistic [11]. We calculated the C-statistic by dividing the study population into six equal groups with increasing predicted mortality to ensure an adequate number of patients in each group. Discrimination was tested by 2 × 2 classification matrices at decision criteria of 10%, 30%, and 50%. Receiver operating characteristic (ROC) curves were constructed as a measure of assessing discrimination with 10% stepwise increments in predicted mortality. The two curves were compared by computing the areas under the ROC curves [12,13].

The patient characteristics and outcome data from the two participating institutions were compared, to evaluate the overall performance of the system between the two hospitals. Continuous variables were expressed as means ± SD. Categorical values were expressed in absolute and relative frequencies. All categorical variables were analyzed by the χ2 test. Non-parametric variables were compared by Kruskal–Wallis test. P values of 0.05 or less were considered significant. Minitab for Windows (Release 12.1, Minitab Inc.) was used for statistical analysis.

Results

Patient characteristics

During the study period 174 postoperative liver transplantation patients were admitted to ICU. Patients' characteristics, underlying liver disease, APACHE II scores, and predicted and observed outcomes are shown in Table 1.

Actual and predicted hospital mortality rates

The mean APACHE II score was 13.96, with an SD of 5.76. Observed mortality was 5.75%. When the original diagnostic weight was used, APACHE II significantly overestimated mortality (predicted mortality 12.96%, SMR 0.44, 95% CI 0.22–0.80). When the new diagnostic weight was used, the system provided a closer estimate of mortality (predicted mortality 8.89%, SMR 0.65, 95% CI 0.31–1.16). Fig. 1 shows actual and predicted mortality with the use of both approaches in the whole cohort classified according to APACHE II score.

Figure 1
figure 1

Actual mortality (triangles), mortality predicted with the original model (diamonds) and mortality predicted with the orthotopic liver transplantation-specific diagnostic category weight (circles) in the whole cohort stratified by APACHE II scores. The bars represent the numbers of patients in each subgroup.

Calibration

The goodness-of-fit analysis, with the Hosmer–Lemeshow C-statistic, is shown in Table 2; the new system had better calibration (original model, χ2 = 11.06, P = 0.03; new model, χ2 = 5.92, P = 0.20).

Discrimination

Discrimination examined by 2 × 2 classification matrices showed an improvement with the new diagnostic category weight. This was reflected by the higher overall correct classification rate at the three examined decision criteria (see Table 3). Discrimination was also tested by ROC curves (Fig. 2): the areas under receiver characteristic curves for the two approaches were almost identical (0.740 and 0.744, respectively).

Figure 2
figure 2

The receiver characteristic curves for the original model (dashed line) and the new model (continuous line).

Comparison between the two institutions

Table 4 shows the characteristics of patients on the basis of their institutions. Patients from KFNGH were slightly (but significantly) younger than patients at UW. Hepatitis C virus was more common, and alcohol-related liver disease was less common, as an underlying disease in patients in KFNGH than in those at UW. APACHE II scores, and correspondingly predicted mortalities, were higher in KFNGH patients. Despite these differences, the performances of the prediction systems (the old and the new models) were quite similar in the two hospitals as reflected by SMRs. The new approach provided more accurate estimates of hospital mortality in each hospital than the old model.

Discussion

The findings of our study can be summarized as follows: (1)APACHE II with its original diagnostic category weight overestimated hospital mortality in postoperative liver transplantation patients; (2) when the newly derived OLTX-specific diagnostic category weight was applied, mortality prediction, discrimination, and calibration of APACHE II improved; (3)despite differences in the patient populations, the performance of the old and new models, as reflected by SMRs, was similar in the two institutions.

The literature evaluating APACHE II in postoperative liver transplantation patients is limited. Bein et al. [14] reviewed the use of scoring systems in 123 liver transplantation patients. In their study, APACHE II scores were reported; however, no calculation of the predicted mortality was performed. The study showed that APACHE II scores had good discrimination as reflected by the areas under the curves of the ROC curves. A second study by Sawyer et al. [15] found that mortality correlated with the APACHE II score. However, the predicted mortality was again not calculated.

Angus et al. [9] recently calculated the predicted mortality for postoperative liver transplantation patients and found that APACHE II system overestimated mortality when the original equation was used (SMR 0.73, 95% CI 0.58–0.99). This is consistent with our findings. The inaccuracy of APACHE II with its original equation probably arises from several factors. The developmental database of APACHE II did not have liver transplantation patients; the use of the system with the original equation for liver transplantation patients therefore essentially assumes that the weighted diagnostic category for liver transplantation would be the same as for postoperative gastrointestinal surgery. In this study we show, as shown previously by Angus et al. [9], that this assumption is not accurate because it leads to a significant overestimation of mortality.

We believe that the reason is related to the unique patho-physiology of the period after liver transplantation. Marked changes occur during the procedure, especially at the time of reperfusion [16,17]. These include a significant decrease in blood pressure, a decrease in systemic vascular resistance, an increase in cardiac output, a decrease in pH, an increase in lactate, an increase in potassium, and a prolongation of prothrombin time [16,17]. Although some of these abnormalities start to normalize during the final stages of surgery, some will persist into to the immediate postoperative period [16] and will be reflected on any severity of illness score such as APACHE II. These changes start to normalize rapidly as the graft starts to function. The multitude of the abnormalities and the speed with which they are corrected make this group of patients unique and explains the inaccuracy of APACHE II when using the diagnostic category weight of 'postoperative gastrointestinal surgery'.

On the basis of the above, it is not surprising that a model developed on a population of liver transplant patients would provide more accurate and reproducible estimates. Similar disease-specific customizations of mortality prediction systems have been performed, such as for sepsis [18].

There are several obvious advantages to the use of APACHE II as a model of severity of illness for liver transplant patients. These include the familiarity with the system and its widespread use in ICUs. ICUs that use APACHE II as their database severity of illness scoring system will find it easy to apply the system to this subgroup of patients rather than implementing a special disease-specific system exclusively for OLTX patients. In general, using a system for scoring the severity of illness is essential for monitoring transplant program performance over time and between different institutions. Such a system also can be useful for grouping patients in clinical studies.

In conclusion, APACHE II provided an accurate estimate of mortality in liver transplant patients when the OLTX-specific diagnostic category weight was used.

Table 1 Characteristics of patients
Table 2 Lemeshow–Hosmer goodness-of-fit C-statistic for APACHE II in its original and new models
Table 3 Classification matrix and sensitivity analysis for APACHE II in its original and new models
Table 4 Comparison between the two participating transplant centers

Key messages

· APACHE II with its original diagnostic category weight overestimated hospital mortality in postoperative liver transplantation patients.

· When the newly derived OLTX specific diagnostic category weight was applied, mortality prediction, discrimination and calibration of APACHE II improved.

· Despite differences in the patient population, the performance of the old and new models was similar in the two institutions as reflected by SMRs.