Introduction

In this issue of Critical Care, Arabi and co-workers present the results of the validation of a modified model of the Acute Physiology and Chronic Health Evaluation (APACHE) II for patients receiving orthotopic liver transplants [1]. They retrospectively used data from 174 patients admitted to two hospitals (King Fahad National Guard Hospital in Riyadh, Saudi Arabia, and the University of Wisconsin Madison, WI, USA) to validate the modification of the APACHE II prognostic model described by Derek Angus and colleagues [2]. Is the approach of Arabi and co-workers correct? Can the results and the approach be generalized to other settings?

The APACHE prognostic systems

Described in 1985 [3], the APACHE II prognostic system is one of the most widely used general outcome models. Developed for use with unselected groups of critically ill adults, the system uses three types of data to provide the user with a probability of death at hospital discharge: these date are the Acute Physiology Score (APS), based on the most deranged physiological and laboratory values during the first 24 hours in the intensive care unit (ICU); the premorbid status, based on a list of chronic diseases and conditions apparent at admission to hospital; and the diagnostic category, based on a list of 29 medical and 24 surgical diagnoses.

Because the system was developed in the early 1980s, several diseases and conditions were not well represented in the original database. This fact, together with major changes in the outcome of major diseases and the need to incorporate other variables, led the authors to undertake a major update, the APACHE III prognostic system, published in 1991 [4]. This updated system, being commercial, has not had the impact of its free predecessor. With better calibration, probably reflecting more the updated database than major changes in the statistical construct of the model, it was found to be quite well calibrated for the USA [5], except in diagnostic groups for which major changes have been made to the therapeutic approach, such as acute myocardial infarction. In other settings, such as Spain, calibration problems remained, prompting a major recalibration or customization of the Apache III system [6].

The customization of an outcome prediction model

Customization – that is, modification of the equations that transform a score (or the directly measured variables) to a probability of mortality – has been suggested as a possible approach when there is evidence that a given model is not fully appropriate and an unbiased estimation of mortality is needed. Preliminary work [7,8] showed that slight modifications of the logistic regression equations would suffice. Later, Zhu et al., working with computer simulations [9], and groups using independent databases [10,11] showed that customization was feasible and would improve the calibration of the model but that some problems would remain, so that there would still be a need for independent validation of the customized model.

This need for validation applies to the work by Angus and colleagues [2] on the development of new coefficients for the APACHE II system to adapt it to patients after liver transplantation. Those authors' approach, which was to develop a new diagnostic weighting for this category of patients, is attractive, because it is simple. However, it assumes that the APACHE II model incorporates the most important prognostic variables in the setting of liver transplantation, and this assumption needs to be justified.

Does the paper by Arabi and colleagues answer our questions?

It does not. The work done by Arabi and his co-workers was based on a highly heterogeneous database, and patients were treated in two very different institutions. Differences in the prevalence of chronic conditions and the degree of physiologic disorder as well as differences in the procedures followed during the liver transplantation (liver nutrition solutions, cold ischemia time, etc.) could have influenced the outcome for these patients. Moreover, the small number of patients in the sample analyzed makes the Hosmer–Lemeshow goodness-of-fit test underpowered to reveal potential differences between the predicted and the actual mortality. The better calibration of the customized model is promising, but it should be empirically tested in a larger database, constructed to reflect the case mix of liver transplantation patients.

For the moment, therefore, it remains to be shown whether the approach used – to derive a new coefficient for the APACHE II system to be applied to a specific group of patients – is potentially useful and will perform better than its predecessor.