Le Gall et al. [1] have described an updated simplified acute physiology score (SAPS) II mortality model that was customized and expanded using 1998 to 1999 patient data from France. The original SAPS II model [2] has been used to predict hospital mortality in Europe and other parts of the world. SAPS II shares many elements in common with other methodologies such as Acute Physiology, Age, and Chronic Health Evaluation (APACHE) III [3] and mortality probability model (MPM)0-II [4], which have been more commonly used for US populations. Studies employing these models, which were developed in the early 1990s, to predict mortality in more contemporary patient databases from the US [5] and the UK [6] show that the accuracy of these mortality predictions has deteriorated. The deterioration has not been as much in discrimination (the ability to distinguish survivors and non-survivors) as in calibration (the correspondence of observed and predicted mortality). In particular, mortality tends to get over predicted when older models are applied to more contemporary data, which in turn leads to 'grade inflation' when benchmarking intensive care unit (ICU) performance [7]. It is thus not surprising that Le Gall et al. [1] found similar results when applying the original SAPS II model (based on data from 1991 to 1992) to a 'newer' data set (1998 to 1999). A mortality model developed for US Veterans Administration patients [8] and a new generation of mortality models (APACHE IV, MPM0-III, and SAPS III) have been developed to address this well-documented phenomenon of 'model fade'.

It is thus puzzling why the authors claim that their model is "a tool suitable for benchmarking" [1]. Instead it seems likely that the updated and expanded model presented by Le Gall et al. might already be out of calibration for patient data collected in 2005 and beyond. The authors concede as much when they apologize for the age of their data and state that, "Nevertheless, for historical comparisons (emphasis mine), the expanded SAPS II can be easily obtained from existing databases". Further, the authors also acknowledge that a different SAPS model, SAPS III "the more recent and sophisticated model", is currently under evaluation. Although the patient sample used to develop SAPS III is not large [9], it is based on more contemporary data.

There are some serious concerns about the patient mix in this study. First, Le Gall et al. state that some ICUs were in fact "intermediate units with only monitored patients" [1]. Mortality at these units is likely to be different from that at ICUs, resulting in models with coefficients optimized for this diluted population [10]. This would compound the effects caused by the age of the data and make benchmarking to contemporary ICUs even more problematic. Second, there is the potential for bias from inadequate collection of cohort data; "Among the 106 ICUs, 22 (21%) failed to provide the SAPS II score for over 20% of admissions" [1]. What are the characteristics of these ICUs and how do they compare with the 84 ICUs that provided more complete data? Were certain patient groups more likely to have a missing SAPS II score and, if so, then would this bias the results? These questions were not addressed in the paper. Third, the frequency of drug overdose patients is very high (11%) and mortality was greatly overestimated in this group. Because of these findings the authors make an exception to their rule of not including diagnostic variables and add a binary variable for the drug overdose patients. In effect, they are acknowledging that diagnostic information is useful in mortality models. They are correct in this assumption as demonstrated by the accuracy among diagnostic subgroups shown in the APACHE models, and they should seriously consider adding more of such variables to their model. The authors go on to state, however, that the inclusion of diagnostic group variables will result in poor calibration across patient groups. This contradicts their including a variable for drug overdose patients.

In summary, unlike fine wine, models for predicting ICU mortality do not age well. The article by Le Gall et al. provides an interesting footnote in the history of critical care mortality models. Beyond that it is equivocal whether their 'updated' model provides any tangible benefit.