To the Editor

In the article “Re-irradiation with cetuximab or cisplatin-based chemotherapy for recurrent squamous cell carcinoma of the head and neck” Dornoff et al. suggested a new prognostic score for the survival of relapsed head and neck cancer patients after salvage treatment consisting of re-irradiation with concurrent immuno- or chemotherapy. Ranging from 0 to 4 points, the score classified patients into five groups with excellent (4 points), good (3 points), moderate (2 points), poor (1 point), and very poor (0 points) outcome. The authors finally stated that their tool would be useful to identify suitable patients for re-irradiation [1].

Certainly, the study takes up an important issue. Simple and reliable prognostic models may be helpful for patient counselling and for the rational choice of treatment. However, we have some concerns regarding the methodological approach.

First, although perfectly meeting the criterion of simplicity, the suggested score lacks successful validation with an independent patient cohort (= external validation). Independent means that the dataset used to validate a prognostic model was not used to construct it. Without successful external validation, a prognostic model cannot be considered fit for purpose [2, 3].

Second, when evaluating the performance of a prognostic model, two fundamental aspects should be distinguished: discrimination and calibration. Discrimination is the extent to which risk estimates from a model characterize different prognoses. By contrast, calibration reflects prediction accuracy. It is particularly important to establish measures that reliably identify poor discrimination, because poor calibration can be improved by model recalibration, whereas inadequate discrimination cannot be corrected [3].

In the present study the analysts exclusively performed a log-rank test between the different prognostic groups and presumed successful validation because “statistical significance (p < 0.05)” was achieved.

This approach cannot be recommended; p values do not quantify discrimination but the evidence against the hypothesis that survival of the risk groups coincides. Instead, the evaluation of hazard ratios may be used as a sensible check of discrimination and a useful accompaniment to the visual comparison of Kaplan–Meier curves [3].

In short, we highly recommend verifying the validity of the aforementioned score with an independent dataset using state-of-the-art statistical methods. A detailed discussion on the appropriate methodology is provided in an article by Royston and Altman [3].