Introduction

Traumatic brain injury (TBI) is one of the most common causes of mortality among young persons [3, 4]. In recent years, it has been identified as an increasing risk of mortality and morbidity among elderly as well [22]. Glasgow Coma Scale (GCS) has been traditionally used as a measure of TBI severity upon admission to hospital. GCS is easy and fast to assess; however, it does not give information on structural information on potential intracranial lesions.

As computed tomography (CT) has become widely available, several classifications and scoring systems have been developed for additional information on TBI prognosis. These include, e.g., Marshall CT classification [8], Rotterdam CT-based score [7], Helsinki CT score [17], and Stockholm CT score [12].

The practical use of CT scores is to give clinicians more quantitative and comparable tools to assess the severity of TBI and estimate need for operative treatment and prognosis. For research purposes, CT scores are used for injury severity standardization and comparison. Recently, a new CT score, NeuroImaging Radiological Interpretation System (NIRIS), was introduced [25]. The NIRIS consists of five categories ranging from 0 to 5, with an increasing intracranial injury load with an increasing number. NIRIS was developed to consolidate imaging findings into different categories of ordinal severity to inform specific patient management actions [25]. The NIRIS has been earlier validated against Marshall CT classification and Rotterdam CT score [2, 25, 27]. To our knowledge, NIRIS has not been validated nor compared to more granular Helsinki CT score.

In this study, we aimed to perform an external validation study of the NIRIS and to compare it with the Helsinki CT score and a clinical model for predicting 6-month mortality. We hypothesized that both CT scores would add predictive performance when compared just with clinical data and that the Helsinki CT score would outperform the NIRIS, as it is more granular. We also report performance statistics of the widely used Marshall CT classification system.

Methods and materials

The ethics committee of Helsinki University Hospital (194/13/03/14 §97), the Finnish National Institute for Health and Welfare (THL/713/5.05.01/2014 and THL/1298/5.05.00/2019), Statistics Finland (TK-53–1047-14), the Office of the Data Protection Ombudsman (Dnro 2713/402/2016 28.10.16), and all the participating university hospitals’ research committees approved this study. The study adhered to the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guidelines.

Study design and population

We performed a multicenter retrospective observational study using data that were prospectively collected from the Finnish Intensive Care Consortium (FICC) database. The FICC database is a nationwide prospectively data-collecting database including all ICU-treated patients from the majority of all ICUs in Finland [19]. In Finland, all specialized tertiary intensive care of TBI patients is centralized to five tertiary ICUs. Four of these ICUs participate in the FICC covering approximately two-thirds of the population in Finland. From these four tertiary ICUs, we included all adult TBI patients (age ≥ 18 years) admitted from January 1, 2003, to December 31, 2013 (readmissions excluded). Patients were excluded if no primary CT scan was available, and if Glasgow Coma Scale (GCS) or pre-admission functional status was missing.

Six-month case fatality was used as endpoint for analysis (available for all Finnish citizens through the Finnish population registry).

CT assessment

All patients in the study had non-contrast CT scan taken at the admission to hospital. Patients with only post-operative CT scans, CT angiography, or MRI scans were excluded. All available CT images were classified according to the Marshall CT classification system, the Helsinki CT score, and the updated version of NIRIS [27] by two authors (JV, RR). The CT classification systems are described in Table 1.

Table 1 Description of Marshall CT classification, Helsinki CT score, and NIRIS methods

Statistical analysis

The Marshall CT classification [8] and the NIRIS [25, 27] were treated as categorical variables, NIRIS being ordinal. The Helsinki CT score was originally constructed as an ordinal scale, but due to its many levels and numeric distribution, it can be treated as a numeric variable [17].

We performed first-level customization of the CT scores by fitting a new logit function to the respective CT score [10]. The Helsinki CT-based score (hereafter referred to as Helsinki CT score), Marshall CT-based classification (hereafter referred to as Marshall CT class), and NIRIS-based (hereafter referred to as NIRIS).

We created a clinical “base model” that included age, GCS score (worst measured GCS score during the first ICU-day or as the last reliable GCS for intubated and/or sedated patients), a modified Simplified Acute Physiology Score II (SAPS II, without the age, GCS score, and chronic comorbidity component), and the presence of a chronic comorbidity (according to the SAPS II and Acute Physiology and Chronic Evaluation [APACHE] II definitions). The SAPS II and APACHE scores were assessed during the first 24 h of ICU treatment. We separately added age, GCS score, and chronic comorbidities to give them more weight in our base model [18]. To the base model, the three different CT scores (NIRIS, Marshall CT classification, and Helsinki CT score) were separately added.

We assessed the individual CT scores and the combined base + CT scores by calculating the Nagelkerke’s R2 and the area under the receiver operating characteristics curve [11]. Nagelkerke’s R2 gives a value between 0 and 1 resembling explained variance, where the value 1 indicates a model that fully explains the outcome. The AUC values range from 0.5 to 1 with 0.5 indicating at the level of chance and 1 indicating a perfect model. We assessed the calibration by using the Hosmer–Lemeshow test for all models apart from NIRIS and Marshall, as these consist of less than 10 groups.

We compared AUCs between models using a DeLong test. We considered p-values under 0.05 statistically significant.

In addition to the AUC analysis, we calculated the integrated discrimination improvement (IDI) as the AUC might be more insensitive in model comparisons in which the baseline model has performed well [14]. The IDI is a category free measure of the discrimination ability between two logistic regression prediction models. The IDI can be defined as the difference in discrimination slopes between two models one with, and the other without, the added variable. It can be estimated with the following equation: \(\widehat{IDI}=\left({\overline{\widehat{p}}}_{new,events}-{\overline{\widehat{p}}}_{old,events}\right)-\left({\overline{\widehat{p}}}_{new,nonevents}-{\overline{\widehat{p}}}_{old,nonevents}\right)\), where \({\overline{\widehat{p}}}_{new,events}\) is the mean of the new model-based predicted probabilities of an event for those who develop events, \({\overline{\widehat{p}}}_{old,events}\) is the corresponding quantity based on the old model, \({\overline{\widehat{p}}}_{new,nonevents}\) is the mean of the new model-based predicted probabilities of an event for those who do not develop events, and \({\overline{\widehat{p}}}_{old,nonevents}\) is the corresponding quantity based on the old model [13]. Another representation of the IDI can be formulated with the following equation: \(IDI=\left({IS}_{new}-{IS}_{old}\right)-({IP}_{new}-{IP}_{old})\), where \(IS= \int sensitivity\) and \(IP= \int 1-specificity\) and the subscripts “new” and “old” correspond to the new model and the old model, respectively [13]. The IDI reflects the mean magnitude of the change in outcome probability with the addition of the new variable in the prediction model. It is the area between the curves (the old model and the new model) in a plot where on the y-axis are both sensitivity and 1-specificity, and on the x-axis is the calculated risk. The model performance is improved when the new model moves the reference curve of sensitivity toward the top-right corner and the reference curve of 1-specificity toward lower-left corner. Conversely, model performance is reduced when the new model moves the reference curve of sensitivity toward lower-left corner and the reference curve of 1-specificity toward top-right corner [16, 24]. The IDI ranges theoretically from − 2 to 2 representing the overall risk discrimination improvement [13, 15, 16, 21].

We used SPSS IBM Corp. Released 2020. IBM SPSS Statistics for Windows, Version 27.0. Armonk, NY: IBM Corp and STATA StataCorp. 2019. Stata Statistical Software: Release 16. College Station, TX: StataCorp LLC for the statistical analysis.

Results

Altogether, 3031 patients met the inclusion criteria (Fig. 1). Patient characteristics are listed in Table 2. The median patient age was 55 years, 78% were male, 90% were functionally independent prior to admission, and 8.3% suffered from significant comorbidity. Almost half (47%) had a GCS between 3 and 8 in the first 24 h or before intubation. Approximately one-third (33%) of the patients required operative treatment, 24% were ICP monitored, and majority of the patients (66%) were intubated and mechanically ventilated. Seven percent of patients died during ICU treatment, 13% died in hospital, and 23% in 6 months after the TBI.

Fig. 1
figure 1

Study’s patient flow chart. Abbreviations: CT = computed tomography; MRI = magnetic resonance imaging; GCS = Glasgow Coma Scale; FICC = Finnish Intensive Care Consortium; TBI = traumatic brain injury. Note that some CT images had more than one exclusion criteria

Table 2 Patient characteristics

 The distribution of NIRIS categories, Marshall CT classes, and median Helsinki CT scores is shown in Table 3. NIRIS category 2 (43%) was the most frequent, followed by NIRIS category 4 (22%) and 3 (17%). The most frequent Marshall classes were EML/NEML (44%) and II (43%). The median Helsinki CT score was 2.0 (IQR 2.0–4.0). NIRIS categories 0–1 correlated well with Marshall classes I and II. Most patients with a NIRIS category of II had a Marshall class of II. The majority of patients with a NIRIS category of 3 and 4 had a Marshall class indicating mass lesion (EML/NEML).

Table 3 Number of patients in NIRIS categories and corresponding Marshall CT classifications and medians of Helsinki CT score

Non-survivors and patients with a low GCS score 3–8 in the first 24 h, or before intubation, more often belonged to a higher NIRIS category than survivors (Table 4). The median age, frequency of mechanical ventilation, operative treatment, and ICP monitoring increased with a rising NIRIS category (Table 4). When divided further into groups of NIRIS categories and respective GCS groups (3–8; 9–12; 13–15), the correlation between ICP monitoring and increasing NIRIS category was not as clear (Supplemental Table 1). The 6-month mortality in patients with NIRIS categories 0, 1, 2, 3, and 4 was 8.4%, 5.3%, 17.0%, 27.0%, and 46.6%, respectively.

Table 4 Association between NIRIS category, age, GCS, treatment-related factors, and mortality

Performance of the models

Of the CT models, Helsinki CT displayed best discrimination (AUC 0.73 vs. 0.70 for NIRIS vs. 0.68 for Marshall, Fig. 2) and explained variance (Nagelkerke’s R2 0.20 vs. 0.15 for NIRIS vs. 0.14 for Marshall) (Table 5).

Fig. 2
figure 2

On the left: ROC curves of NIRIS, Marshall CT classification, and Helsinki CT score vs. 6-month mortality. On the right: ROC curves of NIRIS, Marshall CT classification, and Helsinki CT score combined to a base model vs. 6-month mortality

Table 5 Performance measures of the CT scores, base model, and combined models for predicting 6-month mortality

The base model displayed an AUC of 0.86 (95% CI 0.84–0.87) with an explained variance of 0.43. The addition of all CT models increased the AUC by + 0.01 to 0.87 (p < 0.001, Table 5, Fig. 2). The explained variance increased to 0.44 when adding the NIRIS to the base model and to 0.46 when adding the Helsinki CT score to the base model. The IDI values were positive (ranging from 0.011 to 0.028) for all CT models when they were added to the base model.

Discussion

Key findings

In this large multicenter observational study, including 3031 patients from four academic centers in Finland, we compared three different CT scores to their ability to predict 6-month mortality in patients with TBI treated in the ICU. Of the three CT models in isolation, the Helsinki CT score displayed the best performance for 6-month mortality prediction. However, after adding the individual CT models to a clinical base model, only a moderate improvement in predictive performance predicting the 6-month mortality could be seen: AUC by + 0.01 to 0.87 (95% CI 0.85–0.88) and the IDI by 0.01–0.03, which is small, but statistically significant, and makes the AUC analysis more robust. Our results suggest that the choice of CT model when adjusting for case-mix in patients with TBI is of less importance than the adjustment of clinical variables such as age, GCS score, comorbidities, and acute physiological derangements (e.g., SAPS II).

Comparison to previous studies

The NIRIS has been validated in a more general TBI population and compared to the Marshall CT classification and the Rotterdam CT score [2, 27]. Their results showed that the NIRIS performed similarly to the Marshall CT classification and the Rotterdam CT score in terms of predicting mortality, but markedly better in terms of discriminating the needed interventions and intensity of patient care. This is in line with the original aim of the NIRIS to predict TBI patient care based on initial imaging.

The study populations in both original article introducing the NIRIS [25] and in the validation article by Zhou et al., [27] done in Stanford Hospital, USA, had a more general TBI population compared to our already ICU-admitted study population. Hence, our study population had more severe injuries and higher mortality: approx. 2% in earlier studies with the NIRIS compared to almost 13% during hospitalization in our study. This is also seen in the NIRIS categories as category 2 is the most common (43%) in our study population compared to category 0 in Zhou et al. [27] (77%). Furthermore, categories 3 and 4 include 39% of our study population that leaves less than 20% for categories 0 and 1. A validation study [2] of NIRIS conducted in India had a similar mortality rate in hospital (14%) than in our study; however, the patients were general TBI patients, not only ICU-admitted TBI patients.

In contrast to the NIRIS, the Helsinki CT score was developed for outcome prediction in patients with TBI treated in the ICU [17]. The Helsinki CT score has been validated in pediatric TBI patients (AUC 0.84) [9], penetrating TBI patients (AUC 0.90) [6], and adult TBI patients (AUC 0.70–0.81) [1, 5, 20, 23, 26] with good performance. We found similar performance measures in the present cohort (AUC 0.73, Nagelkerke’s R2 0.20). Thus, due to differences in design and granularity, it is not surprising that the Helsinki CT score outperformed the NIRIS in terms of outcome prediction. Furthermore, in line with previous results, clinical variables seem to be more important predictors that CT predictors [18].

The clinical importance of early CT imaging in patients with significant TBI is undisputed. However, current CT models seem to be of limited additional prognostic value compared to clinical variables in terms of mortality prediction. There is some obvious selection bias to this, as all patients were admitted to the ICU. Thus, the association between variables such as midline shift and mass lesions is diluted since these may be, at least partly, reversible due to surgical treatment [23]. It is possible that the predictive performance of the current CT models could be improved by including spatial and volumetric parameters.

The SAPS II and APACHE scores are measured during the first 24 h of ICU treatment and thus contain more information than the CT models based upon admission characteristics. This will be in favor of the clinical model in terms of prognostic accuracy.

Strengths and limitations

Some strengths should be highlighted. We used a large multicenter high-quality database collecting data prospectively. Thus, we were able to include more than three thousand patients in our study. In addition, there was a small number of missing data, and we had a complete 6-month follow-up. Our patient cohort also represents well the general ICU-treated TBI population in Finland as the referral population of the four neurointensive ICUs is approximately 3.5 million people, encompassing two-thirds of the Finnish population.

Some limitations should be acknowledged. The FICC is a general ICU database and lacks some TBI-specific parameters, like specific neurosurgical procedures, admission GCS score, pupillary light reactivity, and thus information on IMPACT or CRASH models. The FICC database used in this study did not include data on such devastating injuries that the more aggressive treatment was withheld. Information regarding admission due to organ donation has been added later. Still, for case-mix adjustment, our base model displayed good statistical performance. Second, we only had all-cause mortality as an outcome measure. Predicting functional outcome would be desirable as well. Noteworthy, the Helsinki CT score was designed to predict outcome while Marshall CT and NIRIS were not. Thus, this might skew the results in favor of the Helsinki CT score when predicting outcome. Third, we highlight that we only included patients treated in a university hospital ICU and did not include milder TBIs. In this study, we did not include in comparisons the two other significant CT scores, namely, the Stockholm CT score and the Rotterdam CT score. Earlier, both of these scores have been validated against the Helsinki CT score [23] and the Rotterdam CT score against the NIRIS [2, 27]. The Stockholm CT score has shown to have superior predictive power, to some extent, over the Helsinki CT score and the Rotterdam CT score [23]. Further work should be done to compare the Stockholm CT score to the NIRIS.

Conclusion

In patients with TBI treated in the ICU, the Helsinki CT score outperformed the NIRIS for 6-month mortality. However, clinical variables outweighed the current CT-based models in terms of predictive performance. Thus, accounting for clinical variables when adjusting for TBI injury severity is imperative.