External validation of the NeuroImaging Radiological Interpretation System and Helsinki computed tomography score for mortality prediction in patients with traumatic brain injury treated in the intensive care unit: a Finnish intensive care consortium study

Background Admission computed tomography (CT) scoring systems can be used to objectively quantify the severity of traumatic brain injury (TBI) and aid in outcome prediction. We aimed to externally validate the NeuroImaging Radiological Interpretation System (NIRIS) and the Helsinki CT score. In addition, we compared the prognostic performance of the NIRIS and the Helsinki CT score to the Marshall CT classification and to a clinical model. Methods We conducted a retrospective multicenter observational study using the Finnish Intensive Care Consortium database. We included adult TBI patients admitted in four university hospital ICUs during 2003–2013. We analyzed the CT scans using the NIRIS and the Helsinki CT score and compared the results to 6-month mortality as the primary outcome. In addition, we created a clinical model (age, Glasgow Coma Scale score, Simplified Acute Physiology Score II, presence of severe comorbidity) and combined clinical and CT models to see the added predictive impact of radiological data to conventional clinical information. We measured model performance using area under curve (AUC), Nagelkerke’s R2 statistics, and the integrated discrimination improvement (IDI). Results A total of 3031 patients were included in the analysis. The 6-month mortality was 710 patients (23.4%). Of the CT models, the Helsinki CT displayed best discrimination (AUC 0.73 vs. 0.70 for NIRIS) and explanatory variation (Nagelkerke’s R2 0.20 vs. 0.15). The clinical model displayed an AUC of 0.86 (95% CI 0.84–0.87). All CT models increased the AUC of the clinical model by + 0.01 to 0.87 (95% CI 0.85–0.88) and the IDI by 0.01–0.03. Conclusion In patients with TBI treated in the ICU, the Helsinki CT score outperformed the NIRIS for 6-month mortality prediction. In isolation, CT models offered only moderate accuracy for outcome prediction and clinical variables outweighing the CT-based predictors in terms of predictive performance. Supplementary Information The online version contains supplementary material available at 10.1007/s00701-022-05353-0.


Introduction
Traumatic brain injury (TBI) is one of the most common causes of mortality among young persons [3,4]. In recent years, it has been identified as an increasing risk of mortality and morbidity among elderly as well [22]. Glasgow Coma Scale (GCS) has been traditionally used as a measure of TBI severity upon admission to hospital. GCS is easy and fast to assess; however, it does not give information on structural information on potential intracranial lesions.
As computed tomography (CT) has become widely available, several classifications and scoring systems have been developed for additional information on TBI prognosis. These include, e.g., Marshall CT classification [8], Rotterdam CT-based score [7], Helsinki CT score [17], and Stockholm CT score [12].
The practical use of CT scores is to give clinicians more quantitative and comparable tools to assess the severity of TBI and estimate need for operative treatment and prognosis. For research purposes, CT scores are used for injury severity standardization and comparison. Recently, a new CT score, NeuroImaging Radiological Interpretation System (NIRIS), was introduced [25]. The NIRIS consists of five categories ranging from 0 to 5, with an increasing intracranial injury load with an increasing number. NIRIS was developed to consolidate imaging findings into different categories of ordinal severity to inform specific patient management actions [25]. The NIRIS has been earlier validated against Marshall CT classification and Rotterdam CT score [2,25,27]. To our knowledge, NIRIS has not been validated nor compared to more granular Helsinki CT score.
In this study, we aimed to perform an external validation study of the NIRIS and to compare it with the Helsinki CT score and a clinical model for predicting 6-month mortality. We hypothesized that both CT scores would add predictive performance when compared just with clinical data and that the Helsinki CT score would outperform the NIRIS, as it is more granular. We also report performance statistics of the widely used Marshall CT classification system.

Methods and materials
The ethics committee of Helsinki University Hospital (194

Study design and population
We performed a multicenter retrospective observational study using data that were prospectively collected from the Finnish Intensive Care Consortium (FICC) database. The FICC database is a nationwide prospectively data-collecting database including all ICU-treated patients from the majority of all ICUs in Finland [19]. In Finland, all specialized tertiary intensive care of TBI patients is centralized to five tertiary ICUs. Four of these ICUs participate in the FICC covering approximately two-thirds of the population in Finland. From these four tertiary ICUs, we included all adult TBI patients (age ≥ 18 years) admitted from January 1, 2003, to December 31, 2013 (readmissions excluded). Patients were excluded if no primary CT scan was available, and if Glasgow Coma Scale (GCS) or pre-admission functional status was missing.
Six-month case fatality was used as endpoint for analysis (available for all Finnish citizens through the Finnish population registry).

CT assessment
All patients in the study had non-contrast CT scan taken at the admission to hospital. Patients with only post-operative CT scans, CT angiography, or MRI scans were excluded. All available CT images were classified according to the Marshall CT classification system, the Helsinki CT score, and the updated version of NIRIS [27] by two authors (JV, RR). The CT classification systems are described in Table 1.

Statistical analysis
The Marshall CT classification [8] and the NIRIS [25,27] were treated as categorical variables, NIRIS being ordinal. The Helsinki CT score was originally constructed as an ordinal scale, but due to its many levels and numeric distribution, it can be treated as a numeric variable [17].
We performed first-level customization of the CT scores by fitting a new logit function to the respective CT score [10]. The Helsinki CT-based score (hereafter referred to as Helsinki CT score), Marshall CT-based classification (hereafter referred to as Marshall CT class), and NIRIS-based (hereafter referred to as NIRIS).
We created a clinical "base model" that included age, GCS score (worst measured GCS score during the first ICU-day or as the last reliable GCS for intubated and/or sedated patients), a modified Simplified Acute Physiology Score II (SAPS II, without the age, GCS score, and chronic comorbidity component), and the presence of a chronic comorbidity (according to the SAPS II and Acute Physiology and Chronic Evaluation [APACHE] II definitions). The SAPS II and APACHE scores were assessed during the first 24 h of ICU treatment. We separately added age, GCS score, and chronic comorbidities to give them more weight in our base model [18]. To the base model, the three different CT scores (NIRIS, Marshall CT classification, and Helsinki CT score) were separately added.
We assessed the individual CT scores and the combined base + CT scores by calculating the Nagelkerke's R 2 and the area under the receiver operating characteristics curve [11]. Nagelkerke's R 2 gives a value between 0 and 1 resembling explained variance, where the value 1 indicates a model that fully explains the outcome. The AUC values range from 0.5 to 1 with 0.5 indicating at the level of chance and 1 indicating a perfect model. We assessed the calibration by using the Hosmer-Lemeshow test for all models apart from NIRIS and Marshall, as these consist of less than 10 groups.
We compared AUCs between models using a DeLong test. We considered p-values under 0.05 statistically significant.
In addition to the AUC analysis, we calculated the integrated discrimination improvement (IDI) as the AUC might be more insensitive in model comparisons in which the baseline model has performed well [14]. The IDI is a category free measure of the discrimination ability between two logistic regression prediction models. The IDI can be defined as the difference in discrimination slopes between two models one with, and the other without, the added variable. It can be estimated with the following equation: Î DI = p new,events −p old,events − p new,nonevents −p old,nonevents , where p new,events is the mean of the new model-based predicted probabilities of an event for those who develop events, p old,events is the corresponding quantity based on the old model, p new,nonevents is the mean of the new model-based predicted probabilities of an event for those who do not develop events, and p old,nonevents is the corresponding quantity based on the old model [13]. Another representation of the IDI can be formulated with the following equation: where IS = ∫ sensitivity and IP = ∫ 1 − specificity and the subscripts "new" and "old" correspond to the new model and the old model, respectively [13]. The IDI reflects the mean magnitude of the change in outcome probability with the addition of the new variable in the prediction model. It is the area between the curves (the old model and the new model) in a plot where on the y-axis are both sensitivity and 1-specificity, and on the x-axis is the calculated risk. The model performance is improved when the new model moves the reference curve of sensitivity toward the top-right corner and the reference curve of 1-specificity toward lower-left corner. Conversely, model performance is reduced when the new model moves the reference curve of sensitivity toward lower-left corner and the reference curve of 1-specificity toward top-right corner [16,24]. The IDI ranges theoretically from − 2 to 2 representing the overall risk discrimination improvement [13,15,16,21].

Results
Altogether, 3031 patients met the inclusion criteria ( Fig. 1). Patient characteristics are listed in Table 2. The median patient age was 55 years, 78% were male, 90% were functionally independent prior to admission, and 8.3% suffered from significant comorbidity. Almost half (47%) had a GCS between 3 and 8 in the first 24 h or before intubation. Approximately one-third (33%) of the patients required operative treatment, 24% were ICP monitored, and majority of the patients (66%) were intubated and mechanically ventilated. Seven percent of patients died during ICU treatment, 13% died in hospital, and 23% in 6 months after the TBI.
The distribution of NIRIS categories, Marshall CT classes, and median Helsinki CT scores is shown in Table 3. NIRIS category 2 (43%) was the most frequent, followed by NIRIS category 4 (22%) and 3 (17%). The most frequent Marshall classes were EML/NEML (44%) and II (43%). The median Helsinki CT score was 2.0 (IQR 2.0-4.0). NIRIS categories 0-1 correlated well with Marshall classes I and II. Most patients with a NIRIS category of II had a Marshall class of II. The majority of patients with a NIRIS category of 3 and 4 had a Marshall class indicating mass lesion (EML/NEML).
Non-survivors and patients with a low GCS score 3-8 in the first 24 h, or before intubation, more often belonged to a higher NIRIS category than survivors ( Table 4). The median age, frequency of mechanical ventilation, operative treatment, and ICP monitoring increased with a rising NIRIS category (Table 4). When divided further into groups of NIRIS categories and respective GCS groups (3-8; 9-12; 13-15), the correlation between ICP monitoring and increasing NIRIS category was not as clear (Supplemental Table 1). The 6-month mortality in patients with NIRIS categories 0, 1, 2, 3, and 4 was 8.4%, 5.3%, 17.0%, 27.0%, and 46.6%, respectively.
The base model displayed an AUC of 0.86 (95% CI 0.84-0.87) with an explained variance of 0.43. The addition of all CT models increased the AUC by + 0.01 to 0.87 (p < 0.001, Table 5, Fig. 2  Note that some CT images had more than one exclusion criteria

Key findings
In this large multicenter observational study, including 3031 patients from four academic centers in Finland, we compared three different CT scores to their ability to predict 6-month mortality in patients with TBI treated in the ICU. Of the three CT models in isolation, the Helsinki CT score displayed the best performance for 6-month mortality prediction. However, after adding the individual CT models to a clinical base model, only a moderate improvement in predictive performance predicting the 6-month mortality could be seen: AUC by + 0.01 to 0.87 (95% CI 0.85-0.88) and the IDI by 0.01-0.03, which is small, but statistically significant, and makes the AUC analysis more robust. Our results suggest that the choice of CT model when adjusting for case-mix in patients with TBI is of less importance than the adjustment of clinical variables such as age, GCS score, comorbidities, and acute physiological derangements (e.g., SAPS II).

Comparison to previous studies
The NIRIS has been validated in a more general TBI population and compared to the Marshall CT classification and the Rotterdam CT score [2,27]. Their results showed that the NIRIS performed similarly to the Marshall CT classification and the Rotterdam CT score in terms of predicting mortality, but markedly better in terms of discriminating the needed interventions and intensity of patient care. This is in line with the original aim of the NIRIS to predict TBI patient care based on initial imaging.
The study populations in both original article introducing the NIRIS [25] and in the validation article by Zhou et al., [27] done in Stanford Hospital, USA, had a more general TBI population compared to our already ICU-admitted study population. Hence, our study population had more severe injuries and higher mortality: approx. 2% in earlier studies with the NIRIS compared to almost 13% during hospitalization in our study. This is also seen in the NIRIS categories as category 2 is the most common (43%) in our study population compared to category 0 in Zhou et al. [27] (77%). Furthermore, categories 3 and 4 include 39% of our study population that leaves less than 20% for categories 0 and 1. A validation study [2] of NIRIS conducted in India had a similar mortality rate in hospital (14%) than in our study; however, the patients were general TBI patients, not only ICUadmitted TBI patients.
In contrast to the NIRIS, the Helsinki CT score was developed for outcome prediction in patients with TBI treated in the ICU [17]. The Helsinki CT score has been validated in pediatric TBI patients (AUC 0.84) [9], penetrating TBI patients (AUC 0.90) [6], and adult TBI patients (AUC 0.70-0.81) [1,5,20,23,26] with good performance. We found similar performance measures in the present cohort (AUC 0.73, Nagelkerke's R 2 0.20). Thus, due to differences in design and granularity, it is not surprising that the Helsinki CT score outperformed the NIRIS in terms of outcome prediction. Furthermore, in line with previous results, clinical variables seem to be more important predictors that CT predictors [18].
The clinical importance of early CT imaging in patients with significant TBI is undisputed. However, current CT models seem to be of limited additional prognostic value compared to clinical variables in terms of mortality prediction. There is some obvious selection bias to this, as all patients were admitted to the ICU. Thus, the association between variables such as midline shift and mass lesions is diluted since these may be, at least partly, reversible due to surgical treatment [23]. It is possible that the predictive performance of the current CT models could be improved by including spatial and volumetric parameters.
The SAPS II and APACHE scores are measured during the first 24 h of ICU treatment and thus contain more information than the CT models based upon admission characteristics. This will be in favor of the clinical model in terms of prognostic accuracy.

Strengths and limitations
Some strengths should be highlighted. We used a large multicenter high-quality database collecting data prospectively. Thus, we were able to include more than three thousand patients in our study. In addition, there was a small number of missing data, and we had a complete 6-month follow-up. Our patient cohort also represents well the general ICUtreated TBI population in Finland as the referral population of the four neurointensive ICUs is approximately 3.5 million people, encompassing two-thirds of the Finnish population. Some limitations should be acknowledged. The FICC is a general ICU database and lacks some TBI-specific parameters, like specific neurosurgical procedures, admission GCS score, pupillary light reactivity, and thus information on IMPACT or CRASH models. The FICC database used in this study did not include data on such devastating injuries that the more aggressive treatment was withheld. Information regarding admission due to organ donation has been added later. Still, for case-mix adjustment, our base model displayed good statistical performance. Second, we only had all-cause mortality as an outcome measure. Predicting functional outcome would be desirable as well. Noteworthy, the Helsinki CT score was designed to predict outcome while Marshall CT and NIRIS were not. Thus, this might skew the results in favor of the Helsinki CT score when predicting outcome. Third, we highlight that we only included patients treated in a university hospital ICU and did not include milder TBIs. In this study, we did not include in comparisons the two other significant CT scores, namely, the Stockholm CT score and the Rotterdam CT score. Earlier, both of these scores have been validated against the Helsinki CT score [23] and the Rotterdam CT score against the NIRIS [2,27]. The Stockholm CT score has shown to have superior predictive power, to some extent, over the Helsinki CT score and the Rotterdam CT score [23]. Further work should be done to compare the Stockholm CT score to the NIRIS.

Conclusion
In patients with TBI treated in the ICU, the Helsinki CT score outperformed the NIRIS for 6-month mortality. However, clinical variables outweighed the current CT-based models in terms of predictive performance. Thus, accounting for clinical variables when adjusting for TBI injury severity is imperative.
Author contribution Juho Vehviläinen: study design, manuscript preparation, data analysis, data interpretation. Markus Skrifvars: data acquisition, review and editing, data interpretation. Matti Reinikainen: data acquisition, review and editing, data interpretation. Stepani Bendel: data acquisition, review and editing, data interpretation. Tero Ala-Kokko: data acquisition, review and editing, data interpretation. Sanna Hoppu: data acquisition, review and editing, data interpretation. Ruut Laitio: data acquisition, review and editing, data interpretation. Jari Siironen: data acquisition, review and editing, data interpretation. Rahul Raj: study design, manuscript preparation, data analysis, review and editing, data interpretation.
Funding Open Access funding provided by University of Helsinki including Helsinki University Central Hospital. Independent funding support has been received from Helsinki University Hospital (State funding, Finland VTR TYH2018227); Finska Läkaresällskapet; Medicinska Understödsföreningen Liv & Hälsa; and Svenska Kulturfonden. The funders had no role in study design, data collection, data analysis, data interpretation, or writing of the manuscript. The first and last author had full access to all the data in the study and had final responsibility for the decision to submit for publication.

Consent to participate
There was no need for patient consent.