The usefulness of a trauma probability of survival model for forensic life-threatening danger assessments

Clinical forensic medical examinations constitute an increasing proportion of our institution’s tasks, and, concomitantly, the authorities are now requesting forensic life-threatening danger assessments based on our examinations. The aim of this retrospective study was to assess if a probability of survival (PS) trauma score could be useful for these forensic life-threatening danger assessments and to identify a cut-off PS score as a supporting tool for the forensic practice of assessing life-threatening danger. We compared a forensic database and a trauma database and identified 161 individuals (aged 15 years or older) who had both a forensic life-threatening danger assessment and a PS score. The life-threatening danger assessments comprised the following statements: was not in life-threatening danger (NLD); could have been in life-threatening danger (CLD); or was in life-threatening danger (LD). The inclusion period was 2012–2016. A statistically significant difference was found in the PS scores between NLD, CLD and LD (chi-square test: p < 0.0001). The usefulness of the PS score for categorizing life-threatening danger assessments was determined by a receiver-operator characteristic (ROC) curve. The area under the curve was 0.76 (95% CI, 0.69 to 0.84) and the ROC curve revealed that a cut-off PS score of 95.8 would appropriately identify LD. Therefore, a PS score below 95.8 would indicate life-threatening danger. We propose a further exploration of how the evidence-based PS score, including a cut-off value, might be implemented in clinical forensic medical statements to add to the scientific strength of these statements.


Introduction
Clinical forensic medical (CFM) examinations may include an assessment of the life-threatening danger of the documented injuries. This also applies to the Danish CFM examination [1], and the forensic assessments may have an impact on the police investigation and the legal aftermath of a case. The application of any protocols should ensure that boardcertified forensic medical specialists follow standardized approaches; therefore, a protocol regarding the assessment of life-threatening danger was implemented in 2016 at our institution, the Department of Forensic Medicine, University of Copenhagen. Following this protocol, our forensic specialists may come to one of the following conclusions: the examined individual (1) was not in life-threatening danger (NLD) due to stable vital parameters, sparse haemorrhage, no blood transfusion, no treatment except suturing etc.; (2) could have been in life-threatening danger (CLD) because of the necessity for treatment of the injuries; or (3) was in life-threatening danger (LD) as the injuries required emergency treatment, surgery, blood transfusion etc.
The forensic life-threatening danger assessments are based on an assessment of the prior-to-treatment anatomical injuries and the subsequent health state. However, while the forensic life-threatening danger assessments are empirically grounded, they are not evidence based. Due to the nature of forensic medicine, conducting randomized clinical trials is not possible (and may not even be the proper study design) [2]. Several forensic studies have examined the applicability of trauma scoring for postmortem documentation of injuries by quantifying the injury severity at autopsy [3][4][5][6][7][8]. However, few studies have examined the potential of trauma scoring for the prediction of mortality in the CFM setting. A Swedish study with forensic participation concluded that predicting shortterm mortality was possible in victims of violent assaults based on age, sex, the International Classification of Diseases Injury Severity Score (ICISS), the individual ICD 10 injury diagnoses, the anatomical location of the injuries and the cause of injury [9].
In Eastern Denmark, only CFM examinations performed at the Trauma Center at Copenhagen University Hospital (TC-CUH) are given a trauma score, with the majority having penetrating injuries (i.e. sharp force injuries and gunshot wounds). TC-CUH is one of the four trauma centres in Denmark and the only one in Eastern Denmark. Since 1999, TC-CUH has participated in the European Trauma Audit and Research Network (TARN), which was established in 1989 and is the largest European trauma database [10,11]. In 2004, TARN presented a probability of survival (PS) model, based on data from the European trauma centres [12].
Comparison of the forensic life-threatening danger assessments and TARN-derived PS scores is interesting for two reasons. First, the PS scores are evidence based. Second, it must be a key aim in clinical forensic medicine to establish objective and rigorous methods for estimation of injury severity. Thus, our aim in the present study was to assess whether the PS scores would differ in the three forensic conclusions regarding life-threatening danger. We hypothesized that the PS score could be useful for forensic life-threatening danger assessments and that appropriate cut-off PS scores could be identified.

Materials and methods
We identified all Eastern Danish CFM-examined individuals who were 15 years or older and the location where the CFM examination took place at Copenhagen University Hospital from January 1, 2012, to December 31, 2016. Exclusion criteria were other kinds of forensic examinations (e.g. age evaluations, torture cases and individuals examined solely for sampling of biological materials) and cases without a PS score, without a forensic life-threatening danger assessment and without penetrating injuries (Fig. 1).
We used the Danish civil registration number [13] to identify forensically examined patients registered in the TARN database at TC-CUH. In cases without a match, we manually looked up the hospital record to find a match based on age, sex, arrival date and time, and type of violence (blunt or penetrating force). A chief physician from TC-CUH controlled the matches. Not all patients included in the TARN database had a PS score due to rejection by the central TARN coder according to the TARN inclusion flow [12]. In addition to the PS score, we registered the level of consciousness according to the Glasgow Coma Scale (GCS) and the Injury Severity Score (ISS); both of these variables, together with the preexisting medical comorbidities (PMC) [14], are used for the PS score estimation [11]. Each of the TARN variables for PS scoring carries a weighting derived from a retrospective analysis of the TARN database, which includes more than 700,000 cases and is continuously updated with data from  the European trauma centres, and the variables are regularly recalibrated [12,15]. In order to compare the PS scores with the current forensic practice regarding life-threatening danger assessments, three forensic specialists used the implemented protocol on priorprotocol cases and reassessed the life-threatening danger. The forensic specialists had the original forensic case material available: anamnesis, objective examination, obtained hospital records and police report. Thus, only the forensic report conclusion was removed. The forensic specialists stated in few cases that a reassessment was not possible most often because the hospital record had not been obtained (NP), or the examined individual had died shortly after the forensic examination but prior to the forensic report (D). These cases were excluded (cf. exclusion criteria) (Fig. 1).
The dataset was not subdivided based on the specific type of penetrating injury as a decision tree and sensitivity analysis showed no difference in the association between sharp force injuries and gunshot wounds and the life-threatening danger assessments.

Statistical analyses
Continuous variables were reported as median values with interquartile ranges (IQRs). Categorical variables were reported as frequencies. A non-parametric Kruskal-Wallis (KW) H test was used [16], and in cases of a statistically significant result, a post hoc Dunn's test was used for the pairwise comparison of the independent, categorical, life-threatening danger assessment conclusions (NLD, CLD and LD) and the dependent PS score (0-100%) [17].
The usefulness of the PS score for categorizing lifethreatening danger assessments was determined by a receiver-operator characteristic (ROC) curve with an area under the curve (AUC) to evaluate the performance of the forensic protocol regarding life-threatening danger assessments of penetrating injuries [18][19][20][21]. The dichotomous outcome for the ROC analysis was NLD + CLD or LD. The most appropriate cut-off PS score was identified by determining the lower 95% fiducial limit [22,23].
We performed all statistical analyses in SAS (SAS Enterprise Guide 7.1, 2017, SAS Institute Inc., Cary, NC, USA), and we considered a p value of 0.05 as statistically significant. An AUC = 0.7-0.8 was considered acceptable, an AUC = 0.8-0.9 was considered excellent, and AUC > 0.9 was considered outstanding performance [24].

Results
We identified 486 forensically examined individuals at CUH in the 5-year study period. Of the 387 cases with a TARN submission number (TARN ID) (i.e. submitted to TARN), a central TARN coder excluded 188 of them. The remaining 199 cases had a PS score, so 161 cases were included in the final analyses as they also had an NLD, CLD or LD conclusion and documented penetrating injuries (Fig. 1).
In total, 14 females (median age 39, IQR 30-47 years) and 147 males (median age 28, IQR 21-38 years) were included ( Table 1). The median PS score was lower for LD than for NLD and CLD (Table 1 and Fig. 2). The median PS score decreased with increasing danger severity from 99.6 to 98.4%. The LD conclusions had the lowest observed PS score and the largest range (22.4-99.8%) ( Table 1).
The mean ranks of PS scores showed statistically significant differences between NLD, CLD and LD, chi 2 (2) = 33.0, p < 0.0001. A post hoc Dunn's test identified LD as the reason for the statistically significant difference ( Table 2). The latter supported our decision to merge NLD and CLD for the AUC-ROC analysis. The ROC curve of PS in relation to the forensic life-threatening danger assessment had an AUC at 0.76 (95% CI, 0.69 to 0.84), which was deemed acceptable [24] (Fig. 3). An appropriate cut-off PS score was identified as 95.8 (lower 95% fiducial limit).

Discussion
Based on our results, we found that the probability of survival trauma scoring could be useful for forensic life-threatening danger assessments. Furthermore, we suggest a cut-off PS score below 95.8 for use as a supporting tool for forensic determination of life-threatening danger.
The PS scores were statistically significantly lower for the LD conclusions; therefore, cases with increased mortality prediction by the PS model were forensically assessed as having been in life-threatening danger (LD). By contrast, the forensic assessments of individuals as could have been in lifethreatening danger (CLD) had no statistically significant lower PS score when compared to the forensic cases assessed as having not been in life-threatening danger (NLD). This means that the probability of survival model cannot replace the current forensic protocol as no differentiation of the forensic NLD and CLD cases was achieved using the PS score. The PS score is the probability of survival for patients receiving the expected, proper treatment at an average trauma centre. By contrast, the forensic life-threatening danger assessments are based on prior-to-treatment anatomical injuries and subsequent health state. This distinction may explain the high PS scores, even for some of the LD cases. In addition, CLD is a hypothetical scenario and has only a forensic and legal scope of interest. Thus, it is not a relevant situation for physicians in trauma centres and may explain the lack of PS score differences between the NLD and CLD cases; from the trauma centre perspective, they are identical. It is difficult to say whether the PS scores for the CDL cases are high because of the severity of the injuries or because of a high average treatment performance; however, the admission to TC-CUH may indicate severe injuries that are treatable.
The forensic protocol performance regarding categorization of LD and NLD + CLD cases was statistically significantly better than chance (AUC 0.76, 95% CI, 0.69 to 0.84); therefore, we sought to find a cut-off PS score that could be used as a supporting tool for forensic specialists. The ideal model would have both a high sensitivity and specificity, but this is rarely the case. Therefore, the assessment of an optimal cut-off value depends on the intended use of the model; consequently, the cut-off value may vary to increase the sensitivity or specificity [21]. In our study, the assessment of an optimal cut-off PS score depended on a weighting of the importance of not missing an LD (i.e. high sensitivity) or the importance of not misclassifying an NLD + CLD case as an LD (i.e. high specificity) because of the potential legal consequences. All three cutpoint approaches had a PS score of 99.3. Choosing the lower confidence limit (fiducial limit) at 95.8 gave the most conservative cut-off PS score that would predict/identify the CFM-examined individual as having been in life-threatening danger (LD).
The inclusion of prior-protocol CFM cases necessitated a reassessment of the life-threatening danger because the criteria and conclusions changed after the protocol implementation in September 2016 [1]. Instead of comparing the previous assessment practice with the PS score, the study examined the up-to-date practice. This we consider a strength. However, due to the national legislation, forensic specialists are required to request permission to obtain a hospital record, and it is not a standardized retrieval. Thus, the information from the hospital is not always available for the forensic life-threatening danger assessments, resulting in NA or NP (Fig. 1). Another important strength is that the snapshots regarding the CFM-examined the individual's health state, which the forensic assessments are based on, and this can be supported by the frequently recalibrated and evidence-based PS score, which predicts the patient outcome 30 days after the trauma [15]. This is important because of on-going improvements in treatment [8,25]. However, the continuous updates and recalibrations make the PS score time dependent, as a forensic case from 2012 might have had a different PS score if it had been evaluated after 2014 where PMC was included [14]. Thus, the forensic specialists must be aware of TARN updates and address these when using the PS score as a supporting tool.
Lastly and perhaps most importantly, we consider an important strength to be the evaluation of the performance of the forensic protocol regarding life-threatening danger assessments. At present, most of the forensic studies concerning trauma scoring have been focused on postmortem documentation and severity quantification of the injuries [3][4][5][6][7]. Since 2017, CFM examinations have accounted for the majority of the forensic regulatory tasks, compared to the number of autopsies. Because of this trend and the authorities' continuously expressed request for forensic life-threatening danger assessments, the time is ripe for focusing on evidence-based validation of the forensic protocol regarding life-threatening danger assessments. Instead of identifying predictors for a multivariable model, such as in the Swedish study from 2017 [9], we have focused on the evaluations of the performance of the current forensic protocol using AUC-ROC, and we have identified a conservative cut-off PS score that can be used as a forensic supporting tool. In addition, the identified lack of a difference in PS scores between NLD and CLD raises an important question: Should the forensic protocol only surround NLD and LD conclusions and thereby refrain from the hypothetical CLD outcome?
One limitation of the present study is its use of highly selected data, which introduces selection bias. The included CFM cases with a PS score may not be representative of all CFM examinations, which are performed in many places [1]. The inclusion criteria may also explain the second limitation of this study: the relatively small number of included cases. The CFM examination may take place at a random time during the hospitalization, and because of the study inclusion flow, we missed patients transferred from other hospitals when the CFM examination was performed before this transfer (i.e. when the examination location was not CUH). We also only included cases with penetrating injuries as they represent the majority of the CFM-examined individuals in TC-CUH. However, even with the small number of included cases, we consider it a strength that we were able to find that the forensic protocol has an acceptable and statistically significantly better performance than an inconclusive model with AUC = 0.5.
In conclusion, we compared the forensic life-threatening danger assessments and the TARN-derived PS score and found that LD cases had statistically significantly lower mean ranks of PS scores than were obtained for the NLD and CLD cases (which showed no PS score differences). The TARN probability of survival model cannot replace the current forensic protocol, but we suggest a conservative cut-off PS score of 95.8 that can be used as a forensic supporting tool, where a PS score below this score indicates a life-threatening danger.
In perspective, the suggestion of using a cut-off PS score in the CFM setting requires a scientific evaluation of its value as a supporting tool. A future prospective study should examine how the PS score might influence forensic specialists' assessments of life-threatening danger and potentially decrease those specialists' uncertainty regarding these assessments. Another reasonable investigation might be to examine the impact of forensic life-threatening danger assessments on the legal aftermath.
Acknowledgements The researchers would like to thank the Section of Forensic Pathology's Computer Security Officer Peter Kastmand Larsen for technical assistance, and board-certified forensic specialists Anne Birgitte Dyhre Bugge and Julie Munkholm for their contribution regarding reassessment of the life-threatening danger.
Author contributions Lykke Schrøder Jakobsen, Niels Lynnerup, Jacob Steinmetz and Jytte Banner contributed to the study conception and design. All the authors performed material preparation, data collection and analysis. Lykke Schrøder Jakobsen wrote the first draft of the manuscript and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.   NLD, was not in life-threatening danger; CLD, could have been in life-threatening danger; LD, was in life-threatening danger; Kruskal-Wallis' and Dunn's test H 0 : equal PS scores between the forensic NLD, CLD and LD conclusions. A p < 0.05 was considered statistically significant (*). Cutpoint C has the highest correct classification rate, cutpoint D has the minimal distance to the "perfect" point at the upper-left corner of the plot (0, 1) and cutpoint = has the minimal difference between the sensitivity and specificity