Introduction

In the Emergency Department (ED) about 40% of patients admitted to hospital [1] may be suspected of infection. For these patients the decisions of immediate interest for the diagnostic work-up are whether a blood culture should be drawn and how many resources the microbiology lab should spend on providing a rapid answer. Rapid microbiology diagnostics decrease time to identification of pathogens and potentially enable earlier initiation of targeted antimicrobial therapy improving antimicrobial stewardship programs [2]. For example, when a blood culture becomes positive, identification of pathogens by MALDI-TOF MS directly from positive blood cultures are now routine in many labs. Sub-species typing, and detection of drug resistance determinants besides microbial identification from isolated colonies are also being explored for MALDI-TOF MS [3]. Another decision is whether blood cultures should be supplemented by a more expensive, but much faster, method based on direct-from-blood PCR (dfbPCR) [4]. Mangioni et al. proposed the use of multiparemeter scores to triage patients for rapid diagnostic procedures [5], where scores which can predict the likelihood of a useful answer (probability of bacteraemia) and the need for rapid result (high probability of mortality) may be useful. Ordering blood cultures without considering the pretest probability may be both wasteful and harmful [6].

Clinical scores which can predict the probability of bacteraemia such as those described in two reviews [7, 8] can help make these decisions, including whether dfbPCR should supplement blood culture in some patients [9].

Several clinically validated scores have been used to predict mortality in ED patients, such as the National Early Warning Score (NEWS) [10] and the Mortality in Emergency Department Sepsis (MEDS) [11]. The Systemic Inflammatory Response Syndrom (SIRS) [12] was established to define operational criteria for a sepsis diagnosis. SIRS has been replaced by the Sequential Organ Failure Assessment (SOFA) score or by the quick-(q-) SOFA score in the Sepsis-3 consensus definition of sepsis [13]. The Shapiro Decision Rule (SDR) predicts bacteraemia for ED patients [14] as does SepsisFinder (SF) [15].

The primary objective of this study is to retrospectively compare predictions of 30-day mortality and bacteraemia from all of these scores: SF, NEWS, SOFA, MEDS, qSOFA, SDR and SIRS. All scores will be assessed based on their Area Under the Receiver Operating Characteristic (AUROC) curves.

The review by Coburn et al. [7] focuses on overuse of blood culture in low-risk patients, which may be due to an overestimation of the probability of bacteraemia by physicians [16]. They conclude that both SIRS and SDR perform well in identifying a low risk group which may not need blood culture. Pawlowicz et al. [17] found a 33.5% reduction in the number of ordered blood cultures after implementation of SDR. Another evaluation of SDR found that it was able to select a group of 45% of all patients that had a bacteraemia rate of only 0.9% [18].

In line with these studies, a secondary objective of this study will be to compare how well each of the scores can identify a low-risk group, consisting of about one third of the patients, where blood culture may be of limited value. In addition we will identify a high-risk group, consisting of 10% of the patients, where dfbPCR may be justifiable, despite its relatively high cost.

Methods

Patient data

The three test datasets will be referred to as HvH, SLB and TREAT04.

HvH

263 patients with suspected sepsis at Hvidovre Hospital, Hvidovre, Denmark; November 2011 to April 2012 [19].

SLB

199 patients with suspected sepsis at Lillebælt Hospital, Vejle, Denmark; July to August 2012 [20].

TREAT04

1354 patients admitted to a department of medicine with suspected community acquired infections at Rabin Medical Center, Petach Tikva, Israel. Data were collected in an interventional study of TREAT from May to November 2004 [21].

SF predictions

SF [15] is a CPN (Causal Probabilistic Net or Bayesion Net) model of part of the inflammatory response. It uses age, temperature, heart rate, calculated mean arterial pressure, mental status, neutrophil fraction, platelets, CRP, lactate, creatinine and albumin as input variables. The outputs from SF are 30-day mortality and the probability of bacteraemia. It is an inherent part of the CPN technology that SF tolerates missing values well. Input data for calculation of the SF prediction of bacteraemia will therefore be considered “complete” if any three out of the 11 possible input variables are available. Age is not used for the prediction of bacteraemia. The prediction of mortality uses the same input variables as the bacteraemia prediction, plus age as an independent factor [22]. The SF CPN was implemented in Hugin (version 8.7, Hugin Expert A/S), commercially available software for constructing and using CPNs. SF was trained on one dataset and tested on three independent datasets [15]. These three datasets (HVH, SLB and TREAT04) will also be used in this study in the comparison of performance between SF and the other clinical scores.

Clinical scores

Scores commonly used to aid diagnosis and/or prognosis in patients with suspected sepsis were included: NEWS, SOFA, MEDS, qSOFA, SDR and SIRS. The data items required for calculation of the clinical scores are given in Table 1. To best accommodate the data requirements of the different scores, data were mapped when required and possible.

Table 1 Data items used to calculate the clinical scores

Calculated/mapped variables

Glasgow coma scale (GCS) was not available in the datasets. However, mental status was recorded as normal, confused or comatose. Normal was mapped to alert (GCS = 15), both confused and comatose were mapped to not alert (GCS < 15). PaO2 was less widely recorded than SaO2, so to give additional availability for SOFA which requires PaO2, a mapping was made from SaO2. Respiratory distress as used in MEDS was calculated if at least one of respiratory rate and SaO2 were present in the dataset.

Adjustments to the scores

Other than the use of the mapped variables described, no adjustments to the scoring methods were made for NEWS, qSOFA and SIRS. Terminal illness and immature neutrophils were not recorded in any of the datasets. Therefore MEDS was calculated assuming these variables did not contribute to the scores in patients where they were missing. Adjustments were also made to the SOFA score: we did not require evidence of mechanical ventilation for the respiratory component, the maximum score of the cardio component was 1 due to lack of information on vasopressors, the maximum CNS score was 1, using alert/not alert as the GCS score was not available and the renal component was calculated without use of urine output.

Completeness

For NEWS, MEDS, SOFA, qSOFA and SIRS the scores were only calculated if the data for the patients were complete. Data were considered complete where all of the variables used in the adjusted scores were present. SF was used with incomplete data, provided at least three of the 11 possible variables for SF were available.

Microbiology

Bacteraemia was defined as positive blood cultures with one or more clinically significant pathogen. Bacillus spp. (except B. anthracis), coagulase-negative staphylococci (CoNS), Corynebacterium spp. and Micrococcus spp. were considered contaminants in the absence of other clinical evidence.

Outcomes and statistical analysis

The primary outcomes were bacteraemia and all-cause 30-day mortality. Predictive performance was assessed by AUROC. AUROCs were compared using the method of De Long [23] as implemented in the pROC package of R (R version 3.5). To simulate possible clinical scenarios, two cut-offs were determined for each score that would result in a low-risk group of approximately one third of patients, and a high risk group of approximately 10% of patients. Outcomes in each risk group were assumed to be binomially distributed. Confidence intervals for binomial proportions were calculated under the assumption of normality. Analyses were performed in R (version 3.5) and Python (version 3.7), visualizations were constructed using Matplotlib [24].

Results

Descriptive statistics

Table 2 presents the demographics for each of the included datasets as well as for the data material as a whole.

Table 2 Demographic description of datasets

Data availability

Table 3 gives the data availability for the three datasets, as well as for the combined dataset, consisting of all three datasets. The table reflects local differences in clinical practice. For example PaO2 and SaO2 were well recorded for SLB and HvH, but not for TREAT04. In general, vital signs such as blood pressure, heart rate and temperature were well recorded, while mental status and respiratory rate were recorded less often. If we require that all data involved in a clinical score must be recorded, most of the scores could only be calculated for very few patients. MEDS could not be calculated for any patients because none of the datasets contained data on immature neutrophils or on terminal illness.

Table 3 Data availability for the data sets (%)

Similarity between datasets

The predictive performance for mortality, measured by the area under the ROC curve (AUROC), was calculated for SF and for the 5 clinical scores for each of the three datasets, as well as for the combined dataset. Tables 4 and 5 show the percentage of complete cases and AUROCS for all datasets and scores for mortality and bacteraemia, respectively.

Table 4 Percent complete cases and AUROC for mortality for all datasets and scores
Table 5 Percent complete cases and AUROC for bacteraemia for all datasets and scores

To determine the suitability of combining the datasets, the AUROCs for each dataset were compared to the AUROC for the remainder of the combined dataset. Out of these 36 comparisons none were significantly different (p < 0.05) from the SOFA score for the combined dataset. It thus seems that SF and the 6 clinical scores perform similarly for all data sets. This indicates that the comparisons performed in the next section between the performance of SF and the 6 clinical scores can be done, using the combined dataset.

The performance of SF and the clinical scores

The performance of SF and the 6 clinical scores will be assessed from their ROC curves for mortality and bacteraemia. In Tables 4 and 5 the columns labelled AUROC(SF) contain AUROCs for SF, but only calculated for complete cases for the test considered in the row. Pairwise comparisons of mortality AUROCs for SF with AUROCS for each of the clinical scores showed that the AUROCs for SF were significantly higher that the AUROCs for MEDS, qSOFA, SDR and SIRS (p < 0.005) and was not significantly different from AUROCs for NEWS and SOFA. Figure 1a shows the mortality ROC curves for each of the scores.

Fig. 1
figure 1

ROC curves for complete cases for each score for the combined dataset. A ROC curve for 30-day mortality, B ROC curve for bacteraemia

The bacteraemia AUROCs for SF were significantly better that the AUROCs for MEDS, qSOFA and SIRS (p < 0.005) and was not significantly different from AUROCs for NEWS, SOFA and SDR. Figure 1b shows the bacteraemia ROC curves for each of the scores.

SF performed better than the scores by having a significantly larger number of complete cases, due to SFs tolerance of missing data. Table 6 offers an alternative way of looking at the clinical scores’ vulnerability to missing data. In Table 6 mortality and bacteraemia AUROCs were calculated for SF, NEWS, SOFA and SDR both for complete cases and for all cases in the combined dataset. In the calculation for all cases, the missing variables were assumed to be non-pathological for the clinical scores. When calculated for all cases, the AUROCs for NEWS and SOFA became significantly smaller than the AUROC for SF, both for mortality and bacteraemia. For SDR this also applied to the mortality AUROC, but not to the bacteraemia AUROC.

Table 6 Comparison of AUROC assuming missing = normal for clinical scores

Low risk and high-risk groups for bacteraemia

Two examples of clinical decisions where it may be useful to use the scores to stratify patients suspected of infection will be considered. The first scenario concerns the omission of blood culture in low risk patients. The second scenario concerns a potential introduction of dfbPCR in high risk patients.

Omission of blood cultures in low risk patients

In the low risk scenario, the scores will be used to select a low risk group, consisting of about a third of the patients, those with the lowest predicted probability of bacteraemia. For SF the percentage of patients with bacteraemia in the low risk group can be read to 1.67% from Fig. 2 (green cross on the curve in labelled PPV low-risk). Assuming a cost of € 33 for a blood culture [25] then the cost of obtaining one positive blood culture will be € 33/1.67% = € 1976 in the low risk group. The mortality for the low risk group was 4.2% (green cross on dotted line in Fig. 2. Table 7 shows that compared to the other scores, SF gives the lowest probability of bacteraemia. In this low risk group the were 22 false positive blood culture, corresponding to a contaminant rate of 3.7%.

Fig. 2
figure 2

PPV for SF’s bacteraemia prediction and mortality vs. the size of the high risk group

Table 7 Performance of scores in the low risk patients

dfbPCR in high risk patients

In the high risk scenario, the scores will be used to select a high risk group, consisting of about 10% of the patients with the highest predicted probability of bacteraemia. For SF, the percentage of patients with bacteraemia in the high risk group can be read as 25.3% from Fig. 2 (red cross on the curve labelled PPV high-risk). The cost of the two dfbPCR that have been on the market, SeptiFast and Iridica is €127 and €373 [26], respectively. For the cheapest of these, the cost of obtaining one dfbPCR positive bacteraemia case (excluding contaminants) will be €127/25.3% = € 502, assuming that the rate of DNAaemia is the the same as the rate of bacteraemia. The mortality of the patients in the high risk group can be read to 25.3% (red cross) from the graph labelled mortality low risk. Table 8 shows that for all the clinical scores the cost of detecting one bacteraemia case with dfbPCR in the high risk group is smaller than the cost (€ 1976) of detecting one bacteraemia case by blood culture in the low risk group, as defined by SF in Table 7.

Table 8 Performance of scores in the high risk patients

Discussion

For the combined dataset SF obtained mortality AUROCs, calculated from cases with complete data, of 0.775 which was higher than for NEWS (0.734) and SOFA (0.721) and significantly higher than for MEDS, qSOFA, SIRS and SDR.

For the combined dataset SF obtained bacteraemia AUROCs of 0.745, higher than for SDR (0.743), SOFA (0.719) and NEWS (0.694) and significantly higher than for MEDS, qSOFA and SIRS.

SF could identify a low risk group, consisting of about one third of the patients. In that group the bacteraemia rate was 1.7% and the average price of obtaining one positive blood culture was quite high, € 1976.

SF could also identify a high risk group, consisting of 10% of the patients. In that group the bacteraemia rate was 25.3%. The cost of obtaining one positive identification of a pathogen by dfbPCR was estimated to € 502, despite the relatively high cost of dfbPCR. Interestingly this cost is substantially lower than the cost of obtaining a positive blood culture in the low risk group.

The study was based on three data sets HVH, SLB and TREAT04. These data sets have the strength that they are diverse. They were collected over almost a decade, in countries with high and low antimicrobial resistance, with a large variation in the amount and type of data collected and with substantial differences in mortality. This demonstrated the robustness of both SF and the clinical scores in the sense that they all showed uniform performance across these differences.

These differences also gave some weaknesses of the study: Although the scores seem to be able to stratify the patients across the differences, it may prove necessary to adjust cut-off values to adapt to the dataset at hand. Another weakness of the datasets was that in many patients only some of the scores could be calculated. This weakened the data, which already suffered the limitation of the small size of the Danish datasets. It does, however, highlight the tolerance of SF to missing data, since SF could be applied for virtually all data in the data sets.

The age of the data is also a weakness, since data on sepsis markers as procalcitonin and CRP were either absent or scarce in the oldest of the datasets, TREAT04. CRP is one of the stronger sepsis markers in the dataset used to train the SF model and although SF performs better than any single data item [15] it is to be expected that more CRP measurements would have improved the performance of SF. This may be even more true of procalcitonin.

In the literature AUROCs were found for SOFA and qSOFA for in-hospital mortality in a large validation dataset: AUC = 0.74 (all) and 0.79 (non-ICU) for SOFA and 0.66 (all) and 0.81 (non-ICU) for qSOFA [27]. Similar results are observed for recent studies outside the ICU with AUROC ranging from 0.77–0.83 for SOFA [28,29,30,31,32] and 0.63–0.77 for qSOFA [29, 30, 33,34,35]. MEDS is also a predictor of mortality.It had an AUC of 0.82 and 0.76 for its derivation and validation cohorts, respectively [11], although significant variability has been seen in the literature with AUC ranging from 0.67–0.77 in five recent studies [36,37,38,39,40,41]. NEWS also performs well as a predictor of mortality, with reported AUROC between 0.67–0.78 [30, 32, 35, 42].

Use of standard clinical scores as predictors of bacteraemia is not well reported in the literature. A review identified several validated models, including SDR, although noted that very few scores for bacteraemia were prospectively validated and performed well, and none were in routine clinical use [8]. The other scores included in the analysis have not been evaluated specifically for prediction of bacteraemia outside of isolated studies. In one study, qSOFA showed some potential in a subgroup of elderly patients, however the overall AUROC was 0.64 [43]. The same study reported an overall AUROC of 0.60 for SIRS.

The clinical applications discussed in the paper may deserve a health-economic evaluation. The cost estimates for the low risk group indicate that blood culture from a low risk group may not be cost-effective, in particular because the testing of this group gave rise to 3.7% false positive blood cultures, which is higher than the 1.7% true positive blood cultures. As noted by Bates et al. [6] these are presumably associated with substantially increased cost due to increased length af stay (4.5 days) and increased consumption of antibiotics (39%) and the true costs of contaminants may greatly exceed those of the test itself.

In contrast, dfbPCR from high risk patients may be cost effective in terms of a rapid diagnosis. Realloction of resources currently spent on blood cultures from low risk patients to dfbPCR from high risk patients may be a cost neutral way of improving the quality of microbiological services. However, a prospective randomized clinical outcome study is warranted in order to routinely apply any risk assessment tool for eliminating any currently applied diagnostic intervention in any patient group, including the omittance of blood culture in a patient population scoring low on sepsis risk.

Conclusions

SF performed better than the clinical scores for prediction of mortality and bacteraemia, significantly so for MEDS, qSOFA and SIRS. For mortality predictions SF was also significantly better than SDR.

In a low risk group consisting of one third of the patients the cost of one positive result from blood culture was € 1976, which was higher than the cost of € 514 of one positive dfbPCR from a high risk group consisting of 10% of the patients. This may motivate a health economic study of whether resources spent on low risk blood cultures might be better spent on high risk dfbPCR.