Introduction

Sepsis is a health burden in various healthcare settings1, especially in emergency departments (EDs). Quick sequential (sepsis-related) organ failure assessment (qSOFA) is a prediction model for mortality following sepsis in patients suspected of having an infection outside the intensive care unit2,3. qSOFA was developed and validated in 20162 to replace the systemic inflammatory response syndrome (SIRS) criteria, that were originally designed to determine a systemic inflammatory syndrome. Earlier, SIRS criteria were prerequisite to determine whether patients had sepsis based on the previous definitions of sepsis4,5; however, the performance of these criteria was reportedly poor for positive prediction, and insufficient for negative prediction6,7.

Of several studies involving patients with suspected infection in EDs to externally validate the prognostic accuracy of qSOFA compared with SIRS criteria in predicting mortality8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25 and systematic reviews26,27,28, inconsistency occurred in subsequent external validation results, with better prognostic accuracy of qSOFA for in-hospital mortality than for SIRS criteria. However, most of these studies were retrospective in nature or based on retrospective analyses of prospectively collected data8,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25. These designs had multiple flaws, such as the use of the worst values of predictor variables during ED stay8,9,10 and the use of complete case analysis (excluding subjects with missing variables) for the index tests9,10,13,14,16,17,18,19,20,21,23,24.

Our multicentre prognostic study aimed to prospectively test the hypothesis that qSOFA could predict in-hospital mortality in patients with suspected infection with more accuracy than SIRS criteria using variables obtained at the time when a patient was first suspected of having an infection in the ED.

Results

Characteristics of study subjects

This study was discontinued in February 2018 after the number of participants reached 1060 following the recalculation of the required sample size at the interim analysis, which indicated a size of 439 participants. The study participants were mostly older individuals (median age 78, interquartile range [IQR] 68–85 years) with physical impairments (median clinical frailty scale score 4; IQR 3–6) and comorbidities (median Charlson comorbidity index 2; IQR 0–3) (Table 1). Distribution of the site of infection at the ED demonstrated that the most frequent site of infection was the respiratory tract (47.1%), followed by the abdomen (18.7%) and the urinary tract (14.2%).

Table 1 Baseline characteristics of the cohort before and after multiple imputation.

Main results

Missing data for qSOFA, SIRS criteria, and both were found in 1 (0.1%), 408 (38.5%), and 7 (0.7%) participants, respectively (Fig. 1). No missing in-hospital mortality data were reported. In the multiply imputed population, 402 (37.9%) and 915 (86.3%) participants met the thresholds for qSOFA ≥ 2 and SIRS criteria ≥ 2, respectively. A total of 157 (14.8%) participants died in the participating hospitals. The primary analysis demonstrated greater diagnostic accuracy for in-hospital mortality with qSOFA than with SIRS criteria (area under the receiver operating characteristic [AUROC] curve 0.64 versus 0.52, difference + 0.13 95% confidence interval (CI) [+ 0.07, + 0.18]) (Table 2, Fig. 2). Sensitivity and specificity to predict in-hospital mortality at the given thresholds (qSOFA ≥ 2 and SIRS criteria ≥ 2) were 0.55 and 0.65 with qSOFA and 0.88 and 0.14 with SIRS criteria, respectively. The secondary analysis also demonstrated a positive net reclassification improvement (NRI) between qSOFA and SIRS criteria (+ 0.39 95% CI [+ 0.15, + 0.57]).

Figure 1
figure 1

Participant selection tree. All participants recruited in the current study were divided into four groups according to the positivity of the tested scores. The multiply imputed study population and naïve (not imputed) population were included in the primary analysis and sensitivity analysis, respectively. qSOFA quick sequential organ failure assessment, SIRS systemic inflammatory response syndrome.

Table 2 Comparison between qSOFA and SIRS criteria prognostic accuracy in predicting in-hospital mortality.
Figure 2
figure 2

Prediction of in-hospital mortality using the tested scores. Receiver operating characteristics analysis and the prediction of in-hospital mortality using the tested scores (line and dotted line) with the given threshold (circle). qSOFA quick sequential organ failure assessment, SIRS systemic inflammatory response syndrome, AUROC area under the receiver operating characteristic curve.

Sensitivity analysis of the naive dataset similarly demonstrated greater diagnostic accuracy for in-hospital mortality with qSOFA than with SIRS criteria (AUROC 0.64 versus 0.54, difference + 0.10 95% CI [+ 0.03, + 0.17]), where the sensitivity and specificity to predict in-hospital mortality at the given thresholds were 0.58 and 0.62 with qSOFA and 0.94 and 0.10 with SIRS criteria, respectively. Subgroup analyses did not demonstrate significant interactions with age, sex, comorbidities, and frailty (Fig. 3).

Figure 3
figure 3

Explanatory subgroup analysis. Study participants were dichotomised according to age (median), sex (female or male), Charlson index (median), and clinical frailty scale score (median) and were subjected to subgroup analysis. The association between positivity of the tested scores (qSOFA ≥ 2 and SIRS criteria ≥ 2) and in-hospital mortality was indicated as an odds ratio with its 95% confidence interval (95% CI). qSOFA quick sequential (sepsis) organ failure assessment, SIRS systemic inflammatory response syndrome, 95% CI 95% confidence interval.

Discussion

A clinical prediction score is a mathematical model used to estimate the probability of a future event in patients with specific medical conditions. It is essential for a clinical prediction score in the ED to not only deliver a good prediction of the outcome but also provide fewer instances of missing data. The current study demonstrated that the simple qSOFA could provide a better prognostic value as well as better reclassification for in-hospital mortality in patients with suspected infection in the ED, with less frequent cases of missing data than with the SIRS criteria. However, prognostic accuracy of qSOFA (AUROC of 0.64) for mortality is nevertheless insufficient, especially in terms of sensitivity (0.55). Heterogeneity of patients' baseline characteristics, including age, gender, comorbidity index, and frailty score, did not significantly interact with the prognostic ability of both qSOFA and SIRS criteria.

Studies comparing the prognostic accuracy of qSOFA and SIRS criteria in ED settings, except for a single prospective study9, have mostly been either purely retrospective in nature8,12,13,15,16,17,18,19,20,21,23,24,25 or have been retrospective studies based on prospectively collected data10,11,14,22. These studies had several flaws, including acquisition of predictor variables in wide time windows8,9,10 and/or the lack of assessment of computability of missing values9,10,13,14,16,17,18,19,20,21,23,24.

Practically, the qSOFA and SIRS criteria may be applied at the time of the initial evaluation after the patient’s visit to the ED. However, that several studies acquired the predictor variables within a wide time window meant that they could have obtained the worst values during the ED visits, which could bias the prognostic analyses through multiple measurements8,9,10. Use of the worst predicted values obtained with multiple measurements over time could lead to increased and decreased number of participants with positive and negative test results, respectively. Such results might improve the sensitivity and negative predictive value (fewer false negative) or worsen the specificity and positive predictive value (fewer true positive)15,29. Furthermore, wide time windows may narrow the time between the prediction and outcome, which can lead to an apparent improvement in the positive prediction result30. Finally, prolonged delay in the estimation of the score in the ED is inappropriate in the clinical settings. To avoid such biases and any others arising from multiple measurements, the current study used single baseline values obtained at the time the infection was first suspected.

Computation of qSOFA requires three variables that are readily available at the bedside (respiratory rate, systolic blood pressure, and the Glasgow coma scale [GCS] score). Thus, the score can be rapidly calculated, and the frequency of missing values is low. In contrast, computation of SIRS criteria is complicated and time consuming, and there is a greater likelihood of missing data as six variables consisting of bedside information (respiratory rate, heart rate, and body temperature) and laboratory data (partial carbon dioxide pressure, white blood cell count, and stab cell percentage) are required. In the current study, stab cell count data were missing for almost half of the study population, which led to an increase in the number of incomputable points (39%) and cases with undetermined positivity (12%) of SIRS criteria. However, for qSOFA, these percentages were 1% and 0%, respectively. Furthermore, the large number of missing stab cell count data also led to selection bias if patients with a score of 1 point for normal white blood cell count who lacked stab cell count data were excluded. The current study estimated the prevalence of missing scores and compared the score performance based on both multiply imputed data and naive data, unlike previous studies, which did not sufficiently assess the missing scores9,10,13,14,16,17,18,19,20,21,23,24.

To predict in-hospital mortality in ED patients with suspected infection, qSOFA was generally better and more specific, whereas SIRS criteria were generally worse and more sensitive based on the systematic reviews26,27,28. However, a prominent inconsistency in the estimation of these diagnostic indices has been observed in previous ED studies8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25. As discussed, biases in the acquisition of multiple predictor variables and statistical approaches, which ignored missing values, might have led to differences in the estimation of specificity and sensitivity indices for these two methods. Furthermore, even in a prospective study, the authors retrospectively excluded patients with suspected infection at baseline who were later diagnosed with a non-infectious disease9. This retrospective exclusion might have led to the apparent improvement in diagnostic accuracy, similar to that in the per-protocol design9.

The strength of the current study is its prospective design, which eliminates possible biases in relation to multiple measurements, missing values, and per-protocol analysis. However, there are several limitations to the current study, which should be addressed in the future. First, this was a single national study in a developed country, which limits the generalisability of its results to other patients worldwide with suspected infection. In particular, the study participants were frail older patients with other comorbidities, which might make the results of our study inapplicable to younger populations. Second, even though this was a prospective study, a large percentage of missing stab cell result values prevented a realistic estimation for the SIRS criteria and required the use of multiple imputation. However, we believe the results could reflect those from a real-world setting.

Conclusions

This prospective, multi-centre study conducted for the external validation of qSOFA and SIRS criteria demonstrated that for patients in the ED who had suspected infection, qSOFA had modestly better prognostic accuracy in predicting in-hospital mortality albeit inadequate in sensitivity, and improved reclassification.

Methods

Study design and setting

In 2016, we designed the Sepsis Prognostication in Intensive Care Unit and Emergency Room (SPICE) study, a prospective observational study that consisted of two sub-studies—the SPICE-ER and SPICE-ICU—which were based in the ED and the intensive care unit, respectively. The current study is a primary study of the main SPICE-ER study, which involved a prospective prognostic analysis comparing qSOFA and SIRS criteria and externally validating the prognostic accuracy of these tools for in-hospital mortality among patients in the ED with suspected infection. The design and reporting of the study adheres to the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) guidelines31. The study included 34 EDs from 6 secondary and 28 tertiary emergency care centres and was conducted between December 2017 and February 2018. All procedures in these studies that involved human participants were performed in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Declaration of Helsinki and its later amendments or comparable ethical standards. The study protocol for this observational study, including a waiver of the informed consent requirement, was first approved by the institutional review board of Hokkaido University (approval number 016-0385) and subsequently by the ethics committees of all participating hospitals.

Selection of participants

For this study, we included patients who visited the EDs of the participating hospitals; who were suspected of having an infection by the emergency physicians; who received any kind of antibiotics, underwent any fluid culture test, or underwent imaging for the detection of infection sites during ED stay; and who were hospitalised or died in the ED. Patients were excluded if they were transferred to another hospital without hospitalisation at the participating hospital.

Measurements

Index tests (qSOFA and SIRS criteria) were assessed using the data collected on clinical variables at the time when the infection was first suspected. Both qSOFA and SIRS criteria tested positive at a score ≥ 2 point based on the original definitions of sepsis2,4. GCS scores < 15 were used to satisfy the altered mental status criteria in qSOFA2. The reference standard for the study was in-hospital mortality. The application of multiple imputation on all the study variables enabled 100% calculation of the study index tests and 100% assessment of associations between the index tests and the study outcome.

Outcomes

The baseline characteristics examined included the patients’ statuses before index infection, i.e., age, sex, clinical frailty scale score32, and Charlson comorbidity index33, as well as the clinical data obtained at the ED, i.e., from the suspected site of infection and on the physiological status (respiratory rate, heart rate, systolic blood pressure, the GCS score, body temperature, white blood cell count, stab cell percentage, lactate level, and partial pressure of carbon dioxide in the blood gas analysis). The suspected sites of infection were classified into 12 regions: the respiratory tract, urinary tract, abdomen, central nervous system, skin and soft tissue, bones and joints, wounds, intravascular catheter, endocardium, any kind of implant aside intravascular catheter, other regions, and unknown origin. The study outcome was defined as in-hospital mortality during ED stay or hospitalisation. All study data from the participating hospitals were entered electronically into the data capture server provided by the University Hospital Medical Information Network Internet Data and Information Center for Medical Research.

Statistical analysis

The statistical parameters required for estimating the sample size in this study were not fully available from previous publications; therefore, this study employed an adaptive design for sample size estimation. Initial sample size estimation was done using the receiver operating characteristics (ROC) curve power calculation method based on the hypothesis that SIRS criteria significantly predicted in-hospital mortality34. Parameters needed for the initial sample size estimation included AUROC curve for in-hospital mortality with an SIRS criteria score of 0.64, probability of in-hospital mortality of 0.042, power of 0.8, and significance level of 0.05. The required initial sample size was estimated at 807 participants but was modified to 900 participants considering the decline in statistical power owing to missing values. Interim analysis was pre-planned to determine the final study sample size based on two ROC curve power calculations to detect differences in the AUROCs of tested scores when the number of the study participants exceeded the initial sample size35. The study was to be discontinued after the number of participants exceeded the estimated sample size at the interim analysis. Additionally, the study was also to be discontinued if the estimated sample size exceeded the upper limit of 2000 participants.

To compensate for the missing values, mainly in laboratory variables, multiple imputation by chained equations, which generated 25 multiply imputed datasets with 20 iterations of calculations, was used36.

The statistical analysis plan consisted of the comparison of AUROC (primary analysis) and NRI analysis37 (secondary analysis). Integration of the point estimations with 95% confidence intervals across each analysis on the multiply imputed dataset was based on bootstrapping. It was repeated 800 times per multiply imputed dataset to a total of 20,000 times.

In consideration of possible inconsistencies in the results before and after multiple imputation, a sensitivity analysis was conducted to assess the robustness of the results from primary analysis, using the naive dataset prior to multiple imputation instead of the multiply imputed datasets.

It was assumed that heterogeneity in the baseline characteristics of the participants might interact with the predicted scores for in-hospital mortality. A post-hoc subgroup analysis, dichotomised by age, sex, the clinical frailty scale score32, and Charlson comorbidity index33 was used to assess the association between positivity of the tested scores and in-hospital mortality, which was reported as an odds ratio.

All statistical analyses were performed using R version 3.5.2 statistical software (The R Foundation for Statistical Computing, Vienna, Austria).

Ethics approval and consent to participate

The study was conducted after obtaining approval from the ethics committees of all participating hospitals. All procedures in these studies that involved human participants were performed in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Declaration of Helsinki and its later amendments or comparable ethical standards. The requirement for informed consent was waived by the ethics committees because of the observational nature of the study.