Sepsis is defined as a life-threatening organ dysfunction caused by a dysregulated host response to infection [1]. Despite recent developments in the management of sepsis patients, morbidity and mortality still remain high [2]. Presently, clinical findings, biological markers, and microorganism isolation comprise the basis for diagnosing sepsis. Recent guidelines emphasize that early diagnosis and timely administration of antimicrobial therapy are crucial in reducing morbidity and mortality in sepsis patients [3]. However, no single clinical or biological marker indicative of sepsis has been adopted unanimously.

Procalcitonin (PCT) is the inactive propeptide of calcitonin, which is released by C cells of the thyroid gland, hepatocytes, and peripheral monocytes. PCT is widely reported as a useful biochemical marker to differentiate sepsis from other non-infectious causes of systemic inflammation. However, recent evidence has yielded conflicting results [4,5,6], which is reflected by the weak recommendation and the low quality of evidence in the 2016 Surviving Sepsis Campaign (SSC) guideline [3].

Presepsin (P-SEP), the newly identified infection biomarker, is a 13 kDa fragment of the N-terminal of soluble CD14 and is released into the blood upon the activation of monocytes in response to infection [7,8,9,10]. Although P-SEP appeared to be comparable to other inflammatory biomarkers, i.e., C-reactive protein, interleukin-6, and PCT, in the diagnosis of sepsis [11], there has been limited meta-analytical evidence on the diagnostic performance of P-SEP with PCT.

We thus sought to summarize the current clinical evidence regarding the diagnostic test accuracy (DTA) for PCT and P-SEP and analyze the diagnostic performance of both biomarkers in distinguishing sepsis from non-infectious inflammation in the critical care setting more comprehensively.


Protocol registration

This study complied with the recommendations for the conduct and reporting of systematic reviews and meta-analyses, set forth by the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement [12,13,14], the Meta-Analysis of Observational Studies in Epidemiology proposal [15], and the Cochrane Diagnostic Test Accuracy Working Group [16]. We developed a protocol before conducting the analysis and registered it in PROSPERO (an international prospective register of systematic reviews [; Registration No. CRD42016035784]). The protocol for this study has been published previously [17].

Focused review questions

Primary objective: To determine the accuracy of PCT and P-SEP in diagnosing bacterial infection in critically ill adult patients.

Secondary objective: To determine which marker is superior in the diagnosis of bacterial infection in critically ill adult patients.

Search strategy

We searched the following databases for relevant studies: MEDLINE (via PubMed), EMBASE, and the Cochrane Central Register of Controlled Trials. We developed a search strategy using the combination of keywords and Medical Subject Heading (MeSH)/EMTREE terms, which were “(procalcitonin OR PCT OR presepsin OR “soluble CD14 subtype” OR “sCD14-ST” OR P-SEP) AND (sepsis OR “bacterial infection” OR “systemic inflammatory response syndrome” OR SIRS).” Searches of gray literature and bibliographies of relevant papers were used to complement the results of the search strategies. We did not apply any language restriction to the electronic searches. We also contacted the authors of ongoing or unpublished trials to obtain additional details and information on these trials.

Inclusion and exclusion criteria

Two investigators (YK and YH) conducted the study selection independently. Any disagreement was resolved by discussion and the participation of a third author (KY), when necessary. We included cross-sectional studies, cohort studies, case-control studies, and randomized controlled trials that evaluated the accuracy of PCT or P-SEP in plasma or serum (index test), when used to diagnose bacterial infection or sepsis in critically ill adult patients (reference standards). In this study, “sepsis” meant “life-threatening organ dysfunction caused by a dysregulated host response to infection,” according to the new definition proposed in 2016 (Sepsis-3) [1]. We also accepted various comparable definitions, such as the severe sepsis/septic shock, with the conventional definition (Sepsis-1, 2) [18]. We excluded the following studies with insufficient information when building a 2 × 2 contingency table: abstracts with inadequate information to enable the assessment of methodological quality and duplicates or sub-cohorts of already published cohorts. We also excluded all studies investigating animals; those predominantly comprising neonates or post-cardiac surgical, heart failure, or perioperative patients; and those comprising healthy participants as controls.

Data extraction and synthesis

The characteristics of all included studies were extracted by two authors (YK and YH). We used 2 × 2 tables to cross-tabulate the positive or negative numerical data from the index test results (positive or negative) against the target disorder. We displayed all the results in various tables. To visually assess the between-study variability, we presented the results in a forest plot, as well as with summary receiver operating characteristic curves (sROC) with 95% confidence interval (CI) using the MIDAS module in STATA software, V.14.0 (Stata Corporation, College Station, Texas, USA). Furthermore, we generated a Fagan’s nomogram, which is a user-friendly graphical depiction of the positive/negative likelihood ratio (LR) by prevalence. Statistical heterogeneity was evaluated informally from the forest plots of the studies’ estimates and more formally using the χ2 test (p < 0.1, significant) and I2 statistic (I2 > 50% = significant) with 95% CI.

We conducted sensitivity analyses to determine the robustness of the meta-analyses and to explore the sources of potential heterogeneity in sensitivity and specificity. We performed univariate meta-regression analysis using the following as covariates with 95% CI: risk of bias, year of publication (after 2016 or before 2015), prevalence (< 50% or ≥ 50%), sample size (< 100 or ≥ 100), setting (inside ICU or outside ICU), comorbidities (whether the studies excluded patients who had comorbidities that were likely to influence P-SEP levels), clinical diagnostic criteria (Sepsis-1, 2, or Sepsis-3), causal pathogens of infection (bacteria only or mixed with fungal, viral, or other pathogens), and the cutoff values for each biomarker (< 1.0 or ≥ 1.0 for PCT, < 500 or ≥ 500 for P-SEP).

Assessment of risk of bias

The qualities of the included studies were independently assessed by two authors (YK and YU) and verified by a third author (KY), when necessary. The study quality of each article was reported using the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool [19]. We specifically assessed the presence of spectrum, threshold, disease progression, and partial or differential verification bias.

Rating of the quality of effect estimates

We applied the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) approach to rate the quality of the evidence [20]. The quality of evidence, which reflects the extent to which we are confident that an estimate of the effect is correct, is rated for each outcome across studies (i.e., for a body of evidence). Although the quality of evidence represents a continuum, the GRADE approach provides the rating for the quality of the body of evidence in one of the four grades: high, moderate, low, or very low, defined as follows:

  • High: We are very confident that the true effect lies close to that of the effect estimate.

  • Moderate: We are moderately confident of the effect estimate and that the true effect is likely to be close to the effect estimate, but there is a possibility that it is substantially different.

  • Low: Our confidence in the effect estimate is limited, such that the true effect may be substantially different from the effect estimate.

  • Very low: We have very little confidence in the effect estimate, such that the true effect is likely to be substantially different from the estimate of effect.

Confidence ratings may decrease when there is increased risk of bias, inconsistency, imprecision, indirectness, or concern about publication bias.

In contrast to the therapeutic intervention, the GRADE approach suggests different criteria when the evidence comes from studies on diagnostic accuracy. Valid diagnostic accuracy studies—cross-sectional or cohort studies, in patients with diagnostic uncertainty, and in direct comparison of test results with an appropriate reference standard—provide high-quality evidence. However, they are often downgraded to lower quality evidence, because they are liable to limitations, particularly indirectness of outcomes, i.e., uncertainty about the link between the test accuracy and outcomes that are important to patients.

For each outcome, the quality of evidence was started on a high grade, became downgraded by one level when there was a serious issue identified, and by two levels when there was a very serious issue identified in each of the factors used in judging the quality of evidence.

Importance of DTA outcomes

In DTA, the importance of outcomes, including absolute effects (true positive, true negative, false positive, or false negative), was ranked according to their importance in decision-making as follows: critical, important, or of limited importance. We ranked all four outcomes as critical because both accurate diagnosis and misdiagnosis could influence mortality in critically ill patients.


Results of the search

We identified 4203 potentially eligible articles at the initial search (Fig. 1), of which 4192 articles were retained after de-duplicating. After screening the titles and abstracts, 4031 articles were found to be clearly irrelevant and were excluded. We retrieved the full texts of the remaining 160 records (155 observational studies and 5 randomized controlled trials) and assessed them for eligibility. Finally, we included 19 studies (19 observational studies and no randomized controlled trials) that enrolled 3012 patients [11, 21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37]. Among these, 18 [11, 21,22,23, 25,26,27,28,29,30,31,32,33,34,35,36,37] studies evaluated the diagnostic values of PCT in infection and 10 [11, 21, 24,25,26, 28,29,30, 33] determined the diagnostic accuracy of P-SEP in infection.

Fig. 1
figure 1

Flow chart of the identification and selection of studies for inclusion

Basic features of included studies

The characteristics of the included studies are summarized in Table 1. The earliest article [34] was published in 1999 while 18 were published after 2000, with 15 [11, 21, 23,24,25,26, 28,29,30,31, 33, 35,36,37] being published after 2010. Twelve studies [24, 26,27,28,29,30,31,32, 34,35,36] took place in Europe, 5 in Asia [11, 22, 25, 33, 37], and 1 each in North America [23] and Africa [21]. Seventeen [11, 21,22,23,24,25,26,27, 29, 30, 32,33,34,35,36,37] studies were conducted prospectively, and the others were retrospective. All studies described diagnostic cutoff thresholds for PCT or P-SEP. The cutoff thresholds widely varied between > 0.28 and > 4.5 ng/ml for PCT and between > 101.6 and > 1000 pg/ml for P-SEP.

Table 1 Characteristics of the included studies

Risk of bias of included studies

We illustrated the quality of the included 19 studies using the QUADAS-2 tool (Fig. 2). All studies had unclear or high risk of bias in at least one domain. Five studies [11, 21, 28, 29, 32] demonstrated unclear or high-risk patient selection bias, resulting mainly from inappropriate exclusion criteria and the absence of a clear definition for exclusion criteria. All studies demonstrated unclear or high risk of index test interpretation bias, owing to the lack of a clearly pre-specified cutoff threshold of PCT and P-SEP for a positive diagnosis.

Fig. 2
figure 2

Risk of bias and applicability concerns summary (a) and graph (b), review authors’ judgements about each domain, for each included study

We assigned a high concern for patient selection applicability in four studies [11, 22, 25, 36], which included critically ill adult patients, regardless of suspected bacterial infection. Only one study [25] focusing on severe burn patients was assigned an unclear concern for applicability with respect to the index test, because burns could be a possible source of increased PCT and P-SEP levels. None of the studies had high or unclear concerns for applicability with respect to the reference standard.

Diagnostic accuracy

The forest plot in Fig. 3 shows the sensitivity and specificity ranges for PCT and P-SEP for infection, across included studies. The pooled sensitivity for PCT and P-SEP were 0.80 (95% CI 0.75 to 0.84) and 0.84 (95% CI 0.80 to 0.88), respectively. The pooled specificities of these biomarkers were 0.75 for the former (95% CI 0.67 to 0.81, for PCT) and 0.73 for the latter (95% CI 0.61 to 0.82, for P-SEP). Fagan’s nomogram results for PCT indicated that the pooled pre-test probability ratio of 50% by positive/negative LR yielded positive/negative post-test probability ratio of 77% and 22%, respectively. Furthermore, Fagan’s nomogram results for P-SEP demonstrated positive/negative post-test probability ratio of 76% and 19%, respectively. More details can be found in Additional file 1: Figure S1.

Fig. 3
figure 3

Forest plots of PCT and P-SEP for the diagnosis of infection. The plot shows study-specific estimates of sensitivity and specificity with 95% confidence interval (CI). The studies were ordered according to the study names. PCT, procalcitonin; P-SEP, presepsin

We also constructed the sROC curves and calculated the area under ROC (AUROC) for included studies (Fig. 4). The overall diagnostic performance of PCT and P-SEP for infection were comparable (AUROC 0.84 [95% CI 0.81 to 0.87], and 0.87 [95% CI 0.84 to 0.90], respectively).

Fig. 4
figure 4

Summary ROC curves of PCT (the solid line) and P-SEP (the dashed line) for the detection of infection. Each pair of points represents the pair of sensitivity and specificity for each evaluation. PCT, procalcitonin; P-SEP, presepsin. The overall diagnostic accuracy of PCT and P-SEP for infection was moderate and comparable

Investigations of heterogeneity

Because there were substantial heterogeneities among pooled results of sensitivity and specificity for both PCT and P-SEP (Fig. 3), we performed several sensitivity analyses to explain the heterogeneities by investigating the study characteristics using meta-regression analysis. Univariate meta-regression analysis revealed that the sensitivity of heterogeneity among included studies might be attributable to several relevant factors, such as the risk of bias (low risk or high risk), publication years (until 2015 or after 2015), prevalence of infection (< 50% or ≥ 50%), sample size (< 100 or ≥ 100), study setting (inside ICU or outside ICU), clinical diagnostic criteria (sepsis-1, 2 or sepsis-3), and the cutoff value for each biomarker (< 1.0 or ≥ 1.0 for PCT, < 500 or ≥ 500 for P-SEP) (Fig. 5).

Fig. 5
figure 5

Univariate meta-regression analysis by several possible causes of heterogeneities. PCT, procalcitonin; P-SEP, presepsin. To correct for multiple comparisons with 10 relevant covariates, we considered a p value of < 0.05 as statistical significance. The sensitivity of heterogeneity among included studies might be attributable to several factors, such as risk of bias, publication years, and prevalence of infection

Head-to-head comparison of the two biomarkers

To compare the diagnostic performance of PCT and P-SEP in similar populations, we evaluated the nine studies which directly compared PCT and P-SEP in the same population [11, 21, 25, 26, 28,29,30, 33]. As a result, there were no statistically significant differences in both pooled sensitivities (p = 0.48) and pooled specificities (p = 0.57) between PCT and P-SEP. Besides, we conducted a head-to-head comparison of PCT and P-SEP in several subgroups stratified by study characteristics and found no statistically significant differences between the two biomarkers in any of the subgroups (Additional file 2).

Publication biases

We detected no evidence of publication biases, assessed by Deeks’ Funnel Plot Asymmetry Test (PCT: p = 0.67, P-SEP: p = 0.35) (Fig. 6).

Fig. 6
figure 6

Deeks’ funnel plot to estimate the presence of publication bias. PCT, procalcitonin; P-SEP, presepsin; ESS, effective sample size. We detected no evidence of publication bias (PCT: p = 0.67; P-SEP: p = 0.35)

Quality of DTA evidence using the GRADE system (or approach)

We summarized the main findings and the quality of evidence for each outcome across all the studies in the GRADE evidence profile (Table 2). Because, all included studies showed unclear or high risk of index test interpretation bias, the quality of evidence of the outcomes was downgraded by one level (from high to moderate). Furthermore, with the substantial heterogeneities among pooled estimates of the sensitivity and specificity in included studies, and because the source of heterogeneities could not be fully explained by the results of our sensitivity analysis, the quality of evidence of the outcomes was again downgraded by one level (from moderate to low). We found no serious indirectness and imprecision. Finally, the quality of the body of evidence supporting PCT and P-SEP for the diagnosis of infection were both graded as “low” for true positive, false negative, false positive, and true negative. Consequently, we graded the overall quality of evidence as “low.”

Table 2 GRADE evidence profile to determine the accuracy of PCT and P-SEP when used to diagnose bacterial infection in adult critically ill patients


Summary of the main results

On the basis of the pre-defined protocol [17], the present meta-analysis and systematic review, which included 19 studies from several regions of the world, assessed the diagnostic accuracy of the index tests in participants with established infection in the critical care setting. To the best of our knowledge, this is the first study using appropriate methodologies and quality assessment tools, which provide evidence that there is no difference in the diagnostic performance of PCT and P-SEP in critically ill patients. Analyses of AUROC revealed AUC values of 0.84 for PCT and 0.87 for P-SEP with modest sensitivity and specificity. Positive and negative LRs for both biomarkers were sufficiently relevant as additional diagnostic tools in cases of infection, which were often indistinguishable from non-infectious disorders in critically ill patients.

Our sensitivity analysis suggested that several factors might have been responsible for the substantial heterogeneity across included studies. There were no obvious threshold effects for both PCT and P-SEP, partly because almost all included studies claimed that in each dataset, the post-specified cutoff threshold calculated by ROC analyses maximizes the diagnostic performance of either biomarker.

Roles of procalcitonin and presepsin in the diagnosis of sepsis

Given that the SSC guideline emphasizes early diagnosis to improve the clinical outcomes in sepsis [3], developing diagnostic strategies for infection is still required for accurate bedside diagnosis of infections. Although PCT has been widely reported to be an optimal biomarker in the diagnosis of sepsis [38, 39], more recent studies have produced conflicting results [4,5,6]. P-SEP, which is released into circulation after the activation of the pro-inflammatory signaling cascade upon contact with infectious agents [40], is emerging as a novel circulating marker for sepsis [9]. However, the clinical value of these biomarkers, independently or in combination, is still at investigative stages. Indeed, there are limited meta-analyses and systematic reviews comparing the prognostic performance of P-SEP with PCT for the diagnosis of early-stage sepsis in critically ill patients. Thus, there is a lack of evidence to suggest the relevance of the triaging for these tests. Besides, it is still unclear whether testing for these biomarkers is an addition to the existing tests or a replacement, whether partial or complete.

Quality of the evidence

The quality of the body of evidence supporting PCT and P-SEP for the diagnosis of infection was judged as “low” for both markers. As almost all included studies did not pre-specify a clear cutoff threshold of PCT and P-SEP for a positive diagnosis (thus indicating an unclear or high risk of index test interpretation bias), we downgraded the quality of evidence by one level for the risk of bias. Consequently, we suggested the measurements of PCT and P-SEP as an optional diagnostic tool for infection and sepsis in critically ill patients. However, further researches are likely to have an important impact on our findings, which may result in changes to the recommendation.

Strength of the review

Several methodological strengths have enhanced the validity and applicability of our findings. This systematic review and meta-analysis included: (1) any study that measured PCT or P-SEP levels in critically ill adult patients with suspected sepsis, in whom the confirmation of sepsis was by clinical diagnosis or by microbiological confirmation of infection in cultures, or both; (2) comprehensive systemic search without any language restriction to the electronic searches (including 4203 published studies); (3) main results were discussed both by direct comparison and by subgroup analyses; (4) detailed subgroup analyses were performed to solve the heterogeneity concerns; and (5) quality of DTA evidence was rated through a transparent and structured process based on the GRADE approach.

Agreements and disagreements with other studies

Recently, one meta-analysis comparing P-SEP with other biomarkers (PCT and C-reactive protein) reported that P-SEP has similar diagnostic accuracy as PCT or C-reactive protein [41]. The most important feature distinguishing our analysis from this previous meta-analysis is that we focused only on studies evaluating participants with critical illnesses, such as acute respiratory distress syndrome and sepsis, but not healthy volunteers. The former study had high heterogeneity likely because it included normal healthy volunteers as controls. Thus, it should be interpreted more cautiously. As they reported, methods, to distinguish sepsis from non-infectious causes of inflammation, are necessary in clinical situations [41].

Limitations of the review

This study has several limitations. First, the microbiological approaches used for the definitive diagnosis of infection varied from one study to another; therefore, some degree of misclassification bias might have existed. Second, most included studies did not report the pre-specified cutoff thresholds for the biomarkers, which might have increased the risk of bias. Finally, since there was insufficient number of data available, we could not compare the bacterial, viral, and fungal infection population groups only. The usefulness of PCT and P-SEP in viral and fungal infection groups remains unknown.


Our meta-analysis suggests that both PCT and P-SEP are helpful biomarkers for the early diagnosis of sepsis in critically ill adult patients. However, considering the need to avoid misdiagnosis or delayed diagnosis, the use of PCT or P-SEP tests in combination with other clinical modalities for sepsis diagnosis is recommended to improve diagnostic accuracy and patient outcomes.