FormalPara Take-home message
In this meta-analysis, blood NGAL and cystatin C as well as urinary TIMP-2 × IGFBP-7 were found to have some potential as biomarkers for prediction of RRT in AKI. Further studies should better clarify their incremental value as decision-support for starting RRT.

Introduction

Acute kidney injury (AKI) is a frequent complication in critically ill patients in intensive care units (ICU), increasing significantly both hospital mortality and morbidity [1, 2]. The disease management of these patients often comprises several conservative interventions (specifically prompt resuscitation of circulation, avoidance of nephrotoxins, and avoidance of volume overload [3]). If such interventions are not effective, these patients may end up being treated with renal replacement therapy (RRT).

Despite considerable research efforts, it is still unclear whether and when RRT should be commenced to improve outcome of these critically ill patients with AKI. Early initiation of RRT may reduce mortality but it comes with a higher risk of treatment-related complications, such as catheter-related bloodstream infections [4, 5]. Likewise, clinicians have few tools to know with certainty whether kidney recovery is imminent or likely that can be integrated to guide decision-making for use of RRT [6].

The growing numbers of kidney function and damage biomarkers, often showing contradictory findings, have made clinical inferences about their utility challenging [7, 8]. Importantly, few studies have specifically evaluated the value of biomarkers to inform about the likelihood a patient with AKI will worsen or persist and progress to receive RRT.

To begin to address this issue, we conducted a systematic review and meta-analysis to assess the ability of available molecular and physiologic biomarkers to predict the initiation of RRT in critically ill patients suffering from AKI. Preliminary results of this meta-analysis have been previously partially presented as an abstract [9].

The aim of this review is to provide a rigorous summary of currently available data of biomarker-guided decision-support for the initiation of RRT in critically ill patients with AKI.

Methods

Review design

This is a systematic review and meta-analysis of studies reporting biomarkers for the prediction of RRT. The protocol design follows the “Preferred Reporting Items for Systematic Reviews and Meta-Analysis” (PRISMA) guidelines (available in the Electronic Supplemental Material [ESM_5]) [10]. No modifications of the initial research process were applied.

Search strategy

Systematic searches of MEDLINE (through the PubMed interface), Embase, and the Cochrane Central Register of Controlled Trials (CENTRAL) were conducted in consultation with a research librarian from inception to 18 September 2017. The search strategy included terms related to AKI, RRT, and AKI-associated biomarkers (e.g., NGAL, KIM-1, cystatin C, IL-18, L-FABP). The comprehensive search strategy is available in the electronic supplemental material (ESM_1). As a supplementary approach, manual searches were performed and comprised screening of reference lists to cross-check for additional eligible papers.

Study identification and eligibility criteria

After removal of duplicates, titles and abstracts were screened for potential eligibility independently by two reviewers (SJK and MJ) using abstrackr [11]. Retrieved full texts were screened for study inclusion in a standardized manner by two reviewers (SJK and MJ). Disagreement in study assignment at any step of the inclusion process was resolved by consensus. Primary, empirical, quantitative studies (randomized controlled trials, non-randomized trials with and without control group, cohort, cross-sectional or case–control studies) published in peer-reviewed journals in English and German language were included in qualitative and quantitative analysis.

Results from other meta-analyses of the field were not included. Eligible studies evaluated the predictive ability of one or more biomarkers measurable in blood or urine samples for initiation of RRT in critically ill patients with AKI. We did not restrict our inclusion criteria for sample size or minimal follow-up period. Accepted reporting of the outcomes was either the area under the receiver operating characteristic curve (AUC), risk ratio (RR), or odds ratio (OR). Studies reporting a combined outcome of interest (e.g., AUC for initiation of RRT AND mortality) as well as studies focusing on children with AKI were excluded.

Assessment of risk of bias

Following the adapted QUADAS-2 tool [12], two authors (SJK and AKB) independently assessed the risk of bias for the included studies. Since there is no reference/gold standard for determination of the ideal time point for initiation of RRT, “Domain 3: Reference Standard” was adapted to “Domain 3: RRT Initiation”, in which the criteria used to determine receipt for RRT were rated to correctly classify the need for RRT initiation (e.g., conventional clinical indications). The adapted version of the tool was pilot-tested on ten randomly selected studies. The applied version of the adapted QUADAS-2 tool is available in the ESM (ESM_4). Discrepancies were resolved by consensus.

Data extraction

Data elements from included studies were extracted using a predesigned extraction sheet, comprising study characteristics (i.e., study design, participant demographics, case-mix, intervention, comparators, outcomes) as well as setting and time intervals. Intervention-specific characteristics of the study population such as investigated biomarkers plus comparators and their corresponding assays were retrieved. Values for AUC, RR and OR, sensitivity and specificity, as well as positive predictive value and negative predictive value were extracted for quantitative analysis.

Data synthesis and statistical analysis

A meta-analysis was performed to create pooled values for the biomarkers. If stated in at least two studies, a biomarker was included in meta-analysis using a random effects model. If in one study a biomarker was measured at multiple time points, the value closest to 24 h after study inclusion was chosen. As a result of heterogeneity in the selected studies, especially in the chosen threshold values for the biomarkers, we refrained from creating pooled sensitivities and specificities. Subgroup analyses were performed if feasible and can be found in the electronic supplemental material. To investigate homogeneity between the studies, we conducted Chi-squared tests and created an I2 index for every pooled AUC. To address the between-trial heterogeneity, biomarker results were paired if reported in the same study. Funnel plots were constructed to provide a visual assessment of potential for publication bias when at least 10 studies were included in the meta-analysis [13].

Results are presented as pooled AUCs with a 95% confidence interval (CI). If there was no standard error stated for an AUC, it was calculated from its confidence intervals or using Hanley’s method [14]. Results were interpreted according to AUC values ranging from 0.90 to 1.0 indicating excellent, 0.80–0.89 indicating good, 0.70–0.79 indicating fair, 0.60–0.69 indicating poor, and 0.50–0.59 indicating no useful value [15]. p values < 0.05 were considered statistical significant.

Meta-analysis was performed with the metafor package [16] for R (version 3.4.0, R Foundation for Statistical Computing, Vienna, Austria) [17] and MedCalc (version 17.4, MedCalc Software, Ostend, Belgium).

Results

Search results and study characteristics

The search identified 1501 studies. Of these, 215 were included in full text screening. A total of 152 studies were further excluded (ESM_1_Fig. 7). Sixty-three studies were selected for qualitative analysis. The included studies were published between 2004 and 2017. The majority were conducted in intensive care unit (ICU) settings. Nine studies were conducted on general wards [18,19,20,21,22,23,24,25,26]. Of the studies conducted in ICUs, 30 were conducted in patients with mixed etiologies for AKI [27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56]. Ten trials included patients after cardiac surgery [57,58,59,60,61,62,63,64,65,66], three studies investigated patients after major non-cardiac surgery [67,68,69], two studies included patients after kidney transplantation [70, 71], and three studies included only patients with sepsis [72,73,74] while another two studies only comprised patients with malaria [75, 76]. Other etiologies of AKI investigated only by a single trial were contrast-associated AKI [77], Shiga toxin-mediated hemolytic uremic syndrome [78], burns [79], and crush syndrome [80]. Of included studies, only 9 (14.28%) had the primary aim focused on the prediction of RRT [18, 32, 41, 45, 50, 56, 72, 75, 78]. A total of 41 studies, including a total of 13 biomarkers, were included in the quantitative synthesis (meta-analysis). Qualitative synthesis of included studies allowed the appraisal of 62 biomarkers. Of these, 31 biomarkers were measured in blood and 31 in urine. Combinations of biomarkers and clinical scores (e.g., APACHE II, AKIN criteria) as well as biomarker combinations (e.g., TIMP-2 × IGFBP-7) were also evaluated.

Included studies showed considerable heterogeneity in sample sizes, ranging from 31 [62] to 1439 patients [46] (median 122.5).

Characteristics of included studies are presented in Table 1 of the ESM (ESM_2_Tab. 1).

Quality assessment

Overall quality of included trials was moderate. The results of the QUADAS-2 evaluation are provided in the ESM (ESM_1_Fig. 8; ESM_1_Tab. 2). While sample collection and biomarker analysis were quite similar throughout all studies, only 15 studies [18, 27, 32, 37, 41, 50, 52, 56, 57, 61, 64, 68, 71, 75, 76] specified the clinical indications for initiation of RRT. Of these studies, seven [19, 27, 50, 56, 61, 64, 76] were rated as having set predefined criteria for RRT initiation. A prespecified biomarker level threshold was used only in four trials [29, 44, 72, 74].

Meta-analysis of biomarker performance

Urinary biomarkers

Altogether nine urinary biomarkers were eligible for meta-analysis. The number of included studies and number of patients analyzed in each forest plot differed significantly, ranging from two to 12 analyzed studies and from 133 to 3412 patients. Urinary NGAL was analyzed in 12 studies [23, 27, 31, 33, 40, 45, 50, 52, 56, 66, 72, 75], with a pooled AUC of 0.720 (95% CI 0.638–0.803) if uncorrected urinary concentrations were used (Fig. 1a) and in seven studies [28, 33, 35, 50,51,52, 60] with a pooled AUC of 0.727 (0.678–0.776) if normalization to urinary creatinine was applied (Fig. 1b). Urinary interleukin 18 (IL-18) was analyzed in five studies [23, 33, 40, 46, 51], had a pooled AUC of 0.668 (0.606–0.729) (ESM_1_Fig. 9a), and if normalized to urinary creatinine had a pooled AUC of 0.761 (0.661–0.862) in two studies [33, 51] (ESM_1_Fig. 9b). There was evidence of a publication bias for urinary NGAL (concentration), as shown in the funnel plot in the ESM (ESM_1_Fig. 14a). For urinary NGAL, sensitivity analysis (different thresholds and population) revealed no significant findings and can be found in the ESM (ESM_1_Fig. 19a, ESM_1_Fig. 19b; ESM_1_Tab. 6).

Fig. 1
figure 1

Forest plots of urinary NGAL predicting RRT. a Urinary concentration of NGAL. b Urinary NGAL normalized to urinary creatinine. AUC area under the curve, RE Model random effects model, CI confidence interval, RRT renal replacement therapy (i.e., number of patients having received RRT)

Urinary cystatin C showed a fair predictive performance with a pooled AUC of 0.722 (0.575–0.868) (three studies) (ESM_1_Fig. 10) and 0.790 (0.645–0.934) when it was normalized to urinary creatinine, thereby adjusting for urine volume (four studies) (Fig. 2).

Fig. 2
figure 2

Urinary cystatin C normalized to urinary creatinine. AUC area under the curve, RE Model random effects model, CI confidence interval, RRT renal replacement therapy (i.e., number of patients having received RRT)

Urine output (UO), analyzed in two studies [41, 49] including 604 patients, showed the lowest predictive ability with a pooled AUC of 0.614 (0.389–0.840) (ESM_1_Fig. 13a). Kidney injury molecule-1 (KIM-1) had a pooled AUC of 0.648 (0.540–0.756) in two studies [33, 51] if it was normalized to urinary creatinine (ESM_1_Fig. 11b) and of 0.594 (0.499–0.689) if it was not normalized in three studies [33, 40, 51] (Fig. ESM_1_Fig. 11a).

Urinary N-acetyl-beta-d-glucosaminidase (NAG) had a pooled AUC of 0.709 (0.502–0.915) (two studies [19, 23], 176 patients) (ESM_1_Fig. 13b).

The combination of TIMP-2 × IGFBP-7, analyzed in four studies [40, 63, 67, 70] and 280 patients, showed a pooled AUC of 0.857 (0.789–0.925) (Fig. 3). These two markers were also analyzed separately by two studies [40, 70] comprising 133 patients: for TIMP-2 alone the pooled AUC was 0.780 (0.509–1.000) (ESM_1_Fig. 12a) and IGFBP-7 alone showed a pooled AUC of 0.716 (0.463–0.969) (ESM_1_Fig. 12b).

Fig. 3
figure 3

Forest plot of urinary TIMP-2 × IGFBP-7 predicting RRT. AUC area under the curve, RE Model random effects model, CI confidence interval, RRT renal replacement therapy (i.e., number of patients having received RRT)

The fractional excretion of sodium (FeNa) was assessed in two studies [40, 75] including 240 patients. Its pooled AUC showed a fair predictive performance of 0.718 (0.619–0.816) (ESM_1_Fig. 13c).

Blood biomarkers

Of biomarkers assessed in blood, four were included in this meta-analysis.

Plasma, serum, and/or whole blood NGAL, analyzed in 22 studies [18, 21, 24, 28,29,30,31, 35, 40, 42, 48, 52, 56, 62, 64, 66, 68, 71, 72, 77,78,79] and 4391 patients, had a pooled AUC of 0.755 (0.706–0.803) (Fig. 4). When comparing NGAL measured in plasma and serum, the difference of values across the included studies was 0.114 (p = 0.1593). There was evidence of publication bias, as can be noticed in the funnel plot (ESM_1_Fig. 14b). Two sensitivity analyses were performed investigating different thresholds, namely 150–350 ng/ml and > 600 ng/ml. Pooled AUCs were 0.742 (0.678–0.805) and 0.779 (0.689–0.870) for the lower and higher threshold, respectively, with no significant difference (p = 0.5613). Excluding studies with specific patient populations (e.g., after cardiac surgery) afforded a pooled AUC of 0.747 (0.685–0.808). Figures and tables can be found in the ESM (ESM_1).

Fig. 4
figure 4

Forest plot of plasma, serum, and whole blood NGAL predicting RRT. AUC area under the curve, RE Model random effects model, CI confidence interval, RRT renal replacement therapy (i.e., number of patients having received RRT)

Plasma and serum cystatin C was investigated in eight studies [23, 32, 37, 41, 44, 50, 53, 57]; one was excluded for duplicate publication [32], and the remaining seven showed a pooled AUC of 0.768 (0.729–0.807) derived from 1079 patients (Fig. 5).

Fig. 5
figure 5

Forest plot of plasma and serum cystatin C predicting RRT. AUC area under the curve, RE Model random effects model, CI confidence interval, RRT renal replacement therapy (i.e., number of patients having received RRT)

Plasma and serum creatinine was investigated in 15 studies [21, 23, 41, 44, 49, 50, 56, 57, 62, 66, 71,72,73, 75, 77] totaling 2969 patients and showed a pooled AUC of 0.764 (0.732–0.796) (Fig. 6). A funnel plot showed low probability of publication bias for creatinine (ESM_1_Fig. 14c).

Fig. 6
figure 6

Forest plot of plasma and serum creatinine predicting RRT. AUC area under the curve, RE Model random effects model, CI confidence interval, RRT renal replacement therapy (i.e., number of patients having received RRT)

Paired analysis was conducted for plasma, serum, and whole blood NGAL which performed slightly worse than plasma/serum creatinine in the unpaired, pooled analysis. When only studies reporting both NGAL and creatinine were considered, the predictive performance of NGAL improved, being slightly better than creatinine with an average AUC improvement of 0.013. Additional paired analysis as well as sensitivity analysis for creatinine and cystatin C did not result in significant results (ESM_1).

Blood urea nitrogen (BUN) showed a fair predictive performance in two studies [41, 75] and 283 patients. The pooled AUC was 0.732 (0.661–0.802) (ESM_Fig. 13d).

When comparing biomarkers, which were evaluated both in blood and urine (e.g., NGAL), we observed no significant differences between pooled AUCs (results can be found in the ESM [ESM_1]).

Biomarkers not included in the meta-analysis

A forest plot could not be established in 27 biomarkers measured in blood and in 24 urinary biomarkers, because of reporting in a single study only.

Of these biomarkers, the best predictive ability was shown for urinary FABP-1 and FABP-3 with AUCs of 0.9995 and 1, respectively [32]. In patients undergoing kidney transplantation, urinary vascular endothelial growth factor (VEGF) had the best predictive value shortly (4 h) after surgery with an AUC of 0.85 (0.72–0.99); the 8-h time point had an AUC 0.81 (0.64–0.98) and after 12 h VEGF had an AUC of 0.77 (0.59–0.94) [70].

Some studies reported combinations of biomarkers or combinations of a biomarker with clinical parameters. Plasma NGAL combined with creatinine showed an excellent predictive value with an AUC of 0.914 (0.827–1.000) [21].

Conducting a furosemide stress test (FST) had an AUC of 0.86 (0.78–0.94), while combining an FST with urinary TIMP-2 × IGFBP-7 resulted in an AUC of 0.89 (0.82–0.96). The combination of an FST and other biomarkers (IGFBP-7, NGAL, uromodulin) also showed slight AUC improvements [40].

A combined model of serum NGAL (cutoff ≥ 300 ng/ml) and AKIN ≥ stage I had an odds ratio (OR) of 20.00 (3.07–160.65), while AKIN ≥ stage I had an OR of 7.30 (1.01–66.07) [78].

A list of all biomarkers reported in the included 63 studies is available in the ESM (ESM_3).

Discussion

Key findings

We performed a systematic review to identify the most promising biomarkers which may potentially guide decision-making for timing of initiation of RRT among patients with AKI. We found 63 heterogeneous studies investigating 62 different blood and urine biomarkers for the prediction of likelihood for starting RRT. Overall, study quality was moderate and most studies failed to integrate standardized criteria for RRT initiation, thereby potentially increasing risk of bias or confounding by indication. By far the largest number of studies was available for NGAL. However, of biomarkers measured in blood, cystatin C showed the best predictive value, followed by creatinine, which alone showed fair discrimination, and NGAL. While urinary TIMP-2 × IGFBP-7 was showing the best predictive value in its pooled AUC, the number of studies and total sample size were relatively small compared to other biomarkers and therefore limit the significance of this finding. Cystatin C was the second best urinary biomarker with a significantly larger body of evidence.

Context with existing literature

Since damage in AKI at first manifests in renal tubular cells, urinary biomarkers are considered most sensitive for AKI diagnosis. Whether this is also true for prediction of RRT has not been addressed extensively. In our systematic review the largest body of evidence was found for urinary NGAL, but overall it showed a very heterogeneous performance in included studies with AUCs ranging between 0.470 [52] and 0.884 [21]. Our results are quite similar to those of a previously published meta-analysis reporting a pooled AUC of 0.782 (0.648–0.917) for the prediction of RRT [81]. Urinary NGAL is present in different molecular forms (mono-, homo-, and heterodimeric) [35], depending on its origin either representing filtered serum NGAL released by neutrophils or that released directly from damaged tubular cells. Systemic NGAL levels may rise as activated neutrophils release it with their granular content, and increase the filtered quantity and thereby increasing urinary NGAL levels unrelated to any acute renal damage [82]. It is important to note that different commercially available NGAL assays measure various molecular forms depending on their antibody combination which may partly explain the large variation in the predictive performance of NGAL.

Except for one trial [40], TIMP-2 × IGFBP-7 [83] showed a good predictive performance with individual AUCs well above 0.8, with the pooled AUC confirming these values. The performance of the combined biomarkers was superior to the individual markers. Both biomarkers are involved in the G1 cell cycle arrest, which occurs in the early stages of cell injury [84]. Interestingly, TIMP-2 was found to perform slightly better than IGFBP-7 in patients with sepsis-induced AKI, while IGFBP-7 outperformed TIMP-2 in surgical patients. Clinical applicability of these findings may be limited, but it supports the combined use of those two biomarkers to provide more consistent results [84]. Despite the fact that those cell cycle arrest biomarkers showed the best predictive performance for the initiation of RRT, the total number of investigated patients is still small and further studies are warranted evaluating possible influences like pulmonary disease and diabetes mellitus on their levels as shown by Bell et al. [85, 86].

All other biomarkers measured in urine had pooled AUCs below 0.8, leaving considerable uncertainty whether they could sufficiently predict the initiation of RRT.

Of biomarkers measured in blood, plasma/serum cystatin C performed best, followed by plasma/serum creatinine and NGAL. However, differences in AUCs were marginal. Cystatin C is considered an established marker for glomerular filtration rate (GFR) in chronic kidney disease but has also been demonstrated to detect AKI earlier than creatinine does in critically ill patients [87, 88]. Still, creatinine showed a fair predictive performance, despite being potentially influenced by age, sex, body weight, muscle mass, and drugs [89,90,91]. One possible explanation for this finding could be the fact that those biomarkers are often used as a trigger for the initiation of RRT and therefore provide a bias for the predictive performance. Serum creatinine can be serially measured with relative ease and low cost, so multiple serum creatinine values over the course of an episode of AKI showing trends for worsening or not improving may also serve as an important trigger. This is the theory behind the concept of creating a kinetic estimation of GFR [92]. However, two studies [41, 56] used creatinine both as a trigger for RRT and also evaluated the predictive performance of creatinine in the same cohort. Interestingly, the obtained AUCs in these two studies were still lower than the pooled AUCs for creatinine of this meta-analysis, indicating that the impact of this bias might not be that important. The same pattern can be noticed in BUN, which was along with others a predefined trigger for RRT initiation in two studies [50, 56].

Investigated in a larger number of studies, plasma, serum, and whole blood NGAL showed quite similar performance compared to cystatin C and creatinine. As mentioned above, NGAL levels were also identified as being influenced by inflammatory processes or sepsis [93,94,95]. This is an important consideration, because sepsis is the most frequent cause of AKI in critically ill patients [1]. Interestingly, one study found no significant difference in plasma NGAL levels between septic shock patients with and without AKI [95]. Furthermore in patients with severe sepsis, NGAL showed only fair prediction of RRT with an AUC of 0.700 [72]. Overall, though NGAL has been reported by several studies as being a biomarker which predicts AKI earlier than serum creatinine does [81], we could not confirm that NGAL was superior to creatinine for predicting requirement for RRT. However, when we compared NGAL and creatinine and analyzed only those studies reporting results for both biomarkers, the AUC for NGAL improved, slightly outperforming creatinine.

For most biomarkers measured in blood, different samples were analyzed, namely plasma, serum, and in some cases whole blood samples. When comparing these various samples, no significant differences were noted, so overall it does not seem to have a major impact, where those biomarkers are measured.

Though single biomarker assessments may add incremental support to guide clinical decision-making, it becomes clear from the data that they should not be used in isolation. Therefore a promising approach seems to be the combination between biomarkers and clinical parameters. Unfortunately, only few studies investigated various combinations and none of them have been replicated; thus, a meta-analysis was not possible. However, some of the single studies showed remarkable results. For example, Koyner et al. [40] showed that an FST outperformed each of the additionally assessed urinary biomarkers. The FST may be considered a functional test revealing the loss of tubular functional capacity or the severity of AKI. As such it adds additional information to clinical criteria alone. This may explain why the general combination of FST with individual biomarkers did improve predictive value only in those patients with increased biomarker levels (urinary NGAL > 150 ng/ml, urinary TIMP-2 × IGFBP-7 > 0.3 [40]), but not when biomarker levels were not elevated.

A major limitation of biomarker studies evaluating the prediction of RRT is the fact that a gold standard for this end point is missing, because it is still unclear whether and when to commence RRT. As previously mentioned, only 15 studies stated the criteria for initiation of RRT and only seven studies had predefined criteria for RRT initiation. This limits the applicability and significance of published results, especially in the case of AUCs, since those rely heavily on the comparison with an established gold standard [96,97,98]. Two recent trials (the “Effect of Early vs Delayed Initiation of Renal Replacement Therapy on Mortality in Critically Ill Patients With Acute Kidney Injury” [ELAIN] trial [5] and the “Comparison of standard and accelerated initiation of renal replacement therapy in acute kidney injury” [STARRT-AKI pilot trial] [99, 100]) employed a preset NGAL threshold as an inclusion criterion, while another study used NGAL to guide the early initiation of RRT [101]. While in the ELAIN trial, NGAL was found to detect patients with progressively deteriorating AKI, in the STARRT-AKI pilot trial NGAL was found to be universally elevated, but did not show good discriminative value between patients requiring RRT or not [5, 100].

Implications for clinicians, policy, and research

The results of this meta-analysis may have significant implications for clinicians and researchers. Clinicians may be encouraged to utilize novel biomarkers to improve risk stratification for patients with AKI but this must be tempered by the fact that there is uncertainty as to the role of the additional information as well as the financial implications associated with this technology. The biomarkers showing fair to good prediction of RRT are actually markers of renal stress, damage, and/or (loss of) glomerular filtration rate. But clinically, the decision to start RRT is not simply based on the severity of kidney damage but rather on the imbalance between the patient’s remaining renal capacity and the demands characterized by the severity of acute disease, comorbidities, metabolism as well as solute and fluid load (i.e., “demand–capacity imbalance”) [102, 103]. Hence, it is clear that to enable prediction of the need for RRT more focused validation studies are needed, as well as studies investigating outcomes based on various biomarker thresholds. For NGAL, while statistically not significant, there was a trend that a threshold > 600 ng/ml improves prediction of RRT, as can be seen in our sensitivity analysis. As a result of insufficient data, we were not able to perform sensitivity analysis on TIMP-2 × IGFBP-7, for which two cutoffs, one of 0.3 and a high-sensitivity cutoff of 2.0, are available [104].

Of importance for further evaluation of biomarkers seems to be the time point of assessment. Not all biomarkers have the same “window of opportunity”. For example, urinary VEGF had the best predictive value early after the insult and thereafter it declined over the following 12 h [70]. The cell cycle arrest biomarkers urinary TIMP-2 and IGFBP-7 demonstrated the opposite kinetics, with their predictive ability rising over the first 12 h [70]. This is an important detail for future studies evaluating the prognostic ability of biomarkers. Generally, biomarker research evaluating the necessity of a certain treatment (e.g., RRT) should focus on the ability of a biomarker to discriminate patients that may potentially benefit from this therapy from those with low likelihood of benefit. However, the studies in this review aimed to predict a clinical diagnosis (e.g., AKI) or the initiation of a therapy (e.g., RRT) on the basis of biomarker profiles, without examining the impact this treatment has on the outcome. For the case of RRT in AKI patients, this would require a study investigating whether there is benefit in early RRT initiation guided by biomarker profiles.

Limitations

Our review has several limitations. First, there was significant heterogeneity present in the included studies. Possible sources of this heterogeneity are the variation in study design, differences in size (power), variations in case-mixes, bias in RRT initiation standards (not defined)—hence they are susceptible to practice variation; variation in AKI etiology/diagnosis/reference standard; differences in biomarker measurement timing relative to AKI injury and intrinsic variation in biomarker properties, assays, and sample handling. Since AKI can result from different injury pathways, a biomarker demonstrating good predictive value in patients after cardiac surgery may not have equal value in patients after major abdominal surgery. This heterogeneity precludes clear and clinical applicable conclusions. Second, the only performance measures which were available in enough studies to be included in a meta-analysis were the stated AUCs. Additionally, we did not limit the eligibility criteria of studies to the reported follow-up period for initiation of RRT. This is a consequence of a lack of harmonization of reporting criteria, possibly resulting in the false classification of the predictive ability of the evaluated biomarker [96]. We encountered this problem by screening included studies for the stated reasons for RRT initiation. Evaluation of possible publication bias by creating funnel plots was only possible for NGAL and creatinine, and these results showed a trend towards a publication bias for NGAL; however, we cannot preclude potential publication bias for biomarkers which had insufficient data to create funnel plots. Another important limitation was that most of the included studies were designed for evaluating biomarker performance for prediction of AKI. As a result, data on prediction of RRT was also often incompletely reported. Additionally, many trials were of small sample size, with only a few patients requiring RRT, possibly resulting in an underestimation of biomarker performance. This may have diminished the correct interpretation of results.

Conclusion

Only few candidates have been identified with a fair potential to aid clinical decision-making for when to start RRT among patients with AKI. However, the strength of evidence in our review would largely preclude their routine use pending further validation. Future work should further characterize the role biomarkers may have in decision-support for starting RRT among patients with AKI.