Background

Acute kidney injury (AKI) is associated with a higher risk of chronic kidney disease (CKD), end-stage renal disease (ESRD), and long-term adverse cardiovascular effects [1, 2]. Due to the lack of effective treatment for impaired kidney function, the best strategy in clinical practice is to identify AKI as early as possible, reverse its cause, and even improve the sequelae. In the past decades, several serum creatinine (SCr)-based classification systems have been proposed to define AKI [3]. Serum creatinine has traditionally served as a surrogate of kidney function, despite its limitations as a diagnostic surrogate of AKI [4]. The limitations of SCr include a lack of steady-state conditions in critically ill patients, and that the determinants of SCr (rate of production, apparent volume of distribution, and rate of elimination) are variable. Therefore, there is an unmet need for other objective measures to help detect AKI in a timely manner. The role of several biomarkers in the early prediction or risk assessment of AKI has been proposed, including kidney tubular damage markers (e.g., neutrophil gelatinase-associated lipocalin (NGAL), kidney injury molecule-1 (KIM-1), liver-type fatty acid-binding protein (L-FABP)) [5,6,7,8,9], inflammation markers (e.g., interleukin-18 (IL-18)) [6, 10, 11], and stress markers (e.g., tissue inhibitor of metalloproteinases-2 and insulin-like growth factor-binding protein-7 (TIMP-2 ×  IGFBP-7)). The ADQI expert group suggests that routine clinical assessments should be combined with stress, damage, and functional biomarkers to stratify risk, discriminate etiologies, assess severity, plan management, and predict the duration and recovery of AKI [12]. In addition, previous meta-analyses including patients with various clinical scenarios have suggested that these biomarkers hold promise as practical tools in the early prediction of AKI [5, 13,14,15,16,17]. However, few studies have compared the diagnostic accuracy of these AKI biomarkers, and systematic assessments of the quality of evidence, which can provide updated information for clinical guidelines, are lacking. Therefore, the aim of this study was to compare the reported predictive accuracy of AKI biomarkers in various clinical settings and appraise the quality of evidence using a pairwise meta-analysis. The findings of this study may be used to update guidelines and recommendations.

Methods

Search strategy and selection criteria

We conducted this pairwise meta-analysis according to the Preferred Reporting Items of Systematic Reviews and Meta-Analyses (PRISMA) statement [18] and used Cochrane methods [19]. We prospectively submitted the systematic review protocol for registration on PROSPERO [CRD42020207883].

Data sources and search strategy

The primary outcome was incident AKI. Electronic searches were performed on PubMed (Ovid), Medline, Embase, and Cochrane library from inception to August 15, 2022 (Additional file 1: Appendix). We screened references by titles and abstracts and included related studies for further analysis. Reference lists of related studies, systematic reviews, and meta-analyses were manually examined to identify any possible publications relevant to our analysis. Both abstracts and full papers were selected for quality assessment and data synthesis.

Inclusion and exclusion criteria

The inclusion criteria were as follows: (1) clinical studies that included participants over 18 years of age and of any ethnic origin or sex; (2) studies that reported candidate AKI biomarkers including NGAL, KIM-1, L-FABP, IL-18, and TIMP-2 × IGFBP-7; and (3) studies that assessed the occurrence of incident AKI. The exclusion criteria were as follows: (1) studies including patients who had previously received dialysis; (2) studies including pregnant or lactating patients; (3) letters, conference or case reports; and (4) studies that lacked data on sensitivity or specificity of biomarkers to predict the occurrence of AKI. Only regular full papers were selected for quality assessment and data synthesis. We contacted the authors of abstracts for further detailed information, if available.

Study selection and data extraction

Six investigators (Heng-Chih Pan, Terry Ting-Yu Chiou, Chih-Chung Shiao, Che-Hsiung Wu, Hugo You-Hsien Lin, and Ming-Jen Chan) independently reviewed the search results and identified eligible studies. Any resulting discrepancies were resolved by discussion with a seventh investigator (Vin-Cent Wu). All relevant data were independently extracted from the included studies by eight investigators (Heng-Chih Pan, Chih-Chung Shiao, Terry Ting-Yu Chiou, Yih-Ting Chen, Chun-Te Huang, Ya-Fei Yang, Shu-Chen Yu, and Zi-Ming Chen) according to a standardized form. Extracted data included study characteristics (lead author, publication year, population setting, biomarkers, study endpoint, sample size, events, timing of measurements) and participants’ baseline data (mean age (years), gender (%), comorbidities, severity of illness). When available, odds ratios and 95% confidence intervals (CIs) from cohort or case-controlled studies were extracted. Other a priori determined parameters included the type of intensive care unit (ICU) setting (surgical/mixed or medical), criteria used to diagnose AKI and severe AKI, cohort size, and the presence of sepsis. Any disagreements were resolved by discussion with the investigators (Heng-Chih Pan and Vin-Cent Wu).

Quality assessment

The Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) tool was used to assess the quality of each included study [20, 21]. The following 4 domains were assessed: patient selection, index test, reference standard, and flow and timing. Any disagreements in the quality assessment were resolved by discussion and consensus [15].

Pre-specified subgroup analysis

We hypothesized that the following factors could have high impacts on patient outcomes observed among different studies: clinical setting (ICU/non-ICU), patient population (surgical versus mixed/medical), whether the studies only included patients with sepsis or not and different AKI criteria (risk, injury, failure, loss, ESRD (RIFLE); Acute Kidney Injury Network (AKIN); Kidney Disease: Improving Global Outcomes (KDIGO)).

Data synthesis and statistical analysis

A 2 by 2 table reporting the patient number of true positive, false positive, true negative, and false negative findings for the cutoff point given by the included studies was used to generate sensitivity, specificity, and diagnostic odds ratio (DOR) for each study. The sensitivity, specificity, and DOR for all of the included studies were combined using a bivariate model. DOR was defined as the endpoint of primary interest in this study because it combines the strengths of sensitivity and specificity with the advantage of accuracy as a single indicator [22]. The sensitivity and specificity were defined as the endpoints of secondary interest in the study. The diagnostic performance for AKI among the 12 different biomarkers was compared using a bivariate model in which the type of biomarker was treated as a categorical covariate. Hierarchical summary receiver operating characteristic curves (HSROCs), which consider the threshold effect [23], were used to illustrate the overall diagnostic performance for each biomarker. The analysis was further stratified by the following pre-specified subgroups: surgical versus mixed/medical patients, ICU/non-ICU patients, sepsis/non-sepsis patients, and different AKI criteria (RIFLE/AKIN/KDIGO). In the subgroup analysis, biomarkers only reported in 1 study could not be compared and were therefore excluded. Potential publication bias was assessed visually using funnel plots. A two-sided P value < 0.05 was considered statistically significant. The bivariate model was conducted using SAS version 9.4 (SAS Institute, Cary, NC) with the “METADAS” macro (version 1.3) which is recommended by the Cochrane Diagnostic Test Accuracy Working Group. The HSROC analysis and funnel plots were performed using R software version 3.6.3 with the “meta4diag” package (version 2.0.8) based on Bayesian inference.

Results

Search results and study characteristics

The study selection process is summarized in Additional file 1: Appendix. A total of 23,882 articles were identified through the electronic search, and after excluding duplicate and non-relevant articles, the titles and abstracts of the remaining 1803 articles were screened. A total of 242 studies were eligible for full-text review, of which 110 studies including 38,725 patients reported data on the occurrence of AKI with any one of the biomarkers of interest and were included in the meta-analysis [24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133]. The details of the included studies and population characteristics as well as definitions used for the diagnosis of AKI are shown in Tables 1 and 2.

Table 1 Characteristics of included comparative studies
Table 2 Summary of included comparative studies for outcome evaluation

All 110 studies provided quantifiable results for AKI. Seventy-nine studies exclusively enrolled ICU patients, and 31 studies enrolled non-ICU patients. Fifty-seven studies exclusively enrolled surgery patients, and 55 studies enrolled patients from mixed surgical/medical settings. Only 8 studies enrolled patients with sepsis, and therefore, analysis of sepsis was not conducted. Of the enrolled studies, 44 used the KDIGO classification as the only definition for AKI, 23 used AKIN, 21 used RIFLE, 6 used two or more definitions, 6 used a 50% increase in SCr, 1 used an increase in SCr from normal to > 3 mg/dL, 3 used a 0.5 mg/dL increase in SCr within 48–72 h, and 6 were at the discretion of the attending physicians.

Quality of the enrolled trials

The studies were published over 18 years and varied in sample size from 22 to 1635 patients (Tables 1, 2). The QUADAS-2 tool revealed that the quality of the enrolled studies varied. There was a low and/or unclear risk in each study in most domains of bias evaluation (Additional file 1: Figs. S1, S2). The risk of bias was low for patient selection in 84 studies (76.4%); index test in 26 studies (23.6%); reference standard in 30 studies (27.3%); and flow and timing in 96 studies (87.3%). The applicability concerns were low for patient selection in 89 studies (80.9%); index test in 106 studies (96.4%); and reference standard in 95 studies (86.4%). Therefore, according to the criteria of overall quality, 70 studies (63.6%) were rated as low risk, 15 studies (13.6%) as unclear risk, and 25 studies (22.7%) as high risk.

Primary outcomes

The occurrence of AKI was based on all of the included studies with a total of 38,725 patients, of whom 8,340 had incident AKI. Among the 11 candidate biomarkers, the diagnostic accuracy (defined as the DOR value) was numerically highest for NGAL/creatinine (NGAL/Cr) (DOR 16.2, 95% CI 10.1–25.9), which was reported in 9 studies. The results demonstrated that urinary NGAL had high diagnostic accuracy (DOR 13.8, 95% CI 10.2–18.8), which was significantly better than IL-18 (relative DOR 0.60, 95% CI 0.44–0.82), and TIMP-2 × IGFBP-7: 0.3 (relative DOR 0.42, 95% CI 0.22–0.81) for the occurrence of AKI (Table 3). The HSROCs depicting the overall discriminative accuracy of the biomarkers to diagnose AKI are shown in Fig. 1A. Of the biomarkers, urinary NGAL (HSROC 85.2%, 95% CI 80.4–89.4%), urinary NGAL/Cr (HSROC 91.4%, 95% CI 79.4–96.5%), serum NGAL (HSROC 84.7%, 95% CI 80.7–87.9%), IL-18 (HSROC 82.1%, 95% CI 70.2–88.9%), KIM-1 (HSROC 84.4%, 95% CI 72.7–95.5%), and L-FABP/Cr (HSROC 85.8%, 95% CI 74.9–93.8%) had HSROC values greater than 80%. Additional file 1: Figs. S3, S4 and Fig. 1B illustrate the pairwise comparisons of the biomarkers for pooled sensitivity, specificity, and DOR in the whole population.

Table 3 Summary of the diagnostic meta-analysis in the whole population
Fig. 1
figure 1

The discriminative accuracy of the biomarkers to diagnose AKI (A) HSROCs for the AKI biomarkers. The global HSROCs depicting the discriminative accuracy of the biomarkers to diagnose AKI. The red point represents the observation and the circle represents the sample size. The asterisk “*” represents the estimate of HSROC, and the blue dotted circle around it indicates the 95% confidence interval. Among the biomarkers, NGAL, NGAL/Cr, L-FABP/Cr, TIMP-2 × IGFBP-7: custom, and TIMP-2 × IGFBP-7: 2 had good HSROCs (> 85–90%). (B) Heatmap plot depicting pairwise comparisons (row vs. column) of relative DOR between the biomarkers in the whole population. The contents of the diagonal are the values of the relative DOR. Red depicts a positive DOR, while yellow depicts no correlation. NGAL and NGAL/Cr had the best relative DOR of the biomarkers. (C) Heatmap plot depicting pairwise comparisons (row vs. column) of relative DOR between the biomarkers in the surgical subgroup. The contents of the diagonal are the values of the relative DOR. Red depicts a positive DOR, while yellow depicts no correlation. NGAL/Cr had the best relative DOR of the biomarkers. (D) Heatmap plot depicting pairwise comparisons (row vs. column) of relative DOR between the markers in the studies that did not use UO criteria. The contents of the diagonal are the values of the relative DOR. Red depicts a positive DOR, while yellow depicts no correlation. NGAL had the best relative DOR of the biomarkers. Abbreviations: AKI, acute kidney injury; Cr, creatinine; DOR, diagnostic odds ratio; HSROC, hierarchical summary receiver operating characteristic curve; IL-18, interleukin-18; KIM-1, kidney injury molecule-1; L-FABP, liver-type fatty acid-binding protein; NGAL, neutrophil gelatinase-associated lipocalin; TIMP-2 × IGFBP-7: tissue inhibitor of metalloproteinases-2 × insulin-like growth factor-binding protein-7; and UO, urine output

Subgroup analyses

In the setting of ICU patients, the diagnostic accuracy was numerically highest for NGAL/Cr (DOR 12.6, 95% CI 7.8–20.2), followed by L-FABP/Cr and urinary NGAL. The diagnostic accuracy of urinary NGAL was significantly better than TIMP-2 × IGFBP-7: 0.3 (relative DOR 0.51, 95% CI 0.28–0.92) (upper panel in Table 4). In contrast, urinary NGAL (DOR 17.1, 95% CI 7.8–37.5), urinary NGAL/Cr (DOR 99.3, 95% CI 7.7–1285.0), and serum NGAL (DOR 15.0, 95% CI 7.1–32.0) had better diagnostic accuracy for AKI than IL-18 (DOR 9.6, 95% CI 4.2–21.9) in the non-ICU patients (lower panel in Table 4). Additional file 1: Figs. S5–S7 illustrate the pairwise comparisons of the biomarkers for pooled sensitivity, specificity, and DOR in the ICU patients.

Table 4 Summary of the diagnostic meta-analysis in the ICU and non-ICU population

On the other hand, urinary NGAL had the highest diagnostic accuracy (DOR 17.9, 95% CI 12.3–26.3), which was significantly better than IL-18 (relative DOR 0.31, 95% CI 0.21–0.47), IL-18/Cr (relative DOR 0.56, 95% CI 0.34–0.94), KIM-1 (relative DOR 0.57, 95% CI 0.40–0.82), L-FABP (relative DOR 0.46, 95% CI 0.30–0.71), and TIMP-2 × IGFBP-7: 0.3 (relative DOR 0.28, 95% CI 0.10–0.79) for the occurrence of AKI in the setting of medical/mixed patients (upper panel in Table 5). Furthermore, urinary NGAL had a low diagnostic accuracy in the setting of surgical patients. Urinary NGAL/Cr (DOR 34.3, 95% CI 9.0–130.6), KIM-1 (DOR 26.2, 95% CI 9.6–71.6), L-FABP (DOR 14.9, 95% CI 7.0–31.5), and IL-18 (DOR 11.8, 95% CI 6.1–22.9) had better diagnostic accuracy than urinary NGAL (lower panel in Table 5). Additional file 1: Figs. S8–S12 and Fig. 1C illustrate the pairwise comparisons of the biomarkers for pooled sensitivity, specificity, and DOR in the medical/mixed and surgical patients.

Table 5 Summary of the diagnostic meta-analysis in the medical/mixed and surgical population

Only twelve studies recruited patients with sepsis, and therefore analysis of sepsis was not conducted. The results of the non-sepsis patients were similar to those of the overall cohort: Urinary NGAL (DOR 16.3, 95% CI 11.8–22.4) had significantly better diagnostic accuracy for AKI than IL-18 (relative DOR 0.52, 95% CI 0.37–0.72), L-FABP (relative DOR 0.65, 95% CI 0.46–0.93), and TIMP-2 × IGFBP-7: 0.3 (relative DOR 0.36, 95% CI 0.19–0.67) (Additional file 1: Table S1). Additional file 1: Figs. S13–S15 illustrate the pairwise comparisons of the biomarkers for pooled sensitivity, specificity, and DOR in the non-sepsis patients.

Only 10 studies recruited patients without using standard AKI criteria (RIFLE/AKIN/KDIGO), and therefore, the analysis was not conducted. In the 100 studies which adopted standard AKI criteria, NGAL/Cr had the highest diagnostic accuracy (DOR 15.4, 95% CI 9.6–24.4), followed by KIM-1 (DOR 12.8, 95% CI 8.7–18.7), and urinary NGAL (DOR 12.5, 95% CI 9.2–16.9). Urinary NGAL had significantly better diagnostic accuracy for AKI than IL-18 (relative DOR 0.62, 95% CI 0.45–0.85) and TIMP-2 × IGFBP-7: 0.3 (relative DOR 0.46, 95% CI 0.24–0.86) (Table 6). Additional file 1: Figs. S16–S18 illustrate the pairwise comparisons of the biomarkers for pooled sensitivity, specificity, and DOR in the studies using standard AKI criteria.

Table 6 Summary of the diagnostic meta-analysis for the studies using standard AKI criteria (any of RIFLE, AKIN, and KDIGO)

Only 30 studies diagnosed AKI using urine output criteria, and the diagnostic accuracy was numerically highest for KIM-1 (DOR 14.6, 95% CI 5.9–35.9), followed by IL-18 (DOR 13.1, 95% CI 6.7–25.7), and TIMP-2 × IGFBP-7: 2 (DOR 12.0, 95% CI 5.2–27.8). Among the other 80 studies that diagnosed AKI without using urine output criteria, NGAL had the highest diagnostic accuracy (DOR 18.6, 95% CI 12.8–27.0), followed by urinary NGAL/Cr (DOR 17.6, 95% CI 10.7–29.1). Urinary NGAL had significantly better diagnostic accuracy for AKI than IL-18 (relative DOR 0.38, 95% CI 0.26–0.56), IL-18/Cr (relative DOR 0.60, 95% CI 0.37–0.98), KIM-1 (relative DOR 0.61, 95% CI 0.42–0.88), and L-FABP (relative DOR 0.61, 95% CI 0.41–0.88) (Table 7). Additional file 1: Figs. S19–S20 and Fig. 1D illustrate the pairwise comparisons of the biomarkers for pooled sensitivity, specificity, and DOR in the studies that did not use urine output criteria.

Table 7 Summary of the diagnostic meta-analysis according to AKI criteria with or without UO

Sensitivity analyses

To determine the robustness of the study results, we examined the extent to which the results were influenced by the quality of the enrolled study, the economic situation of the countries in which they were conducted, and the definition of the study outcome.

We first stratified the studies according to their quality. Seventy studies were of high quality and 40 studies were of low or middle quality. Among the high-quality studies, the diagnostic accuracy was numerically highest for urinary NGAL (DOR 12.95, 95% CI 8.88–18.87), followed by urinary NGAL/Cr (DOR 12.34, 95% CI 5.85–26.02), and serum NGAL (DOR 12.32, 95% CI 8.41–18.06). Urinary NGAL had significantly better diagnostic accuracy for AKI than IL-18 (relative DOR 0.56, 95% CI 0.39–0.78), L-FABP (relative DOR 0.66, 95% CI 0.45–0.97), and TIMP-2 × IGFBP-7: 0.3 (relative DOR 0.43, 95% CI 0.22–0.87). Among the low- or middle-quality studies, KIM-1/Cr had the highest diagnostic accuracy (DOR 35.33, 95% CI 9.87–126.47), followed by KIM-1 (DOR 34.60, 95% CI 17.16–69.77), and IL-18 (DOR 30.43, 95% CI 12.80–72.33). Both KIM-1 (relative DOR 3.00, 95% CI 1.53–5.87) and IL-18 (relative DOR 2.64, 95% CI 1.11–6.28) had significantly better diagnostic accuracy for AKI than NGAL, while IL-18/Cr had significantly worse diagnostic accuracy for AKI than NGAL (relative DOR 0.42, 95% CI 0.22–0.81) (Additional file 1: Table S2).

Seventy-eight studies were conducted in high-income countries, and the diagnostic accuracy was numerically highest for urinary NGAL/Cr (DOR 15.23, 95% CI 9.56–24.26), and urinary NGAL (DOR 14.13, 95% CI 10.03–19.89). Urinary NGAL had significantly better diagnostic accuracy for AKI than IL-18 (relative DOR 0.46, 95% CI 0.33–0.64), L-FABP (relative DOR 0.54, 95% CI 0.36–0.79), and TIMP-2 × IGFBP-7: 0.3 (relative DOR 0.40, 95% CI 0.21–0.74). Among the other 32 studies conducted in middle- or low-income countries, L-FABP had the highest diagnostic accuracy (DOR 45.15, 95% CI 14.56–140.05), which was significantly better than urinary NGAL (relative DOR 2.89, 95% CI 1.12–7.42) (Additional file 1: Table S3).

Thirty-seven studies focused on early onset AKI (AKI developed within 48 h), and the diagnostic accuracy was numerically highest for L-FABP (DOR 33.1, 95% CI 11.5–95.1), serum NGAL (DOR 21.4, 95% CI 10.5–43.7), L-FABP/Cr (DOR 21.4, 95% CI 2.9–158.8), and urinary NGAL (DOR 15.4, 95% CI 7.2–32.9) (Additional file 1: Table S4).

Twenty-four studies focused on severe AKI (AKI stage 2 or 3), and the diagnostic accuracy was numerically highest for TIMP-2 × IGFBP-7: custom (DOR 19.6, 95% CI 7.0–55.3), and serum NGAL (DOR 11.5, 95% CI 6.1–21.9) (Additional file 1: Table S5). Ten studies focused on renal replacement therapy, and both urinary NGAL (DOR 15.2, 95% CI 5.3–43.5) and serum NGAL (DOR 12.1, 95% CI 4.7–31.1) had good diagnostic accuracy (Additional file 1: Table S6).

The findings were not materially different from the standard analysis and remained robust in the sensitivity analyses.

Publication bias

Publication bias was assessed visually using funnel plots. There were apparent asymmetrical patterns in the funnel plots for all the biomarkers except TIMP-2 × IGFBP-7: custom, TIMP-2 × IGFBP-7: 0.3, and TIMP-2 × IGFBP-7: 2.0. These results suggested that publication bias was obvious in this meta-analysis (Additional file 1: Appendix).

Assessment of quality of evidence and summary of findings

The quality of evidence was assessed using the GRADE system. We evaluated the primary outcomes and presented them as summary of findings in Additional file 1: Appendix.

Discussion

The current study is the most comprehensive systematic review to date including the highest number of studies of candidate AKI biomarkers. In this systematic review of 110 studies including 38,725 patients, the overall AKI rate was 21.5% (8340/38725). Serum NGAL and urinary NGAL were the most commonly used biomarkers for AKI (Table 3). In the whole population, both serum and urine NGAL had the best diagnostic accuracy regardless of whether or not they were adjusted by urinary creatinine (Table 3). For the critical patients, all of the biomarkers had similar predictive performance for AKI (upper panel in Table 4). However, for the non-critical patients, NGAL, NGAL/Cr, and serum NGAL had better diagnostic accuracy for AKI than IL-18 (lower panel in Table 4). In the medical patients, NGAL had the best diagnostic accuracy (upper panel in Table 5), while in the surgical patients, NGAL/Cr and KIM-1 had the best diagnostic accuracy (lower panel in Table 5). Our data showed that NGAL/Cr had the best predictive performance when using a HSROC meta-analysis approach.

There is an unmet need for the early detection of AKI due to an increase in the incidence of AKI in hospitalized patients [134, 135]. In clinical practice, it is difficult to recognize AKI before the level of creatinine changes, at which time the damage may be irreversible [4]. Therefore, researchers are increasingly interested in identifying biomarkers that can identify AKI at an early stage. The 23rd ADQI consensus meeting proposed combining clinical assessments, traditional tests, and validated novel biomarkers to identify patients at risk of AKI [136]. In susceptible patients exposed to high-risk events, biomarkers can predict the development or progression of AKI and may guide targeted therapy [137]. In the literature, many biomarkers have performed better than SCr when histologic evidence of kidney injury was used as the reference standard [138]. Although various biomarkers have been associated with AKI and adverse outcomes, the clinical application of any single biomarker has failed to demonstrate troponin-like diagnostic performance in myocardial infarction. The Translational Research Investigating Biomarker Endpoints in AKI (TRIBE-AKI) study [37, 111, 139] showed the heterogeneity of AKI subtype is a major limitation for large-scale population studies. In the present study, we demonstrated that several biomarkers had good predictive performance for AKI. In addition, the damage biomarkers had better predictive ability for AKI than the stress biomarker in various clinical settings. It is likely that the ability to identify different etiologies, mechanisms, and types of AKI will be critical in developing targeted therapies and designing pharmacological trials to enable more precise medicine or therapeutic interventions.

The complexity of the pathogenesis of AKI due to factors such as hemodynamics, inflammatory status, genetic background, the use of nephrotoxic compounds, and interventions means that the clinical course of AKI differs in different clinical situations [140]. In critically ill or surgical patients, the potential benefits of reducing kidney injury-related complications may outweigh the loss caused by over-monitoring the patient, such as related length of stay. Appropriate biomarkers should improve the detection rate of AKI with high sensitivity and good negative predictive value, thus enabling timely initiation of preventive strategies for AKI [141]. Previous investigations have reported that TIMP-2 × IGFBP-7 was a good biomarker to identify patients who will develop AKI and reduce the need for renal replacement therapy [136, 137, 142]. As demonstrated in the present study, NGAL/Cr, L-FABP/Cr, and TIMP-2 × IGFBP-7: custom seemed to have good predictive performance in the setting of critically ill patients, while NGAL/Cr and KIM-1 were the best biomarkers in surgical patients (Tables 4, 5).

In non-critically ill or medical patients, patient stratification for the risk of AKI should be applied to the entire hospital population before any scheduled elective intervention. In order to minimize unnecessary impacts due to these scheduled treatments, the specificity should outweigh the sensitivity [141]. In our study, the clinical performance of TIMP-2 × IGFBP-7 with a cutoff value of 2 was significantly better than that of TIMP-2 × IGFBP-7 with a cutoff value of 0.3 in the medical patients. Urinary NGAL, KIM-1, and serum NGAL seemed to be the best biomarkers in the setting of non-critically ill patients and medical patients (Tables 4, 5).

However, the sensitivity and specificity in the enrolled studies were heterogeneous because they depended on the circumstances and the threshold effects of the biomarkers. Considering the potential threshold effects and the correlation between sensitivity and specificity, HSROC analysis proved the good predictive performance of L-FABP/Cr and the NGAL series (Fig. 1A). There were differences in the applied diagnostic criteria for AKI between the enrolled studies. The subgroup analysis also demonstrated that the relative diagnostic accuracy of the AKI biomarkers remained consistent in the studies using current standard AKI criteria (RIFLE/AKIN/KDIGO) (Table 6). NGAL series seemed to have the best predictive performance for AKI, especially in the high-quality studies and in the studies which were conducted in high-income countries. Other biomarkers outperformed the NGAL series only in low- or moderate-quality studies or in the studies conducted in middle- or low-income countries (Additional file 1: Tables S2-S3). Sensitivity analysis also demonstrated the good predictive performance of serum NGAL, urinary NGAL, and TIMP-2 × IGFBP-7: custom for early onset AKI (AKI developed within 48 h) and severe AKI (stage 2–3 or renal replacement therapy) (Additional file 1: Tables S4-S6). These findings enhance the robustness of the study results.

Although the damage and stress biomarkers in this study had good predictive performance, unlike troponin in acute coronary syndrome, none of the reported biomarkers are completely specific for AKI. Previous studies have reported that NGAL, IL-18, and KIM-1 may be elevated in the setting of sepsis and CKD [143,144,145,146]. Of note, these biomarkers can be used to recruit more homogenous patient populations when implementing a clinical trial [147]. Biomarkers to identify and characterize AKI sub-types are necessary and may have the potential to provide individualized timely etiology-based management of AKI. In addition, considering the complex and multifactorial etiology of AKI, a panel of multiple biomarkers including stress, injury, and kidney reserve biomarkers could provide better discrimination for AKI. Furthermore, more kidney tissue-specific markers may help localize and quantify the severity of AKI and provide a deeper understanding of the pathophysiology of AKI. These biomarkers may offer opportunities for personalized management of AKI and support the call for a refinement of the existing AKI criteria.

Strengths and limitations

The strength of our analysis is the extensive literature search of related studies. We used standard Cochrane protocols and included the largest cumulative study sample size to date in comparison with previous reports. The strength of our meta-analysis also lies in the comprehensive data search with subgroup analyses across several clinical scenarios. We used the GRADE approach to rate the certainty of evidence [148].

Besides limitations in the meta-analysis, there were several limitations in the individual studies. First, most studies had a small sample size, and this contributed to the high heterogeneity of the meta-analysis. Second, our funnel meta-regression and Cochrane Collaboration tool analysis showed significant publication bias (Additional file 1: Appendix). Third, in some scenarios, the limited number of enrolled studies, such as trials focusing on sepsis, made subgroup analysis difficult. Of note, these new biomarkers are most effective in conditions where the time of renal insult is known, for instance, post-cardiac surgery or coronary angiography, compared to situations where the onset of kidney injury is less clear, for instance, in sepsis. To ensure the robustness of the findings, we did not emphasize the diagnostic accuracy of biomarkers extracted from fewer than three articles. Fourth, we did not perform additional analyses to assess the additional predictive value of SCr levels. Most of the included studies did not measure SCr levels with biomarkers to predict AKI. In the literature, SCr has poor predictive performance for AKI due to delayed rise and cannot accurately estimate the timing of injury [118, 127]. Traditionally, the diagnosis of AKI is based on a rise in serum creatinine and the creatinine could be hard to wear two hats, having an administrative role as well as patrolling the beat. Furthermore, the use of SCr as a comparison has several limitations and limits the full interpretation of biomarker performance. For example, SCr may be elevated in pre-renal azotemia, which is not true for renal tissue damage, and biomarkers may not be elevated. On the other hand, in the setting of true renal injury with fluid overload, biomarkers may be elevated but SCr may remain unchanged, which may underestimate the predictive performance of biomarkers [149, 150]. Fifth, the kits for specific biomarker analysis varies among the studies, so it was difficult to determine the optimal cutoff value of biomarkers to predict AKI. Sixth, the occurrence of AKI was diagnosed according to several different criteria in the enrolled studies. However, the KDIGO classification was the mostly commonly used, which has been proposed to provide a uniform definition of AKI, essentially combining the RIFLE and AKIN criteria. Finally, the definition of AKI varied between the studies, and this may have unduly influenced pooled effect estimates. Nonetheless, our conclusions were drawn from studies with different study designs and different clinical scenarios. Further research efforts are certainly needed for the pursuit of better precision medicine, especially with regard to the use of multiple biomarkers. It could be more fruitful to investigate whether different etiologies of AKI (pre-renal versus renal versus obstructive, cardiogenic shock, hypovolemic shock, sepsis-related, etc.) affect the predictive accuracy of biomarkers, and to evaluate whether the efficacy of biomarkers is affected by the severity of AKI. These issues can be incorporated into the design of future randomized controlled trials to evaluate the optimal biomarkers for different clinical settings in order to improve the timely diagnosis of AKI. Moreover, further investigations to improve the diagnosis and manage the underlying mechanisms of AKI may help to mitigate the current high mortality rate of patients with AKI.

Conclusion

Based on our pairwise meta-analysis of biomarkers to predict AKI, NGAL series had the best diagnostic accuracy for the prediction of AKI, regardless of whether or not they were adjusted by urinary creatinine, especially in medical patients. However, the predictive performance of urinary NGAL was limited in surgical patients, and NGAL/Cr seemed to be the best biomarkers in these patients. All of the biomarkers had similar predictive performance in critically ill patients. Future pragmatic clinical trials are warranted to evaluate the real-world predictive accuracy of AKI biomarkers.