Introduction

Acute kidney injury (AKI) is a common problem encountered in the intensive care unit (ICU), estimated to occur in up to 60% of all critically ill patients, depending on the definition [1]. When AKI progresses in severity, treatment with renal replacement therapy (RRT) may be initiated. RRT is currently applied in 23.5% of ICU patients with AKI (i.e., 13.5% of all ICU admitted to the ICU), with utilization growing by over 10% per year over the past decade [2, 3]. Recent large randomized clinical trials (RCTs) in critical care nephrology have focused on the optimal timing of initiation of acute RRT [4,5,6]. However, less is known about the ideal circumstances in which RRT may be successfully discontinued. RRT is a complex and expensive therapy, with complications including catheter-associated infections, hemorrhage, hemodynamic instability, and delayed renal recovery [7,8,9,10,11,12,13,14,15]. It is thus imperative to recognize when a patient may be safely liberated from this treatment. The Kidney Disease Improving Global Outcomes (KDIGO) organization has stated in their 2012 Clinical Practice Guidelines for Acute Kidney Injury that RRT should be discontinued “when it is no longer required, either because intrinsic kidney function has recovered to the point that it is adequate to meet patient needs, or because RRT is no longer consistent with the goals of care.” [16]. However, this recommendation was based on expert opinion and lacks specific guidance for how clinicians should assess patients for suitability to discontinue RRT.

Numerous parameters have been evaluated to help identify patients for whom RRT may be safely discontinued, including traditional biochemical markers of kidney function (creatinine, urea, and estimates of glomerular filtration rate (GFR) [17,18,19,20,21,22,23,24,25,26]), clinical findings such as urine output [21,22,23, 26, 27], and newer kidney biomarkers including neutrophil gelatinase-associated lipocalin (NGAL) [19, 28, 29] and serum cystatin C [28,29,30,31]. Despite many criteria being evaluated in the existing literature, the available evidence has yet to be rigorously synthesized. Our objective was to conduct a systematic review and meta-analysis to identify predictors of successful discontinuation of acute RRT among critically ill patients with AKI.

Methods

We performed a systematic review using methodological approaches outlined in the Cochrane Handbook for Systematic Reviews of Interventions [32] and described according to the Preferred Reporting Items for Systematic Meta-Analyses (PRISMA) guideline [33]. A PRISMA checklist is available as Additional file 1. Research ethics approval was not required. This systematic review was registered at PROSPERO (2018-03-06, CRD42018074615), and the protocol was published separately [34].

Search strategy

The search strategy was developed in consultation with a research librarian and independently peer-reviewed by a second librarian. We searched electronic databases: Ovid MEDLINE (1946-), Ovid Embase (1988-), and Wiley Cochrane Library (inception-) on October 10, 2017, with an updated search on April 8, 2019. Our search strategy combined concepts related to renal replacement therapy (i.e., intermittent hemodialysis (IHD), slow low-efficiency dialysis (SLED), continuous renal replacement therapy (CRRT)), intensive care (i.e., involving any intensive care unit (ICU) setting), and discontinuation of therapy (i.e., either clinical, physiological, and biochemical parameters of weaning acute RRT) or treatment outcome (Additional file 2). Search results were limited to publications after 1990, when continuous venovenous RRT was initiated. No language limits were applied.

Additional search sources included the trial registry platforms (i.e., ClinicalTrials.gov) and Google Scholar. We also searched meeting abstracts where available using Conference Proceedings Citation Index (Clarivate Analytics) and by hand-searching published proceedings from the following associations and meetings: American Society of Nephrology, Canadian Society of Nephrology, “CRRTonline” (San Diego), European Renal Association – European Dialysis and Transplant Association, European Society of Intensive Care Medicine, International Symposium on Intensive Care and Emergency Medicine (Brussels), National Kidney ?>Foundation, and Society of Critical Care Medicine. Search results were exported and screened in EndNote X7 (Thomson Reuters, Philadelphia, Pennsylvania). See Additional file 2 for the complete search strategy.

Study selection

Eligible articles were identified through a two-phase process. In the first phase, two authors (AA, RJK) independently reviewed the titles and abstracts of all retrieved articles and documents. Disagreements were resolved through discussion or adjudication by a third author (OGR). In the second phase, full texts of the selected articles were reviewed by the same two authors independently and reviewed for eligibility using standard, predefined criteria. Disagreements were resolved through a discussion with a third author (OGR).

Eligibility criteria

Studies were included if they mentioned all of the following themes: (1) intensive care (i.e., intended to refer to patients supported in an ICU setting capable of providing invasive mechanical ventilation or vasoactive therapy), (2) renal replacement therapy (i.e., IHD, SLED, CRRT), and (3) described parameters associated with weaning or discontinuation (i.e., clinical, physiological, and biochemical parameters). Additionally, we only included adult patients (i.e., age greater or equal to 18 years old) for this review. Studies which did not mention all of these themes were excluded.

Risk of bias assessment

Study methodological quality was independently rated by two authors (AA, RK) using the Newcastle-Ottawa Scale (NOS) for observational studies [35]. Observational studies were rated as high quality if they had a total score of 6–9, moderate quality with a score of 4 or 5, and poor quality if the score was 3 or lower. In order to account for potential bias due to population selection in observational trials, the NOS score for comparability was based on whether included studies accounted for patients with factors that would influence the predictability of RRT discontinuation, such as pre-existing chronic kidney disease (CKD) or prior RRT use in included cohorts. Disagreements were resolved through a discussion with a third author (OGR). The overall quality of evidence and certainty of outcome measures reported was further assessed according to the GRADE framework. A summary of findings’ table was prepared using the GRADEpro Guideline Development Tool (https://gradepro.org) for the pooled analysis. Full results of the risk of bias assessment are available in Additional file 3.

Data analysis

Two-by-two tables of true/false positives/negatives were constructed from exposures and outcomes of weaning parameters reported from studies where available. When these quantities were not explicitly described, they were computed from other available data (e.g., sensitivity/specificity) where possible. When four or more studies reported on the same parameter in a sufficiently homogenous manner, the sensitivity and specificity were simultaneously pooled in a statistical meta-analysis using the bi-variate random effects method [36]. This model assumes that the correlated logit transformed values of the sensitivity and specificity are correlated and follow a bi-variate normal distribution, from which we can estimate not only simultaneous parameter estimates, but also a hierarchical summary receiver operating characteristic (AUROC) curve for the presented data. Review Manager (Version 5.3.5, Copenhagen, Denmark) was used to create the forest plots, while Stata (Version 14.2, College Station, Texas) was used to compute the bi-variate estimates.

Results

Search results

Our initial search yielded 3031 citations and our updated search yielded an additional 924 citations. Twenty-three articles fulfilled all inclusion criteria (Fig. 1). This consisted of 18 full-text articles and five conference abstracts, representing two case-control studies, 15 retrospective cohort studies, and six prospective cohort studies (Table 1). There were no randomized controlled trials. All studies were published in English.

Fig. 1
figure 1

PRISMA flow diagram of retrieved and included records. Of the 23 included trials, 5 were abstracts and 18 were full text

Table 1 Studies retrieved and parameters identified in a systematic review

Study quality

Study quality was generally rated as high for included observational studies, with a mean NOS score of seven (range 4–9) and no studies being rated as poor quality (see Additional file 3). Twenty studies were rated as high quality and two studies were rated as moderate quality. One conference abstract could not be rated due to insufficient information. Regarding the use of urine output as a predictor of successful RRT discontinuation, the overall certainty of the evidence was rated as very low given methodological limitations inherent to the included retrospective observational studies resulting in the risk of bias, and imprecision in the reported values (Fig. 2). There was consensus among authors as to the quality of the included studies.

Fig. 2
figure 2

Summary of findings’ table for urine output prior to discontinuation of RRT

Markers of RRT weaning

A total of 16 variables predictive of RRT weaning and five multivariate models were assessed in 46 instances (Table 1). Weaning variables were grouped into four categories: conventional biochemical criteria (Table 2; n = 14; 29.8%), kidney biomarkers (Table 3; n = 13; 27.7%), physiologic criteria (Table 456; n = 13; 27.7%), and multivariate models (Table 7; n = 5; 14.9%). Urine output was the most commonly described variable. A total of nine studies described urine output criteria prior to RRT discontinuation and four studies described urine output after RRT discontinuation.

Table 2 Conventional biochemical criteria used to predict successful discontinuation of RRT
Table 3 Kidney biomarkers used to predict successful discontinuation of RRT
Table 4 Urine output after discontinuation of RRT to predict successful weaning
Table 5 Urine output prior to discontinuation of RRT to predict successful weaning
Table 6 Effect of diuretic use on urine output test characteristics to predict successful discontinuation of RRT
Table 7 Multivariate models used to predict successful discontinuation of RRT

Studies retrieved and parameters identified in a systematic review

There was a significant heterogeneity across studies in the definitions of “successful” RRT discontinuation and in the thresholds used to define weaning criteria. Most studies defined successful discontinuation as a specified period during which the patient did not receive further RRT; however, the periods specified varied and ranged from 3 days [37] to 60 days [30], with 7 days being the most frequently used (seven studies). Where multiple studies reported on the same variable, the threshold values with optimal predictive accuracy varied, as did the timing of measurement in relation to RRT discontinuation. For cystatin C, two studies measured values prior to RRT discontinuation [29, 31], and three measured values following discontinuation [28,29,30] (Table 3). The optimal threshold values ranged from 1.85 to 2.98 mg/L, respectively. The predictive accuracy of serum creatinine was assessed in two studies at RRT initiation [17, 18], and in four studies at the time of discontinuation [19, 20], with threshold levels ranging from 224 to 300 μmol/L (Table 2). Due to variation in the timing of measurement and threshold values, the pooled analysis was not feasible.

Urine output prior to discontinuation of RRT to predict successful weaning

Urine output was reported in numerous studies, both before and after discontinuation. We felt that the timing of measurement relative to discontinuation (before or after) was the most clinically important characteristic affecting the heterogeneity of reporting; accordingly, we pooled data separately based on the timing of measurement. Urine output prior to RRT discontinuation was sufficiently homogenous across studies to perform pooled analysis (Fig. 2, Table 4). We found a pooled sensitivity of 66.2% (95% CI, 53.6 to 76.9), specificity of 73.6% (95% CI, 67.5 to 79.0), LR + 2.91, and LR − 0.43 for urine output (Fig. 2). Estimation of an optimal threshold to discriminate “successful” RRT discrimination was not feasible due to variation across studies, with thresholds ranging from 191 mL/24 h [37] to 1720 mL/24 h [17]. Overall certainty of evidence regarding this parameter was graded as very low (Fig. 2).

Urine output after discontinuation of RRT to predict successful weaning

Urine output following RRT discontinuation was assessed in four studies and had a moderate AUC to predict successful RRT discontinuation (Table 5). Due to the low number of studies and heterogeneity, the pooled analysis was not feasible.

Effect of diuretic use on urine output test characteristics to predict successful discontinuation of RRT

Diuretic use was associated with successful RRT discontinuation in three studies [21, 37, 42], although test characteristics were not reported. The effect of a diuretic challenge on the predictive accuracy of urine output was assessed in four studies [17, 21, 27, 37] (Table 6). In two small studies [17, 27], there was a suggestion of improved predictive accuracy in patients who had received diuretics. However, a larger study [21] found that urine output was a less reliable predictor of successful discontinuation when diuretics were given and another study did not compare the accuracy of urine output with or without diuretics [37]. Diuretics used include furosemide as an intermittent dose [21, 27, 37, 42] or infusion [27, 37], spironolactone [37], thiazides [37], or other diuretics [21, 42]. No comparative analysis has been performed on the accuracy of urine output to predict RRT discontinuation among diuretic responders versus non-responders. Because the effect that concomitant diuretic exposure may have on the predictive capacity of urine output for RRT discontinuation remains uncertain, our pooled analysis does not adjust for diuretic-induced urine output (Fig. 3).

Fig. 3
figure 3

Pooled analysis for studies using urine output prior to discontinuation of RRT to predict successful weaning

Multivariate models used to predict successful discontinuation of RRT

Several multivariate models were studied and those which described the operative characteristics of variables, such as sensitivity, specificity, and AUROC are included in Table 7. The timing of these models varied, with one model integrating NT-proBNP, APACHE II score, urine output, and serum creatinine at RRT initiation (estimated AUROC of 0.70 [18]), and another model integrating age, sex, urine output, and serum creatinine during the first 24 h of admission (estimated AUROC of 0.73 [38]). The remaining three models included variables either at RRT discontinuation [17, 45] or at approximately 8 h after RRT discontinuation [42]. These models had good to excellent discrimination. In the study by Yoshida et al. [17], the model included urine output at day 0 (the day of CRRT discontinuation) and kinetic eGFR at day 1 (the day after CRRT discontinuation) and the AUROC was 0.93 for prediction of successful RRT discontinuation. None of the studies that included multivariable models have been externally validated.

Secondary outcomes

Analysis of the relationships between reported weaning parameters and several secondary outcomes (new CKD, RRT duration, RRT use post-ICU discharge) was planned, but not carried out, as the available data made such analysis unfeasible (Additional file 4). New CKD and RRT use post-discharge were infrequently reported in included studies, and when reported were variably defined [27, 38, 39]. While RRT duration was specified in most studies, it was only analyzed in relation to weaning parameters in four instances [21, 22, 42, 45]. In three of these studies, RRT duration was included as a component of multivariate models used to predict successful discontinuation, as opposed to being the predicted endpoint. Given the infrequent reporting, heterogeneous definitions, and confounding effect of being both predictor and endpoint, we felt that meaningful inferences regarding these parameters could not be derived from the included studies and chose to omit the planned secondary analysis.

Discussion

Key findings

We found 16 unique parameters that have been evaluated for their ability to predict successful discontinuation of RRT and classified our findings into four categories: physiologic findings (i.e., urine output), conventional biochemical markers of kidney function (creatinine, urea, and estimates of GFR), kidney biomarkers (cystatin C and NGAL), and multivariate models integrating a variety of clinical and biochemical data.

Of all the variables described, urine output was the most commonly studied, with pooled sensitivities and specificities suggesting a modest (66.2% [95% CI, 53.6 to 76.9] and 73.6% [95% CI, 67.5 to 79.0]) predictive ability for successful RRT discontinuation for urine output. We found that there was substantial heterogeneity across studies in optimal thresholds for urine output, ranging from 191 mL/day [37] to over 1700 mL/day [17].

The effect of a diuretic challenge was variable among included studies, with some studies describing a decrease in the predictive ability of urine volume after diuretic administration [17, 27] and others describing superior discrimination following diuretic administration [21, 37]. The association of diuretic use with successful discontinuation of RRT [21, 37, 42] suggests that augmented diuresis may be an important management strategy to mitigate the need for re-initiation of RRT due to fluid accumulation following initial RRT discontinuation. A randomized controlled trial has previously demonstrated enhanced urinary volume and sodium excretion with infusion of furosemide in patients with resolving ARF; however, no overall improvement in renal outcome was seen, possibly due to infusion of fluids equal to the volume of diuresis in this trial which would have mitigated the potentially beneficial effects of a negative fluid balance [46].

At present, there is currently insufficient data to recommend a specific approach or identify a specific urine output threshold that may reliably predict successful RRT discontinuation. This is due to the fact that urine output was evaluated at different time points (i.e., preceding and following the exact timing of RRT discontinuation), with or without the actions of diuretics, and with cut-offs that varied greatly between studies. The pooled sensitivity and specificity represent the predictive ability of urine output in general, but the inability to determine an optimal threshold value based on available data currently limits the clinical utility of this parameter.

Kidney biomarkers have shown promise for prognostication in the setting of AKI and have been assessed in several studies evaluating RRT discontinuation and kidney recovery. Cystatin C was the most commonly studied, and though it appears to have a promising discrimination, due to substantial clinical heterogeneity in the timing of measurement and threshold values used in retrieved studies [28,29,30], data could not be pooled and definitive inferences on the accuracy of cystatin C measurement to predict successful RRT discontinuation could not be provided.

Numerous multivariate models have been proposed and generally shown good predictive ability (Table 7). Common features include measures of illness acuity (i.e., SOFA score or APACHE II score), urine output, and variables related to RRT duration (i.e., total days on RRT; number of RRT cycles). The timing of measurement of various parameters comprising a model is important. Among those models whose measurements are taken at the time of ICU admission or RRT initiation [18, 38], there is inherent confounding by a competing risk of death as a reason for RRT discontinuation, and there may be lower reliability to predict RRT in the intermediate term. Ideally, such models that integrate a spectrum of clinical information would have the capacity to inform and guide clinical decision-making on when to discontinue RRT. Those models that integrate variables taken at the time of or near the time of RRT discontinuation, along with other important time-varying variables (e.g., changes in acuity and non-kidney organ dysfunction) and which provide a standardized timeframe to assess re-initiation (with standardized criteria for re-initiation) are likely ideally suited to inform practice. Among the most promising models was the one described in the study by Yoshida et al., which combined the urine output on the day of discontinuation of RRT with the kinetic eGFR on the first-day post-discontinuation (AUROC 0.93) [17]. Importantly, these models would ideally undergo external validation to further assess performance and generalizability. Finally, each of these models utilized only 2 to 4 variables, never exceeding 10 events per variable in the multivariate models, thus minimizing the risk of model overfitting.

Context with prior literature

Observational studies have shown that early re-initiation of RRT after a failed weaning attempt is associated with greater mortality, although it is unclear if this was more a surrogate of increased or evolving illness severity rather than non-recovery of kidney function and RRT independence [21]. A failed attempt to discontinue RRT may contribute to worsened or exacerbated physiologic profiles, uncertainty in drug dosing, and potentially re-exposing patients to the harmful sequelae of AKI, such as fluid accumulation, metabolic acidosis, and retention of metabolic waste.

Unlike weaning from mechanical ventilation, for which there are rigorously evaluated and published protocols [47, 48], the paucity of controlled trials to guide discontinuation of RRT has resulted in wide variations in practice [1, 49, 50], which may contribute to suboptimal care [51]. This may contribute to delays or premature discontinuation of RRT, which can have both patient-specific and health system-specific outcomes and resource implications.

Previous narrative reviews have suggested a weaning attempt of RRT in stable patients when the spontaneous urine output was > 400 mL/24 h and measured creatinine clearance was 15–20 mL/min [52], or when the urine output was > 30 mL/h and the 24-h urinary creatinine excretion was > 5.2 mmol/L [53]. Our study reaffirms the importance of urine output as a clinical marker to help guide RRT discontinuation; however, there is little evidence to determine an optimal threshold urine output value that can be reliably used by clinicians.

Implications for future research

Our study highlights important avenues of future research and reveals significant knowledge gaps in the existing literature. Several markers that have been evaluated appear to have reasonable discrimination, such as daily urinary urea excretion (AUROC of 0.96) [25], and kinetic eGFR (a method developed to reflect true GFR in situations where the serum creatinine is changing dynamically; AUROC of 0.87 in two different studies [17, 24]). However, these have generally be evaluated in small, isolated, retrospective studies and there is a need for replication in larger prospective cohorts before reliable inferences can be made. In addition, further work is needed to understand optimal urine output thresholds, the ideal measurement intervals to predict a successful RRT discontinuation, and further validation across clinical care settings and case-mix.

The predictive ability of urine output could be further augmented to aid clinical decision-making by a combination of patient-specific variables (i.e., age, CKD status) with time-varying and dynamic variables (i.e., daily SOFA score, kinetic GFR, RRT duration) [17, 45]. This would be an important advance in updating clinical practice guidelines with recommendations that are more detailed and clinically prescriptive for clinicians.

Strengths and limitations

Our review identifies and synthesizes a wide array of physiologic and biochemical markers of weaning success for acute renal replacement therapy. The strengths of our study include the peer-reviewed comprehensive search strategy and a rigorous methodology as outlined in the PRISMA guideline.

Despite these strengths, there are some important limitations in the studies retrieved that warrant consideration. First, although the overall quality of included studies was felt to be high, all data were derived from observational studies, as no randomized trials assessing liberation from RRT were found. Second, many different markers have been studied; however, most have only been evaluated in a single study, have not been replicated, and have not been externally validated. Where markers have been assessed in multiple studies, we found significant heterogeneity in the thresholds used to define positive or negative results, the timing of measurement with respect to the discontinuation of RRT, and in the operational definition used for weaning success. This limited the ability to generalize results and provide a pooled estimate of predictive ability for markers that have been studied in multiple cohorts.

Finally, we were unable to carry out a planned analysis of secondary endpoints such as new CKD, RRT duration, and RRT use post-ICU discharge. With the exception of RRT duration, these endpoints were infrequently reported and none of the included studies related these endpoints to parameters used to predict RRT discontinuation in a way that could be meaningfully aggregated. Subgroup analyses were also planned [34], stratified by age, RRT modality, and CKD status; however, we found that these subgroups were not well described and were heterogeneously defined across studies and thus did not permit further subgroup analysis.

Conclusions

Our systematic review identified 16 variables for the prediction of successful RRT discontinuation. Where multiple studies reported on the same parameter, the timing of measurement and threshold values used were heterogeneous, making pooled analysis not feasible for most. Urine output prior to discontinuation of RRT was the most-studied variable to predict RRT discontinuation (pooled sensitivity and specificity of 66.2% and 73.6%); however, an optimal threshold value was not determined due also to heterogeneity across retrieved studies. Future work should focus on refinement of a urinary output threshold value and the development and validation of a clinical prediction tool, incorporating urine output with other static and dynamic clinical variables, to better guide clinicians on when to discontinue RRT in ICU settings.