FormalPara Take-home message

This scoping review revealed substantial heterogeneity in applied criteria for each component and observation periods to evaluate major adverse kidney events (MAKE) across existing articles. This heterogeneity potentially leads to inconsistent results as well as difficulties in comparing or integrating renal outcome data among studies, underscoring the need for dedicated discussion for establishing uniform and acceptable standards to operationalize MAKE for future trials.

Introduction

Acute kidney injury (AKI) is a common complication among critically ill patients and is associated with high morbidity and mortality [1,2,3]. The diagnosis and staging of AKI have been standardized through the consensus criteria of the Kidney Disease Improving Global Outcomes (KDIGO) 2012 guidelines [4]. Such AKI diagnosis and staging criteria have helped clinicians understand and compare AKI epidemiology. Similarly, long-term outcomes in AKI patients regarding progressive renal functional decline or new onset of chronic kidney disease (CKD) have been investigated by many studies using consensus definitions of both AKI and CKD. Finally, to create a bridge from the definition of AKI to the definition of CKD, the Acute Disease Quality Initiative developed a consensus definition of acute kidney disease (AKD) [5]. Thus, several epidemiological studies have now reported the impact of AKD on outcomes [6, 7]. These studies have demonstrated that AKI, AKD, and CKD are closely connected and are part of a continuum of renal injury across a patient’s illness [8]

Linked with the above developments, the concept of major adverse kidney events (MAKE) has been proposed as analogous to that of major adverse cardiac events (MACE) [9, 10], a concept that has been successfully used in cardiovascular research [11, 12]. MAKE has been defined as a composite outcome comprising death, dialysis dependence, and persistent renal dysfunction, which are all patient-centered and related to AKI, AKD, subsequent CKD development, and mortality [13,14,15]. Importantly, MAKE components are relatively easily measured, subject to limited confounding, indicative of major morbidity, and potentially applicable in a consistent manner across all patient populations [16].

However, despite the above potential advantages, uniform and acceptable criteria for MAKE operationalization have yet to be established. Thus, different definitions of MAKE appear to have been used across studies, suggesting the need for a systematic assessment of current practice and evidence. The aim of such assessment is to quantify such heterogeneity, identify which MAKE elements are affected, and help develop a common language in relation to this outcome measure.

Accordingly, we conducted a systematic scoping review to investigate the heterogeneity in applied criteria in MAKE components and observation periods used to evaluate those components across existing studies in acute critical settings and to assess the impact of this heterogeneity on MAKE rates.

Methods

Study design

We conducted a systematic scoping review with adherence to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guideline and its extension for scoping reviews [17]. The study protocol was developed following the Joanna Briggs Institute (JBI) Manual for Evidence Synthesis [18] and registered with the Open Science Framework (https://osf.io/thqp8).

Eligibility criteria

We included all randomized controlled trials (RCTs), non-RCTs, and prospective or retrospective cohort studies that employed MAKE as endpoints in acute critical care settings to systematically evaluate current heterogeneity in MAKE definitions. We included studies reported as full text in peer-reviewed journals. Reviews, editorials, conference articles, comments, standalone abstracts, and nonhuman studies were excluded.

Search strategy

The databases of Medical Literature Analysis and Retrieval System Online (MEDLINE), Excerpta Medica Database (Embase), Web of Science, and Cochrane Central Register of Controlled Trials were searched to retrieve relevant articles from the database inception to November 3, 2023. Supplementary Appendix I summarizes the search strategies used for each database. In addition, the reference lists of all included articles and recent relevant reports or reviews were manually searched.

Study selection and data extraction

Two authors (AM and RI) independently screened the titles and abstracts of all articles retrieved using Covidence. In the case of disagreements between the two authors, a consensus was reached via discussion with a third reviewer (KD). With eligible or potentially eligible/unclear articles after this initial screening, we retrieved full texts and assessed them for eligibility. We determined study eligibility and recorded reasons for exclusion of the studies. We extracted the title, year of publication, study design, the observation periods for MAKE evaluation, and the applied criteria for each MAKE component. Applied criteria for each MAKE component were determined through a consensus between the abovementioned two authors with any disagreements referred to the third reviewer for further discussion and resolution.

Description of the subsets of articles

We utilized several subsets of articles for further description. The first subset included articles that reported MAKE rates for multiple observation periods under consistent MAKE criteria within each article. This subset was used to evaluate the impact of observation periods on MAKE rates. The second subset included articles that only included patients treated with renal replacement therapy (RRT) and that employed chronic dialysis or dialysis dependence as the criteria for the dialysis component of MAKE and was used to evaluate the impact of the variation in the MAKE dialysis component on MAKE rates. Additionally, we utilized subsets of articles stratified by i) study type, ii) patient population, iii) single or multicentre, and iv) AKI for inclusion or as an outcome to describe the variation in MAKE operationalization based on study characteristics.

Statistical analysis

Descriptive statistics were used to characterize included studies and to summarize key findings. Frequencies and percentages were calculated to report the distribution of study characteristics. Visualization of trends, including histogram, bar chart, and alluvial plot, was also employed to illustrate the ranges and distributions of publication years and applied criteria for each MAKE component. All statistical analyses were performed using R version 4.2.1 (R Core Team. 2022 R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.) with the packages 'tidyverse', 'ggpubr', 'easyalluvial', 'ggstatplot [19]', and 'tableone'.

Results

Characteristics of included articles

The database search retrieved 692 citations, and 204 full-text articles were assessed for eligibility. Among these, 82 articles were excluded for a variety of reasons and 122 articles met our eligibility criteria (Fig. 1).

Fig. 1
figure 1

Flow chart showing the selection process. MAKE major adverse kidney events

The characteristics of the included articles are summarized in Table 1. The most common study design was that of retrospective cohort studies (74 articles, 60.7%), [20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92] followed by prospective cohort studies (35 articles, 28.7%) [93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127] and RCTs (13 articles, 10.7%) [128,129,130,131,132,133,134,135,136,137,138,139,140]. Regarding each MAKE component, all three components of MAKE (death, dialysis, and persistent renal dysfunction) were included in the majority of articles (119 articles, 97.5%). However, death was not included in two studies (1.6%), [47, 86] and information on its inclusion was not reported in another study (0.8%). [76] Dialysis was not included in one MAKE study (0.8%), [81] and information on its inclusion was not reported in another study (0.8%). [76] Moreover, information on the inclusion of persistent renal dysfunction was not reported in five studies (4.1%). [28, 76, 94, 96, 112] Additionally, three articles incorporated hospitalization as a MAKE component. Two of these articles [131, 140] employed rehospitalizations for AKI and the remaining article [137] employed rehospitalization with cause unspecified. Finally, as shown in Fig. 2, the annual number of articles using MAKE as endpoints has increased annually, surpassing 20 articles/year as of 2021.

Table 1 Characteristics of included studies
Fig. 2
figure 2

Number of articles using MAKE as endpoints. This bar histogram shows the annual number of articles using MAKE as an endpoint from each year, with bars representing the number of articles and colored by study design. Data are presented from full calendar years except 2023, which includes articles through November 3rd. Cluster randomized controlled trials are included under Randomized controlled trials and crossover trials are included under Prospective cohort studies

Three articles employed rehospitalization as a MAKE component: two articles with AKI-related rehospitalization and one with rehospitalization (with cause unspecified)

Observation period for MAKE evaluation

Table 2 describes the observation periods for MAKE evaluation. Most articles (104 articles) evaluated MAKE at a single moment in time, while 18 articles [22, 27, 43, 46, 48, 92, 94, 97, 103, 108, 109, 112, 117, 129, 133, 135, 138, 140] evaluated MAKE at multiple time points. Overall, thirteen different observation periods were employed across articles, with 30 days (27.9% in single period articles, 55.6% in multiple period articles) and 90 days (22.1% in single period articles, 77.8% in multiple period articles) as the most frequently used periods for evaluation.

Table 2 Variation in observation periods for MAKE evaluation

Time-to-MAKE analysis over a given observation period was also utilized in a proportion of single (18 articles, 17.3%) [20, 24,25,26, 38, 39, 47, 56, 58, 66, 68, 83, 86, 96, 107, 113, 119, 123] and multiple (2 articles, 11.1%) [46, 103] period articles. Observation periods were not reported in five articles (4.8%) that used a single period evaluation. [35, 76, 80, 90, 104] Fig. 3 described observation periods in articles that used a single period evaluation based on the study type. RCTs were more likely to employ 90 days or longer (7 articles, 87.5%) compared to prospective studies (9 articles, 32.1%) or retrospective studies (21 articles, 30.9%). The observation periods employed in articles with single or multiple period evaluations were described based on study characteristics, as presented in supplementary Tables S1 and S2.

Fig. 3
figure 3

Variation in observation periods for MAKE evaluation according to study type. This bar plot describes the variation in observation periods in articles that used single period evaluation. MAKE major adverse kidney events, RCT randomized controlled trial

Additionally, we assessed another subset of articles from this scoping review cohort that reported MAKE rates for multiple observation periods (n = 13). [22, 27, 43, 48, 92, 94, 108, 109, 117, 129, 133, 135, 138] This analysis revealed that the median of the largest difference in reported MAKE rates resulting from different observation periods within each article was 7% [interquartile range (IQR): 1.7–16.7%] (supplementary Table S3).

Applied criteria for death component

Among 119 articles that included death as a MAKE component, 116 articles employed death from any cause or death with cause unspecified, leaving three articles [24, 38, 119] specifically defined death as “death from renal failure”, “death due to kidney disease”, or “death while receiving dialysis”, respectively.

Applied criteria for persistent renal dysfunction component

Figure 4 and supplementary Table S4 show the parameters used in applied criteria for persistent renal dysfunction, one of the MAKE components, in the included articles. Most commonly used parameters for defining persistent renal dysfunction were serum creatinine concentration (59 articles, 48%) [21, 23, 27, 31,32,33,34, 36, 37, 39, 40, 42, 43, 45, 49, 50, 52, 53, 55, 59,60,61, 63, 65, 70,71,72,73,74,75, 77, 79, 81, 84, 88, 89, 93, 95, 98, 99, 102, 105, 110, 111, 113,114,115,116,117,118, 122, 123, 126,127,128,129, 132, 136, 138] and estimated glomerular filtration rate (eGFR) (41 articles, 34%) [20, 22, 24,25,26, 29, 30, 35, 38, 44, 46, 47, 51, 56, 57, 62, 64, 67, 68, 78, 82, 83, 85, 87, 100, 101, 106,107,108,109, 119,120,121, 125, 133,134,135, 137, 139,140,141]. The criteria for persistent renal dysfunction were not described in 7 studies (6%). [28, 66, 76, 94, 96, 112, 131] In addition, as shown in supplementary Table S5, we identified 37 different definitions based on variations in the parameters used, cut-off values, and assessment periods, even when the same parameters were used in the criteria for this component. Furthermore, when defining baseline creatinine for the application of MAKE criteria, most of the included articles (95 articles, 77.9%) used pre-study creatinine values, but also estimated creatinine and/or study-derived creatinine as shown in supplementary Fig. S1. Finally, in one study [139] in cardiac surgery patients with risk factors for AKI, the MAKE90 rate using eGFR based on creatinine was 20.1%, whereas employing eGFR based on cystatin C yielded a higher rate of 40.5%.

Fig. 4
figure 4

Variation in the criteria used to describe persistent renal dysfunction in MAKE. This bar chart illustrates the criteria used to describe the persistent renal dysfunction component of MAKE. AKI acute kidney injury, eGFR estimated glomerular filtration rate, MAKE major adverse kidney events

Applied criteria for dialysis component

Figure 5 and supplementary Table S6 show the applied criteria for the dialysis component of MAKE across 122 articles with 150 different observation periods. While 43 articles (35.2%) applied criteria for this component using the performance of chronic dialysis or RRT dependence assessed at the specific study endpoint, 51 articles (41.8%) applied it as the receipt of any RRT at any time until the end of the observation period [20,21,22,23,24,25,26, 28,29,30,31,32,33,34, 37, 39,40,41,42, 45, 46, 48,49,50,51,52,53,54,55, 57,58,59,60,61,62, 64,65,66,67, 69,70,71,72,73,74,75, 77,78,79, 81,82,83,84,85,86,87,88, 92, 93, 95,96,97, 99,100,101,102,103, 105, 107,108,109,110, 115,116,117,118, 120,121,122,123, 125,126,127,128,129,130,131,132, 134,135,136, 138, 139, 141] and two articles (1.6%) employed combined criteria of both for this component. [137, 140] Moreover, the criteria for the dialysis component varied according to the duration of the observation period [20,21,22, 24,25,26,27,28, 31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80, 83,84,85,86,87, 89,90,91,92,93,94,95,96,97,98, 100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120, 122,123,124,125,126,127,128,129,130,131,132,133,134,135,136, 138, 139, 141] such that among the 48 articles using an observation period of ≤ 30 days, 32 (66.7%) applied it as any RRT at any time until the study endpoint, while seven (14.6%) applied it as chronic dialysis or RRT dependence at the specific study endpoint. In contrast, in the 54 articles with an observation period of > 30 days, 18 (33.3%) applied the dialysis component as any RRT at any time until the endpoint, and 24 (44.4%) applied it as chronic dialysis or RRT dependence at the specific study endpoint. Similarly, in the 20 articles employing Time-to-MAKE analysis, only two (10%) applied the former criteria of any RRT at any time, while 12 (60%) applied the latter of dialysis dependence at specific endpoint. Finally, 13 articles with 20 observation periods did not describe whether RRT was limited to chronic dialysis or not and 23 articles with 30 observation periods did not report the assessment window for RRT.

Fig. 5
figure 5

Variation in definitions for dialysis component. This alluvial plot illustrates the heterogeneity in the criteria and assessment window applied to evaluate the dialysis component of MAKE across study publications. Each flow represents publications applying a criterion for dialysis component in terms of RRT modality and assessment window, with flow color indicating the observation period for MAKE evaluation and flow width representing the number of publications. Publications evaluating MAKE at multiple observation periods are represented repeatedly in the plot, comprising 122 publications with 146 observation periods. MAKE definitions varied substantially across studies. Articles with observation periods of ≤ 30 days more often applied a criterion for this component as any use of RRT at any time until the study endpoint (66.7%) rather than chronic dialysis or RRT dependence at the endpoint (14.6%). In contrast, for articles with > 30 days observation periods or employing Time-to-MAKE analyses, a higher proportion used criteria of chronic dialysis or RRT dependence at the final assessment endpoint (44.4% and 60.0%, respectively) rather than any RRT until that endpoint (33.3% and 10.0%). Additionally, 10.7% of articles did not describe the RRT modality and 18.9% did not describe the assessment window. One article employed combined criteria of chronic and temporary dialysis that could not be distinctly classified within the categories shown, and was thus excluded from this visualization. ICU intensive care unit, MAKE major adverse kidney events, RRT renal replacement therapy

Finally, we assessed a subset of articles that only included patients treated with RRT and reported the rate of patients with chronic dialysis or RRT dependence at day 90 as the dialysis component criteria (n = 6). [33, 37, 51, 87, 100, 109] By definition, if the “any RRT at any time” criterion was used, 100% would have been MAKE positive. However, using the actual pre-set study MAKE criteria only 63.6% were reported as MAKE positive, and dialysis at day 90 was seen in only 8% (supplementary Table S7).

Discussion

Key findings

In this systematic scoping review, we found marked heterogeneity in applied criteria for MAKE across 122 eligible studies. While the number of published articles using MAKE as an endpoint continued to rise, heterogeneity remained, both in terms of the observation periods selected for MAKE evaluation and the criteria applied to each MAKE component. In particular, we identified 37 different criteria for persistent renal dysfunction, which can be further influenced by baseline creatinine definitions and/or eGFR calculations. Moreover, in some studies, descriptions of applied criteria for MAKE components were lacking and criteria and observation periods for MAKE evaluation vary according to study characteristics. Furthermore, MAKE rates varied by as much as 7% with different observation periods and by 36.4% with different dialysis component definitions.

Relationship to previous studies

To the best of our knowledge, this is the first systematic scoping review to describe the heterogeneity in the applied criteria and observation periods used to detect MAKE. Observed heterogeneity and its impact on MAKE rates, with some studies even lacking a description of applied criteria, are noteworthy and even problematic in study interpretation or comparison and planning RCTs. [142, 143]

The concept of MAKE was originally proposed by Shaw in 2011 [144] as a patient-centered renal outcome including death, new dialysis, and incident or progressive chronic kidney disease. Subsequently, it was operationalized to include death, dialysis, and persistent renal dysfunction in most articles. Since the introduction of MAKE, death has been a key component. However, it remains unclear if death is a kidney injury related endpoint or a competing endpoint for renal dysfunction. This is because mortality mostly reflects the underlying acute illness. Although death from kidney failure has been employed as a component for proposed Major Adverse Renal Events for CKD patients [145], attributing mortality to kidney injury in intensive care unit patients, however, is particularly challenging and possibly misleading. Yet, death is a fundamental patient-centered outcome. As a consequence, despite such concerns, death from any cause continues to be used in studies of MAKE.

The original concept of MAKE was to serve as the renal analog of MACE. In this regard, since 2008, MACE has been adopted by the United States of America Food and Drug Administration (FDA) to assess the cardiovascular risks of therapies for diabetes. [9] Subsequently, the European Medicines Agency (EMA) adopted it as the preferred cardiovascular safety endpoint in trials of new medicinal products. [146] Additionally, the Academic Research Consortium-2 Consensus Document recommended such composite outcomes in coronary intervention trials. [10] These guidelines have accelerated the adoption of MACE as a primary outcome measure in cardiovascular research. Despite early heterogeneity, [147,148,149] several statements describing uniform definitions of MACE components [150, 151] have now been published as collaborative consensus statements of relevant associations including the Standardized Data Collection for Cardiovascular Trials Initiative (SCTI) and FDA. Our observations suggest that MAKE may be in a position similar to that of MACE approximately 15 years ago.

Our findings suggest that, despite the lack of consensus, the use of MAKE as an outcome measure is increasing as it may help provide more statistical power in RCTs. This was suggested by the findings of the REVIVAL trial (Recombinant human alkaline phosphatase SA-AKI survival), [137] which reported a significant difference in MAKE rates between treatment groups despite non-significant differences in 28-day mortality in sepsis-associated AKI. In response, the need for standardized MAKE definitions and the risk of applying different definitions in RCTs were highlighted in the accompanying editorial. [142]

Implications of study findings

Our findings imply that the use of MAKE as an endpoint is increasing and, as such, its clinical relevance is also increasing. At the same time, our observations demonstrate that there is substantial heterogeneity in applied criteria for each MAKE component and observation period. This heterogeneity impacts on reported MAKE rates and makes its current use as an endpoint more challenging. Additionally, considering the difficulties in powering RCTs of AKI and AKD treatment, MAKE’s potential advantage of increasing event numbers and thus statistical power is likely to lead to more studies using it as the preferred primary outcome. In this regard, our observations suggest that establishing uniform and acceptable standards for operationalizing each MAKE component with the assistance of independent adjudication committees, a similar path to MACE, may well be fundamental for the advancement of trial medicine in critical care nephrology. Moreover, as MAKE faces additional specific challenges due to a lack of consensus on indications for initiating dialysis or on minimal clinically important differences for persistent renal dysfunction, dedicated discussion for unified principles and acceptable standards in operationalizing MAKE is warranted.

Finally, MAKE can be seen as part of the continuum of AKI, AKD, and CKD definitions, aligning with the original intent of MAKE. Toward this end, the evolution of MAKE may incorporate elements from proposed AKD or CKD definitions, such as structural markers of kidney damage [152]. We acknowledge that until more consensus is achieved, a degree of flexibility in the MAKE definition, accompanied by clear descriptions, will remain. This is particularly true for exploratory retrospective studies that utilize established databases or in studies with specific aims requiring a tailored selection of MAKE components.

Strength and limitations

To our knowledge, the heterogeneity in applied criteria for each MAKE component employed in the literature has not been previously systematically evaluated. Thus, this study provides the first systematic assessment of such heterogeneity for component choice, observation periods, and component identification, as reported in the literature. Our findings offer a comprehensive overview of variability in the use of MAKE as study endpoint and provide perspective on the potential effect of different MAKE definitions on MAKE rates. This characterization generates fundamental insights into the current landscape surrounding MAKE and is important to future standardization of criteria for trials.

We acknowledge some limitations. First, the varied design and reporting of studies posed some difficulties in extracting fully comparable information. However, our formal methodology utilizing multiple independent reviewers helped minimize misinterpretation risks. Moreover, such heterogeneity was sufficiently large to minimize the impact of minor differences in the interpretation of individual articles. On the other hand, such heterogeneity made it necessary to conduct a scoping review rather than a formal meta-analysis and systematic review. This scoping design allowed us to comprehensively describe and quantify such heterogeneity. Moreover, given the wide variability demonstrated, our findings reinforce the challenges of performing a robust meta-analysis on this topic. Second, we did not describe the analytical challenges associated with employing composite endpoints like MAKE. We acknowledge that individual MAKE components may differently impact patient outcomes and any interpretability depends on how components are weighted or analyzed statistically. However, emerging methods like win-ratio modeling may help address this challenge. [153] Furthermore, MAKE emphasizes clinically relevant elements beyond simple creatinine/eGFR and aims to capture competing risks, supporting its usefulness despite these challenges. Third, we did not capture studies employing alternative composite outcomes with similar underlying endpoints but different naming conventions than MAKE. However, no competing widely used system exists to challenge MAKE’s pre-eminence in this field.

Conclusion

In this systemic scoping review, we found an annual increase in the number of articles employing MAKE as an endpoint. We also found substantial heterogeneity in the applied criteria and observation periods for MAKE evaluation across published articles, particularly regarding the components of persistent renal dysfunction and dialysis dependence. This heterogeneity leads to inconsistent results as well as difficulties in comparing or integrating renal outcome data among studies. Since employing composite outcomes like MAKE that incorporate patient-centered perspectives appears logical and clinically valid, our findings indicate the need for dedicated discussion on how to develop more uniform and acceptable standards for the operationalization of MAKE. Such discussion is important for the application of MAKE in future RCTs of patients with AKI or AKD.