Introduction

Acute kidney injury (AKI) occurs in up to 30 % of critically ill adults [1, 2] and is associated with increased mortality [3, 4] and morbidity [5, 6]. Clinical research evaluating the prevention and treatment of AKI has historically been hampered by the lack of consensus definitions for AKI and the unclear relationship between acute changes in kidney function and longer-term outcomes [7, 8]. To address these challenges, the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) workgroup on Clinical Trials in Acute Kidney Injury recently recommended use of “a composite endpoint of death, provision of dialysis, or sustained loss of kidney function” for phase III trials related to AKI [9]. Analogous to the Major Adverse Cardiovascular Events composite for coronary artery disease [10], the proposed Major Adverse Kidney Events (MAKE) composite of death, new renal replacement therapy (RRT), or sustained loss of kidney function incorporates several clinically important outcomes and retains a reasonable event rate while shifting the focus from short-term, surrogate measures [8] to longer-term, more patient-centered endpoints [11, 12].

The proliferation of electronic health records (EHRs) and clinical information systems provides a novel opportunity to detect the development of AKI in hospitalized patients [13]. Several previous studies have successfully used EHRs to detect changes in serum creatinine for the purposes of generating provider alerts [14, 15]. Beyond clinical use, there is increasing interest in leveraging tools within the EHR to facilitate the conduct of large, pragmatic trials [16, 17]. In preparation for an upcoming clinical trial, we developed and tested an approach to identifying the MAKE composite endpoint from EHR data collected as part of routine care.

Methods

Study design and oversight

We conducted an observational study using data prospectively collected as a part of an ongoing pilot (NCT02345486). The protocol was approved by the institutional review board at Vanderbilt University with a waiver of informed consent.

Patient population

Among 466 consecutive adult (≥18 years old) admissions to the medical ICU at Vanderbilt University between February 3, 2015 and March 31, 2015, we used computer-generated simple randomization to select 200 cases for review with regard to the MAKE30 outcome (Fig. 1).

Fig. 1
figure 1

Flow of patients through the study. From 466 consecutive admissions to the medical intensive care unit (ICU) between February 3, 2015 and March 31, 2015, a sample of 200 cases was selected by computer-generated simple randomization. For these 200 cases, the presence of Major Adverse Kidney Events (MAKE) was determined by (1) two-physician manual chart review and (2) electronic data extraction. Discrepancies between the two physician reviewers were resolved by a third physician to generate a reference standard manual-review dataset. Electronic identification of MAKE (with and without targeted manual review of cases missing a serum creatinine value prior to hospital admission) was compared to MAKE identified by manual review

Study outcomes

The endpoint of interest was the proportion of patients meeting one or more criteria for Major Adverse Kidney Events within 30 days (MAKE30): in-hospital mortality; receipt of new RRT; or persistent renal dysfunction [1, 9] (Table 1). In-hospital mortality was defined as death from any cause prior to hospital discharge censored at 30 days after ICU admission. Receipt of new RRT was defined as receipt of any modality of RRT between ICU admission and the first of (1) hospital discharge or (2) 30 days in a patient not known to have received RRT prior to ICU admission. Persistent renal dysfunction was defined as a final serum creatinine value before hospital discharge (censored at 30 days after enrollment) that was ≥ 200 % of the baseline creatinine value. Patients who had received RRT prior to enrollment were ineligible for new RRT and persistent renal dysfunction endpoints, but could still meet MAKE30 criteria via the in-hospital mortality component. Secondary outcomes included death, new RRT, and persistent renal dysfunction by 90 days (MAKE90), including outcomes that occurred after hospital discharge.

Table 1 Definition of Major Adverse Kidney Events within 30 days (MAKE30)

Study definitions

The value for baseline serum creatinine was determined in a hierarchical approach. The lowest serum creatinine between 12 months and 24 h prior to hospital admission was used when available. If no such creatinine value was available, the lowest creatinine value between 24 h prior to hospital admission and the time of ICU admission was used. If no creatinine value was available between 12 months prior to hospital admission and the time of ICU admission, a baseline creatinine value was estimated using a previously-described three-variable formula [creatinine = 0.74 − 0.2 (if female) + 0.08 (if African American) + 0.003 × age (in years)] [18].

Chronic kidney disease (CKD) was defined as (1) a highest glomerular filtration rate (GFR) <60 mL/min/1.73 m2 in the 12 months prior to enrollment as estimated by the Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) equation [19] or (2) a clinical history of CKD stage 3 or greater among patients without an available serum creatinine value.

Data collection by manual chart review

Two physicians independently reviewed the institutional EHR using a structured instrument. Reviewers abstracted the following variables, which had been collected as a part of routine clinical care, into a secure online database:

  1. (1)

    Demographics, diagnosis, and severity of illness;

  2. (2)

    Presence of CKD and any prior receipt of RRT;

  3. (3)

    Serum creatinine values, including lowest between 12 months and 24 h prior to hospital admission, lowest between 24 h prior to hospital admission and ICU admission, highest between ICU admission and hospital discharge or 30 days, and final before hospital discharge or 30 days;

  4. (4)

    Vital status and receipt of RRT through hospital discharge; and

  5. (5)

    Post-discharge survival, receipt of RRT, and serum creatinine values when available within the EHR (median duration of follow up among patients surviving to hospital discharge: 183 days; IQR 63–242 days).

After completion of the independent two-physician review, a third physician examined any cases with discrepancy between the initial reviewers with regard to one or more of the MAKE30 criteria. The reason for the discrepancy was recorded and a final reference standard manual-review dataset was generated for comparison to data extracted electronically.

Electronically-extracted data

Structured data from StarPanel, our enterprise EHR, is exported on a daily basis to our institution’s Enterprise Data Warehouse (EDW), along with data from our patient registration system, billing system, and laboratory clinical information system. We developed a process to detect transfers and admissions to the study ICU using data extracted from our EDW on a weekly basis. The combination of patient identifiers (medical record number and encounter number) and a timestamp for study enrollment (date and time of first ICU admission) were used to extract pre- and post-enrollment data elements, as described below.

Patients with a history of prior RRT were identified electronically using the American Medical Association’s Current Procedural Terminology (CPT) codes (3066 F, 4054 F, 4055 F, 90963, 90964, 90965, 90966, 90967, 90968, 90969, 90970, 90989, 90993, G0257, G8714, G8956, G9013, G9014, G9231, 90935, 90937, 90945, 90947, 90989, 90993, 90921, 90925, 90999) as well as International Classification of Disease, Clinical Modification (ICD) codes for ICD-9 (39.95, 54.98) and ICD-10 (5A1D00Z, 5A1D60Z, 3E1M39Z) [20]. The presence of any one of these codes in our patient registration system or billing system prior to the date and time of ICU admission resulted in the patient receiving the status of “RRT received prior to enrollment”. The same codes were used to determine which patients received RRT during the study period. For all patients who received RRT during the study period, a full text search of the pre-enrollment record was performed using terms related to receipt of RRT to identify patients who had received RRT prior to enrollment at an outside facility (such that CPT and ICD codes for RRT might not be available in our EHR). Search terms included “renal replacement”, “RRT”, “CRRT”, “dialysis”, “HD”, “PD”, “end-stage renal”, and “ESRD”. Patients who had not received RRT prior to enrollment and received RRT between enrollment and hospital discharge, censored at 30 days, were considered to have met the “new receipt of renal replacement therapy” component of the MAKE30 endpoint.

Using all inpatient, outpatient, and emergency department creatinine values from our institutional laboratory clinical information system, we determined (1) the lowest serum creatinine value between 12 months and 24 h prior to hospital admission, (2) the lowest creatinine value between 24 h prior to hospital admission and the time of ICU admission, and (3) an estimated baseline creatinine value using a previously-described three-variable formula [creatinine = 0.74 − 0.2 (if female) + 0.08 (if African American) + 0.003 × age (in years)] [18]. A baseline creatinine value for each patient was determined using the hierarchical approach described above. For each patient we compared the baseline creatinine value to the final creatinine value obtained between enrollment and hospital discharge, censored at 30 days. If the final creatinine value was at least twice the baseline creatinine value, the “persistent renal dysfunction” component of the MAKE30 outcome was considered present.

Mortality was determined by searching for a mortality-associated discharge disposition in our patient registration system within 30 days of the study enrollment date. Patients with a mortality-associated discharge disposition within 30 days of study enrollment were considered to have met the “mortality” component of the MAKE30 endpoint.

Patients who met any of the three components of the MAKE30 endpoint were considered to have experienced the MAKE30 composite endpoint.

Anticipating potential challenges associated with electronically identifying prior RRT receipt and baseline creatinine values among patients without previous care at the study institution, we tested the additive value of supplementing electronic data abstraction with a “targeted manual review” of the EHR for those cases without an available serum creatinine in the 12 months prior to hospital admission.

Statistical analysis

Because this study focused on comparing two approaches to measuring the same clinical endpoint, no formal power calculation was performed. Continuous variables were reported as mean ± standard deviation or median and interquartile range; categorical variables as frequencies and proportions. Boot-strapping using 1,000 sampling iterations was used to estimate 95 % confidence intervals. Between-group comparisons were made with the Mann–Whitney rank sum test for continuous variables and Chi-square or Fisher exact test for categorical variables. Inter-rater reliability was assessed using the kappa statistic. To provide an impression of how the electronically-extracted MAKE30 criteria would perform if considered a screening test for the presence of the manually-extracted MAKE30 outcome, the sensitivity and specificity were calculated. A two-sided P value < 0.05 was used to determine significance. All analyses were performed using SPSS Statistics v.22 (IBM Corp., Armonk, NY, USA) or R version 3.2.0 (R Foundation for Statistical Computing, Vienna, Austria).

Results

Baseline characteristics

Baseline characteristics of the study cohort (n = 200) are given in Table 2. Patients’ median age was nearly 60 and almost half were men. One in five had stage 3 or greater CKD, with around 15 % having previously received RRT. Sepsis and respiratory failure were the most common admitting diagnoses, with almost 20 % of patients receiving vasopressors and nearly a third of patients on mechanical ventilation. A total of 148 (74.0 %) patients had a serum creatinine value available from the 12 months prior to hospital admission (median 0.80 mg/dL; IQR 0.62 – 1.25 mg/dL). An additional 43 (21.5 %) patients had a creatinine measurement between hospital admission and enrollment that served as the baseline creatinine value. Only 9 (4.5 %) patients had no creatinine measurements available in the 12 months prior to enrollment and required a calculated estimate of baseline creatinine.

Table 2 Characteristics of patients with and without a Major Adverse Kidney Event in the 30 days after enrollment

Clinical outcomes determined by manual chart review

The two physician reviewers agreed in their assessment of the presence or absence of MAKE30 in 192 of the 200 cases (96.0 % agreement [95 % CI 93.5–98.5 %]; kappa 0.85 [95 % CI 0.73–0.94]; P < 0.001). Disagreement occurred in four cases because a history of RRT was missed by a reviewer and in four cases because a reviewer failed to incorporate an available pre-enrollment value into the baseline serum creatinine.

In the final reference standard manual-review dataset, 32 patients (16.0 %) experienced the MAKE30 composite outcome (Table 2). The incidence of each of the individual MAKE30 components was 8.5 % for in-hospital mortality before 30 days, 3.5 % for receipt of new RRT (2.2 % among survivors), and 8.5 % for persistent renal dysfunction (7.1 % among survivors and 6.1 % among survivors who did not require new renal replacement therapy). The incidence of MAKE by 90 days, including after hospital discharge, was 28.0 %. A total of 27 (16.1 %) patients who did not experience MAKE30 met criteria for MAKE90, 17 of whom died between hospital discharge and 90 days, 1 of whom experienced new RRT between hospital discharge and 90 days, and 9 of whom met criteria by the development of persistent renal dysfunction.

Comparison of electronically- and manually-extracted data

Correlation between electronically- and manually-extracted simple demographic data was perfect (r2 = 1.00; P < 0.001 for age, date of hospital admission, date of ICU admission, and body mass index). Electronically- and manually-extracted baseline creatinine values are compared in Fig. 2. Post-hoc review of the three cases with a discrepancy greater than 0.25 mg/dL between electronically- and manually-collected values found in all cases that manual review had erroneously classified a creatinine value from shortly after ICU admission as pre-enrollment.

Fig. 2
figure 2

Bland-Altman plot of electronically- versus manually-extracted baseline creatinine values. Among the 200 patients in the current study, 172 had never received renal replacement therapy prior to ICU admission and were eligible to experience the creatinine-based component of the MAKE30 outcome. For these 172 patients, the difference between (Y axis) and average of (X axis) electronically- and manually-extracted baseline serum creatinine values (mg/dL) are displayed. Each point represents an individual patient and dotted lines are the 95 % limits of agreement. The three cases with a discrepancy greater than 0.25 mg/dL between electronically- and manually-collected values (red) were found to be due to errors in the manually-collected creatinine values

There was strong agreement between the electronic and manual assessment of the MAKE30 endpoint (98.5 % agreement [95 % CI 96.5–100.0 %]; kappa 0.95 [95 % CI 0.87–1.00]; P < 0.001) (Table 3). The electronic assessment correctly classified all patients with regard to the receipt of new renal replacement therapy. Two patients who died were misclassified by electronic identification as alive at discharge. Review of these two records revealed a programmatic error in which patients with data retrieved from our Perioperative Data Warehouse who had not experienced any operative procedure received a null value assigned to the death source, even if they died before hospital discharge. Removing this filter resolved the error and correctly classified all 200 patients. Presence of persistent renal dysfunction was correctly classified for 198 of the 200 patients. Two patients with CKD by clinical history but no serum creatinine at the study institution prior to enrollment were inappropriately classified by the electronic assessment as experiencing new persistent renal dysfunction. Supplementing the electronic MAKE30 assessment with targeted manual review of cases without an available serum creatinine value prior to hospital admission achieved appropriate classification of these two cases. Electronic MAKE30 assessments supplemented with targeted manual review performed similarly among patients with and without CKD and among those with and without a prior serum creatinine value in the EHR (Table 3). The final electronic algorithm, supplemented by targeted manual review of cases without a pre-admission creatinine, achieved 100 % sensitivity and specificity for the MAKE30 endpoint in the current dataset.

Table 3 Major Adverse Kidney Events by 30 days derived from electronically-extracted compared with manually-extracted data (referent)

Discussion

Establishing reliable methods for electronically collecting patient-centered outcomes is essential to leveraging the EHR for use in pragmatic trials [16, 17, 21]. This prospective, observational study demonstrated the feasibility of electronically identifying death, new RRT receipt, or persistent renal dysfunction among critically ill adults, using EHR data collected as a part of routine care.

Although the development of stage II or III AKI by KDIGO criteria [8] currently represents the most established definition of AKI in clinical research, there is increasing recognition of the need to examine outcomes meaningful both to clinicians and patients [9]. The MAKE composite endpoint captures, at a consistent time interval, mortality (the most important outcome to many patients) as well as receipt of RRT and persistent renal dysfunction (two kidney-specific events which may be more closely associated with long-term morbidity and quality-of-life than transient changes in creatinine [5, 6]). Uncertainty remains regarding how to best define the persistent renal dysfunction component of MAKE. A sustained doubling in creatinine at 30 days represents a large reduction in GFR and may prioritize specificity at the cost of decreased sensitivity. Even as work is ongoing to determine the best creatinine or estimated GFR criteria for persistent renal dysfunction [12], MAKE is increasingly being recommended [9] and used [12] as the endpoint of choice for AKI clinical trials and biomarker validation studies [1, 9, 11, 12, 2226].

A number of prior studies have assessed the feasibility of detecting AKI with data from the EHR [13, 27], primarily using laboratory clinical information systems to identify changes in serum creatinine concentration. Although the reference standard for AKI and the performance of electronic detection have varied significantly across studies [1315, 2729], some methods have achieved sensitivity and specificity in excess of 90 %. Notably, many of these studies were limited to a narrow spectrum of patients, frequently excluding those with CKD.

Numerous prior studies have also evaluated the use of administrative data to identify episodes of AKI in hospitalized patients [21, 30, 31]. Most AKI studies using administrative data have applied ICD-9 codes or CPT codes to capture the diagnosis of AKI and RRT. Commonly used administrative codes for AKI (e.g. ICD-9-CM 584, ICD-10 N-17) generally demonstrate low sensitivity and higher specificity [21]. Inclusion of only patients with AKI requiring dialysis may improve diagnostic performance [30], but sensitivity in some studies has remained as low as 40 % [31]. Use of billings codes may identify AKI with a more severe phenotype and may demonstrate better performance characteristics at higher stages of AKI.

The goal and technical approach of the current study were significantly different than these prior, related studies. We aimed to identify critically ill adults who experienced a Major Adverse Kidney Event between ICU admission and hospital discharge, using all data available within the hospital informatics systems. By merging date- and time-specific laboratory data from the inpatient and outpatient setting with administrative data on ICD-9 and CPT codes, we were able to accurately identify patients who experienced the MAKE30 composite outcome. The sensitivity and specificity of electronically extracted data for the manually-collected MAKE30 outcome were above 95 %, and increased incrementally with targeted manual review of charts known to be at higher risk for misclassification due to missing baseline creatinine data. Performance was similar among patients with and without evidence of CKD. Identifying the MAKE endpoint using all available laboratory and administrative data from an individual patient’s hospitalization avoids some of the challenges associated with detecting AKI via laboratory values or administrative datasets alone. A doubling of creatinine from baseline to discharge is easier to detect than smaller, time-dependent changes in creatinine. Death and RRT may be coded more consistently in administrative data than AKI diagnoses generally. The ability to reliably detect the MAKE composite endpoint from EHR data collected during routine care suggest the MAKE endpoint is well suited for use in pragmatic AKI research.

Our study has several strengths. Manual chart review by two physicians is a well-recognized reference standard for kidney injury outcomes, and the 91 % agreement between the two initial reviewers in our study was similar to that observed in previous AKI studies [28, 31]. Including all ICU admissions allowed examination of performance characteristics over a wide spectrum of underlying illness, including patients with CKD (for whom risk of AKI is high but assessment of baseline creatinine may be challenging) and patients receiving RRT prior to admission (who remain at risk for the in-hospital mortality component of MAKE). Correction of the small number of systematic errors in electronic data extraction identified in the current study may produce even more accurate MAKE30 identification in future studies.

Our study also has limitations. We studied only 200 patients in a single ICU at a single center. Our approach to identifying MAKE in the EHR might perform differently in other populations of patients and providers or at centers that handle laboratory and administrative data differently. Replication might prove challenging in healthcare environments in which information systems differ between the outpatient and inpatient setting, or where patients are less consistently cared for at a single institution. Determining baseline creatinine values in studies of AKI is a recognized challenge [32] and alternative definitions might have produced different results. Censoring the primary MAKE assessment at hospital discharge or 30 days avoids biases related to differences in post-discharge follow up, but offers less information about progression to CKD than complete patient follow up to a later time-point [12]. Data from electronic medical records contain an inherent rate of noise compared to data deliberately collected by study personnel as a part of research. Reassuringly, however, the concordance between the electronically-assessed and reference standard MAKE30 outcome appeared to be similar to the concordance between the two physician reviewers for the same endpoint.

Conclusions

Accurately identifying critically ill adults who experience a Major Adverse Kidney Event using EHR data collected during routine care is feasible. Future research is needed to test the performance of the methods described here in other settings.