INTRODUCTION

Sodium glucose co-transporter 2 (SGLT2) inhibitors are commonly used for the treatment of type 2 diabetes mellitus.1,2 Their primary mechanism of action lowers plasma glucose by inhibiting reabsorption at the nephron.1,2 Two of the SGLT2 inhibitors, empagliflozin and canagliflozin, also reduce the risk of myocardial infarction, stroke, and cardiovascular mortality.3,4 Because of the reduction in cardiovascular events, clinical trials are underway to test these medications in patients with cardiovascular disease who do not have diabetes.5 One trial, DAPAHF, recently demonstrated a lower risk of cardiovascular death or heart failure with dapagliflozin compared to placebo, regardless of whether the patient had diabetes.6 As a result, SGLT2 inhibitors may soon be indicated for a much wider patient population which highlights the importance of rare adverse events for this class of medications.

Important adverse events of SGLT2 inhibitors identified in clinical trials included mycotic genital infection and excessive volume depletion.1,3,4,7 One additional adverse event not initially detected in clinical trials, but subsequently identified after widespread use, was euglycemic diabetic ketoacidosis.8,9,10 Diabetic ketoacidosis is typically a complication of type 1 diabetes mellitus rather than type 2 diabetes and can be fatal.11,12 Because it is uncommon among adults with type 2 diabetes, initial cases of SGLT2 inhibitor-related ketoacidosis were unexpected and a diagnostic challenge for clinicians.8 In May 2015,13 the Food and Drug Administration (FDA) released a warning and subsequently updated the drug label to include diabetic ketoacidosis as a side effect of SGLT2 inhibitors.14

Four observational studies8,10,15,16 and two randomized trials17,18 have subsequently demonstrated that SGLT2 inhibitors are associated with diabetic ketoacidosis. However, these studies have not identified the risk factors for this adverse event. Risk factors for diabetic ketoacidosis generally fall into three categories: non-adherence to insulin, intercurrent illness (e.g., severe infection), and cardiovascular events (e.g., myocardial infarction, stroke). Prior to SGLT2 inhibitors, no specific medications were consistently shown to be associated with diabetic ketoacidosis. Because it is well established that diabetic ketoacidosis with SGLT2 inhibitors is rare, being able to identify risk factors for this could help mitigate this potential patient harm among patients prescribed an SGLT2 inhibitor. This is particularly important due to expanding indications for these medications.19 The objective of this study was to identify patient-level characteristics associated with an increased risk of diabetic ketoacidosis for patients receiving an SGLT2 inhibitor.

METHODS

Study Population

We conducted a population-based, new-user, cohort study using the nationwide US commercial insurance claims database Optum© Clinformatics® Data Mart.20 This database provides individual-level de-identified data on demographics, healthcare utilization, diagnoses, diagnostic tests and procedures, outpatient laboratory results, and pharmacy dispensing of drugs to over 13 million people in the USA. It has been widely used to understand the safety and effectiveness of medications used in routine care.20,21,22

We included adults with type 2 diabetes mellitus over age 18 who were newly prescribed an SGLT2 inhibitor (empagliflozin, canagliflozin, dapagliflozin) between March 29, 2013 (date of approval of the first SGLT2 inhibitor) and September 30, 2017 (last available database update). Patients with diabetes mellitus type 2 were identified using the International Classification of Diseases, Ninth Revision (ICD-9) and ICD-10 codes. The cohort entry date was the date of the first prescription for an SGLT2 inhibitor. A new user of an SGLT2 inhibitor was defined as an adult without a prior prescription for an SGLT2 inhibitor in the preceding 180 days.

Patients with insufficient baseline data (i.e., less than 180 days of available data) or a diagnosis of type 1 diabetes were excluded. For our primary analysis, we included patients with a prior history of DKA (in a sensitivity analysis, these patients were excluded).

The Brigham and Women’s Hospital Institutional Review Board provided ethics approval and a valid data use agreement for the database was in place.

Cohort Follow-Up

Follow-up began on the day after the first SGLT2 inhibitor prescription was filled and continued until the end of the study period (i.e., September 30, 2017), end of continuous health plan enrollment, first study outcome, discontinuation of SGLT2 inhibitor, 365 days, or death (whichever came first). An SGLT2 inhibitor was considered discontinued if 30 days elapsed after the expiration of the last prescription’s supply without being refilled (data censored on day 31 onwards).

Study Outcomes

The primary objective was to identify predictors of DKA among patients prescribed an SGLT2 inhibitor. A diagnosis of DKA was defined as one of (i) hospitalization with DKA as the primary diagnosis, (ii) hospitalization with DKA as a secondary diagnosis, and (iii) outpatient diagnosis of DKA. The primary analysis was restricted to hospitalizations with DKA. Secondary analyses were restricted to (i) alone expanded and to (i), (ii), or (iii). In all cases, DKA was identified using ICD-9 or ICD-10 codes (Appendix). Prior studies have utilized claims data to estimate the risk of DKA in various clinical scenarios.10,23,24 In addition, two prior studies suggest these codes have a specificity of about 92%, a sensitivity of 68%, and a positive predictive value above 90%.25,26

Baseline Covariates

For the primary analysis, covariates were assessed during the 180 days before cohort entry. In a secondary analysis, covariates were assessed in the 60 days before cohort entry since diagnoses immediately preceding the prescription may be especially relevant. Data included chronic medical conditions (e.g., cardiovascular disease), proxies for diabetes severity (e.g., hemoglobin A1C), established risk factors for diabetic ketoacidosis (e.g., prior event, current insulin use, recent infection), healthcare utilization (e.g., recent hospitalization, emergency department visit, or surgery), prescriber information (e.g., endocrinologist, general practitioner), other diabetes medications (e.g., metformin, sulfonylureas) and non-diabetes-related medications (e.g., diuretics, steroids). These covariates were a priori selected based on prior literature, clinical experience, and expert opinion (Table 1).27,28 Importantly, some of the laboratory measurements (e.g., serum bicarbonate) may appear to be causal intermediates; however, we only included laboratory values that preceded the prescription for the SGLT2 inhibitor. Thus, the laboratory values by definition are not on the causal pathway between SGLT2 inhibitor use and diabetic ketoacidosis.

Table 1 Baseline Characteristics
Table 2 Risk Factors for Diabetic Ketoacidosis Using Variables Identified Up to 180 Days Prior to SGLT2 Inhibitors

Statistical Analysis

Each patient had 83 covariates identified during the baseline period before being prescribed an SGLT2 inhibitor. Because DKA was rare, including all covariates can cause over-fitting in a logistic regression model. Instead, we applied two machine learning techniques for identifying variables that might be associated with SGLT2 inhibitor-related DKA: least absolute shrinkage and selection operator (LASSO) regression and gradient boosted trees. These two approaches were selected because they are two of the most commonly applied supervised machine learning techniques.

LASSO regression can handle high-dimensional data (i.e., a large number of predictors relative to outcomes), even with substantial collinearity among covariates.29 Because LASSO cannot handle missing data, we included only baseline covariates with complete data (i.e., excluding serum creatinine, serum bicarbonate, serum hemoglobin A1C). We performed LASSO using the glmnet package available in R and standardized the predictors by their individual standard deviation (sd) so that the odds ratios (OR) produced by LASSO were on a consistent scale.

Gradient boosted trees can accommodate missing data, and it has been shown to have good predictive performance across a wide range of problems.29,30 Since gradient boosting effectively handles missing data, we retained variables with missing data. The model tuning parameters were selected using a grid search of varying number of trees (0 to 12,000), interaction depth (1 or 3), shrinkage factor (0.001, 0.01, or 0.1), and bag fraction (0.4, 0.5) that optimized the standard loss function for classification (i.e., Bernoulli deviance) (Appendix).29,30

To quantify the association between potential predictors and the risk of DKA, we included variables identified using LASSO regression and gradient boosted trees in a logistic regression model. We rank ordered the variables produced from LASSO by the size of their estimated coefficient (largest to smallest). With gradient boosted trees, a relative importance (RI) measure for each variable is provided rather than a coefficient.29,30 The relative importance measure was then sorted from largest to smallest to identify the variables with the largest association with diabetic ketoacidosis.29,30 We selected the top variables from LASSO (i.e., largest coefficient) and gradient boosted trees (i.e., largest relative importance) as candidates for subsequent logistic regression. The “top variables” were identified after visually reviewing the distributions of the effect estimates from LASSO and gradient boosted trees to identify where there was a substantial decrease in variable importance. There was no specific cut-off or analytic metric used because both approaches would require arbitrary cut-offs and there is no specific literature to indicate this approach is robust. Partial dependence plots were reviewed for the top variables that were continuous to aid in model interpretability.31 A partial dependence plot displays the marginal effect of the continuous variable on the outcome.31 An alternative approach is to arbitrarily dichotomize variables (e.g., hemoglobin A1C less than 7%), but doing so decreases statistical power and limits the ability to identify potentially clinically relevant cut-points.

Three predefined sensitivity analyses were performed. First, the original cohort was re-analyzed using only baseline characteristics in the 60 days rather than 180 days before being prescribed an SGLT2 inhibitor. A shorter baseline period was used under the assumption that perhaps variables identified closer to the index date (i.e., up to 60 days before) might be more relevant than ones identified further from the index date (e.g., up to 180 days before). Second, a new cohort of patients was constructed that excluded any patients with a prior diagnosis of diabetic ketoacidosis. Excluding patients with a prior diagnosis of diabetic ketoacidosis was considered in the event that prior diabetic ketoacidosis is the strongest predictor and thus overshadows other potentially relevant characteristics. Third, an additional cohort of patients was constructed to include those with type 1 diabetes mellitus, since some patients with type 1 diabetes mellitus have been prescribed an SGLT2 inhibitor off-label.

RESULTS

Study Population

A total of 111,442 patients satisfied study inclusion and exclusion criteria (Fig. 1). The mean age was 57 years and 44% were female. The mean hemoglobin A1C was 8.7% (sd = 1.8) and the mean creatinine was 0.89 (sd = 0.25). Overall, 62% were prescribed metformin and 24% were prescribed insulin (Table 1). Risk factors for diabetic ketoacidosis were infrequent (e.g., prior diabetic ketoacidosis). Over a mean follow-up of approximately 180 days, 475 patients were diagnosed with diabetic ketoacidosis (inpatient or outpatient).

Figure 1
figure 1

Cohort entry criteria. Legend: SGLT2 = sodium glucose co-transporter 2 inhibitor; all criteria assessed in the preceding 180 days with the exception of a diagnosis of type 2 diabetes which was assessed up to 1000 days prior.

Predictors Identified by LASSO Regression

In the LASSO model of hospitalization with a diagnosis of DKA (N = 192), 54 of the 80 predictors had an odds ratio (OR) of 1, and 4 had an OR close to 1 (i.e., 1.01 to 1.09). Of the remaining 22 variables, the largest predictors were a prior diagnosis of DKA (OR = 4.8), use of dementia medications (OR = 4.2), hypoglycemia (OR = 2.4), digoxin use (OR = 2.0), insulin use (OR = 1.8), diabetic retinopathy (OR = 1.8), heparin use (OR = 1.7), or gastrointestinal bleed (OR = 1.62). Similar findings were observed in analyses restricted to DKA as the primary diagnosis (N = 125), and analyses expanded to include outpatient diagnoses (N = 475). Similarly, limiting the baseline time period to 60 days yielded comparable results (Appendix Table 1).

When we restricted the cohort to only patients without a past episode of DKA, the strongest predictors for a hospitalization with DKA were use of dementia medications (OR = 3.7) and use of digoxin (OR = 2.1). When we broadened the cohort to include patients with a diagnosis of type 1 diabetes, the strongest predictors were intracranial hemorrhage (OR = 4.7), use of dementia medications (OR = 4.6), prior DKA (OR = 4.1), and a diagnosis of type 1 diabetes mellitus (OR = 3.3).

Predictors Identified by Gradient Boosted Trees

The selected model with the lowest Bernoulli deviance had 1257 trees, a lambda of 0.1, interaction depth of 1, and a bag fraction of 0.4 though differences in the Bernoulli deviance across the tuned model hyper-parameters were small (i.e., generally less than 0.01). In the fitted gradient boosted tree model that included a hospitalization with a diagnosis of DKA (N = 192), 63 of the 83 predictors had a relative importance of 0 and 9 predictors had a relative importance near 0 (i.e., between 0 and 0.1). Of the remaining 11 variables, the largest predictors were baseline hemoglobin A1C (RI = 55.9), baseline creatinine (RI = 40.0), use of dementia medications (RI = 1.1), prior diagnosis of DKA (RI = 1.0), and serum bicarbonate (RI = 0.6). Similar findings were observed in analyses restricted to hospitalizations with DKA as the primary diagnosis (N = 125), or in analyses expanded to include outpatient diagnoses (N = 475). In addition, analyses limited to a baseline time period to 60 days rather than 180 days yielded comparable findings (Appendix Table 2). When we restricted the cohort to only patients without a past episode of DKA, the strongest predictors were serum creatinine (RI = 51.4), hemoglobin A1C (RI = 46.4), and serum bicarbonate (RI = 0.92). When we broadened the cohort to include patients with a diagnosis of type 1 diabetes, the strongest predictors were baseline hemoglobin A1C (RI = 54.5) and baseline creatinine (RI = 28.9).

Predictors Analyzed Using Logistic Regression

A logistic regression model included the variables that were consistently identified using either LASSO or gradient boosted trees: prior diabetic ketoacidosis, hypoglycemia, digoxin, dementia medications, delirium, intracranial hemorrhage, hemoglobin A1C, creatinine, and bicarbonate (Table 2). The cut-offs for hemoglobin A1C, serum bicarbonate, and creatinine were identified using partial dependency plots that indicated clear transition points in the predicted probability of DKA. Results are also provided for the logistic regression model only including variables in the preceding 60 days (Table 3).

Table 3 Risk Factors for Diabetic Ketoacidosis Using Variables Identified Up to 60 Days Prior to SGLT2 Inhibitors

DISCUSSION

In this study of over 100,000 adults who started on an SGLT2 inhibitor, overall 475 (4 per 1000) were subsequently diagnosed with diabetic ketoacidosis in the inpatient or outpatient setting, with 192 in the inpatient setting (2 per 1000) over a mean follow-up of approximately 180 days. Using machine learning techniques, both anticipated (i.e., prior DKA, low serum bicarbonate, and a hemoglobin A1C above 10%) and unanticipated (i.e., digoxin, dementia medications) risk factors for DKA were identified. These findings were robust across various sensitivity analyses and highlight a role of machine learning for identifying potential risk factors for rare adverse events.

Preventing SGLT2 inhibitor-related DKA is important because it can be life threatening and an easily over-looked diagnosis for several reasons.8,12 First, DKA is typically associated with type 1 diabetes mellitus, rather than type 2 diabetes.12,32 Since SGLT2 inhibitors are prescribed to patients with type 2 diabetes mellitus, DKA is not always considered.5 Second, DKA was traditionally considered to be caused by profound insulin deficiency rather than an adverse event from medications.12 Third, patients with SGLT2 inhibitor-related DKA can have normal or mildly elevated blood glucose levels.8,9,32 This is atypical for DKA, because patients typically present with blood glucose levels that are five- to tenfold higher than normal. Indeed, many physicians do not initially recognize DKA due to the near-normal glucose levels.8 Educating clinicians about SGLT2 inhibitor-related DKA may help raise awareness and identifiy startegies to prevent it from occuring. 

We identified prior DKA, digoxin, dementia medications, serum bicarbonate less than 18 mmol/L, and a hemoglobin A1C greater than 10% as some of the important risk factors. Prior DKA, a low serum bicarbonate, and an elevated hemoglobin A1C may seem intuitive since the former suggests a metabolic acidosis may already be present and the latter suggests poorly controlled diabetes. However, many associations in medicine can appear intuitive, but empirical data help to support informed care. Other predictors were surprising (i.e., digoxin, dementia medications, intracranial hemorrhage), and thus, it is unknown whether these findings are spurious, surrogate markers of underlying risk, or potentially directly related to an increased risk of DKA with SGLT2 inhibitors. Of note, both digoxin and SGLT2 inhibitors are a substrate for p-glycoprotein.33,34 A recent pharmacokinetic study of healthy volunteers identified that digoxin modestly increased serum concentration of empagliflozin which may help explain the increased odds of DKA we observed compared to adults not prescribed digoxin.33,34 However, digoxin may instead be a surrogate for cardiovascular disease severity, rather than being causally linked to SGLT2 inhibitor-associated diabetic ketoacidosis.

It is unclear why dementia medications might be associated with a higher risk of DKA. SGLT2 inhibitors are generally not metabolized by cytochrome P450 enzymes and instead are eliminated by glucuronidation via UGT1A9 and UGT2B4. Neither donepezil nor memantine should affect glucuronidation, but memantine is eliminated by tubular secretion which could alter plasma levels of SGLT2 inhibitors which are renally cleared and act at the proximal convoluted tubule.35 An alternative explanation is that the increased risk of DKA might be related to dementia severity, and dementia medication use is a proxy for severity, or that dementia medications are a sign of polypharmacy. It is also unclear why prior intracranial hemorrhage was a seemingly important predictor. It may represent a surrogate of recent hospitalization, illness severity, or perhaps a spurious finding.

Despite observational studies, some clinicians remain skeptical that DKA can be caused by an SGLT2 inhibitor.36,37 For example, two prior meta-analyses of clinical trials found no association between SGLT2 inhibitors and DKA.36,37 However, the average number of patients who received an SGLT2 inhibitor was about 500 in each of the trials. Since the overall rate is approximately 3–8 per 1000 person-years, the majority of those trials were underpowered to detect diabetic ketoacidosis. Moreover, our study identified that a hemoglobin A1C above 10% is a strong risk factor yet the majority of trials included excluded patients who had a hemoglobin A1C above 10%.36,37 The cardiovascular outcome trial for dapagliflozin (N = 17,160) included patients who had a hemoglobin A1C up to 12%, and found a twofold higher rate of DKA (hazard ratio [HR] 2.18, 95% confidence interval [CI] 1.10–4.30).17 Furthermore, the renal outcome trial for canagliflozin (N = 4410) also included patients who had a hemoglobin A1C of up to 12% and also found an increased rate of DKA in patients randomized to canagliflozin (HR = 10.80, 95% CI 1.39–83.65).

Unlike the recent clinical trials identifying an increased risk of DKA with SGLT2 inhibitors, our study lacked diagnostic certainty in identifying DKA. While ICD codes are popular, they are imperfect and can result in misclassification. For example, we observed a higher odds ratio for prior DKA when our outcome definition included an outpatient diagnosis of DKA as opposed to only an inpatient diagnosis of DKA. The higher odds ratio with prior DKA may represent re-recording of prior events rather than a truly new DKA event. For this reason, the results from our models that defined outcomes based on inpatient diagnostic codes might be more accurate. We also lacked complete laboratory data (i.e., only one-third had a baseline hemoglobin A1C) in addition to not having data related to other potential risk factors for DKA including body mass index, dietary intake, alcohol use, and genetic markers.12 Furthermore, diagnostic codes for variables such as hypoglycemia, delirium, and smoking are imperfect measures and likely underestimate the prevalence of these conditions. Similarly, there were considerable amounts of missing data for laboratory measures, and thus, the mere fact that they were performed may be an indicator of underlying illness severity or concern by the attending physician. These gaps are an important area for future research.

For patients with multiple risk factors, further laboratory monitoring might help to risk stratify these patients. For example, if laboratory testing identifies a low serum bicarbonate level, then these should be worked up accordingly and an SGLT2 inhibitor should be initially avoided. Of course, this recommendation is pragmatic and not formally tested in our study.

CONCLUSION

SGLT2 inhibitors are an effective class of medications for adults with type 2 diabetes mellitus, but DKA remains an important risk. We applied machine learning methods and identified both anticipated (i.e., prior DKA, low serum bicarbonate, elevated hemoglobin A1C) and unanticipated (i.e., digoxin) risk factors to advance our understanding of SGLT2 inhibitor-related DKA. Additional studies are required to confirm our findings, but patients with multiple risk factors for SGLT2 inhibitor-related diabetic ketoacidosis may benefit from laboratory testing prior to initiation of an SGLT2 inhibitor, closer monitoring for diabetic ketoacidosis, or alternative medications to manage their diabetes.