Sodium glucose co-transporter 2 (SGLT2) inhibitors, also known as gliflozins, are a new class of oral medication for the treatment of hyperglycaemia in type 2 diabetes (T2DM). They improve glycaemic control through the inhibition of glucose reuptake by SGLT2 in the proximal tubule of the kidney [1, 2]. This action is independent of insulin secretion and activity and, therefore, these agents are not considered to predispose to hypoglycaemia [3]; they also promote modest, but sustained, weight loss [4]. Inhibition of SGLT2 also promotes urinary sodium loss [5, 6] which, along with weight reduction, may be responsible for the early blood pressure-lowering effects observed in trials [7]. The reduction in weight and blood pressure seen in clinical trials are likely to be clinically meaningful and are potentially useful additional benefits of these agents.

Three SGLT2 inhibitors are available in both Europe and the USA, namely, canagliflozin, dapagliflozin and empagliflozin. SGLT2 inhibitors have been found to provide excellent glycaemic improvement in randomised controlled trials when used as a single agent [8] and when used in combination with metformin [3, 4, 9], sulfonylureas [10, 11] or insulin [12, 13]. However, clinical trials are performed on selected patient groups and, therefore, the trial results may not be fully replicated in ‘real world’ clinical practice. In this context, The Association of British Clinical Diabetologists nationwide exenatide audit demonstrated a differing real world efficacy profile to that reported by clinical trials [14]. Whilst early data with dapagliflozin (the first drug available in the SGLT2 inhibitor class) has demonstrated good efficacy in the real world [15] additional analyses on a larger scale are still needed.

Recent findings from the first cardiovascular safety trial in SGLT2 inhibitors to be completed, the BI 10773 (Empagliflozin) Cardiovascular Outcome Event Trial in Type 2 Diabetes Mellitus Patients (EMPA-REG OUTCOME) trial demonstrated reduced cardiovascular outcomes in people with high cardiovascular risk (Table 1) who were treated with empagliflozin [16]. How these findings can be extrapolated into real world clinical practice remains unclear. Other cardiovascular safety trials are still ongoing and will help to answer some of these questions as their inclusion criteria are somewhat broader than those of the EMPA-REG trial, with a lower cardiovascular risk population included. In the meantime, a comparison of the characteristics of people treated with SGLT2 inhibitors in clinical practice with those of the EMPA-REG trial is important to enable an understanding of how these important trial results can be applied to clinical practice. A measure of the proportion of people with T2DM, to whom these trial criteria apply in clinical practice, is vitally important to enable correct interpretation of the results.

Table 1 A summary of the major cardiovascular safety trials in sodium glucose co-transporter 2 inhibitors

Aim and Methods

The study will be a cross-sectional analysis of all people with T2DM included in the Royal College of General Practitioners Research (RCGP) and Surveillance Centre (RSC) database to identify people initiated on SGLT2 inhibitors and to describe their cardiovascular risk profile. The proportion of people who have a similar cardiovascular risk profile to those included in the EMPA-REG trial will also be reported.


The aim of this study will be to compare the clinical characteristics of people initiated on SGLT2 inhibitors with those of people included in the EMPA-REG OUTCOMES trial.

Primary Objectives

  1. 1.

    To identify how many people initiated on an SGLT2 inhibitor in clinical practice meet the inclusion criteria for the EMP-REG trial.

  2. 2.

    To provide a breakdown of this proportion by:

    1. (a)

      the number of people with each inclusion criteria for EMPA-REG,

    2. (b)

      by duration of diabetes,

    3. (c)

      by number of concurrent diabetes agents and presence or absence of insulin use.

  3. 3.

    To describe the clinical characteristics (age, gender distribution, weight, blood pressure, renal function and time since diagnosis) of people in the each of the above groups.

Secondary Objectives

  1. 4.

    To also identify how many people initiated on each SGLT2 inhibitor separately (canagliflozin, dapagliflozin, and empagliflozin) meet the inclusion criteria for the EMPA-REG trial, comparing each SGLT2 inhibitor as a subgroup analysis—providing sufficient numbers are available.

  2. 5.

    To identify how many people in clinical practice, in the entire cohort, meet the inclusion criteria for the EMP-REG trial.

  3. 6.

    To provide a breakdown of this proportion by:

    1. (a)

      the number of people with each inclusion criteria for EMPA-REG,

    2. (b)

      by duration of diabetes,

    3. (c)

      by number of concurrent diabetes agents and presence or absence of insulin use.

  4. 7.

    To describe the clinical characteristics (age, gender distribution, weight, blood pressure, renal function and time since diagnosis) of people in the each of the above groups.

Data Source

Routinely collected English general practice data will be used to perform the study. These data are suitable for this type of analysis for several reasons [17]. Firstly, English general practice is a registration-based system—people have to register with a single general practitioner (GP), and all individuals have a unique national patient identifier, namely, the National Health Service (NHS) number, which facilitates linking data on patient movement from one general practice to another as well as deaths. This number makes the population denominator reliable and valid. The NHS number also helps facilitate data linkage, pathology results for example, to the correct record. Secondly, many GPs computerised in the 1990s, with most prescribing being carried out using computerised records. Coding of chronic disease data and laboratory links become nearly universal from around 2004. Although not widely used internationally, the UK uses the Read code system, an extensive coding system which allows the detailed coding of diagnosis, symptoms, signs, investigations, therapy and health service management [18]. Diabetes data are particularly well recorded, although care is needed to accurately find cases and to differentiate between the different ways data are recorded on the different computerised medical record systems used by GPs [19, 20]. Sufficient details on prescription data are available to facilitate the study of the use and persistence of medicine in real world therapy [21].

The RCGP RSC database includes the primary care records from 128 primary care practices distributed across England (1.7% of all practices) and provides a broadly representative population sample [22]. All included data are recorded using the Read code 5-byte version 2 coding hierarchy. The coded data include comprehensive diagnosis and treatment information, prescriptions and laboratory data. Inclusion of data recording UK primary care pay-for-performance targets have led to a high level of data completeness in these records, particularly in the population of people with T2DM [23].

We will use data from all the included primary care practices collated after January 1, 2016 and will include all patients with a diagnosis of T2DM who are older than 18 years on or before this date. In those with T2DM we will identify all those people initiated on SGLT2 inhibitors (canagliflozin, dapagliflozin, or empagliflozin) at any time before January 1, 2016. For those with T2DM and a prescription for SGLT2 inhibitor we will report their clinical characteristics and the proportion of people with a cardiovascular risk similar to that of the EMPA-REG trial population.

Data Analysis

Our aim is to define and make explicit our approach to using real world data to create real world evidence [24, 25]. To this end, we use a two-stage informatics ontology-based process to identify people with T2DM [21]. This is a concept-based approach to case and outcome identification [26]. In brief, the two stages are the identification of all people with diabetes (stage 1) and then categorisation by diabetes type (stage 2). People with diabetes are identified using one or more of the following: (1) a diabetes diagnosis code, (2) glucose and glycated hemoglobin (HbA1c) test results (two or more consistent with diabetes) or (3) the use of diabetes therapies (excluding metformin). People are classified by diabetes type based on their medication usage history, diabetes type-specific diagnosis codes and other key clinical characteristics (these include age at diagnosis, duration of oral medication use, and body mass index at diagnosis).

The high cardiovascular risk inclusion criteria for the EMPA-REG study are given in Table 2. We will identify people with each of these cardiovascular risk factors using our ontological process to identify the nearest matching clinical diagnostic codes, or other codes which identified the presence of the risk factor. We will include the full description of this process and the final list of codes generated in the final manuscript.

Table 2 A comparison of the inclusion criteria of the EMPA-REG OUTCOMES trial with the nearest match available from routine UK primary care data

Statistical Methods

Standard descriptive statistics (mean, standard deviations etc.) will be used to describe the characteristics of the various populations. This will include describing baseline HbA1c at initiation of therapy with SGLT2 inhibitor. We will report the crude rates of each outcome measure for each cohort, as well as the proportion of people with each outcome of interest together with 95% confidence intervals.

Strengths and Limitations

The strengths of the dataset have been broadly eluded to in the "Data Source" section. The large-scale (>1.2 million records) of this very complete real world evidence dataset is a particular strength. Despite the large size of the dataset only a minor of people with T2DM have been initiated on SGLT2 inhibitors to date and therefore subgroup analyses may not have sufficient power to resolve differences between groups.

An additional limitation is the potential for missing data on EMPA-REG inclusion criteria in the primary care record. A number of patients may have the condition of interest without documentation of this in the primary care record. However, as all the EMPA-REG inclusion criteria are major cardiovascular risk factors and the monitoring and recording of these risk factors are part of current primary care pay for performance targets in the UK (Quality and Outcome Framework), this effect is likely to be small. Any additional limitations identified during the conduct of the study will be discussed in the final study manuscript.

Compliance with Ethics Guidelines

All data to be used have been anonymised at the point of data extraction. No clinically identifiable information will be made available to researchers or in any publications. The study has been tested against the Health Research Authority (HRA)/Medical Research Council (MRC) “is this research” tool ( and is considered to be an audit of current practice when compared to best available evidence. The study therefore does not require specific ethical approval. Approval for this work has been granted by the RCGP RSC study approval committee.


This real world evidence cross-sectional analysis will demonstrate which proportion of people with T2DM, in an unselected population, are likely to benefit from the cardiovascular protection demonstrated in the EMPA-REG trial. These data will provide clinicians with valuable insights into the best applications for these important trial data.