FormalPara Key Summary Points

Several new injectable GLP-1RAs have been approved in different countries since 2017.

However, evidence on the comparative effectiveness of these products is absent.

The study aimed to examine whether GLP-1RAs have differential efficacy and safety profiles.

Not all GLP-1RAs have equal efficacy, but some of the new GLP-1RAs are not inferior to the old ones.

The comparative effectiveness profiles should be taken into account when treating patients with type 2 diabetes.

Digital Features

This article is published with digital features to facilitate understanding of the article. To view digital features for this article go to https://doi.org/10.6084/m9.figshare.13606940.

Introduction

In 2019, over 9% of the global adult population were living with diabetes, to which type 2 diabetes mellitus (T2DM) contributed 90% [1]. On top of that, the prevalence of T2DM is expected to increase over the next two decades [1]. Despite several encouraging advances in its therapies, T2DM remains a major public health and clinical challenge.

Current treatment options for T2DM primarily aim to restore glucose homeostasis [2]. Towards this goal, several drug classes have been developed to target various biological pathways that eventually lead to glycemic control [3]. Among these, glucagon-like peptide 1 receptor agonists (GLP-1RAs) are a class of biologicals that suppress glycemic level through several mechanisms including enhanced glucose-dependent insulin secretion, slowed gastric emptying, and decreased postprandial glucagon and food intake [4].

Evidence of comparative effectiveness and safety is crucial for the selection of appropriate clinical pathways. When direct comparisons based on head-to-head trials are absent, indirect comparisons can be conducted using network meta-analysis (NMA) [5]. Although indirect comparisons of the relative efficacy of GLP-1RAs have been previously documented in several NMAs [2, 4, 6, 7], the sufficiency of the evidence from these analyses is potentially undermined by a couple of real-world complexities. First, the GLP-1RA drug class has been continuously evolving [8], obsoleting NMA studies in recent years. For example, several new injectable GLP-1RAs have been approved in different countries since 2017 [9, 10]. Second, the portfolio of approved GLP-1RA entities varies substantially across healthcare systems. For example, loxenatide (also known as PEX168) has not been approved in the USA and semaglutide has not been approved in Australia as of January 2020. In particular, China, in which the population of patients with T2DM outnumbers any other country [1], has approved benaglutide and loxenatide that are not currently available elsewhere [10, 11]. To our knowledge, existing NMAs have not included these products. As such, there is a dearth of evidence on indirect comparison between GLP-1RAs to support pharmacological treatment choice for glycemic control among patients with T2DM in China, which necessitates a more thorough analysis that takes into account all approved GLP-1RAs. To fill this important evidentiary gap, the present analysis was conducted by focusing on GLP-1RAs approved in China as of June 30, 2019.

Methods

The report of the present study is in compliance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA-NMA) [12, 13]. This article is based on previously conducted studies and does not contain any new studies with human participants or animals performed by any of the authors.

Included Treatment Options

By June 30, 2019, six different GLP-1RA entities were approved in China, namely exenatide, liraglutide, benaglutide, lixisenatide, dulaglutide, and loxenatide [10]. Each of these entities was marketed with two or more regimens using different doses and administration frequencies. All regimens of the six GLP-1RAs that had eligible evidence in the literature were included in the present analysis and the comparison was conducted across different regimens.

Study Identification

A search strategy of potentially eligible clinical trials was prespecified for MEDLINE, EMBASE, and the Cochrane Central Register of Controlled Trials (CENTRAL) databases from inception to June 30, 2019. The terms and rules used for the search strategy are summarized in Table 1.

Table 1 Summary of search strategy

The exact combinations and criteria that were specified in each of the three databases are listed in Tables S1–S3. Studies were filtered using all three layers of conditions in Table 1. The types of included literature were research articles, short communication, and abstracts having the corresponding trial registrations affiliated. The search process was repeated using sentence-wise and uppercase spellings.

After dropping duplicate studies, two reviewers independently screened the title and abstracts of the search results for initial inclusion. A third reviewer would be involved had there been a discrepancy between the first two reviewers. The rest of the texts were examined for further inclusion in parallel by two groups, each of which consisted of two reviewers. Within-group discrepancies were resolved by a cross-group reviewer, and cross-group discrepancies were discussed by all reviewers to reach a unanimous decision.

The studies were considered eligible for inclusion if they (1) were randomized clinical trials; (2) were conducted on individuals with type 2 diabetes mellitus; (3) compared GLP-1RAs of interest with each other or with a control group (restricted to placebo or no treatment) with or without the same add-on therapy in all arms; (4) had a follow-up between 24 and 30 weeks; (5) contained results of at least one of the prespecified primary and secondary endpoints; and (6) were published in English language. The final list of included studies was manually supplemented with two studies on loxenatide (Table S4) [14,15,16].

Data Extraction

Each of four researchers extracted roughly a quarter of the included studies using a pre-specified template form. The assigned studies were then switched to another reviewer for duplicate data extraction. The data extracted were (1) study characteristics including authors, year of publication, journal or conference name, trial name, trial registration, sponsorship, type of sponsor, RCT design (e.g., two-arm vs. three-arm, double blind vs. triple blind vs. open label), follow-up period, and total sample size; (2) patient characteristics including age, sex, diabetes duration, and baseline values of outcomes; (3) treatment characteristics including preparation (drug, administration route, and dosage) and administration frequency, sample size of the arm, other medications in the regimen if any; and (4) outcome data.

Two researchers independently assessed the risk of bias of all included studies using the Cochrane Collaboration risk of bias tool over five domains including selection, performance, detection, attrition, and reporting [17]. Network maps were used to visualize the availability of information on pairwise comparisons [18].

Outcomes

The present study mainly focused on the glycemic control efficacy of GLP-1RAs. Therefore, the primary outcome of our analyses was change in hemoglobin A1c (HbA1c) from baseline to study endpoint. The secondary endpoints were proportions of patients achieving HbA1c < 7% and < 6.5%; changes in fasting plasma glucose concentration (FPG), postprandial plasma glucose at 2 h (PPG2h), body weight, systolic and diastolic blood pressure (SBP and DBP), serum lipid concentrations [total cholesterol (TC), high-density lipoprotein (HDL), low-density lipoprotein (LDL), and triglycerides (TG)]; and proportions of patients who reported hypoglycemia, severe hypoglycemia, and any adverse events (AE).

Data Analysis

To describe baseline characteristics, the weighted averages of age, weight, body mass index (BMI), and baseline HbA1c were calculated, as were their standard deviations. The description of the baseline characteristics was based on the studies involved in the analysis of the primary outcome.

For continuous outcomes, the mean value change of each arm was used along with its standard deviation (SD) and the sample size of the arm. When standard errors (SEs), least-square SEs or confidence intervals (CI) were reported, they were integrated with the sample size information to retrieve SDs. When the relevant uncertainty data were not available, the corresponding data of an arm with the same intervention and endpoint from a study of similar sample sizes were used to impute [4, 19]. In addition, two or more treatment arms were merged into one if they shared the same GLP-1RA entity and the same dosage but were delivered at different hours of the day.

The relative efficacy for each continuous outcome was represented by mean difference in the change of value from baseline to endpoint across treatment groups. The sample size and event count of each arm were used for comparison in the analyses of dichotomous outcomes, in which the relative efficacy was measured by odds ratio (OR). Frequentist NMA with random effects were performed for all outcomes. Along with that, I2 was still computed and reported to quantify heterogeneity across studies but not used to choose random-effects model versus fixed-effects model [20, 21]. The base-case analyses were based on consistency models [22]. Inconsistency in the analyses was tested using the “design-by-treatment interaction” approach [22]. In all analyses, placebo was used as the reference group. For the primary outcome, pairwise comparison was also conducted.

All analyses were conducted using Stata 15, R 3.6 (with the “getmc” package), and JAGS 4.2. Two-sided p values of 0.05 or less were considered statistically significant.

Sensitivity Analyses

Sensitivity analyses were conducted to test the robustness of the results to alternative inclusion criteria, study designs, and model specifications. “Design-by-treatment interaction” models were carried out for outcomes that were potentially subject to inconsistency. Also, two additional sets of sensitivity analyses were conducted for the primary outcome. First, Bayesian NMAs using a uniform prior distribution for the heterogeneity parameter and 5000 iterations were conducted for the primary outcome. Second, treatment arms that had total sample sizes of less than 100 were dropped in another set of sensitivity analysis.

Results

Literature Identification

The initial search using keywords resulted in 8396 items that required further screening. After 2408 duplicate items were removed, studies that did not meet the requirements of treatment, population, trial designs, outcomes, and study duration were also dropped. Eventually, there were 54 studies that were eligible for NMA. The flowchart of study identification is displayed in Fig. 1.

Fig. 1
figure 1

Flowchart of study identification and screening

Baseline Characteristics of Regimens

A total of 23,209 patients were included in these 54 studies, representing 18 different regimens for the six GLP-1RAs covered by the eligible literature. The full list of the 54 studies is provided in Table S4. However, only 13 regimens used by 21,397 patients were covered in the analysis of the primary outcome, which were dulaglutide 0.75 mg QW, dulaglutide 1.5 mg QW, exenatide 5 µg twice daily (BID), exenatide 10 µg BID, exenatide 2 mg QW, liraglutide 0.3 mg QW, liraglutide 0.6 mg QW, liraglutide 0.9 mg QW, liraglutide 1.2 mg QW, liraglutide 1.8 mg QW, lixisenatide 20 µg once daily (QD), loxenatide 100 µg QW, and loxenatide 200 µg QW. The network plot of the studies on the primary outcome is illustrated in Fig. S1. Benaglutide was not investigated in any of the eligible literature and was therefore not included in subsequent analyses. The baseline characteristics of the regimens relevant to the primary outcome are summarized in Table 2. The number of trials ranged from one for liraglutide 0.3 mg QW to 16 for exenatide 10 µg BID, whereas the group sample size ranged from 64 for liraglutide 0.3 mg QW to 3327 for lixisenatide 20 µg QD. The majority of the studies were phase III clinical trials (76%). The weighted mean age varied across treatment groups (range 52.3–59.2 years), as did the weighted mean baseline HbA1c (range 7.8–8.6%). By contrast, the weighted mean body weight (range 68.0–101.1 kg) and the weighted mean BMI (range 25.1–35.9) presented some heterogeneity across groups. Each of several regimens were only investigated in one study that did not report the mean weight. Therefore, such regimens had missing average weight.

Table 2 Characteristics of trials by treatment in the analysis of the primary outcome

Risk of Bias

According to the results of the Cochrane Collaboration’s tool for assessing risk of bias, the overall bias was judged to be low risk, some concerns, and high risk in 26.6%, 68.4%, and 2.5% of the studies. Among the domains of bias, “some concerns” was most prevalent in the measurement of the outcome (54.4%). Selection of reported results was judged to be low risk for 91.1% of the studies.

Primary Outcome

Of the 54 studies, 50 examined the change in HbA1c as an endpoint. The base-case estimates of the efficacy of GLP-1RA regimens on reducing HbA1c are listed in Fig. 2. All GLP-1RA regimens except liraglutide 0.3 mg QW significantly lowered HbA1c after 24–30 weeks compared with placebo. Among these, liraglutide 1.8 mg QW had the strongest efficacy (− 1.19%; 95% CI − 1.40 to − 0.97%), followed closely by liraglutide 0.9 mg QW (− 1.09%, 95% CI − 1.59 to − 0.59%), dulaglutide 1.5 mg QW (− 1.09%; 95% CI − 1.41 to − 0.77%), and dulaglutide 0.75 mg QW (− 1.03%; 95% CI − 1.39 to − 0.68%). The effect of liraglutide 0.3 mg QW, liraglutide 0.6 mg QW, and liraglutide 1.2 mg QW on HbA1c change was − 0.36% (95% CI − 1.02 to 0.29%), − 0.73% (95% CI − 1.08 to − 0.38%), and − 0.64% (95% CI − 0.98 to − 0.31%) in relation to placebo, respectively. In addition, the magnitude of HbA1c-lowering effect associated with exenatide 5 µg BID, exenatide 10 µg BID, and exenatide 2 mg QW was − 0.56% (95% CI − 0.87 to − 0.25%), − 0.74% (95% CI − 0.95 to − 0.53%), and − 0.97% (95% CI − 1.31 to − 0.63%), respectively. Lixisenatide 20 µg QD lowered HbA1c by 0.55% (95% CI − 0.79 to − 0.31%). Finally, the efficacy estimates for loxenatide 100 µg QW and loxenatide 200 µg QW were − 0.82% (95% CI − 1.38 to − 0.27%) and − 0.96% (95% CI − 1.52 to − 0.40%).

Fig. 2
figure 2

Base-case analysis results of the effects of GLP-1RA regimens on HbA1c

The p values of the pairwise tests are shown in Table S5. The notable results are that dulaglutide 0.75 mg QW, dulaglutide 1.5 mg QW, exenatide 2 mg QW, liraglutide 0.9 mg QW, liraglutide 1.8 mg QW, loxenatide 100 µg QW, and loxenatide 200 µg QW were not significantly outperformed by any of the other regimens, leaving the remaining regimens inferior to at least one of the aforementioned GLP-1RAs. In particular, exenatide 5 µg BID and lixisenatide 20 µg QD had a significantly weaker HbA1c-lowering effect than at least four other regimens yet did not significantly outperform any competitors.

The I2 of this analysis was 92.7%, and the p value of the test of inconsistency was 0.000.

Secondary Outcomes

Thirty-nine studies reported the proportion of participants achieving HbA1c < 7%, covering 13 different regimens. All regimens in the analyses of the primary outcome except liraglutide 3 mg QW were included. The corresponding NMA results are displayed in Fig. 3. Compared with placebo, all regimens were statistically significantly more likely to reduce HbA1c to levels of 7% or lower. The I2 of this analysis was 71.9%, and the p value of the test of inconsistency was 0.001. Thirty-four studies reported the proportion of participants achieving HbA1c < 6.5%, representing 12 different regimens. The results of this set of analysis are given in Fig. 3. Compared with placebo, all regimens had statistically significantly higher odds of reducing HbA1c to levels of 6.5% or lower. The I2 of this analysis was 59.1%, and the p value of the test of inconsistency was 0.189.

Fig. 3
figure 3

Base-case analysis results of the effects of GLP-1RA regimens on reducing HbA1c to < 7%, reducing HbA1c to < 6.5%, FPG, and PPG2h. OR odds ratio, CI confidence interval, MD mean difference

Forty-three studies that included FPG as an endpoint were identified, covering the same 13 regimens as those of achieving HbA1c < 7%. The results are described in Fig. 3. All regimens significantly reduced FPG in relation to placebo. The I2 of this analysis was 60.2%, and the p value of the test of inconsistency was 0.000.

Twelve studies were included in the NMA of PPG2h that compared exenatide 2 mg QW, liraglutide 1.2 mg QW, liraglutide 1.8 mg QW, lixisenatide 20 µg QD, loxenatide 100 µg QW, and loxenatide 200 µg QW. Among these, only liraglutide 1.2 mg QW and loxenatide 100 µg QW did not significantly reduce PPG2h. The I2 of this analysis was 63.0%, and the p value of the test of inconsistency was 0.971. The results are displayed in Fig. 3.

Fifty-two studies investigated body weight as an endpoint, which allowed a comparison of 18 regimens. In addition to all the regimens included in the analysis of the primary outcome, the analysis of body weight covered five more dulaglutide doses which were dulaglutide 0.25 mg QW, dulaglutide 0.5 mg QW, dulaglutide 1 mg QW, dulaglutide 2 mg QW, and dulaglutide 3 mg QW. Nine of the 18 regimens significantly reduced body weight in relation to placebo, which are detailed in Fig. S2. The I2 of this analysis was 62.1%, and the p value of the test of inconsistency was 0.000.

The estimates of the effects on blood pressure and serum lipids are displayed in Fig. S2. While most GLP-1RAs significantly lowered SBP, they did not demonstrate equally strong effects on DBP and serum lipid outcomes.

There were 34 studies that documented the event counts of any report of hypoglycemia. The I2 of this analysis was 46.7%, and the p value of the test of inconsistency was 0.080. Dulaglutide 0.75 mg QW, liraglutide 0.9 mg QW, loxenatide 100 µg QW, and loxenatide 200 µg QW were the only regimens that were not associated with significantly higher odds of hypoglycemia. However, none of the regimens was associated with significantly higher odds of severe hypoglycemia based on the analysis of 34 studies on 12 regimens, the I2 of which was 0%. The results of any hypoglycemia and severe hypoglycemia are illustrated in Fig. S3.

Thirty studies provided information on the occurrence of any adverse event. Only dulaglutide 1.5 mg QW (OR 1.58, 95% CI 1.26–1.98), exenatide 10 µg BID (OR 1.47, 95% CI 1.21–1.78), liraglutide 1.8 mg QW (OR 1.42, 95% CI 1.21–1.68), and lixisenatide 20 µg QD (OR 1.25, 95% CI 1.06–1.47) were associated with significantly higher odds ratios of any adverse event. The I2 of this analysis was 0.9%, and the p value of the test of inconsistency was 0.669. The results related to adverse events are depicted in Fig. S3.

Sensitivity Analysis

According to the results of the “design-by-treatment” approach, the base-case analysis of HbA1c change was subject to inconsistency. The corresponding estimates of efficacy on reducing HbA1c were similar to those in the base-case analysis, except that (1) liraglutide 0.3 mg QW showed a significant HbA1c-lowering effect in the inconsistency model; and (2) lixisenatide 20 µg QD did not show a significant effect in the inconsistency model. Inconsistency was also significant in the analysis of three secondary outcomes. The results of the inconsistency estimates are illustrated in Fig. S4–S7. The Bayesian NMA of HbA1c generated estimates that were close to the frequentist estimates. The corresponding results are depicted in Fig. S8. Finally, removing the regimens of which the samples sizes were less than 100 had limited impact on the results of the primary outcome, which are displayed in Fig. S9.

Discussion

This NMA of 54 studies covering 23,209 individuals primarily compared the HbA1c-lowering efficacy across 13 GLP-1RA regimens in relation to placebo and with each other. As expected, most of the GLP-1RA regimens significantly reduced HbA1c in the base case. However, the HbA1c-lowering effect varied across the regimens, with dulaglutide 0.75 mg QW, dulaglutide 1.5 mg QW, exenatide 2 mg QW, liraglutide 0.9 mg QW, liraglutide 1.2 mg QW, liraglutide 1.8 mg QW, loxenatide 100 µg QW, and loxenatide 200 µg QW on the relatively superior end of the spectrum. Most of the regimens were also found to have glycemic control effects using several other blood glucose endpoints. The effects on weight loss, blood pressure, and lipid levels were mixed across regimens, and some of the differences were appreciable. For example, at least nine regimens did not show a statistically significant weight loss effect, although most of the same nine regimens had a significant HbA1c-lowering effect. However, weight loss is a key indicator of the effect of T2D management [23]. These results underscore the need for patient-centric prescription practice based on the most urgent medical need of the patients.

Benefits of treatments should be weighed against potential harms. Therefore, several safety outcomes were also analyzed in the present study. While most regimens were associated with higher risks of hypoglycemia, none of them was associated with a higher risk of severe hypoglycemia. Most of the regimens also had comparable safety profiles in relation to placebo.

No previous analyses have specifically focused on GLP-1RAs approved in China. Therefore, the indirect comparison evidence established through the present study provides clinically important evidence to physicians in China who treat over a quarter of the world’s diabetes population [1]. In addition, the present analysis included two regimens that were not analyzed in any previous NMAs of GLP-1RAs to our knowledge, which was likely related to the unique portfolio of GLP-1RAs in China. As such, at least part of the evidence on comparative effectiveness from the present study had not been highlighted previously. Such comparative evidence is critical for clinicians to make patient-centered treatment decisions that need to comprehensively take into account efficacy, safety, and costs to local patients.

Whereas part of the evidence from the present analysis is novel, part of it also confirmed some previous findings from the literature. For example, an NMA of GLP-1RAs by Zaccardi et al. estimated that dulaglutide 0.75 mg QW, dulaglutide 1.5 mg QW, and exenatide 2 mg QW reduced HbA1c by − 1.2%, − 1.4%, and − 1.3%, respectively [7]. The corresponding estimates are − 1.0%, − 1.1%, and − 1.0% in the present study, which are generally in line with the existing evidence. However, the present analysis suggested no statistically significant difference between dulaglutide 0.75 mg QW and dulaglutide 1.5 mg QW in terms of HbA1c reduction, yet the Zaccardi et al. study did find significant difference. This was probably attributable to both different studies included in analyses and inconsistent restrictions on the follow-up periods of eligible trials.

The findings of the study must be interpreted with caveats. On the basis of the descriptive statistics of the baseline characteristics as well as the heterogeneity tests, there might be some systematic difference across trials. Although frequentist and Bayesian RE models were used to analyze the data, they might not have fully accounted for the heterogeneity due to unbalanced patient characteristics across regimens such as body weight and BMI. Also, the values of the outcomes were dependent on their baseline levels, yet some of the studies did not report baseline-adjusted change. For example, some studies reported LS means of HbA1c change while others reported the means of simple pre–post difference. The latter would not have accounted for the baseline, but both were considered acceptable and were pooled together for evidence synthesis. In addition, data imputation was inevitable in the present study, which could undermine the validity of the results if not conducted appropriately. Unfortunately, the validity of the imputation could not be verified or tested. Moreover, the number of studies and the sample sizes of some regimens were limited. For example, data on liraglutide 0.9 mg QW were from two studies in Japan. Consequently, all indirect comparisons between liraglutide 0.9 mg QW and other regimens anchored on these studies, engendering potentially non-robust results. It is important to take into account such uncertainty when using the evidence. Notwithstanding these limitations, the present study provided useful insights into the comparative effectiveness of GLP-1RAs.

Future clinical and observational studies should try to establish evidence on additional attributes of the comparative effectiveness of GLP-1RAs. For example, the present study only examined endpoints between 24 and 30 weeks. Long-term outcomes may be equally important. The number of studies that examined GLP-1RA agents for a year or longer was much smaller than that of 24–30 weeks such that comparisons of outcomes in longer terms were unlikely to be robust. However, such comparisons should be conducted when the information in the literature becomes sufficient. Moreover, most of the clinical trial reports did not analyze the relative effectiveness by demographic subpopulations, which also limited the feasibility of NMAs by subgroups. Future studies should shed light on the modification effects of demographic parameters if possible.

Conclusion

Not all GLP-1RAs approved in China have equal glycemic control effects, nor do they have equal effects on other cardiometabolic indicators. Clinicians should choose therapies on the basis of the comparative effectiveness profiles of the agents as well as the need of patients.