Study design
We used MR-BMA to identify which glycaemic trait(s) (fasting insulin, fasting glucose and/or HbA1c) gave the best model, overall and by sex. We used univariable or multivariable MR, as appropriate, to assess, in both directions, overall associations between the selected trait(s) and both CKD and eGFR in Chronic Kidney Disease Genetics (CKDGen) and the UK Biobank, and sex-specifically using UK Biobank individual data [18].
The role of glycaemic traits in CKD and kidney function
Genetic predictors for fasting insulin, fasting glucose and HbA1c in MR-BMA
We used ten SNPs (rs4846565, rs10195252, rs2943645, rs17036328, rs3822072, rs6822892, rs4865796, rs459193, rs2745353 and rs731839) for fasting insulin identified from a genome-wide association study (GWAS) conducted in 108,557 people of European ancestry (mean age 50.6 years; ~53% men), without diabetes [19], and tested for prediction of a comprehensive measure of insulin resistance (both euglycaemic-hyperinsulinaemic clamp- and OGTT-based measures) [20]. The genetic associations with fasting insulin (without adjustment for BMI) were obtained from the Meta-Analyses of Glucose and Insulin-related traits Consortium (MAGIC), where mean fasting insulin averaged ~56.4 pmol/l across studies [19]. We used 17 SNPs identified in a published GWAS of glycaemic traits from MAGIC [19] for fasting glucose (mean ~5.2 mmol/l). For HbA1c, we used 38 SNPs identified in a meta-analysis of GWAS of HbA1c in 159,940 people from 82 cohorts, 123,665 people of European ancestry (mean age ~53 years, ~48% men, mean HbA1c ~5.4%) [21]. The SNPs for each glycaemic trait were independent (r2 < 0.05 as seen previously [22]). To satisfy the MR assumptions, we dropped SNPs associated with potential confounders (Townsend index, smoking, alcohol drinking and physical activity) at genome-wide significance in the UK Biobank summary statistics, or in three comprehensive curated genotype to phenotype cross-reference systems, Ensembl (http://www.ensembl.org/index.html), the GWAS Catalog (https://www.ebi.ac.uk/gwas/) and PhenoScanner (www.phenoscanner.medschl.cam.ac.uk). The details are given in electronic supplementary material (ESM) Table 1 and ESM Table 2. For MR-BMA, we included all SNPs predicting different glycaemic traits (fasting insulin, fasting glucose or HbA1c), then removed duplicate SNPs and strongly correlated SNPs, using the ‘clump_data’ function, at a distance of 10,000 kb and r2 cut-off of 0.8, as seen previously [17]. For the remaining SNPs, we obtained their associations with fasting insulin, fasting glucose and HbA1c from the relevant GWAS. For unavailable SNPs, we used a proxy SNP (r2 ≥ 0.8).
Genetic predictors for sex-specific MR-BMA in CKD and eGFR
For sex-specific analysis of the ten SNPs predicting fasting insulin, we used those sex-specifically predicting fasting insulin in 47,806 men and 50,404 women of European ancestry without diabetes in MAGIC [23] after Bonferroni correction (p < 0.05/10 = 0.005), giving six SNPs each in men and women (ESM Table 1). Of the 17 SNPs predicting fasting glucose, we used those sex-specifically predicting fasting glucose in the UK Biobank (http://www.nealelab.is/blog/2019/9/16/biomarkers-gwas-results) and in 67,506 men and 73,089 women of European ancestry without diabetes in MAGIC, after Bonferroni correction (p < 0.05/17 = 0.003). Of the 38 SNPs predicting HbA1c, we used the SNPs reaching Bonferroni-corrected significance (p < 0.05/38 = 0.001) sex-specifically in the UK Biobank (ESM Table 1). We assessed the strength of these genetic instruments from the F-statistic, calculated using the square of SNP-exposure association divided by the square of its standard error [24]; SNPs with F-statistic >10 were selected.
Genetic associations with CKD and kidney function
We obtained overall associations with CKD and kidney function (eGFR) from CKDGen summary statistics and overall and sex-specific associations from UK Biobank individual data (application number 42468). Sex-specific associations are not available in CKDGen.
CKDGen is a large, trans-ancestry GWAS meta-analysis comprising 60 GWAS for CKD in 625,219 people [25], 480,698 of European ancestry (41,395 CKD cases), and 121 GWAS for eGFR in 765,348 people, 567,460 of European ancestry, with median age 54 years, 50% men and median eGFR 89 ml min−1 1.73 m−2 (IQR 81, 94) [25]. To avoid population stratification, we only included people of European ancestry. Genetic associations were obtained using logistic regression for CKD, and linear regression for log eGFR, controlling for age, sex, genetic principal components, relatedness and other study-specific characteristics as appropriate [25].
The UK Biobank is a large, ongoing, prospective cohort study, with median follow-up of 11.1 years [18]. The UK Biobank recruited 502,713 people (aged 40–69 years, mean age 56.5 years, 45.6% men) from Great Britain from 2006 to 2010, 94% of self-reported European ancestry. CKD events were obtained from a nurse-led interview at recruitment, and record linkage to all hospital admissions and deaths in the follow-up [18], as well as eGFR less than 60 ml min−1 1.73 m−2. eGFR was calculated from the Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) formula using serum creatinine [26], median 94 ml min−1 1.73 m−2 (IQR 86, 101). Genotyping was assessed using two similar arrays, i.e. the UK Biobank Lung Exome Variant Evaluation (UK BiLEVE) array and UK Biobank Axiom array. To control for population stratification, we only included participants of white British ancestry, based on self-report and genetic quality control. For quality control, we also excluded participants with (1) inconsistent self-reported and genotyped sex; (2) excess relatedness (>10 putative third-degree relatives); (3) an abnormal number of sex chromosomes; or (4) poor-quality genotyping based on heterozygosity and missingness rates. After quality control, 179,917 white British men (6016 CKD cases) and 212,079 white British women (5989 CKD cases) remained. We used logistic regression to obtain SNP-specific association with CKD, controlling for age, sex, 20 principal components, assay array, smoking and BMI. Smoking and BMI are shared risk factors for CKD and many chronic diseases; controlling for them partly addresses selection bias from inevitably only recruiting survivors of genotype and competing risk into the underlying GWAS [27]. In contrast, we did not control for blood pressure because it may mediate associations with CKD [28, 29]. Similarly, we used linear regression to assess genetic association with log-transformed eGFR.
The role of kidney function in glycaemic trait(s)
Genetic predictors for eGFR
To meet the MR assumptions, we used independent (r2 < 0.01) genetic predictors reaching genome-wide significance (p < 5 × 10−8) in the CKDGen GWAS meta-analysis [25]. We selected independent SNPs using the ‘clump_data’ function of MR-Base (http://www.mrbase.org/), and from checking the r2 between these selected genetic variants using LD-Link (https://ldlink.nci.nih.gov/) in Europeans.
Genetic associations with glycaemic trait(s)
We examined the role of eGFR in the glycaemic trait(s) identified by MR-BMA using genetic associations from the relevant MAGIC GWAS [19, 21], obtained using linear regression adjusted for age, sex, study site and geographic covariates in an additive genetic model [19]. For SNPs predicting eGFR not in MAGIC, proxy SNPs (r2 ≥ 0.8) were used.
Statistical analysis
Overall and sex-specific MR-BMA
We used MR-BMA, a novel approach extending multivariable MR, which essentially ranks different exposure combinations on model fit (Bayesian posterior probability) [17], to select between glycaemic traits overall and sex-specifically. We standardised genetic associations with glycaemic traits from their original units to effect sizes, for ease of comparison and model selection. Standard deviations for these glycaemic traits were calculated based on MAGIC GWAS [19, 21]. As previously described [30], we used a prior probability of 0.1 and a prior variance of 0.25. MR-BMA was also used to rank individual glycaemic traits on marginal inclusion probability (the sum of posterior probability out of all models where the glycaemic trait is present) [17]. We identified and excluded outliers, i.e. influential SNPs with very high heterogeneity (Cochran’s Q statistic >100) or SNPs with both large Cook’s D (above 0.1) and high heterogeneity (Cochran’s Q statistic >10), as previously described [30]. The analysis was repeated after removing outliers until no outlier was detected.
Overall and sex-specific associations of selected glycaemic trait(s) with CKD and eGFR
Based on MR-BMA model ranking, we used the top ranked model to assess the role of the selected glycaemic trait(s). We obtained SNP-specific Wald estimates (the genetic association with CKD or eGFR divided by the genetic association with the glycaemic trait[s]) in CKDGen, and then meta-analysed these estimates using inverse variance weighting (IVW) with multiplicative random effects. If a single glycaemic trait was selected, we used genetic associations in the original units (e.g. pmol/l for fasting insulin), rather than effect sizes, for ease of interpretation. To satisfy the MR assumptions, we checked and dropped SNP(s) associated with CKD or eGFR at genome-wide significance and detected as outliers, because the SNP(s) may be directly related to the outcome. Similarly, we conducted overall and sex-specific analyses in the UK Biobank, and meta-analysed the associations from both data sources. We used 0.05 as cut-off for significance because correction for multiple testing is more suitable for a hypothesis-generating study, such as a GWAS, than for a confirmatory study [31, 32]. We assessed differences between sex-specific estimates (log ORs for CKD and β-coefficients for eGFR) using a z-test, and then obtained the two-tailed p value [33].
In a sensitivity analysis, we used different methods with different assumptions, i.e. a weighted median [34] and Mendelian Randomisation Pleiotropy Residual Sum and Outlier (MR-PRESSO) with 10,000 simulations [35]. The weighted median is robust to invalid instruments and provides consistent estimation even when up to 50% of the weight is from invalid SNPs [34]. MR-PRESSO can detect and, if necessary, correct for potentially pleiotropic outliers [35], but assumes the indirect effect is independent of the direct effect. We did not use MR Egger because the limited number of SNPs means it is not very interpretable and is sensitive to outliers [36] and provides wider confidence intervals than these other methods. In a sensitivity analysis we also used Bonferroni-corrected significance (p < 0.05/2000 [number of phenotypes in the UK Biobank] = 2.5 × 10−5) as the cut-off for pleiotropy. If fasting insulin was the best predictor for CKD or eGFR, as hypothesised, we also checked the pattern of associations for SNP(s) in the IRS1 gene, which encodes the insulin receptor substrate 1 and is critical in insulin signalling and to the pathogenesis of type 2 diabetes [37].
Power calculations were conducted overall and by sex. The sample size needed for MR is approximately the sample size for the conventional observational study divided by the variance in the exposure explained by the SNPs [38]. Specifically, for binary outcomes, the required sample size was calculated based on the log OR, the ratio of cases to non-cases and the variance explained by the SNPs. For continuous outcomes, it was calculated based on the effect size and the variance explained by the SNPs.
Association of genetically predicted eGFR with selected glycaemic trait(s)
We obtained SNP-specific Wald estimates (the association of each eGFR-related SNP with the selected glycaemic trait[s] in MR-BMA divided by the genetic association with eGFR), and meta-analysed these estimates using IVW with multiplicative random effects [39]. We used a weighted median [34] and MR-PRESSO with 10,000 simulations [35].
All statistical analyses were conducted using R version 3.6.2 (R Foundation for Statistical Computing, Vienna, Austria) and the R packages ‘clump_data’ and ‘MendelianRandomization’.
Ethics approval
This research has been conducted using the UK Biobank resource under application number 42468 and publicly available data. No original data were collected for the MR study. Ethical approval for each of the studies included in the investigation can be found in the original publications (including informed consent from each participant).