The Malmö Diet and Cancer Study
The Malmö Diet and Cancer Study (MDCS) is a population-based prospective cohort study established between 1991 and 1996 in Sweden. Detailed descriptions of the cohort and representability have been published previously [18]. Between October 1991 and February 1994, every other MDCS participant was invited to join a sub-study on cardiovascular disease risk (MDCS-CC; N = 6103) [19]. The population samples used for analyses are shown in Fig. 1. Baseline blood samples were available from 4242 participants, of whom 220 individuals had missing data for covariates, leaving 4022 participants for descriptive analysis (mean age 57.6 ± 6.0 years; 2355 women and 1667 men). The MDCS was approved by the Ethics Committee at Lund University (Malmö, Sweden; LU 51–90, LU 204–00 and Dnr. 469/2006), and written, informed consent was obtained from all participants in accordance with the Declaration of Helsinki.
MDCS-CC baseline measurements
Detailed descriptions of clinical characteristics and standard anthropometric and blood-based measurements can be found in the electronic supplementary material (ESM) Methods. Participants completed an extensive baseline questionnaire including questions on lifestyle, socioeconomic factors and medical history. Direct measurements included height (cm) and weight (kg), which were used to calculate BMI (kg/m2). Blood pressure (mmHg) was measured after 5 min of supine rest. Blood samples were donated at baseline after an overnight fast. Plasma creatinine (μmol/l) levels were measured and analysed with the Jaffé method [20]. eGFR was calculated based on the previously reported Chronic Kidney Disease Epidemiology Collaboration 2009 creatinine-based equation [21]. A factor of 0.0113 was included to convert creatinine levels measured in μmol/l into mg/dl.
Galectin-1 measurements
Galectin-1 was measured in human sera sampled at the study baseline with the Human Galectin-1 Quantikine ELISA Kit (R&D Systems, MN, USA) according to the manufacturer’s instructions. Intra-assay and inter-assay coefficients of variation were 7.1% and 9.5%, respectively.
Incidence of CKD, type 2 diabetes and secondary outcomes
Incidence of CKD was defined as having an eGFR <60 ml min−1 [1.73 m]−2 at the follow-up re-examination between 2007 and 2012 (mean follow-up of 16.6 ± 1.5 years) [22]. Diabetes status at baseline and during follow-up was ascertained through linkage to regional and national registries until 31 December 2014 and through the baseline screening (mean follow-up of 18.4 ± 6.1 years). Secondary outcomes included coronary artery disease and all-cause and cause-specific mortality. A more detailed description of the ascertainment of all outcomes is found in the ESM Methods: ‘Type 2 diabetes ascertainment’ and ‘Secondary outcomes’.
Genetic analyses and selection of instrumental variables
Genotyping was performed using Illumina HumanOmniExpress BeadChip v. 1 (Illumina, CA, USA) at Broad Institute, MA, USA, and the dataset was imputed to the 1000 Genomes reference panel (phase 1, version 3) after standard quality control procedures (for more details, see ESM Methods ‘Genotyping quality control’). Galectin-1 levels were natural log (loge)-normalised and adjusted for age, age2, sex and the first three principal components of ancestry from multi-dimensional scaling in a linear regression model. Age2 adjustment was included to account for non-linear age effects. The residuals were rank inverse normalised and used as the phenotype for association testing. PLINK (version 1.9; http://pngu.mgh.harvard.edu/~purcell/plink/, available from 15 May 2014) was used to fit linear regression models using an additive genetic model [23]. Results are presented using a Manhattan plot and a locus zoom plot of the galectin-1 gene region (LGALS1) (Fig. 2). The Q–Q plot and the genomic inflation factor (λ) were used to assess goodness of fit. The galectin-1 gene region was defined using Ensemble BioMart (GRCh37 Build). SNPs within 300 kb of the galectin-1 gene (LGALS1) were assessed for inclusion as genetic instruments. The sentinel SNP (rs7285699) was defined as the SNP with the lowest p value at the genome-wide significant locus. Two additional independent variants were identified using a stepwise conditional analysis with a conditional p value threshold of 0.01 (see ESM Methods ‘Construction of multi-SNP instrument’). Details of the variants included as genetic instruments can be found in ESM Table 1.
The All New Diabetics In Scania (ANDIS) study
ANDIS is an ongoing study that aims to include all individuals with newly diagnosed diabetes mellitus in the south of Sweden (the Scania region). Using this study, it was recently suggested that diabetes mellitus can be clustered into five subgroups. One of the identified clusters corresponded to the classical type 1 diabetes phenotype, and individuals with type 2 diabetes could be further sub-stratified into four clusters: severe insulin-deficient diabetes (SIDD); mild obesity-related diabetes (MOD); mild age-related diabetes (MARD); and SIRD [3]. Between 1 February 2007 and 30 September 2016, 9367 participants were enrolled and had eGFR and genome-wide association study (GWAS) genotyping data measured. In this group, 1116 participants were clustered as SIRD. Within each of the four type 2 diabetes clusters, 44 participants had data on plasma galectin-1 levels (arbitrary units [AU]) quantified using the Olink Target 96 Immuno-Oncology proximity extension assay (Olink Proteomics, Sweden) and analysed at SciLife Laboratory (Uppsala, Sweden). The mean levels of galectin-1 across the four type 2 diabetes clusters were examined, and differences were tested using ANOVA and linear regression with adjustment for age, sex and BMI. Further, the association between galectin-1 and eGFR at the time of diabetes diagnosis was assessed using linear regression, adjusting for age, sex, BMI, HOMA2-IR (calculator downloaded 10 December 2016 [24]), HOMA2-B and HbA1c (all measured at time of diagnosis). The sentinel SNP from the galectin-1 GWAS in the MDCS-CC (rs7285699) was examined in relation to eGFR among all individuals with type 2 diabetes and within the specific clusters.
Statistical analyses
We examined the baseline characteristics of participants across quartiles of baseline galectin-1 levels. A linear regression model was used to test differences in galectin-1 levels by baseline characteristics, adjusting for age, sex and BMI. Longitudinal change in eGFR (absolute change and annual relative change) per SD increase in galectin-1 was examined using a linear regression model. The risk of incident CKD at the re-examination was assessed using a Cox regression model with follow-up time until the date of re-examination as the timescale. Longitudinal associations of baseline galectin-1 levels in relation to incidence of type 2 diabetes as well as secondary outcomes were examined using Cox proportional hazards regression with follow-up time until 31 December 2014 as the timescale. We estimated HRs and 95% CIs per quartile of galectin-1 using the first quartile as reference, as well as the HR per SD increase in galectin-1 levels. No deviations from linearity were detected based on fitting restricted cubic splines and testing for non-linearity using the likelihood ratio test. The proportional hazards assumption was fulfilled for all presented models based on the Schoenfeld residuals test. All covariates were selected a priori and included established risk factors for CKD and type 2 diabetes, respectively. To assess the discriminative usefulness of galectin-1 in addition to established risk factors, we calculated Harell’s concordance index and category-free net reclassification improvement (cNRI). Differences between models with and without galectin-1 were assessed using the likelihood ratio test.
We performed two-sample MR by using previously published summary statistics of genetic associations with outcomes. Genetic associations with CKD and eGFR were obtained from the Chronic Kidney Disease Genetics (CKDGen) Consortium meta-analysis (41,395 cases and 439,303 control participants with European ancestry) [25]. Further, we examined the association between galectin-1 and eGFR stratified by diabetes mellitus status in the CKDGen Consortium study based on summary statistics from Pattaro et al [26]. Genetic associations with type 2 diabetes were obtained by including summary statistics from the DIAbetes Genetics Replication And Meta-analysis (DIAGRAM) consortium [27]. The meta-analysis included 32 GWAS of type 2 diabetes (cases = 74,124, control participants = 824,006 with European ancestry). We used the BMI-unadjusted results to avoid collider bias. The causal effects of galectin-1 on the examined outcomes were assessed using the sentinel SNP (rs7285699) as the instrumental variable and estimated using the Wald ratio method. For CKD and type 2 diabetes, we further estimated causal effects using the fixed effect inverse variance-weighted (IVW) method, including three variants associated with galectin-1 at p < 0.01 after stepwise conditional and joint analysis.
All tests were two-sided, and statistical significance level was set at p < 0.025 after Bonferroni correction to account for the two main outcomes examined. All analyses were performed using Stata version 14.2 (TX, USA), SPSS Statistics version 25 (IBM, New York, NY, USA) and the statistical program R version 3.6.0 (R Foundation for Statistical Computing, Vienna, Austria; www.R-project.org, available from 26 April 2019). MR analyses were performed using the ‘TwoSample MR’ package [28].