A novel rare CUBN variant and three additional genes identified in Europeans with and without diabetes: results from an exome-wide association study of albuminuria

Aims/hypothesis Identifying rare coding variants associated with albuminuria may open new avenues for preventing chronic kidney disease and end-stage renal disease, which are highly prevalent in individuals with diabetes. Efforts to identify genetic susceptibility variants for albuminuria have so far been limited, with the majority of studies focusing on common variants. Methods We performed an exome-wide association study to identify coding variants in a two-stage (discovery and replication) approach. Data from 33,985 individuals of European ancestry (15,872 with and 18,113 without diabetes) and 2605 Greenlanders were included. Results We identified a rare (minor allele frequency [MAF]: 0.8%) missense (A1690V) variant in CUBN (rs141640975, β = 0.27, p = 1.3 × 10−11) associated with albuminuria as a continuous measure in the combined European meta-analysis. The presence of each rare allele of the variant was associated with a 6.4% increase in albuminuria. The rare CUBN variant had an effect that was three times stronger in individuals with type 2 diabetes compared with those without (pinteraction = 7.0 × 10−4, β with diabetes = 0.69, β without diabetes = 0.20) in the discovery meta-analysis. Gene-aggregate tests based on rare and common variants identified three additional genes associated with albuminuria (HES1, CDC73 and GRM5) after multiple testing correction (pBonferroni < 2.7 × 10−6). Conclusions/interpretation The current study identifies a rare coding variant in the CUBN locus and other potential genes associated with albuminuria in individuals with and without diabetes. These genes have been implicated in renal and cardiovascular dysfunction. The findings provide new insights into the genetic architecture of albuminuria and highlight target genes and pathways for the prevention of diabetes-related kidney disease. Electronic supplementary material The online version of this article (10.1007/s00125-018-4783-z) contains peer-reviewed but unedited supplementary material, which is available to authorised users.

Study characteristics stratified by lead (discovery) SNP genotypes 4.
SNP-albuminuria associations adjusted for Systolic Blood Pressure 5.
Detailed results for the genes identified in the gene aggregate tests 7.
Association of CUBN rs141640975 with type 2 diabetes and kidney function (eGFR) 8.

ESM Figures
ESM Figure 1. Quantile-quantile (QQ) plot for the discovery stage albuminuria ExWAS ESM Figure 2. Bar plot of the CUBN rs141640975 genotypes versus mean albuminuria levels (mg/g) in the discovery set. ESM Figure 3. Bar plot of the KCNK5 rs10947789 genotypes versus mean albuminuria levels (mg/g) in the discovery set. ESM Figure 4. Bar plot of the LMX1B rs140177498 genotypes versus mean albuminuria levels (mg/g) in the discovery set.  IMI-SUMMIT Consortia participants list I.

Study Description Study Populations
Description of the Danish study samples applied in primary association analysis (discovery phase). Clinical and biochemical characteristics for all study sample groups are shown in Main Table 1. All individuals participating in the discovery studies were of Danish nationality. Informed written consent was obtained from all study participants. The studies were conducted in accordance with the Declaration of Helsinki II and were approved by the local Ethical Committees.

ESM Methods 1.1 Discovery Phase
ADDITION DENMARK STUDY [1,2] Addition-DK is the Danish arm of the ADDITION-Europe Study (Anglo-Danish-Dutch Study of Intensive Treatment in People with Screen-Detected Diabetes in Primary Care), which is an intervention study based on screening for individuals with high-risk of type 2 diabetes. The sampling is performed at the Department of General Practice, University of Aarhus, Denmark (ClinicalTrials.gov ID-no: NCT00237549). A total of 8,662 individuals from the initial screening cohort had DNA available.
A total of 2,013 individuals with genotype and phenotype data participated in the current study where 1,643 were diabetes cases (screen-detected and untreated type 2 diabetes) while 370 were without diabetes.

HEALTH-2006 [3]
Health2006 is a general population-based cohort, designed to investigate lifestyle-related chronic diseases including diabetes and coronary heart disease. The participants were selected randomly from the South western part of greater Copenhagen area with age group ranging 18-69 years and a participation rate of 44.7% (N=3,471 individuals participating). Health2006 was conducted at the Research Centre for Prevention and Health in Glostrup, Denmark.
In the current study 2,658 individuals with genotype and phenotype data and with no diabetes participated.

HEALTH-2008 [4]
Health2008 is a general population-based cohort, designed to investigate lifestyle-related chronic diseases including diabetes and coronary heart disease. The participants were selected randomly from the South Western part of greater Copenhagen area with age group ranging 30-60 years and a participation rate of 35.8% (N=795 individuals participating). Health2008 was conducted at the Research Centre for Prevention and Health in Glostrup, Denmark.
In the current study 642 individuals with genotype and phenotype data and no diabetes, participated.
INTER99 [5,6] The Inter99 cohort is a randomized, non-pharmacological intervention study for the prevention of ischaemic heart disease, conducted on 6,784 randomly ascertained participants aged 30 to 60 years at the Research Centre for Prevention and Health in Glostrup, Denmark (ClinicalTrials.gov: NCT00289237). Detailed characteristics of Inter99 have been published previously.
In the current study 5,971 individuals with genotype and phenotype data participated. 311 had diabetes whereas 5,660 had no diabetes (normal glucose tolerance).
VEJLE BIOBANK [7] Vejle Biobank situated at Vejle Hospital comprises of clinical-onset type 2 diabetes cases and non-diabetic control individuals with matching age and sex distribution that have been examined during a three year period. Control individuals were confirmed of non-diabetic status when they self-reported and also through a fasting plasma glucose test fulfilling WHO 1999 criteria.
In the current study 1,942 type 2 diabetes cases with genotype and phenotype data participated.

ESM Methods 1.2 Replication Phase
DANFUND [8] The GENESIS/GENEDIAB [9,10] This is a French study comprising individuals of European ethnicity with type 1 diabetes and have been called GENEDIAB (genes nephropathy and diabetes) and GENESIS (genes nephropathy and sib pair study) based on some sub classifications (diabetes duration).
All type1diabetes patients attending diabetes clinics were invited to be included in the GENEDIAB study, provided they had severe diabetic retinopathy (proliferative or severe non-proliferative), regardless of their nephropathy status. All type 1 diabetes patients with retinopathy and diabetes duration longer than 15 years were included in the GENESIS study.
Diabetes classification was made using American Diabetes Association diagnostic criteria, as defined in 1997. Type 1 diabetes was defined as age at diabetes onset before 35 years. 1,249 individuals with albuminuria measures and genotype data participated in the current study.
The Malmo Diet and Cancer Study (MDCS) [11,12] MDCS is a population-based cohort study in the city of Malmö, Sweden. All women and men (n=74,138) born 1923-1950, and 1923-1945, respectively, were invited to participate in the baseline examination between 1991 and 1996. Mental incapacity and inadequate Swedish language skills were the only exclusion criteria. The total participation rate was 40.8%. A detailed description of the cohort has been previously published [11]. The study was approved by the ethical committee at the Lund University (LU 51-90). A written informed consent was provided by all the participants.
For the current study we included participants that were part of the 6,103 randomly selected individuals who underwent additional phenotyping during 1991-1994 as part of the MDCS-Cardiovascular Cohort (MDCS-CC), designed to study epidemiology of carotid artery. In total 3,734 MDCS-CC participants, of those that were alive and had not emigrated from Sweden (n=4924), attended a follow-up re-examination between 2007 and 2012, which has been described previously [12]. Urinary albumin-creatinine concentration was quantified from overnight urine using Cobas at the Department of Clinical Chemistry. We included 2,641 participants, who had complete data on urinary albumin-creatinine concentration at follow-up re-examination and with nonmissing genotype data. Of these, in total 547 had a history of either a prevalent or incident diabetes at followup re-examination.

SUMMIT Consortia Studies
As part of preliminary screening for a multicentre, randomized controlled trial of statins/ACE inhibitors, we measured albumin-creatinine ratio (ACR) in 6 early morning urine samples from 3,353 adolescents (10-16 years) and calculated tertiles based on an established algorithm. From those subjects deemed to be at higher risk (upper ACR tertile) we recruited 400 into the intervention study (Trial cohort). At baseline vascular measurements (carotid intima-media thickness, pulse wave velocity (PWV), flow-mediated dilatation, digital pulse amplitude tonometry), renal (symmetric dimethylarginine, cystatin C, creatinine) and CVD markers (lipids and apolipoproteins (Apo)A-1 and B, C-Reactive Protein, asymmetric dimethylarginine) were assessed. The trial cohort were genotyped on the Illumina omniexpress SNP genotyping array as part of the SUMMIT study of renal complications in subjects with diabetes[13].

EURODIAB [13]
A total of 3250 men and women with type 1 diabetes were recruited from 31 centers in 16 European countries, and were aged between 15 and 60 at the baseline investigation phase (1989 to 1991). The diagnosis of type 1 diabetes was a clinical one; diagnosis had to have occurred in the patient prior to age 36 and the patient had a continuous need for insulin within a year of diagnosis. Re-examination occurred on average of between six and eight years after baseline investigations. The trial was designed to investigate the causes of complications in subjects with type 1 diabetes. Renal phenotypes were defined using the criteria detailed in van Zuydam et al., 2018[13].

FINNDIANE [14-16]
The Finnish Diabetic Nephropathy (FinnDiane) Study is an ongoing nationwide multicenter study of individuals with type 1 diabetes, with the aim to identify clinical, environmental and genetic risk factors for diabetic complications and diabetic nephropathy in special. The study was established in 1997, and new individuals with type 1 diabetes continue to be added to the study; currently, more than 5000 individuals are included in the study. The study protocol has been previously described [14]. At their baseline visit, all patients went through a comprehensive clinical examination and the patients' attending physician filled the standardized questionnaires regarding health and medical history. Patients gave also urine and blood samples used to measure e.g. albumin excretion rate (AER). A subset of patients came also for one or more follow-up visits with similar setting. Further information on clinical events has been retrieved from the national hospital discharge register (National Care Register of Health Care). In addition, over 2000 Finnish individuals with type 1 diabetes were added to the FinnDiane study through collaboration with the National Institute of Health and Welfare (Finland). For these, health related information was retrieved from the patients' medical records as well as from the national registries.
The genome-wide genotyping of 3651 FinnDiane individuals was performed as previously described [15,16]. Following the common analysis plan for the SUMMIT cohorts, AER was measured from 24-hour urine collections, and taken from the latest visit, but excluding visits after ESRD (defined as dialysis or Transplant or eGFR<15). If multiple 24-hour AER measurements were available within one year, the geometric mean of these values was calculated to reduce the effect of day-to-day variability. The AER values were logtransformed. Only patients with a minimum diabetes duration of 10 years were included in the analysis. A total of 2840 individuals fulfilled the inclusion criteria, had required genetic and phenotypic data available, and were included in the analysis. Analysis was adjusted for use of anti-hypertensive medication, sex, age at diabetes onset, diabetes duration, and principal components.
GODARTS [17][18][19] The Genetics of Diabetes and Audit Research in Tayside Scotland (GoDARTS 1 and GoDARTS 2) is a case controls study of type 2 diabetes that includes ~10,000 subjects with diabetes and ~6000 subjects with no history of diabetes. A full description of study recruitment is given in Morris et al., 1997[17]. Subjects with T2D included in the study of kidney related phenotypes were either genotyped on the Affymetrix 6.0 SNP array or the Illumina Omniexpress array. All subjects were linked to electronic medical records including hospital admission and blood biochemistry. These data were used to derive renal phenotypes as described in van Zuydam et al. 2018 [13].

BENEDICT (PHASE A and B) [20, 21]
The multicenter double-blind, randomized Bergamo Nephrologic Diabetes Complications Trial-A (BENEDICT) was designed to assess whether angiotensin-converting-enzyme inhibitors and nondihydropyridine calcium-channel blockers, alone or in combination, prevent microalbuminuria in subjects with hypertension, type 2 diabetes mellitus, and normal urinary albumin excretion. We studied 1204 subjects, who were randomly assigned to receive at least three years of treatment with trandolapril (at a dose of 2 mg per day) plus verapamil (sustained-release formulation, 180 mg per day), trandolapril alone (2 mg per day), verapamil alone (sustained-release formulation, 240 mg per day), or placebo. The target blood pressure was 120/80 mm Hg. The primary end point was the development of persistent microalbuminuria (overnight albumin excretion, > or =20 microg per minute at two consecutive visits) [20].
SCANNIA DIABETES REGISTRY (SDR) [16,19] Patients in SDR were randomly collected from the Department of Endocrinology, Malmö Sweden and surrounding clinics in Skåne (Scania) Sweden 3. The total cohort included 7414 individuals with all types of diabetes. Diabetes classification into type 1 diabetes and type 2 diabetes was done based on presence of GAD antibodies and c-peptide levels, or in case of incomplete information, based on the diagnosis given by the treating physician. Patients were selected for genotyping based on presence of complications (kidney disease or retinopathy) or absence of complications in spite of more than 15/10 years duration of diabetes for T1D/T2D respectively [16]. Patients of known non-Scandinavian origin were excluded from the analysis.
These comprise a sample of clinical-onset type 2 diabetes patients and non-diabetic control individuals who were examined at the outpatient clinic at Steno Diabetes Center, Copenhagen. An OGTT in all control individuals confirmed the exclusion of individuals with unknown diabetes or states of pre-diabetes according to WHO 1999 criteria.

Replication in Greenlanders
GREENLANDERS [22] The The current study comprises a total of 2,605 individuals (IHIT=2,519; B99=86) with complete genotype and albuminuria measurements available.

ESM Methods 1.3 Danish Exome Sequencing based SNP Selection
16, 340 coding SNPs selected from the Danish Exome sequencing study comprising 1000 T2D patients and 1000 matched controls were genotyped as part of the custom designed Illumina iSelect array [23,24]. SNPs based on the following 3 criterion were selected from the exome sequencing study: 1) annotated to the most likely deletiriou cateogries (nonsense, nonsynonymous, located in splice sites, and in untranslated regions, n=14,654); 2) nominally associated with T2D in the exome sequencing based association (n=995); 3) additional coding variants and lead SNP in 192 loci associated with common metabolic traits at genome wide significance (n=700) [23]. Genotype calling was performed using Gentrain 1.0 clustering algorithm based on a custom cluster file created using 1032 samples with high quality data selected from various batches as described [23]. SNP quality filters included minor allele frequency (MAF>0.01), genotype call rate >95% and Hardy -Weinberg equilibrium (P > 10 -7 ). Individuals who were related with inbreeding coefficients (>0.1 or < -0.1), or a low call rate (<95%), or mislabeled sex were removed. More details have been reported previously [23].

ESM Methods 1.4 Analyses Plan: Albuminuria Exome Chip based Exome Wide Association Study
Introduction AIM: This analysis plan aims to coordinate collection of summary statistics data of Exome Chip based analyses (Exome wide association study) for the trait "Urinary Albumin Creatinine Ratio/ACR" or "Urinary albumin excretion rate/AER" (mg/g or mg/24 hours) as a continuous variable.
We aim to carry out replication analyses of the provided list of SNPs (Exome Chip based) Please contact Tarunveer S. Ahluwalia (veertarun@gmail.com or Tarun.veer.singh.ahluwalia@regionh.dk) if you have any questions regarding trait definitions or analyses implementation.

General guidelines for analyses
Each individual study will perform data quality control (QC) and analysis and provide summary results for meta-analysis.

Exclusions
 All individuals less than 18 years of age  For twin studies/family data, we recommend that data from one individual from each twin pair/family (preferably with a higher genotyping call rate and preferably if the sample is a case) be used.
 We will refer to principal components (PCs), which are used to control for population stratification and other confounders. Here, we don't distinguish between PCs generated using software like smartPCA/EIGENSOFT or components calculated using multidimensional scaling (MDS) as implemented in PLINK. For your analyses, you should include either PCs or MDS components as covariates in the models. Principal components in studies with related individuals to be calculated in founders.
 Limit your dataset to individuals with valid data for phenotype, genotype, covariates, and PCs.

Trait Transformation
Take the the natural logarithm of ACR/AER use this as the outcome variable (ln_acr or ln_aer) *Models to be run, single-marker and gene-based:  NOTE: There has been some SNP filtering based on functionality performed at our center and therefore we will provide you with the same list of markers to be used for this analyses (which can be used to create a new binary file before proceeding to step 1 of METAskat)

Summary Data for single-marker analyses
An Excel Sheet asking for summary data will be provided to all participating cohorts where information on data summary, genotyping summary, author list, a brief study description and acknowledgements for each study would be required. This file can then be completed by each study group and uploaded on the server details provided for data upload as "Exome.descriptive.STUDYNAME.xls".   Study characteristics stratified by index (discovery) SNP genotypes 4.

Data upload format for single SNP analyses with BUILD
SNP-albuminuria associations adjusted for Systolic Blood Pressure
Detailed results for the genes identified in the gene aggregate tests 7.
Association of CUBN rs141640975 with type 2 diabetes and kidney function (eGFR) 8.
Known Albumin excretion rate measures were all performed in the clinical biochemistry laboratory at Ninewells Hospital, Dundee Cambridge Urinary albumin and creatinine were measured using the turbidometric/Jaffe Method.

BENEDICT STUDY PHASE A AND B
Albuminuria was measured from overnight urine collection.
Values below detection limit, or the 0 values ln(0) = -Inf. Thus, we have to convert the zero values to nonzero values. Suggestion: use a fixed value that is slightly smaller than the smallest value otherwise: 24h-AER: 0.1 mg/24h; nU-AER: 0.01 µg/min; ACR: 0.01 mg/mmol Or use the smallest value available in the data set or use the detection limit ESM