figure a

Introduction

Gestational diabetes mellitus (GDM) is one of the most common complications of pregnancy, affecting approximately 7% (range 1–14%) of all pregnancies in the USA [1]. Globally the prevalence of GDM has increased by more than 30% in the past three decades, following worldwide trends of increasing obesity [2, 3]. GDM is associated with short- and long-term complications for women and their offspring. Compared with healthy women, those with GDM have an increased risk of gestational hypertension and pre-eclampsia during pregnancy, and a significantly higher risk of impaired glucose tolerance and type 2 diabetes after pregnancy [4, 5]. Moreover, offspring born to women with GDM are more likely to be obese [6, 7] and to have impaired glucose tolerance and diabetes in childhood and early adulthood [8, 9]. Several modifiable lifestyle factors have been identified for the prevention of GDM, including maintaining a normal body weight, eating a healthy diet, being physically active and abstaining from cigarette smoking [10,11,12,13,14,15].

Although compelling data suggest that genetic factors play a role in GDM [10, 16], relatively few studies have been published on the genetic susceptibility to GDM [17, 18]. Insulin resistance and defects in insulin secretion play a pivotal role in the development of GDM [19]. More than 53% of peripheral insulin sensitivity and 75% of the variation in insulin secretion can be explained by genetic components [20, 21]. Previous studies have shown that genetic variants of KCNJ11 [22], TCF7L2 [23, 24], KCNQ1 [25], MTNR1B [24] and IRS1 [24] are associated with GDM risk. However, these studies examined only a priori single or a small number of genetic variants. Only one genome-wide association study (GWAS) of GDM in an Asian population has been conducted, and two genetic variants, rs10830962 near MTNR1B and rs7754840 in CDKAL1, were identified [26]. However, this study was limited by a relatively small sample size (1399 women with GDM and 2025 control women) in a Korean population. Therefore, we conducted a comprehensive candidate gene analysis to identify genetic variants of GDM among 8722 white women (2636 with GDM and 6086 non-GDM control women) from the Nurses’ Health Study II (NHSII) and the Danish National Birth Cohort (DNBC). As insulin resistance and defects in insulin secretion play a central role in the pathogenesis of both GDM and type 2 diabetes, we genotyped 112 susceptibility variants that had been identified in previous GWASs of type 2 diabetes as candidate SNPs for GDM [27,28,29,30,31,32].

Methods

Study population

The current analysis used genotyping data from two sources: (1) genomic data from 6873 women nested within the existing NHSII, which is a US population; and (2) candidate genotyping in a sample of 1227 women from the DNBC [33]. All participants in the NHSII and DNBC gave informed consent to participate in the study.

The NHSII was established in 1989 and consists of 116,429 female registered nurses who were aged 25–42 years at baseline. Detailed questionnaire data were collected at baseline and every other year thereafter, and included medical history, lifestyle, usual diet and the occurrence of chronic diseases. In each biennial questionnaire through 2001, women were asked whether they were diagnosed as having GDM by a physician. In 2009, a questionnaire was administered to ascertain NHSII participants’ pregnancy and reproductive history. From 1996 to 2001, 29,611 NHSII participants aged 32–52 years provided blood samples. Among them, genome-wide data were available for participants of European ancestry within previous nested case–control studies of kidney stones, ovarian cancer, post-traumatic stress disorder, venous thromboembolism, endometriosis and breast cancer [34, 35]. Among all participants with genome-wide data, we restricted the current analysis to 5803 women with at least one pregnancy between 1989 and 2009, of whom 325 women reported a clinician diagnosis of GDM during pregnancy. Candidate genotyping was performed on DNA samples from an additional 1852 women with GDM collected as part of the Diabetes & Women’s Health (DWH) study during 2012–2016 [33]. A flow diagram of sample selection is shown in Fig. 1. As shown in the figure, the DWH study was part of the NHSII and of the DNBC. The study protocols were approved by the institutional review boards of Brigham and Women’s Hospital and the Harvard T. H. Chan School of Public Health. In a validation study among a subgroup of NHSII participants (n = 120), 94% of self-reported GDM events were confirmed by medical records [10, 36]. The majority of NHSII participants were screened for GDM during pregnancy. A supplemental questionnaire was sent to a random sample of parous women who did not report GDM (n = 114). Of these women, 83% reported undergoing a 50 g glucose screening test during pregnancy and 100% reported frequent prenatal urine glucose screening [7, 32].

Fig. 1
figure 1

A flow diagram of sample selection in the NHSII and DNBC. The characteristics of women with GDM and control participants included in the final analysis population were similar to those of their corresponding source populations

The DNBC (1996–2002) was a longitudinal cohort of 91,827 pregnant women in Denmark who were recruited during their first antenatal visit to a general practitioner [37]. All women living in Denmark who could speak Danish and were planning to carry to term were eligible for the study. Prospective data on maternal sociodemographics, lifestyle and environmental exposures, as well as clinical and perinatal conditions, were collected from the DNBC through four telephone interviews at gestational weeks 12 and 30, and at 6 and 18 months postpartum.

Of the 91,827 DNBC participants, 1274 were identified as having GDM. Among 90,553 women who did not have GDM, a random sample of 1457 women (control participants) were selected. For the current analysis, we identified 607 women with GDM and 620 control participants who participated in the DNBC clinical examination and provided bio-specimens as part of the DWH study (2012–2014) (Fig. 1). The study was approved by the Regional Scientific Ethical Committee (VEK) of the Capital Region of Denmark (record no. H-4-2013-129). Study procedures were followed in accordance with the Declaration of Helsinki.

The methods and procedures undertaken to ascertain GDM in the DNBC have been previously described in detail [37]. Briefly, in the DNBC, questions related to GDM were asked at gestational week 30 and at 6 months postpartum. Women who either self-reported GDM in the interviews or had a GDM diagnosis recorded in the National Patient Registry were considered as having GDM. Women who had a diabetes diagnosis recorded in the National Patient Registry prior to the index pregnancy were excluded. Medical records were retrieved for all women suspected of having GDM as well as the randomly selected control group, and a high sensitivity between self-reported GDM and medical records was found (96%). An expert panel developed criteria and guidelines for extracting the relevant data and for ascertaining GDM diagnoses according to WHO criteria [37, 38].

Genotyping

The genome-wide genotyping methods used by the NHSII have been described in detail elsewhere [39]. Genome-wide genotyping was conducted using high-density SNP marker platforms including Illumina (San Diego, CA, USA) HumanHap, Infinium (Natick, MA, USA) OncoArray and Infinium HumanCoreExome. Genotypes were imputed using the 1000 Genomes Project ALL Phase I Integrated Release v3 (www.internationalgenome.org) haplotypes excluding monomorphic and singleton sites (2010–2011 data freeze, 2012-03-14 haplotypes; http://csg.sph.umich.edu/abecasis/mach/download/1000G.2012-03-14.html) as the reference panel. SNPs for which Hardy–Weinberg equilibrium testing produced a p value of less than 1 × 10−6 were excluded. Most of the SNPs were genotyped (sample call rate = 97%) or had a high imputation quality score (r2 ≥ 0.8), as assessed with the use of MACH software (Cincinnati, OH, USA). Moreover, the effect allele frequency and imputation quality score of all SNPs genotyped in different platforms were similar (see electronic supplementary material [ESM] Table 1).

Among women with GDM whose data was collected as part of the DWH study (i.e. NHSII participants whose genome-wide data were unavailable and DNBC participants), genotyping was performed using the TaqMan quantitative PCR method (Applied Biosystems, Foster City, CA, USA). TaqMan reagents and protocols for uniplex quantitative real-time PCR amplification and genotyping by allelic discrimination were performed as per the manufacturer’s instructions (for complete details, see the TaqMan SNP Genotyping Assays Protocol; Applied Biosystems). We excluded participants with poor sample quality (i.e. where genotyping failed for >100 SNPs). In total, 117 participants were excluded from the NHSII (all with GDM) and 43 participants were excluded from the DNBC. The final analysis population of the present study was therefore composed of 7538 participants (2060 women with GDM and 5478 control women) from the NHSII and 1184 participants (576 women with GDM and 608 control women) from the DNBC.

The distributions of major characteristics of these women were similar to those of the corresponding source populations of women with and without GDM (data not shown).

Candidate SNP selection

We initially selected a total of 130 SNPs that were significantly associated with the risk of type 2 diabetes based on previous GWASs [27,28,29,30,31,32]. We excluded 18 SNPs because they had minor allele frequencies of less than 1% (rs60980157, rs2233580, rs3842770, rs7560163 and rs9552911), because they were not imputed in genome-wide genotyping in the NHSII (rs5945326 and rs12010175) or because they could not be genotyped in candidate gene genotyping (rs163182, rs10965250, rs1470579, rs312457, rs343092, rs6467136, rs7656416, rs7901695, rs34160967, rs6968865 and rs713598). In total, 112 SNPs were available for further analysis (ESM Table 2).

Assessment of covariates

Covariates for the NHSII and the DNBC were selected a priori. Covariates in the NHSII were ascertained from the baseline questionnaire and included age (years), smoking (never smoker vs smoker), family history of type 2 diabetes and BMI calculated from self-reported height and weight. Covariates in the DNBC were ascertained from questionnaires administered during the index pregnancy and included age (years), smoking during pregnancy (yes vs no) and pre-pregnancy BMI calculated from self-reported height and pre-pregnancy weight. In the DNBC, information on family history of diabetes (yes vs no) was collected as part of the DWH study follow-up.

Statistical analysis

We identified the risk allele of each SNP associated with risk of type 2 diabetes based on previous GWASs of type 2 diabetes (ESM Table 2). Logistic regression models were fitted to evaluate the association between each SNP and the risk of GDM by using an additive model in the NHSII and DNBC. The results from the two cohorts were meta-analysed using a fixed-effect inverse variance model [40]. The false discovery rate (FDR) was used to account for multiple testing, and the Benjamini–Yekutieli procedure was adopted [41]. The Benjamini–Yekutieli procedure stringently controls the proportion of false positives among rejected hypotheses, and performs well in the presence of correlation among genetic variants.

We created unweighted and weighted genetic risk scores (GRSs) based on SNPs that were significantly associated with the risk of GDM after FDR correction (p < 0.05). Specifically, unweighted GRSs were determined by summing up risk alleles of identified SNPs, which was the allele associated with a higher risk of type 2 diabetes based on a literature search (ESM Table 2). Weighted GRSs were determined by summing up risk alleles of identified SNPs multiplied by the corresponding weight estimated based on the pooled coefficient of each SNP with risk of GDM from both cohorts. Using a similar method, we determined unweighted and weighted GRSs based on all candidate SNPs included in our study. In addition, we created two sub-GRSs according to their biological functions, a GRS based on 66 SNPs related to beta cell function (GRS-BC) and a GRS based on 17 SNPs related to insulin resistance (GRS-IR) [42, 43], and examined the potential differences in associations with risk of GDM. Participants were categorised into four quartiles defined by the 25th, 50th and 75th percentile GRS scores (i.e. quartile 1: ≤25%, quartile 2: 25–50%, quartile 3: 50–75% and quartile 4: >75%). Logistic regression models were then fitted to examine the associations of GRSs with risk of GDM using quartiles 1 (0–25% quartiles) as a reference in the NHSII and DNBC, and results from both cohorts were pooled using a fixed-effects model. Given that our study did not include a replication cohort, we additionally created GRSs and examined the association with risk of GDM using tenfold cross-validation [44]. We extracted a subsample with replication from the pooled sample of the NHSII and the DNBC, and the subsample was divided into ten approximately equal bins. The association of GRS with risk of GDM was evaluated ten times, using nine bins to estimate the weight of each SNP by obtaining the coefficient of each SNP with risk of GDM and the tenth bin to examine the association of GRS with risk of GDM. We averaged the association of GRS with risk of GDM for the tenth bin across the ten analyses. We repeated the extraction of the subsample 1000 times to obtain the non-parametric 95% CI of the association of GRS with risk of GDM.

We conducted stratified analyses by family history of type 2 diabetes and smoking status, BMI and age at baseline. We tested for potential effect modification by these stratified variables by including interaction terms between the exposure and potential effect modifier in a multivariate adjusted model, and conducted a likelihood ratio test comparing the models with and without interaction terms. All statistical tests were two-sided and performed using SAS (v9.4, SAS Institute, Cary, NC, USA). The tenfold cross-validation was conducted using R v3.2.5 [45].

Results

Our study population comprised 8722 participants, including 7538 women (2060 with GDM and 5478 control women) from the NHSII and 1184 (576 with GDM and 608 control women) from the DNBC (Table 1). Compared with control women, women with GDM were more likely to be heavier and have a family history of type 2 diabetes. By pooling the results from the DNBC and NHSII, we identified 11 SNPs that were significantly associated with risk of GDM after FDR correction (Table 2). Of the 11 SNPs, eight were identified for the first time as novel SNPs of GDM, namely rs7957197 (HNF1A), rs10814916 (GLIS3), rs3802177 (SLC30A8), rs9379084 (RREB1), rs34872471 (TCF7L2), rs7903146 (TCF7L2), rs11787792 (GPSM1) and rs7041847 (GLIS3). Consistent with previous literature [23, 24], rs10830963 (MTNR1B), rs1387153 (MTNR1B) and rs4506565 (TCF7L2) were also associated with risk of GDM in our study. The risk allele of type 2 diabetes was associated with a higher risk of GDM for all identified SNPs except rs9379084 (RREB1) and rs11787792 (GPSM1), and the results were in general consistent between cohorts. ESM Table 2 shows the association of all measured individual SNPs with risk of GDM in the NHSII and DNBC.

Table 1 Baseline characteristics of 8722 study participants, including 7538 women from the NHSII and 1184 from the DNBC
Table 2 SNPs significantly associated with risk of GDM after FDR correction among 8722 women from the NHSII and the DNBC

The weighted GRS based on the 11 SNPs was significantly associated with a higher risk of GDM in both the NHSII and DNBC, and the results were consistent between cohorts (p for heterogeneity >0.05) (Table 3). Compared with participants in the lowest quartile of the weighted GRS, the ORs for GDM were 1.07 (95% CI 0.93, 1.22), 1.23 (95% CI 1.07, 1.41) and 1.53 (95% CI 1.34, 1.74) for participants in the second, third and fourth (highest) quartiles, respectively (p for trend <0.001). Specifically, each per allele increase in GRS was associated with a higher risk of GDM (OR 1.04, 95% CI 1.03, 1.05). The associations for the risk of GDM persisted and remained significant when using two other types of GRS as the main exposure: unweighted GRS and weighted GRS created using tenfold cross-validation. Furthermore, we created GRSs by additionally including four SNPs that had been found to be associated with the risk of GDM in previous studies (rs7756992 [CDKAL1], rs7754840 [CDKAL1], rs9939609 [FTO], and rs1801278 [IRS; also known as IARS]) [24, 26], by including all 112 candidate SNPs, and by determining GRS-BC and GRS-IR. As expected, these GRSs were associated with a higher risk of GDM (Table 3). However, the association of the per allele increase in GRS with risk of GDM was strongest based on the 11 SNPs identified in our study. The magnitude of the association of GRS-BC with the risk of GDM was stronger than that of GRS-IR. In addition, we examined associations of weighted GRS with risk of GDM by pooling both cohorts rather than using meta-analysis, and the results were unchanged.

Table 3 Associations of GRS with risk of GDM among 8722 women from the NHSII and the DNBC

The positive associations between weighted GRS and risk of GDM did not materially change across different strata of stratification variables (i.e. family history of type 2 diabetes, smoking, BMI and age), although the magnitudes of the associations were stronger among participants without a family history of type 2 diabetes (ORs across increasing quartiles of GRS were 1.00, 1.10, 1.33 and 1.64 for participants without family history of type 2 diabetes vs 1.00, 1.01, 0.98 and 1.25 for participants with family history of type 2 diabetes) (all p values for interaction were >0.10) (Table 4).

Table 4 Associations of weighted GRS based on the 11 SNPs with risk of GDM stratified by family history of type 2 diabetes, age, smoking status and BMI at baseline among 8722 women from the NHSII and the DNBC

Discussion

In this study with 8722 participants including 2636 women with GDM, we identified 11 SNPs that were significantly associated with the risk of GDM after FDR correction, of which eight SNPs were identified for the first time. The GRS based on the 11 SNPs was significantly associated with risk of GDM, and the positive associations remained significant within most of the subgroups stratified by family history of type 2 diabetes, smoking, BMI and age at baseline. In addition, we found that most of the 11 identified SNPs were related to beta cell function, and the association of GRS-BC with risk of GDM was stronger than that of GRS-IR, indicating that type 2 diabetes SNPs related to insulin biosynthesis and secretion play an important role in the development of GDM.

Among the previously identified type 2 diabetes SNPs, SNPs falling within gene TCF7L2 has been the strongest genetic predictor of type 2 diabetes to date [46]. For the first time, we identified two other TCF7L2-related SNPs (rs34872471 and rs7903146) whose type 2 diabetes-associated risk alleles were also associated with a higher risk of GDM. rs34872471 and rs7903146 are located in the intron region of TCF7L2 and are in linkage disequilibrium with rs4506565, whose risk allele has been reported to increase the risk of GDM by 44–49% in previous studies [23, 47] and was confirmed to be associated with a higher risk of GDM in our study. These three SNPs might impair the expression of GLP-1 (also known as GLP1R) in enteroendocrine cells by interfering with β-catenin–mediated transcriptional activation of the gene encoding glucagon [48]. This could, in turn, result in a defective or poorly expressed glucagon protein and lead to decreased insulin secretion and consequently hyperglycaemia [48].

The current study identified, for the first time, several type 2 diabetes-associated risk alleles associated with a higher risk of GDM, namely rs7957197 (HNF1A), rs10814916 (GLIS3), rs3802177 (SLC30A8) and rs7041847 (GLIS3). These SNPs had not previously been identified as being significantly related to GDM. There is distinct biological plausibility for the associations of these SNPs with GDM. rs7957197 is located in the intronic region of HNF1A, which encodes a transcription factor required for the expression of GLUT1 and GLUT2 in pancreatic beta cells [49]. Furthermore, defects in HNF1A are a cause of MODY type 3 [50]. rs3802177 is located in the intronic region of SLC30A8, which encodes a zinc transporter that is expressed solely in the secretory vesicles of beta cells and is involved in the final stages of insulin biosynthesis and secretion [46]. Previous studies have shown that reduced zinc transport activity increases type 2 diabetes risk [51], while overexpression of SLC30A8 in pancreatic cells increases glucose-stimulated insulin secretion [52]. rs10814916 is located in the intronic region of GLIS3, a member of the GLI-similar zinc finger protein family, and encodes a nuclear protein with five C2H2-type zinc finger domains. This protein is highly expressed in pancreatic beta cells, and variants in this gene have been associated with neonatal diabetes [53]. Of note, we also observed that the type 2 diabetes-associated risk alleles of rs9379084 (RREB1) and rs11787792 (GPSM1) were associated with a lower risk of GDM. The reason for the inverse association between these two SNPs and GDM risk needs further investigation.

Our finding that SNPs rs10830963 and rs1387153 were significantly associated with GDM risk is consistent with the results of several other studies on GDM [24, 26, 54]. A meta-analysis involving 8204 women with GDM and 15,221 control participants demonstrated that out of six type 2 diabetes risk variants, rs10830963 near MTNR1B was most strongly associated with GDM risk [24]. In addition, rs10830963 was one of the two GDM-associated variants identified in a prior GWAS conducted among Korean women [26]. Furthermore, rs10830963 and rs1387153 have been associated with higher levels of glucose during pregnancy in Greek and Chinese women [55, 56]. MTNR1A and MTNR1B are receptors of melatonin, which is best known as a regulator of seasonal and circadian rhythms [57]. rs10830963 is located within the intron region of MTNR1, and carriers of the risk allele of rs10830963 exhibit increased expression of MTNR1B in pancreatic beta cells, which leads to impaired insulin secretion [57].

Our study has several unique strengths. First, our study is the largest study of genetic variants of GDM to date, which allowed greater statistical power to detect potential associations after correcting for false discoveries. Second, for the identified SNPs, we observed a consistent association of weighted GRS with risk of GDM in two independent cohorts of white women, the NHSII in the USA and the DNBC in Denmark, further demonstrating the validity of the identified SNPs and the GRS. In addition, we were able to examine whether the SNP–GDM associations were modified by other major risk factors for GDM.

Our study also has several potential limitations. First, we included only candidate SNPs that were known to be associated with the risk of type 2 diabetes, which limited our capacity to discover novel variants of GDM beyond these candidate SNPs for type 2 diabetes. However, the current effort represents only the initial step of our endeavour in investigating genetic variants of GDM, and further demonstrates at least a partially shared aetiology of GDM with type 2 diabetes. Second, given that only candidate SNPs were genotyped in the DNBC and in the majority of women with GDM in the NHSII, our study did not adjust for population stratification resulting from a systematic difference in allele frequencies between populations. However, all of the participants included in the DNBC and the NHSII were self-reported whites, and the population in Denmark has been shown to have high population homogeneity [58]. Third, potential misclassification of women with GDM might exist in both cohorts, attenuating associations between SNPs and the risk of GDM and limiting our study power. However, the validation study conducted in the NHSII showed that the majority of NHSII participants were screened for GDM during pregnancy and most of the self-reported GDM events could be confirmed by medical records [10, 36]. Fourth, given that there is insufficient a priori data from genetic studies of GDM to derive weights of GRS, we used our own data to calculate GRS. However, we created a weighted GRS using a cross-weight method, and the associations between cross-weighted GRS and risk of GDM were significant and consistent in both cohorts. Finally, the generalisability of our findings to other populations with differing genetic and other characteristics needs further investigation. However, the homogeneity of our population minimises bias related to population stratification.

In summary, among two independent populations of women, we identified eight novel SNPs for GDM and confirmed three previously known GDM SNPs. In addition, the GRS based on identified SNPs was significantly and positively associated with GDM risk. These findings potentially provide novel information to improve our understanding of the aetiology of GDM, and particularly of biological mechanisms related to GDM and insulin biosynthesis and secretion.