The current analysis used genotyping data from two sources: (1) genomic data from 6873 women nested within the existing NHSII, which is a US population; and (2) candidate genotyping in a sample of 1227 women from the DNBC . All participants in the NHSII and DNBC gave informed consent to participate in the study.
The NHSII was established in 1989 and consists of 116,429 female registered nurses who were aged 25–42 years at baseline. Detailed questionnaire data were collected at baseline and every other year thereafter, and included medical history, lifestyle, usual diet and the occurrence of chronic diseases. In each biennial questionnaire through 2001, women were asked whether they were diagnosed as having GDM by a physician. In 2009, a questionnaire was administered to ascertain NHSII participants’ pregnancy and reproductive history. From 1996 to 2001, 29,611 NHSII participants aged 32–52 years provided blood samples. Among them, genome-wide data were available for participants of European ancestry within previous nested case–control studies of kidney stones, ovarian cancer, post-traumatic stress disorder, venous thromboembolism, endometriosis and breast cancer [34, 35]. Among all participants with genome-wide data, we restricted the current analysis to 5803 women with at least one pregnancy between 1989 and 2009, of whom 325 women reported a clinician diagnosis of GDM during pregnancy. Candidate genotyping was performed on DNA samples from an additional 1852 women with GDM collected as part of the Diabetes & Women’s Health (DWH) study during 2012–2016 . A flow diagram of sample selection is shown in Fig. 1. As shown in the figure, the DWH study was part of the NHSII and of the DNBC. The study protocols were approved by the institutional review boards of Brigham and Women’s Hospital and the Harvard T. H. Chan School of Public Health. In a validation study among a subgroup of NHSII participants (n = 120), 94% of self-reported GDM events were confirmed by medical records [10, 36]. The majority of NHSII participants were screened for GDM during pregnancy. A supplemental questionnaire was sent to a random sample of parous women who did not report GDM (n = 114). Of these women, 83% reported undergoing a 50 g glucose screening test during pregnancy and 100% reported frequent prenatal urine glucose screening [7, 32].
The DNBC (1996–2002) was a longitudinal cohort of 91,827 pregnant women in Denmark who were recruited during their first antenatal visit to a general practitioner . All women living in Denmark who could speak Danish and were planning to carry to term were eligible for the study. Prospective data on maternal sociodemographics, lifestyle and environmental exposures, as well as clinical and perinatal conditions, were collected from the DNBC through four telephone interviews at gestational weeks 12 and 30, and at 6 and 18 months postpartum.
Of the 91,827 DNBC participants, 1274 were identified as having GDM. Among 90,553 women who did not have GDM, a random sample of 1457 women (control participants) were selected. For the current analysis, we identified 607 women with GDM and 620 control participants who participated in the DNBC clinical examination and provided bio-specimens as part of the DWH study (2012–2014) (Fig. 1). The study was approved by the Regional Scientific Ethical Committee (VEK) of the Capital Region of Denmark (record no. H-4-2013-129). Study procedures were followed in accordance with the Declaration of Helsinki.
The methods and procedures undertaken to ascertain GDM in the DNBC have been previously described in detail . Briefly, in the DNBC, questions related to GDM were asked at gestational week 30 and at 6 months postpartum. Women who either self-reported GDM in the interviews or had a GDM diagnosis recorded in the National Patient Registry were considered as having GDM. Women who had a diabetes diagnosis recorded in the National Patient Registry prior to the index pregnancy were excluded. Medical records were retrieved for all women suspected of having GDM as well as the randomly selected control group, and a high sensitivity between self-reported GDM and medical records was found (96%). An expert panel developed criteria and guidelines for extracting the relevant data and for ascertaining GDM diagnoses according to WHO criteria [37, 38].
The genome-wide genotyping methods used by the NHSII have been described in detail elsewhere . Genome-wide genotyping was conducted using high-density SNP marker platforms including Illumina (San Diego, CA, USA) HumanHap, Infinium (Natick, MA, USA) OncoArray and Infinium HumanCoreExome. Genotypes were imputed using the 1000 Genomes Project ALL Phase I Integrated Release v3 (www.internationalgenome.org) haplotypes excluding monomorphic and singleton sites (2010–2011 data freeze, 2012-03-14 haplotypes; http://csg.sph.umich.edu/abecasis/mach/download/1000G.2012-03-14.html) as the reference panel. SNPs for which Hardy–Weinberg equilibrium testing produced a p value of less than 1 × 10−6 were excluded. Most of the SNPs were genotyped (sample call rate = 97%) or had a high imputation quality score (r2 ≥ 0.8), as assessed with the use of MACH software (Cincinnati, OH, USA). Moreover, the effect allele frequency and imputation quality score of all SNPs genotyped in different platforms were similar (see electronic supplementary material [ESM] Table 1).
Among women with GDM whose data was collected as part of the DWH study (i.e. NHSII participants whose genome-wide data were unavailable and DNBC participants), genotyping was performed using the TaqMan quantitative PCR method (Applied Biosystems, Foster City, CA, USA). TaqMan reagents and protocols for uniplex quantitative real-time PCR amplification and genotyping by allelic discrimination were performed as per the manufacturer’s instructions (for complete details, see the TaqMan SNP Genotyping Assays Protocol; Applied Biosystems). We excluded participants with poor sample quality (i.e. where genotyping failed for >100 SNPs). In total, 117 participants were excluded from the NHSII (all with GDM) and 43 participants were excluded from the DNBC. The final analysis population of the present study was therefore composed of 7538 participants (2060 women with GDM and 5478 control women) from the NHSII and 1184 participants (576 women with GDM and 608 control women) from the DNBC.
The distributions of major characteristics of these women were similar to those of the corresponding source populations of women with and without GDM (data not shown).
Candidate SNP selection
We initially selected a total of 130 SNPs that were significantly associated with the risk of type 2 diabetes based on previous GWASs [27,28,29,30,31,32]. We excluded 18 SNPs because they had minor allele frequencies of less than 1% (rs60980157, rs2233580, rs3842770, rs7560163 and rs9552911), because they were not imputed in genome-wide genotyping in the NHSII (rs5945326 and rs12010175) or because they could not be genotyped in candidate gene genotyping (rs163182, rs10965250, rs1470579, rs312457, rs343092, rs6467136, rs7656416, rs7901695, rs34160967, rs6968865 and rs713598). In total, 112 SNPs were available for further analysis (ESM Table 2).
Assessment of covariates
Covariates for the NHSII and the DNBC were selected a priori. Covariates in the NHSII were ascertained from the baseline questionnaire and included age (years), smoking (never smoker vs smoker), family history of type 2 diabetes and BMI calculated from self-reported height and weight. Covariates in the DNBC were ascertained from questionnaires administered during the index pregnancy and included age (years), smoking during pregnancy (yes vs no) and pre-pregnancy BMI calculated from self-reported height and pre-pregnancy weight. In the DNBC, information on family history of diabetes (yes vs no) was collected as part of the DWH study follow-up.
We identified the risk allele of each SNP associated with risk of type 2 diabetes based on previous GWASs of type 2 diabetes (ESM Table 2). Logistic regression models were fitted to evaluate the association between each SNP and the risk of GDM by using an additive model in the NHSII and DNBC. The results from the two cohorts were meta-analysed using a fixed-effect inverse variance model . The false discovery rate (FDR) was used to account for multiple testing, and the Benjamini–Yekutieli procedure was adopted . The Benjamini–Yekutieli procedure stringently controls the proportion of false positives among rejected hypotheses, and performs well in the presence of correlation among genetic variants.
We created unweighted and weighted genetic risk scores (GRSs) based on SNPs that were significantly associated with the risk of GDM after FDR correction (p < 0.05). Specifically, unweighted GRSs were determined by summing up risk alleles of identified SNPs, which was the allele associated with a higher risk of type 2 diabetes based on a literature search (ESM Table 2). Weighted GRSs were determined by summing up risk alleles of identified SNPs multiplied by the corresponding weight estimated based on the pooled coefficient of each SNP with risk of GDM from both cohorts. Using a similar method, we determined unweighted and weighted GRSs based on all candidate SNPs included in our study. In addition, we created two sub-GRSs according to their biological functions, a GRS based on 66 SNPs related to beta cell function (GRS-BC) and a GRS based on 17 SNPs related to insulin resistance (GRS-IR) [42, 43], and examined the potential differences in associations with risk of GDM. Participants were categorised into four quartiles defined by the 25th, 50th and 75th percentile GRS scores (i.e. quartile 1: ≤25%, quartile 2: 25–50%, quartile 3: 50–75% and quartile 4: >75%). Logistic regression models were then fitted to examine the associations of GRSs with risk of GDM using quartiles 1 (0–25% quartiles) as a reference in the NHSII and DNBC, and results from both cohorts were pooled using a fixed-effects model. Given that our study did not include a replication cohort, we additionally created GRSs and examined the association with risk of GDM using tenfold cross-validation . We extracted a subsample with replication from the pooled sample of the NHSII and the DNBC, and the subsample was divided into ten approximately equal bins. The association of GRS with risk of GDM was evaluated ten times, using nine bins to estimate the weight of each SNP by obtaining the coefficient of each SNP with risk of GDM and the tenth bin to examine the association of GRS with risk of GDM. We averaged the association of GRS with risk of GDM for the tenth bin across the ten analyses. We repeated the extraction of the subsample 1000 times to obtain the non-parametric 95% CI of the association of GRS with risk of GDM.
We conducted stratified analyses by family history of type 2 diabetes and smoking status, BMI and age at baseline. We tested for potential effect modification by these stratified variables by including interaction terms between the exposure and potential effect modifier in a multivariate adjusted model, and conducted a likelihood ratio test comparing the models with and without interaction terms. All statistical tests were two-sided and performed using SAS (v9.4, SAS Institute, Cary, NC, USA). The tenfold cross-validation was conducted using R v3.2.5 .