figure b

Introduction

Type 2 diabetes is a growing public health challenge, affecting approximately 14.6% of the US population [1] and expected to double in prevalence in the next two decades [2,3,4]. Investigating the genetic architecture of quantitative traits, including fasting glucose, fasting insulin and HbA1c, that serve as early markers of type 2 diabetes progression may lead to a deeper understanding of type 2 diabetes aetiology. For example, prior genome-wide association studies (GWAS) of glycaemic traits identified novel loci in genes and pathways related to glucose metabolism, circadian rhythm regulation, and cell proliferation and development [5, 6], as well as erythrocyte characteristics that can influence HbA1c [7].

Despite the success of prior glycaemic trait GWAS, which have identified nearly 600 loci [5, 6, 8,9,10,11], most of these findings were identified in populations primarily of European ancestry. Such limited ancestral diversity reduces our ability to map novel loci [12,13,14,15,16,17,18]. Additionally, locus characterisation and fine-mapping can be improved through multi-ethnic studies that increase sample size and leverage differences in linkage disequilibrium (LD) structure between diverse populations [19,20,21,22].

This study examined the genetic architecture of fasting glucose, fasting insulin and HbA1c in participants of the diverse Population Architecture using Genomics and Epidemiology (PAGE) Study [23]. We aimed to identify novel genetic loci and independent secondary association signals at previously identified regions and characterise these loci through transethnic fine-mapping.

Methods

Ethics statements

Approval by the Institutional Review Boards was obtained for each participating cohort. Informed consent was obtained from all participants, and the study was conducted in accordance with the principles of the Declaration of Helsinki.

Study population

This study included adults without diabetes who self-identified as African American (AA), Hispanic/Latino (HA), Asian (ASN), Native Hawaiian (HI), Native American (NAm), European (EA) or other race/ethnicity, enrolled in the Atherosclerosis Risk in Communities (ARIC) study, the Ichan Mount Sinai School of Medicine’s BioMe Biobank (BioMe), the Coronary Artery Risk Development in Young Adults Study (CARDIA), the Multiethnic Cohort (MEC) Study, the Hispanic Community Health Study/Study of Latinos (HCHS/SOL) and the Women’s Health Initiative (WHI) (see electronic supplementary material [ESM] Methods for details). These studies were part of the PAGE Study consortium, an NIH-funded effort to characterise the genetic architecture of complex traits among historically underrepresented populations through large-scale genetic epidemiology research [23].

In this paper, we stratified populations based on self-identified race/ethnicity due to historical reasons (e.g. genotyping datasets and study recruitment) and in recognition of the shared lived experiences of people based on self-identified grouping. To address confounding by population stratification, we included ancestral principal components in our models. We conducted two main analyses: transethnic analyses in the entire population; and analyses stratified by self-identified race/ethnicity. Participants who self-identified as ‘other race/ethnicity’ were included in all transethnic analyses but because of lack of power due to small sample sizes, no population-specific analyses for this group are presented.

Trait measurement

Fasting glucose and fasting insulin concentrations (fasting > 8 h) were measured using standard assays at baseline visits; for all cohorts except HCHS/SOL, HbA1c was measured at a subsequent visit. Glycaemic trait measurements among individuals with type 2 diabetes reflect their current glycaemic control, which is influenced by their access and adherence to medical treatment; therefore, individuals were excluded from analysis if they reported a previous diabetes diagnosis or fasting glucose concentrations consistent with diabetes (≥ 7.0 mmol/l). Because HbA1c was not measured at the same time point as fasting glucose and fasting insulin in most cohorts and was only added as a diagnostic criterion for diabetes in 2009 [24], after the majority of data were collected, individuals with HbA1c ≥ 48.0 mmol/mol (6.5%) were not excluded from the study population. However, for HbA1c analyses, individuals with extreme HbA1c values (HbA1c ≥ 65.0 mmol/mol [8.1%]) were excluded. Individuals with BMI >70 kg/m2 were also excluded for all traits.

Contributing samples were genotyped using multiple platforms (ESM Methods, ESM Table 1). A total of 53,426 samples were genotyped on the MEGA array, which was specifically designed to increase variant coverage across multiple ethnic groups [25, 26]. Additionally, 28,477 participants with fasting glucose measurements, 12,296 participants with HbA1c measurements and 26,965 participants with fasting insulin measurements from ARIC, BioMe, CARDIA, MEC and WHI were previously genotyped using either Illumina or Affymetrix arrays within each individual study/stratum. All studies used standard quality control filters (ESM Table 1). Ancestral principal component analysis was conducted to evaluate and adjust for population substructure, as previously described in Wojcik et al [26].

Statistical analyses

Fasting glucose concentrations, natural-log-transformed fasting insulin concentrations, and HbA1c measurements were each adjusted for age at trait measurement, sex, age × sex interaction, BMI (kg/m2), smoking status, self-reported race/ethnicity and study centre (see ESM Methods for details of covariate measurements), after which residuals were computed and inverse-normally transformed within each genetic dataset (e.g. population-specific for ARIC or substudy for WHI). In sensitivity analyses, models were estimated excluding BMI. Association analyses for each dataset were performed using SUGEN version 8.10 (https://github.com/dragontaoran/SUGEN), which implemented a generalised estimating equation method that accounts for relatedness, while adjusting for ten ancestral principal components [27]. Subsequently, fixed-effects models with inverse variance weighting were used to pool dataset-specific variant effect estimates and their SEs across populations as well as within populations using METAL version 2011-03-25 (http://csg.sph.umich.edu/abecasis/Metal/download/), after applying genomic control correction [28]. Variants with an effective n < 30 or an imputation R2 < 0.4 within a given dataset were excluded from meta-analysis. To account for testing of multiple traits across multiple ancestries, we defined novel loci as those in which the lead variant reached a genome-wide significance threshold of p < 5.0 × 10−9, as done previously [26], and were located more than 500 KB from any previously established loci for the given glycaemic trait.

Fine-mapping

To identify independent secondary signals, stepwise conditional analyses were performed for the transethnic meta-analysis results, conditioning on the most significant variants (known and novel) identified in our GWAS and applying genomic control correction. After conditioning on the top genome-wide significant (p < 5 × 10−9) variant, variants identified within a 1 MB region of the variant with a p value < 5.0 × 10−8 were considered significant, independent signals. These conditional analyses were repeated, adding in the conditional lead variants until no variant had a conditional p value less than the locus-specific significance (p < 5.0 × 10−8). To determine whether identified secondary signals at known loci were independent from known secondary signals, we also conditioned on known variants reported in the literature.

We subsequently performed fine-mapping of novel primary analysis loci and independent secondary loci using FINEMAP version 1.4_x86_64 (http://www.christianbenner.com) [29]. All variants within ±1 MB of each novel primary and independent secondary variants were included for fine-mapping, restricting to variants with a stratum specific effective n > 30 and imputation R2 > 0.4. If variants demonstrated population-specific significance, a population-specific LD matrix was constructed; for all other variants with genome-wide significance in the transethnic meta-analysis, a combined ancestry LD matrix was constructed by computing population-specific LD matrices and subsequently weighting by population sample size. We then computed the posterior probabilities of k causal variants at each reported locus and constructed a 95% credible set (CS). LocusZoom plots [30] of the CS top variants were generated to visualise the signals identified at each locus.

Replication

Replication of novel loci was performed under a common analysis plan; variant proxies in high LD (D′ and r2 > 0.9 in the population of interest) were used if the variant of interest was not genotyped or well-imputed in the following four multi-ethnic studies: Jackson Heart Study (JHS); Cameron County Hispanic Cohort (CCHC); Reasons for Geographical And Racial Differences in Stroke (REGARDS) Study; and Multi-Ethnic Study of Atherosclerosis (MESA). Additionally, published summary statistics from the China Health and Nutrition Survey (CHNS) cohort [31] and an analysis of individuals of EA ancestry from Lagou et al and the Meta-Analyses of Glucose and Insulin-related traits Consortium (MAGIC) [32] were also included for replication analyses (ESM Methods). We used the R package MetaSubtract version 1.60 (https://cran.r-project.org/web/packages/MetaSubtract/) [33] to remove overlapping EA ARIC cohort results from the Lagou et al summary statistics before their inclusion in replication (ESM Methods). A maximum of n = 8459, n = 92,432, n = 3406 and n = 6476 AA, EA, HA and ASN participants, respectively, were identified for replication of fasting glucose, fasting insulin and HbA1c novel variants. Replication data were not available for HI and NAm populations. Significance was determined using Bonferroni correction (0.05/number of significant novel independent signals). All replication results were meta-analysed in transethnic and population-specific analyses, using METAL [28].

Functional annotation

Finally, to characterise the putative functionality of variants, we performed bioinformatic follow-up for all novel primary and independent secondary variants, as well as the top variants identified in each fine-mapping CS. We used the UCSC Genome Browser Islet Regulome tracks [34,35,36], which include data on chromatin classes, cytokine-induced regulatory elements and enhancer hubs in both adult human islets and pancreatic progenitors. Additionally, we created a custom UCSC Genome Browser analysis hub of important regions (e.g. enhancer and repressor activities, DNase I hypersensitive sites [DHS] and transcribed regions) in the pancreas and insulin-responsive tissues, including skeletal muscle, liver and adipose tissue, using GTEx [37] and Roadmap Epigenome Project [38] data.

Results

Study overview

After exclusions, a total of 52,267, 23,357 and 48,395 participants were available for fasting glucose, HbA1c and fasting insulin GWAS, respectively (ESM Table 2), of which collectively over half were either self-reported AA or HA (maximum 23% AA, 46% HA, 40% EA, 4% ASN, 3% HI, 0.8% NAm). The mean age of participants was 54.5 years and they were overweight (mean ± SD BMI 28.0 ± 5.7), with a greater representation of female participants (72%). Glycaemic trait distributions were similar across studies and self-reported race/ethnic groups, with mean ± SD fasting glucose levels ranging from 4.5 ± 0.5 mmol/l to 5.5 ± 0.6 mmol/l, HbA1c levels ranging from 34.0 ± 3.5 (5.3%) mmol/mol to 38.6 ± 3.2 (5.7%) mmol/mol and fasting insulin levels ranging from 32.3 ± 19.7 pmol/l to 80.9 ± 59.0 pmol/l.

Identification of significant loci

In the transethnic meta-analysis, we identified a total of 13, 13 and 11 genome-wide significant (p < 5.0 × 10−9) loci for fasting glucose, HbA1c and fasting insulin, respectively (Fig. 1 and ESM Table 3, ESM Fig. 1). Several loci and, in some cases, several top variants were shared across glycaemic traits: G6PC2 for fasting glucose and HbA1c (shared top variant: rs560887); GCKR for fasting glucose and fasting insulin (shared top variant: rs1260326); SLC2A2 for fasting glucose and HbA1c (shared top variant: rs1879442); and GCK for fasting glucose and HbA1c. Effect estimates for significant variants were generally consistent across populations (Fig. 2 and ESM Fig. 1), although statistical significance varied, often in accordance with minor allele frequency (MAF) and/or sample size.

Fig. 1
figure 1

Manhattan plots for glycaemic trait association analyses in PAGE, adjusting for BMI. (a) Fasting insulin transethnic meta-analysis results. (b) HbA1c transethnic meta-analysis results. (c) Fasting glucose transethnic meta-analysis results. (d) Fasting glucose AA-specific meta-analysis results. Known loci are shown in grey; novel loci with p value < 1 × 10−6 are shown in purple; novel loci with p value < 5 × 10−9 are shown in pink

Fig. 2
figure 2

Forest plots of primary GWAS and replication transethnic and population-specific meta-analysis effect estimates and 95% CIs for the four novel variants identified in the PAGE Study. (a) Fasting glucose variant rs571025315 at LRRC37A5P locus, which was genome-wide significant (p < 5 × 10−9) only in AA-specific meta-analysis. Effective n < 30 for all other populations in the primary analysis, indicated by sample size n = NA in the primary analysis panel. (b) Fasting insulin variant rs9472142 at VEGFA locus. (c) Fasting insulin variant rs35131928 at CASC8/CASC21 locus; EA REGARDS replication data used proxy variant rs10956361 in lieu of rs35131928 (D′ = 1 and r2 = 1 with rs35131928 in EA PAGE data). (d) Fasting insulin variant rs10887773 at PTEN locus. PAGE Study GWAS results for transethnic and population-specific meta-analyses are shown against a white background; transethnic and population-specific meta-analyses of replication results are shown against a grey background. Replication data sources, by population, are as follows: AA, JHS, REGARDS; EA, REGARDS, MESA, MAGIC; HA, MESA, CCHC; and ASN, MESA, CHNS

Three of the 34 significant loci identified in transethnic GWAS were novel (± 500 KB from a known variant) at time of analysis (January 2020) and were associated with fasting insulin: the VEGFA (also known as MVCD1, VEGF or VPF) locus on chromosome 6 (lead variant rs9472142, p = 5.56 × 10−10); the CASC8/CASC21 (also known as CARLO1, CARLo-1, LINC00860, CARLO2, CARLo-2 or LINC01244) locus on chromosome 8 (lead variant rs35131928, p = 2.70 × 10−9); and the PTEN (also known as 10q23del, BZS, CWS1, DEC, GLM2, MHAM, MMAC1, PTEN1, PTENbeta or TEP1) locus on chromosome 10 (lead variant rs10887773, p = 4.55 × 10−10) (Table 1, Figs 1, 2). Wide variation in MAF was observed across populations for lead variants at these three novel loci, particularly for rs9472142 at the VEGFA locus (MAF range 0.12–0.36) and rs10887773 at the PTEN locus (MAF range 0.10–0.37). Effect estimates were generally directionally consistent across populations (Fig. 2). We also identified a fourth novel locus associated with fasting glucose in the population-specific meta-analysis of self-identified African Americans: the LRRC37A5P (also known as C9orf29) locus on chromosome 9 (lead variant rs571025325, pAA = 4.58 × 10−9) (Table 1, Figs 1, 2), with a MAF of 0.0037.

Table 1 Novel genome-wide-significant (p<5 × 10−9) loci discovered in genome-wide association study of fasting glucose, fasting insulin and HbA1c via transethnic and population-specific meta-analysis

Replication of lead variants at four novel loci

Replication of lead variants or proxy variants at the four potentially novel loci was performed through transethnic meta-analysis of independent AA (n range 1311–4986), ASN (n range 667–5809), EA (n range 1054–97,348) and HA (n range 1189–2217) cohorts, with EA fasting insulin results from published summary statistics from Lagou et al contributing the largest sample size. Lead variants for all three novel fasting insulin loci showed directionally consistent effects, although considerable effect attenuation was observed. The PTEN lead variant was significant at the Bonferroni-corrected significance level of p = 0.0125 (α = 0.05/4 signals) in independent transethnic meta-analysis and the other two fasting insulin loci showed suggestive significance, particularly CASC8/CASC21 (p = 0.0174) (Fig. 2 and ESM Table 4). The fourth locus (fasting glucose, LRRC37A5P), which was observed only in AA-specific meta-analysis, did not show evidence of replication (p = 0.62), although only 41 of the 5110 replication dataset participants were expected to carry at least one copy of the minor allele (ESM Table 4). Furthermore, in Chen et al’s [39] recently published glycaemic traits GWAS, our VEGFA, PTEN and CASC8/CASC21 lead variants showed significance in transethnic (VEGFA and PTEN), EA-specific (VEGFA, PTEN) and East Asian-specific (PTEN, CASC8/CASC21) meta-analyses; however, these results are not an independent replication as they contain overlapping data from the ARIC, BioMe, WHI, HCHS/SOL and several replication cohorts used here (ESM Table 4).

Secondary analyses at known glycaemic trait loci

Through stepwise conditional analysis, we identified seven significant secondary signals at known glycaemic trait loci, including two previously unreported fasting glucose (GCK [also known as FGQTL3, GK, GLK, HHF3, HK4, HKIV, HXKP, LGLK, MODY2 or PNDM1], rs55908146) and fasting insulin (PPP1R3B [also known as GL, PPP1R4 or PTG], rs330941) secondary signals that remained significant after conditioning upon known variants (Table 2 and ESM Table 5). Wide variation in MAF was observed across populations for both novel independent secondary signals rs330941 (MAF range 0.22–0.49) and rs55908146 (MAF range 0.15–0.32) (Table 2).

Table 2 Significant (p<5 × 10−8) previously unreported secondary signals at known fasting insulin and fasting glucose loci

Fine-mapping

To identify the most likely causal variant(s) for the four putatively novel loci and two novel independent secondary signals, we subsequently utilised FINEMAP to estimate the number of causal variants per locus and generate a 95% CS for each causal variant. For three of the four novel loci (LRRC37A5P, CASC8/CASC21, PTEN) we estimated one causal variant at each locus (k = 1) (Table 3); at these loci, the top variants in our GWAS analyses (rs571025325, rs35131928, rs10887773) were identified as the variants most likely to be causal, although with varying posterior probabilities of being the top causal variant (range 0.06–0.79) (ESM Tables 6, 7, 8). The broad range of posterior probabilities by locus reflects the size of the LD block. For the fourth novel locus (VEGFA), the highest posterior probability was observed for k = 2 causal variants, with our top GWAS variant rs9472142 identified as the top variant in CSVEGFA1 (Table 3 and ESM Table 9); the top variant in CSVEGFA2 (rs6910726) was just under the significance threshold in our stepwise conditional analysis, with p = 4.20 × 10−6 (ESM Table 5).

Table 3 Fine-mapping posterior probabilities of k causal variants at novel primary GWAS and independent secondary signal loci

For the two novel independent secondary signals, the highest posterior probabilities were estimated for k = 2 (PPP1R3B) and k = 4 (GCK) causal variants (Table 3). Because we did not perform any LD pruning, we identified CSs containing many variants in high LD with each other, and therefore low individual posterior probabilities of being the top causal variant in each CS. For example, at the PPP1R3B locus, for the variants in CSPPP1R3B1, the posterior probabilities of being the top causal variant range between 0.11 and 0.26 (ESM Table 10). The top three variants in CSPPP1R3B2, including the most significant variant from our conditional analysis, rs330941, are in high LD with each other but not the CSPPP1R3B1 variants, and posterior probabilities for these three variants range from 0.24 to 0.37 (ESM Table 10). The novel GCK secondary signal rs55908146 was among the top five variants in CSGCK3, all of which had a probability of being the top variant in CSGCK3 of about 0.10, additionally suggesting an LD block (ESM Table 11). LocusZoom plots of the loci with more than one CS showed that the CSs have little shared LD (ESM Fig. 2).

Functional annotation

We performed bioinformatic follow-up of the novel primary loci and known loci with independent secondary signals using the UCSC Genome Browser Islet Regulome tracks [34,35,36] and a custom UCSC analysis hub of important regions (e.g. enhancer and repressor activities, DHS and transcribed regions) in the pancreas and insulin-responsive tissues including skeletal muscle, liver and adipose tissue. However, functional annotation of the top variants in the fine-mapping CSs for each loci did not indicate a clear potential mechanism through which variants may act; gene expression in the GTEx dataset [40] showed ubiquitous levels of expression across tissues for most of the loci, and human pancreatic islet chromatin state data showed chromatin state markers of expression in the general regions of many of the loci (data not shown).

Discussion

Examining the genetic architecture of glycaemic traits in a diverse study, we identified three novel (at time of analysis, January 2020) fasting insulin loci shared across populations and a fourth low-frequency fasting glucose locus specific to self-identified AAs. Additionally, we identified two previously unreported independent secondary signals in the PPP1R3B and GCK loci associated with fasting insulin and fasting glucose, respectively. These results emphasise the continued need for more GWAS in diverse populations to assess the genetic heterogeneity of complex diseases.

While this paper was under review, Chen et al and the MAGIC consortium published a large-scale transancestry analysis of glycaemic traits, aggregating GWAS data from up to 281,416 individuals without diabetes [39]. They identified the novel fasting insulin-associated PTEN locus identified here (r2 = D′ = 1 between our identified variant rs10887773 and Chen et al’s variant rs12769346), as well as a fasting insulin variant in the VEGFA locus. However, after conditioning on Chen et al’s top variant (rs998584), our identified VEGFA top variant remained genome-wide significant (p < 5 × 10−9). Additionally, there was low LD between the VEGFA variants (r2PAGE rs9472142 and MAGIC rs998584 = 0.03, DPAGE rs9472142 and MAGIC rs998584 = 0.35); we note that rs9472152, which was contained within both of our VEGFA fine-mapping 95% CSs, is located near rs998584, with r2rs9472125 and MAGIC rs998584 = 0.01 and Drs9472125 and MAGIC rs998584 = 0.61 between the two variants, as calculated from the PAGE combined ancestry LD. The independent fasting insulin and fasting glucose secondary signals we identified in the PPP1R3B and GCK loci were not among the variants identified at these loci by Chen et al.

Although there was overlap in the cohorts in our PAGE data and in Chen et al, including ARIC, BioMe, WHI and HCHS/SOL, in the PAGE Study much of our contributing genetic data from these cohorts were newly genotyped on the MEGA array, which was specifically designed to increase variant coverage across multiple ancestry groups [25, 26]. Additionally, the distribution of ancestry groups varied across the two analyses: PAGE data had a higher percentage of non-EA participants (% non-EA range 60.0% [fasting insulin] to 62.4% [fasting glucose]) than Chen et al, in which approximately 30% of participants were non-EA. While the PAGE Study’s statistical power is diminished by a smaller sample size, due to the increased ancestral diversity and finer genotyping on the MEGA array, we identified two loci not identified by Chen et al and one that was reported by Chen et al [39]. Both approaches provide complementary information on the genetic architecture of glycaemic traits in diverse populations.

The three novel fasting insulin loci identified via transethnic meta-analysis (VEGFA, CASC8/CASC21 and PTEN) and the novel fasting glucose AA-specific locus (LRRC37A5P) harbour genes with biologically plausible roles in insulin signalling and beta cell function. VEGFA has been associated with type 2 diabetes [41], waist/hip ratio [42, 43] and erythrocyte traits [44, 45]. Novel variant rs9472142, in CSVEGFA1, is in high LD (r2EA = 0.97) with an identified VEGFA type 2 diabetes variant (rs9472138), supporting an early role of this signal prior to type 2 diabetes onset [22]. Mouse models have also demonstrated that VEGFA signalling is necessary for pancreas specification and differentiation and plays important roles in pancreatic islet blood vessel maintenance and blood flow [46]. CASC8/CASC21 are cancer susceptibility genes and have not been previously associated with insulin or type 2 diabetes, although the CASC8 locus has been associated with BMI-adjusted waist/hip ratio in individuals of African ancestry [47]. The low probability for any single variant identified in fine-mapping CS1 for CASC8/CASC21 indicates an LD block or haplotype for this locus. PTEN is involved in the negative regulation of insulin signalling [48] and has been associated with type 2 diabetes [41, 49]. A low probability for any single variant in fine-mapping CSPTEN1 also indicates a likely LD block or haplotype for this locus. Although several variants in our final novel locus, LRRC37A5P, have previously shown suggestive significance (p < 1.0 × 10−6) in association with diastolic BP in a transethnic meta-analysis of the metabolic syndrome [50], this locus has not previously been associated with fasting glucose. The pseudogene LRRC37A5P is next to the PTGR1 gene encoding an enzyme involved in the inactivation of chemotactic factor, leukotriene B4, which is associated with insulin resistance and obesity [51, 52].

Fine-mapping of known fasting insulin and fasting glucose PPP1R3B and GCK loci containing novel independent secondary signals yielded results consistent with our stepwise conditional analyses. Multiple CSs, including those containing our identified secondary signals, were predicted for each locus. PPP1R3B contributes to insulin signalling through an insulin–Akt–protein phosphatase 1 regulatory subunit 3G (PPP1R3G)–protein phosphatase 1 regulatory subunit 3B (PPP1R3B) regulatory axis, in which PPP1R3B binds to dephosphorylated glycogen synthase (GS), thus relaying insulin signals for hepatic glycogen synthesis [53]. Rare PPP1R3B missense variants may increase the risk of type 2 diabetes, possibly through altered GS function and altered lipid metabolism [54]. GCK encodes the enzyme glucokinase, which acts to maintain glucose homeostasis and has been previously associated with fasting glucose and type 2 diabetes [5, 11, 14, 55,56,57,58]. Specific GCK mutations also cause Mendelian disease phenotypes including MODY2 and permanent neonatal diabetes mellitus (PNDM) [59,60,61]. Continuing to identify the spectrum of natural variation across populations of genes that alter risk for glycaemic traits and type 2 diabetes will enable improvements in risk prediction models for diverse populations.

Strengths of this study include the large study size and representation of multiple ancestrally, ethnically and racially diverse populations, including HA and AA populations, which shoulder a large burden of hyperglycaemia and type 2 diabetes in the USA and historically have been understudied in genetic epidemiology research. However, because the greatest proportion of participants were from HA, AA and EA populations, this study was limited in its ability to detect associations specific to East Asian, South Asian, HI and NAm populations. Additionally, our transethnic fine-mapping approach utilised a combined ancestry LD matrix that was constructed by computing population-specific LD matrices and subsequently weighting by population sample size. This weighted LD matrix approach is limited by the fact that it ‘averages’ LD patterns across populations, thus potentially missing ancestry-specific LD differences. Nevertheless, we applied this approach because it accounts for potentially more than two causal variants at a given loci. Developing computationally scalable fine-mapping methods that leverage ancestry-specific LD patterns while accounting for more than two causal variants is an area of active research.

Furthermore, only the fasting insulin association at the PTEN locus replicated in a transethnic meta-analysis of several multi-ethnic studies, although both the VEGFA and CASC8/CASC21 loci showed suggestive significance. Our inability to replicate several identified loci likely reflects the increasing limitations of replication in large-scale ‘mega-biobank’ studies, since meta-analysis of multiple small independent replication studies, as performed here, may be underpowered [62]. Furthermore, replicating rare variants like the AA-specific LRRC37A5P variant is a known challenge, especially since rare variants tend to be population-specific [63]. To further interrogate rare loci identified in populations thus far underrepresented in GWAS, there must be a continued effort to increase the ancestral diversity of the populations studied in GWAS and all biomedical research.

In summary, this study of glycaemic traits in the diverse PAGE Study identified three novel fasting insulin loci: one AA-specific rare fasting glucose locus; and two novel independent secondary signals at known fasting glucose and fasting insulin loci. These findings reinforce the need to conduct genetic association studies in participants of diverse backgrounds to yield new insights into the genetics of glycaemic traits.