Background

Genetic variation across worldwide populations reflects the widespread impact of human evolutionary history, including processes related to natural selection and demographic history [1]. Large-scale genome-wide association studies (GWAS) are disentangling the complex genetic architecture of human traits and diseases, providing insights into the molecular and cellular mechanisms at the basis of physiological and pathological conditions [2,3,4,5]. Leveraging genome-wide data from these studies, it is possible to investigate whether the SNP-based heritability (SNP-h2, i.e., the proportion of phenotypic variance explained by additive effects of common genetic variation) of human phenotypes is enriched for specific genomic features via a partitioned heritability analysis [6]. Genomic features related to natural selection are enriched for loci associated with complex traits [1, 7,8,9]. In particular, background selection (i.e., the selective removal of deleterious alleles across the genome) appears to play a primary role in shaping the highly polygenic architecture of human traits and diseases [1, 7,8,9]. Positive selection, a measure for adaptive evolution was also detected in complex traits previously [10,11,12]. Introgression from Neanderthals and Denisovans, the only archaic humans sequenced to date, also contributes to the genetic pool of modern populations [13, 14] and consequently to the human phenotypic spectrum [15, 16]. The genomic segments of anatomically modern humans inherited from the admixture events with extinct human species are hypothesized to have contributed to the adaptation processes of worldwide populations that occurred during the colonization of landmasses [17,18,19,20,21]. Additionally, signatures of archaic introgression have been reported in genes associated with hair and skin pigmentation, immunity [16, 17, 19, 20, 22], neoplasms and metabolic traits [19, 23, 24], and male sterility [17, 18]. In populations of European descent (EUR), a phenome-wide association study of Neanderthal-introgressed alleles showed a wide range of associations with physiological conditions related to the immune system, skin pigmentation, and metabolic pathways, and with pathological outcomes such as depression, actinic keratosis, hypercoagulation, and tobacco use [15]. Due to the well-known disparities of ancestry representation in biomedical research, the information currently available regarding the role of human evolutionary history in shaping the genetic architecture of traits and diseases is mostly for EUR individuals. Although few studies investigated archaic introgression in non-EUR-descent groups such as Pacific [25], East Asian [26], Tibetan [27], and Island South East [28] populations, to our knowledge no investigation systematically explored the impact of archaic introgression across human traits and diseases across multiple ancestry groups. This major gap has important implications for the characterization of the history of human populations and its phenotypic consequences on individuals of diverse ancestral backgrounds.

The present study aimed to investigate how archaic introgressions contribute to the polygenic inheritance of human diseases and traits across different ancestry groups. Leveraging data generated from large-scale GWAS conducted in Biobank Japan (BBJ) [29, 30], we analyzed the genetic background of individuals of East Asian descent (EAS). Populations in East Asia present an evolutionary history that is only partially shared with EUR populations. With respect to archaic introgression, earlier studies found that on average, an EAS individual carries a higher percentage of Neanderthal genome DNA than a EUR individual (1.4% and 1.1%, respectively) [17]. EAS populations also show evidence of introgression from Denisovans [18, 19, 31]. Accordingly, we explored how archaic introgression and other evolutionary processes contributed to the genetics of complex traits in EAS and EUR populations. We also conducted a phenome-wide association study (PheWAS) of Neanderthal- and Denisovan-introgressed alleles to characterize their contribution in EAS and EUR individuals and other ancestry groups (CSA: Central/South Asian, MID: Middle Eastern, AMR: Admixed American) available from the UK Biobank (UKBB) [32]. We did not investigate data from UKBB participants of African descent, because no archaic introgression is present in these human groups. Similarly, Denisovan archaic introgression was investigated only in EAS and CSA. Neanderthal introgression was investigated in EAS, EUR, CSA, MID, and AMR. Our findings expand the understanding of how human evolutionary history influenced the genetic liability to complex traits, also providing evidence of the contribution of Denisovan introgression to physiological and pathological conditions in EAS populations.

Results

Partitioned heritability analysis

For the partitioned heritability analysis based on baseline and evolutionary annotations of the human genome, we identified a total of 37 and 39 traits with adequate SNP-h2 estimates (z-score ≥ 7) among those available in both BBJ (EAS participants) and UKBB (EUR participants), respectively (Additional file 1: Table S1). As expected, we observed a strong correlation between effective sample size and heritability z-score in both EAS and EUR (ρ=0.75, p=1.86×10−13 and ρ=0.82, p=2.20×10−16, respectively) (Additional file 1: Table S2). We identified several differences between EAS and EUR enrichments of genome structure and functional annotations (Additional file 1: Table S3). Although some of them may be due to the difference in the statistical power of UKBB and BBJ GWAS, we identified several enrichments that were statistically significant in EAS but not EUR.

We also observed several enrichments for evolutionary features in the SNP-h2 of traits and diseases assessed in EAS and EUR individuals (Table 1, Additional file 1: Table S4). With respect to archaic introgression, we identified one FDR-significant SNP-h2 enrichment: Denisovan-introgressed loci for coronary artery disease in EAS (1.7-fold enrichment, p=0.003). No enrichment for archaic introgression was observed in EUR (Supplemental Table 4). In line with previous studies [10, 33], the strongest enrichments in both ancestry groups were observed for annotations related to genic and LoF intolerant regions. In EAS, 89% and 68% of the traits had significant SNP-h2 enrichments for genic and LoF intolerant regions (False Discovery Rate, FDR q<0.05; Table 1). Platelet count was the most significantly enriched trait in both genic and LoF intolerant regions (1.33-fold enrichment, p=7.64×10−12 and 2-fold enrichment, p=1.98×10−8, respectively). Additionally, we identified several significant enrichments related to B-statistic values in EUR (i.e., reduction in allelic diversity due to purifying selection). Due to the much larger sample GWAS size, all phenotypes in EUR showed FDR significant enrichment in at least one of the B-statistic value thresholds. Background selection was more significantly enriched in lymphocyte count in EUR compared to EAS (EUR: 1.82-fold enrichment, p=1.12×10−18; EAS: 1.30-fold enrichment, p=2.18×10−4; EAS-EUR difference: p=4.59×10−4). Similar to other studies conducted in EUR [34, 35], we did not identify SNP-h2 enrichment for positive selection signatures in our EAS and EUR analyses (Additional file 1: Table S4).

Table 1 Statistically significant enrichment for natural selection and functional annotations in SNP-based heritability of complex traits in East Asian and European populations. Details of all traits and annotations tested are available in Supplemental Table 4

The enrichment of three traits (i.e., blood sugar, mean corpuscular volume, non-albumin protein) was different for H3k27 active enhancer acetylation (H3K27ac) in EAS and EUR (most significant difference: non-albumin protein was more enriched for this functional annotation in EAS compared to EUR (EAS: 2.88-fold enrichment, p=1.22×10−18, EUR: 1.11-fold enrichment, p=0.080, EAS-EUR difference: p=2.96×10−12). Moreover, albumin/globulin ratio was depleted for H3K27ac flanking region in EAS (6.14-fold depletion, p=0.001), but it was significantly enriched in EUR (2.21-fold enrichment, p=2.26×10−10; EAS-EUR difference: p=2.72×10−4). The super-enhancer annotation was enriched in EAS (4.46-fold enrichment, p=3.01×10−17), but not in EUR (1.14-fold enrichment, p=0.105) with respect to non-albumin protein (EAS-EUR difference: p=3.43×10−12). The enrichment of three traits (i.e., lymphocyte count, neutrophil count, non-albumin protein) was different for CpG content between EAS and EUR. The most significant difference was for non-albumin protein, which was more enriched for this functional annotation in EAS compared to EUR (EAS: 1.51-fold enrichment, p=1.37×10−11, EUR: 1.09-fold enrichment, p=2.36×10−6, EAS-EUR difference: 2.89×10−6).

Phenome-wide association study of Archaic introgressed loci

Although we observed only SNP-h2 enrichment with respect to Denisovan-introgressed loci for coronary artery disease in EAS, single loci inherited from Neanderthals and Denisovans can still contribute to the phenotypic variation of human populations [15]. Therefore, we performed a PheWAS of Neanderthal and Denisovan introgressed loci across multiple ancestry groups available from the UK Biobank (Additional file 1: Table S5) and identified several associations surviving FDR multiple testing correction at 1%. In our analysis, we tested introgressed loci that (i) matched only Neanderthal genome, (ii) matched only Denisovan genome, and (iii) matched both Neanderthal and Denisovan genomes.

In EAS, Denisovan introgressed SNP rs62391664 was associated with albumin/globulin ratio (beta=−0.17, p=3.57×10−7; Fig. 1A, Additional file 1: Table S6). Among Neanderthal introgressed loci, rs79043717*A, rs145929965*C, and rs76966342*A alleles showed the strongest associations with respect to lower risk for “No bipolar or depression” (beta=−1.50, p=1.10×10−7), “handedness” (beta=−3.54, p=6.45×10−7), and “illnesses of father” (beta=−0.44, p=9.27×10−7), respectively (Fig. 1B, Additional file 1: Table S7). Introgressed alleles matching both Denisovan and Neanderthal genomes were associated with increased risk of “shortness of breath” (rs77589994*A beta=5.27, p=1.10×10−8) and “breast cancer” (rs12143332*A beta=1.56, p=1.69×10−7), and lower chance of “duration of vigorous activity” (rs74962884*G beta=-0.26, p=3.20×10−7, Fig. 1C, Additional file 1: Table S8). Among Neanderthal and Neanderthal/Denisovan introgressed loci, we also observed few associations related to dietary habits (e.g., “bread consumed”; Additional file 1: Tables S7 and S8).

Fig. 1
figure 1

PheWAS Manhattan plots showing variant associations with UKBB phenotypes in EAS. Panel A shows associations with Denisovan-introgressed alleles, panel B depicts associations with Neanderthal-introgressed alleles, and panel C shows associations with introgressed alleles matching both Denisovan and Neanderthal genomes. Phenotype categories are shown on the x-axis, while -log10 (p-values) are shown on the y-axis. The dashed line shows the FDR-significant threshold (q < 0.01)

In EUR, Neanderthal-introgressed alleles were associated with 82 phenotypes, being red hair color (rs60733936*A beta=−0.86, p=1.81×10−157) and alkaline phosphatase (rs11244089*A beta=−0.10, p=1.44×10−109), the most significant (Additional file 1: Table S9). Because of the large number of EUR-Neanderthal associations surviving multiple testing correction (FDR q<0.01), we tested whether these associations were specifically enriched for one or more of the phenotypic domains investigated (Additional file 1: Table S5), observing an over-representation of EUR-Neanderthal associations with metabolic traits (27.52-fold enrichment, p=6.61×10−7). Although a limited sample size is available for other ancestry groups available from UKBB, we identified several associations with Neanderthal-introgressed alleles in CSA and MID (Additional file 1: Tables S10–S11). No Denisovan- or Denisovan/Neanderthal-introgressed locus was associated to any phenotype in CSA (Additional file 1: Tables S12 and S13). Interestingly, some of the associations were related to the use of certain medications, including those related to pain management (e.g., aspirin in CSA) and opioids (MID) and antihypertensive medication (indapamide and alfuzosin in CSA). No association survived multiple testing correction in AMR (Additional file 1: Table S14).

Enrichment for biological processes, cellular components, and molecular functions

Considering the loci identified in our PheWAS, we tested the enrichment for biological processes, cellular components, and molecular functions. Considering the Neanderthal loci identified in the PheWAS in EUR, we found 30 gene-set enrichments (FDR < 5%) related to genomic regulation (Additional file 1: Table S15). Among them, we observed genes targeted by several microRNAs (miRNA, e.g., Hsa-miR-374b, FDR q=9.27×10−5) and by different transcription factors (e.g., WT1 in human podocyte, FDR q=9.27×10−9). Due to the limited number of loci identified in other ancestries, no enrichment survived multiple testing correction.

Discussion

Previous studies showed that Neanderthal-introgressed loci are associated with immunological, neurological, psychiatric, metabolic, cardiovascular, and dermatological outcomes in EUR populations [10, 15,16,17]. In our study, we expanded this previous knowledge by testing for enrichment and depletion of SNP-h2 for loci related to Denisovan- and Neanderthal-introgression and several other evolutionary features across multiple traits in EAS and EUR populations. Additionally, we provide the first evidence of the consequences of Denisovan introgression across the human phenotypic spectrum in human groups of East Asian descent.

Leveraging EAS genome-wide information, we observed that Denisovan-introgressed loci are more enriched with the heritability of coronary artery disease than expected by chance. Two related cardiovascular phenotypes, myocardial infarction, and coronary atherosclerosis were previously associated with Neanderthal-introgressed loci in EUR [15]. In our EAS PheWAS of introgressed loci matching Denisovan/Neanderthal loci, we identified an association with “shortness of breath walking on level ground”, which is a trait related to cardiovascular health [36]. The associated variants locus, rs77589994 mapped to the TRAP1 gene that encodes a protein regulating cellular stress responses [37]. The first PheWAS of Neanderthal-introgressed alleles in EUR found that Neanderthal alleles explained a significant proportion of variance in risk in coronary atherosclerosis [15]. In line with this previous evidence, we observed that “vascular/heart problems diagnosed by doctor” was associated with a Denisovan-Neanderthal introgressed SNP, the LINGO2 rs74597612 variant in EUR.

Our EUR PheWAS of Neanderthal-introgressed SNPs was enriched for associations related to metabolic traits. This overrepresentation was not present in the previous Neanderthal-introgression PheWAS. This could be due to the different characteristics of the cohorts investigated. Our PheWAS conducted in the UKBB, which is a middle-aged sample healthier and wealthier than the general population [38]. Conversely, the previous PheWAS was conducted in the Electronic Medical Records and Genomics (eMERGE) Network [39] which is a sample combining participants enrolled from multiple healthcare systems. Sample-specific characteristics may have affected the statistical power of detecting associations with respect to certain health domains. Similarly. another study investigating Neanderthal-introgressed SNPs in EAS and EUR found multiple associations with autoimmune diseases, prostate cancer, and type 2 diabetes [24]. This study used a different method to assign Neanderthal-introgressed alleles than the one applied in our study. In this previous analysis, Neanderthal-introgressed alleles were defined as those present in modern non-African populations that are shared with the Vindija Neandertal genome using a linkage disequilibrium-based test for incomplete lineage sorting (ILS). Considering loci identified from multiple sources, this previous investigation tested whether they were Neanderthal introgressed using the ILS method. Therefore, due to the different study designs, a different set of associations were identified. Another recent study investigating Neanderthal-introgressed alleles showed associations with hair color and hematological biomarkers that are consistent with our results [40].

In EAS, a Denisovan-introgressed allele was associated with a metabolic phenotype, albumin/globulin ratio. To our knowledge, this is the first evidence of the effect of Denisovan introgression on the phenotypic expression of EAS modern populations. In our EUR PheWAS, a Neanderthal-introgressed allele was associated with albumin. Although these are two different hematological parameters, the convergence on albumin-related biomarkers may suggest an evolutive pressure on archaic introgression with respect to genes involved in albumin regulation. Among Neanderthal introgressed alleles in EAS, we also identified an association with handedness. This trait is particularly interesting with respect to human evolution, because it arose after the chimp and human lines were separated between 5 and 7 million years ago [41]. Neanderthal hominins appear to be right-handed in line with manual lateralization and brain functional asymmetry that is also present in modern humans [42]. Accordingly, the association of Neanderthal-introgressed loci with handedness in EAS may suggest that of the human populations. With respect to pathological conditions, breast cancer was also associated with a shared Denisovan- and Neanderthal-introgressed variant in EAS. Although this is a novel finding, Neanderthal-introgressed haplotypes were previously associated with prostate cancer [24]. This may suggest that variants introgressed from archaic genomes may play a role in the pathogenesis of cancers linked to sex hormone regulation [43].

In our evolution-focused SNP-h2 enrichment analysis, we detected an overabundance of genic and LoF intolerant loci in both EAS and EUR, suggesting that functionally important regions of the genome contribute to SNP-h2 to a different extent compared to the other annotations tested [10, 33, 40, 44]. Most of the traits tested were also enriched in CpG content, which is known to be positively correlated with genic content [45]. Genic and LoF intolerant regions are strongly under negative selection [46]. While most EUR phenotypes (76%) were highly enriched in B-statistic values, we only found one FDR-significant association in EAS (serum creatinine). A similar disparity between EUR and EAS findings was also present for the B-statistic continuous annotation. This is likely due to the much larger sample size available in EUR and may not reflect a general lack of evidence for background selection in EAS populations (Supplemental Table 2). Conversely, some functional enrichments were significantly more enriched in EAS than in EUR. For example, the super-enhancer annotation was enriched in EAS, but not in EUR. Genomic regions including enhancers have been shown to present an accelerated evolutionary rate, which is a signature of positive selection [11]. However, similar to previous studies [34, 35], none of the positive-selection annotations tested was significant in the two populations tested. We also observed that several Neanderthal-introgressed loci identified were related to transcriptomic regulation via transcription factors (i.e., proteins that control transcription from DNA to mRNA) and miRNA (i.e., non-coding RNA responsible for RNA silencing and post-transcriptional gene expression regulation) in EUR. MiRNA seed regions are under significant background selection [47] and their function can be affected by variants introgressed from Neanderthals [48].

Although our study provides novel insights into the role of human evolutionary history in the genetics of traits and diseases in worldwide populations, we acknowledge that the results generated are strongly affected by the well-known overrepresentation of EUR populations in human genetic research [49]. Accordingly, the analyses conducted were more statistically more powerful when conducted in EUR-based datasets than in EAS-based ones. However, we demonstrated that the majority of functional annotations were not statistically different between EAS and EUR in their enrichment for the SNP-based heritability of complex traits. When a stronger enrichment was observed in EUR, we cannot distinguish whether this is due to the larger sample size available in EUR or to genetic differences between the two populations investigated. Conversely, when a stronger enrichment was observed in EAS, this is related to human genetic diversity. Nevertheless, it is important to highlight that our findings are consistent with the fact that the fundamental biology of human traits and disease is shared among worldwide populations and that diversity among ancestral groups affects only a relatively small component of the genetic predisposition to complex traits. Additionally, further studies are needed to disentangle the role of environment, demography, and natural selection in the inter-population differences observed.

Conclusions

Our study expands the understanding of how archaic introgression contributed to the genetic architecture of human traits and diseases across worldwide populations. In particular, we present evidence that Denisovan and Neanderthal introgression contributed specifically to shape the genetics of complex traits in East Asian populations and other human groups currently underrepresented in genetic research. This highlights the need to expand the representation of human diversity in genetic research to ensure a comprehensive understanding of the complex dynamics by which the variation in the human genome is linked to the variation in the human phenome.

Methods

Datasets

GWAS statistics were accessed from BBJ [29, 30] and the UKBB [50]. BBJ is a registry of over 200,000 Japanese patients including information about 47 diseases and 59 quantitative traits (Supplemental Table 1) [29, 30]. The UKBB dataset provides information regarding more than 7000 phenotypes assessed in up to 500,000 participants from six ancestry groups [32]. We obtained genome-wide association statistics from a pan-ancestry genetic analysis of the UKBB (Pan-UKBB). A detailed description of this analysis is available at https://pan.ukbb.broadinstitute.org. Briefly, multi-ancestry genome-wide association analyses of 7,221 phenotypes were performed using a generalized mixed model association testing framework. We used ancestry-specific GWAS statistics available for five genetically-defined ancestry groups: EUR (N=420,531), CSA (N=8876), EAS (N=2709), MID (N=1599), AMR (N=980). We did not investigate data from UKBB participants of African descent, because no archaic introgression is present in these human groups. Similarly, Denisovan archaic introgression was investigated only in EAS and CSA. Neanderthal introgression was investigated in EUR, CSA, EAS, MID, and AMR.

Annotations measuring archaic introgression, positive-and negative selection, and functionally important regions

SNP-h2 partitioning [6] was performed considering 95 baseline genomic annotations (baseline-LD model v2.2 downloaded from https://alkesgroup.broadinstitute.org/LDSCORE/) characterizing important molecular properties such as allele frequency distributions, conserved regions of the genome, and regulatory elements [9]. The full model included annotations from Finucane et. al. (2015) [6] including coding, UTR, promoter, and intronic regions. Then additional annotations were added to the model including four human promoter annotations (promoter, promoter from the Exome Aggregation Consortium [33], and two corresponding flanking annotations) [34], three human enhancer annotations (enhancer and corresponding flanking annotation + enhancer-enhancer conservation count) [34], two human promoter sequence age annotations (including one flanking annotation) [35], and two human enhancer sequence age annotation (including one flanking annotation) [35].

We created additional genome-wide annotations for Denisovan [51, 52] and Neanderthal [18, 51,52,53]-introgressed, positively selected [12, 35, 54], negatively selected [1, 55], genic and LoF intolerant [33] positions using the publicly available datasets from the original publications. Denisovan (N=6515) and Altai Neanderthal-introgressed (N=49,793) positions were derived from the Sprime dataset [52], which identified these archaic-introgressed positions from the 1000 Genome Project with respect to the Japanese population sample (i.e., Japanese in Tokyo, Japan). This reference population was selected because our primary analysis was conducted with respect to East Asian populations available from BBJ and UKB. We defined Denisovan SNPs as those matching the Denisovan genome. Neanderthal SNPs we selected were matched uniquely to the Neanderthal genome. The contribution of archaic ancestry was also assessed by another method that identifies segments of archaic ancestry in modern human genomes without the need for archaic reference genomes [18, 53].

Positive selection was tested based on the integrated haplotype score (iHS) for Asian populations, which reports detection of positive selection during the last ~30,000 years based on the detection of abnormally long haplotypes [56]. Cross-population extended haplotype homozygosity (XP-EHH) comparing EAS and EUR ancestries based on 1000 Genomes was also used to detect differential selective pressure since the two populations diverged [35]. The B-statistic for EAS was used to assess background selection. B measures phylogenetic information from other primates to determine the reduction in allelic diversity in humans due to purifying selection [1]. The Exome Aggregation Consortium (ExAC) database was used to annotate genic and LoF intolerant regions of the genome. Each gene was assigned a probability of LoF intolerance (pLI) score [33]. Continuous evolutionary measurements were analyzed as top 2%, top 1%, and top 0.5% of scores genome-wide as binary annotations as recommended before due to the difficulty of setting specific thresholds to define regions under negative- and positive selection [10, 44, 55]. The evolutionary annotations used in EUR are reported in Wendt et al. [10]. Apart from those previously reported, we created additional annotations for Denisovan- and Neanderthal-introgressed positions for EUR as explained before.

Statistical analysis

Linkage Disequilibrium Score Regression

The Linkage Disequilibrium Score Regression method (LDSC) was used to quantify the enrichment of evolutionary annotations in the SNP-h2 of each trait [5]. For each binary trait, the effective sample size was calculated as recommended previously [57]. The major histocompatibility complex region was excluded from the analysis due to its complex LD structure. To compare BBJ EAS participants with other ancestry groups, we selected 79 UKBB traits that were assessed similarly to those available in BBJ. SNP-h2 was calculated for each phenotype and, as recommended by the developers [6], the 69 traits with an estimated SNP-h2z score ≥ 7 were selected for the partitioned SNP-h2 analysis to test whether certain functional categories of the human genome contribute disproportionately to the heritability of the traits investigated. Due to the limited sample size in UKBB for other ancestry groups, we limited our partitioned SNP-h2 analysis to the data derived from BBJ EAS and UKBB EUR participants. Accordingly, we used LD scores generated from the 1000 Genome Project Phase 3 EAS and EUR reference panels to analyze GWAS data generated from BBJ and UKBB, respectively [58]. We applied FDR multiple-testing correction (q ≤ 0.05) [59] accounting for the number of phenotypes tested. Partitioned SNP-h2 in LDSC analyzes a large linear model including all annotations described in the previous section simultaneously such that enrichment values for a single annotation reflect independence from all other annotations in the model.

Phenome-wide association study

To increase the resolution of our investigation (from heritability enrichment to single-variant contribution), we conducted a PheWAS of Denisovan (N=6515) and Neanderthal introgressed (N=49,793) loci, and shared loci between Denisovan-and Neanderthal (N=22,787) in EAS and CSA. As mentioned above, we only tested Neanderthal introgression in the other ancestry groups (AMR, EUR, MID). PheWAS tests for association between single variants and a large number of different phenotypes. The association statistics of 7,221 phenotypes were derived from the Pan-UKBB analysis (details available at https://pan.ukbb.broadinstitute.org, Additional file 1: Table S5). Briefly, the genome-wide association analysis was conducted using the Scalable and Accurate Implementation of Generalized (SAIGE) mixed model and including a kinship matrix as a random effect and covariates as fixed effects. The covariates included age, sex, age × sex, age2, age2×sex, and the top 10 within-ancestry principal components.

Our phenome-wide analysis included traits related to body structures, cardiovascular, cognitive, dermatological, ear-nose-throat, endocrine, environmental, gastrointestinal, hematological, immunological, medication, metabolic, musculoskeletal, neoplasms, neurological, nutritional, ophthalmological, psychiatric, respiratory, and urogenital domains (Supplemental Table 5). These phenotypic categories are similar to the ones used in the GWAS Atlas [60]. To keep the type I error rate at 1%, we applied FDR multiple testing correction considering q < 0.01 [59] accounting for the number of phenotypes, variants, and ancestries tested to identify associations surviving multiple testing correction. Variants with minor allele frequency (MAF) ≤ 0.05 and the variants with the “low-confidence” flag (i.e., variants with alternate allele count in cases ≤ 3, alternate allele count in controls ≤ 3, or minor allele count (cases and controls combined) ≤ 20) in the Pan UKBB analysis were excluded from the analysis. To define independent loci among those identified as significant by our PheWAS, we performed LD clumping using PLINK 1.9 [61] with a r2=0.1 within 500 kb windows. The significant LD-independent variants were annotated to genes using the SNP Nexus variant annotation tool [62].

Gene Ontology Enrichment

The significant genes identified in each PheWAS were analyzed for gene ontology enrichment using the ShinyGO toolset [63] using all protein-coding genes in the genome as background set and functional and molecular annotations (e.g., molecular pathways and gene ontology) from Ensembl [64]. Gene ontology enrichment is used to interpret sets of genes using Gene Ontology system [65] of classification based on their functional characteristics. We considered FDR q < 0.05 to identify enrichments surviving multiple testing correction.

Over-representation test

To test for over-representation of certain phenotypic classes among the associations observed in the PheWAS, we calculated the significance of the phenotypic enrichment by a hypergeometric distribution test (https://systems.crump.ucla.edu/hypergeometric/) where k is the number of phenotypes with at least one LD-independent association within the phenotype category of interest, s is the number of phenotypes with at least one LD-independent association, M is the number of phenotypes within the phenotype category of interest, and N is the number of phenotypes tested.