Introduction

Ultraviolet B radiation (UVB; 290–320 nm) exposure is the primary factor influencing vitamin D status in humans, with environmental UVB levels varying considerably by latitude and season. Furthermore, vitamin D status is modulated by variance in vitamin D-associated genes [1, 2], with key genes relating to the production (DHCR7/NADSYN1), binding and transport (GC), metabolism (CYP2R1, CYP27A1, CYP27B1, CYP11A1 and CYP24A1), and activation of vitamin D (VDR and RXRα, RXRβ, RXRγ) [3]. Both UVB exposure and vitamin D-associated single nucleotide polymorphisms (SNPs) are risk factors for vitamin D insufficiency and many related diseases, such as cardiovascular disease, infectious diseases and cancers [1, 4, 5].

The impact of UVB and vitamin D-related genetics are not merely additive, but may also be interactive. Indeed, there is evidence that the frequency of SNPs in vitamin D-associated genes reflect changes in UVB environment [6,7,8,9]. These findings indicate that the functionality of the vitamin D system varies between individuals of differing ethnicities or UVB environments. Genetic differences between populations may also modify vitamin D’s influence on related disease risk [1, 4], warranting further investigation in this area given the current lack of convincing evidence around vitamin D’s roles in many diseases [10]. However, despite an abundance of research into vitamin D-related variants, studies focusing on how the distribution of such variants differs between geographic populations is limited.

The relationship between vitamin D-associated SNPs and skin pigmentation is an important consideration regarding differences between geographically defined populations. Skin pigmentation is an apparent adaptation to differing UVB environments, with darker-pigmented populations originating in areas of high UVB, and lighter-pigmented populations in lower UVB areas [11,12,13]. However, the genetic architecture underlying skin pigmentation differs even between populations exposed to similar UVB regimes. A key example of this is the fact that parts of Europe and East Asia share similar UVB conditions, but the evolution of lighter skin phenotypes in these populations evolved independently, via different genetic adaptions [14, 15].

Similar geographic patterns may exist in vitamin D-associated SNPs. Both vitamin D and skin pigmentation pathways respond to changes in UVB. Importantly, the vitamin D hypothesis proposes that the reduction of skin pigmentation in early humans migrating out of Africa to areas of lower UVB areas occurred to facilitate vitamin D production [11, 12]. This hypothesis is based on the UVB induced synthesis of vitamin D being dependent on skin pigmentation levels, with competition for UVB absorption existing between pigments and the vitamin D cholesterol precursor. Consequently, lighter-skinned individuals can synthesise up to 30 times more vitamin D than darker-skinned individuals following identical UVB exposure [16].

Our current understanding of how variation in vitamin D-associated genes differs between global populations is limited. Notably, there has been a significant focus on examining vitamin D genetics in Europeans [17,18,19] with little attention given to other global populations. Therefore, in the present study, a more comprehensive approach has been taken; genotypic data for variants within multiple vitamin D-related genes was collated from 60 sample sets [2633 subjects] with European, East Asian and Sub-Saharan African origin to examine for potential patterns in the geographic distribution of vitamin D-associated SNPs.

Results

Validation of European, East Asian and Sub-Saharan African groups with skin pigmentation SNPs

The mean allelic frequencies of SLC24A5 rs1426654, SLC45A2 rs16891982 and OCA2 rs1800414 in derived geographic groups did not deviate from previously reported frequencies in populations of European (EUR), East Asian (EAS) and Sub-Saharan African (AFR) ancestry [20, 21]. rs1426654 and rs16891982 frequency were the highest in EUR (0.99 and 0.91, respectively). Conversely, rs1426654 and rs16891982 were near absent in EAS and AFR (mean frequencies 0.00–0.08; Table 1). Presence of rs1800414 was exclusive to the EAS group (mean frequency 0.59).

Table 1 Frequency of skin pigmentation variants in EUR, EAS and AFR groups

Annual UVB levels in European, East Asian and Sub-Saharan African sample set areas

Global mean annual UVB levels and sample set locations are shown in Fig. 1, with the highest mean annual UVB levels found in AFR locations followed by EAS and EUR sample set locations as expected (82.2 vs. 48.1 vs. 18.4 Mw/m2/nm respectively). Intergroup comparisons found significant differences between all geographic areas for annual UVB levels (p < 0.001).

Fig. 1
figure 1

Mean annual UVB levels (surface irradiance at 305 nm) at locations of EUR, EAS and AFR sample sets

Distribution of vitamin D production/transport-related variants (NADSYN1/DHCR7 and GC) across European, East Asian and Sub-Saharan African groups

Sixteen variants in genes involved in vitamin D production (NADSYN1/DHCR7) and transport (GC) were examined, eight within the NADSYN1/DHCR7 loci and eight within GC (Table 2).

Table 2 Frequency of NADSYN1/DHCR7 and GC variants in EUR, EAS and AFR groups

The frequencies of all NADSYN1/DHCR7 variants varied by geographic group (p < 0.0001, r2 0.59–0.87). Patterns of distribution varied by SNP (Table 2). For NADSYN1/DHCR7 variants rs11603330, rs7944926 and rs3794060, allelic frequency differed between all geographic groups, with their distribution coinciding with changes in environmental UVB. rs7944926 increased in areas of increased environmental UVB (i.e. frequency highest in AFR, lowest in EUR), whilst rs11603330 and rs3794060 decreased with increased UVB levels (i.e. frequency lowest in AFR, highest in EUR).

Four other NADSYN1/DHCR7 variants, rs3750997, rs1790325, rs7928249 and rs12800438, frequencies differed in EUR compared to EAS and AFR. rs3750997, rs7928249 and rs12800438 frequencies were increased in EAS and AFR, compared to EUR, with the inverse relationship observed for rs1790325. Another NADSYN1/DHCR7 variant, rs12280295, was near absent in the EUR and EAS (mean frequencies of 0.00), with higher frequency in AFR (0.23). Considering these distribution patterns together, there was no apparent trend for NADSYN1/DHCR7 polymorphisms to be in higher in one geographic region over another.

The allelic frequency of all examined GC genotypes varied by geographic group (p < 0.0001, r2 0.64–0.94). The largest effect was observed for rs705117 (p < 0.0001, r2 0.94), with the frequency of this variant differing between all geographic regions, and decreasing in geographic areas of increasing UVB (EUR 0.84, EAS 0.50 and AFR 0.17). Interestingly, five other GC variants followed this distribution pattern (rs7041, rs222047, rs222016, rs222020, rs843006 and rs705117). Another GC variant, rs4364228 had reduced frequencies in EUR (0.09) and EAS (0.12) compared to AFR (0.45), and a further variant, rs3737549, was shown to absent in the EUR group (0.00), but increasingly present in EAS and AFR (0.14 and 0.22, respectively; Table 2). Considered together, frequencies of examined GC variants were the highest in either EUR or AFR groups, with high frequencies in EAS uncommon.

Distribution of variants in vitamin D metabolism genes (CYP11A1, CYP24A1, CYP27A1 and CYP2R1) across European, East Asian and Sub-Saharan African groups

Fourteen cytochrome P450 (CYP) variants fit the inclusion criteria (two in CYP11A1, five each in CYP24A1 and CYP27A1 and two in CYP2R1). Allelic frequency of all 14 variants varied by geographic groups (p < 0.0001; Table 3).

Table 3 Frequency of CYP11A1, CYP24A1, CYP27A1 and CYP2R1 variants in EUR, EAS and AFR groups

Two CYP11A1 variants varied in frequency by geographic group (rs11632698 and rs2073475; p < 0.0001, r2 0.86 and 0.88, respectively) but displayed different distribution patterns across geographic groups. The distribution of CYP11A1 rs2073475 coincided with increasing UVB (EUR 0.16, EAS 0.45 and 0.58). CYP11A1 rs11632698 frequency significantly differed in EUR compared to EAS and AFR (mean frequency of 0.57 in EUR and 0.20 in EAS and AFR).

Five CYP24A1 variant frequencies varied by geographic group (rs3787557, rs927650, rs912505, rs2762929 and rs4809956, p < 0.0001, r2 0.82–0.85). For three variants, frequencies differed between all geographic groups (rs3787557, rs927650 and rs912505). A potential UVB-dependent trend in rs927650 was noted (frequencies of 0.47, 0.27 and 0.18 in EUR, EAS and AFR groups respectively). For another two variants, rs2762929 and rs4809956, frequency was significantly higher in EUR (rs2762929 0.58, rs4809956 0.81) compared to EAS and AFR groups (rs2762929 0.22–0.33, rs4809956 0.36–0.47). Examining these distribution patterns together, frequencies of CYP24A1 and CYP27A1 variants tended to be the highest in EUR or EAS groups.

Two of the 5 examined CYP27A1 variants, rs691414 and rs692290, appeared to be fixed in EUR and EAS (mean allelic frequencies of 1.00). Conversely, frequencies were significantly reduced in AFR (rs691414; 0.78 and rs692290; 0.60). These variants had the largest effect sizes of examined CYP27A1 variants (p < 0.0001, rs691414 r2 0.89, rs692290; r2 0.96). The remaining examined CYP27A1 variants displayed differing patterns in allelic frequency. rs7568196 had low frequencies in EAS and AFR (0.02–0.22), with increased frequency in EUR (0.40). Frequency of rs13013510 and rs4674338 were significantly different in all geographic groups, with the highest frequency for rs13013510 reported in AFR (0.65), and EAS for rs4674338 (0.93). Interestingly, despite differing distribution patterns observed for CYP27A1 variants, there was a trend for frequencies of these variants to be the highest in EUR and EAS over AFR.

The frequencies of CYP2R1 variants (rs16930625 and rs11023374) differed by geographic group (p < 0.0001, rs16930625; r2 0.41 rs11023374; r2 0.79), although there was no trend for CYP2R1 variants to be higher in one geographic region over others. rs16930625 had low frequencies in all groups (0.06–0.21), but was higher in AFR compared to EUR. rs11023374 had a lower frequency in EAS and AFR (0.01–0.11), compared to EUR (0.28).

Distribution of variants in genes relating to vitamin D activity (VDR, RXRα and RXRγ) across European, East Asian and Sub-Saharan African groups

Sixteen variants in vitamin D-related nuclear receptor genes were examined (five VDR, seven RXRα and four RXRγ; Table 4).

Table 4 Frequency of VDR, RXRα and RXRγ variants in EUR, EAS and AFR groups

The allelic frequencies of all examined VDR variants varied by geographic group (rs886441, rs2283342, rs2107301, rs4334089 and rs4516035; p < 0.0001, r2 0.71–0.93). The greatest effect size was for rs4516035 (p < 0.0001, r2 0.93), which had reduced frequencies in AFR and EAS (0.03), compared to EUR (0.43). VDR rs886441, rs2283342, rs2107301 and rs4334089 allelic frequencies differed between all geographic groups. Only rs4334089 appeared to have a UVB relationship, with frequency decreasing in areas of increasing UVB.

Six of the seven examined RXRα variants varied by the examined geographic groups (rs1805343, rs1805352, rs10881582, rs3118571, rs731516 and rs7040434; p < 0.0001; r2 0.95–0.99). Interestingly, these six RXRα variants followed the same distribution pattern, with differences in AFR when compared to EUR and EAS. For five variants (rs1805343, rs1805352, rs10881582, rs3118571 and rs731516), the allelic frequency was reduced in AFR compared to EAS and EUR. Notably, RXRα rs731516 was fixed in EUR and EAS (mean frequency of 1.0), with reduced frequency in AFR (0.59). rs7040434 was absent in EUR and EAS (0.00) but not AFR (0.53; r2 0.99).

Four RXRγ variants varied by geographic group (rs283695, rs12069160, rs10800098 and rs10489745; p < 0.0001, r2 0.80–0.92). Frequencies of three variants (rs12069160, rs10800098 and rs10489745) did not differ between EUR and AFR, but frequency differed in EAS. The rs283695 variant had increased frequencies in EUR and AFR (0.77 and 0.85 respectively) compared to EUR (0.44).

There was no trend for examined VDR and RXRγ variants to be higher in specific geographic groups, although frequencies of examined RXRα variants appeared to be the highest in either EUR or EAS. However, genotypic data for the RXRα variants were only available for 44–46 of the 60 included sample sets, and unavailable data were mostly from EUR and AFR sample sets, so this may have influenced results.

Discussion

This study demonstrates that variant frequency in multiple vitamin D-associated genes (VDR, RXRα, RXRγ, GC, CYP2R1, CYP27B1, CYP24A1, CYP11A1 and DHCR7/NADSYN1) varies by environmental UVB and ancestry. For many SNPs, frequency followed a trend to either decrease or increase in geographic regions of increasing environmental UVB. However, several SNPs displayed a population-specific pattern that cannot be explained by changes in UVB levels alone. This provides insights into the extent to which vitamin D regulation differs by cohort, and may have consequences for public health recommendations and disease outcomes.

The reported geographic patterns in the frequency of SNPs in CYP genes and RXRα are novel findings. Whilst such variants have been examined previously in differing cohorts, details into how the distribution of these variants differs by ancestry has not been highlighted. CYP2R1 and CYP27A1 enzymatically activate vitamin D, and formation of the excretory form is enzymatically regulated by CYP24A1. CYP11A1 is highly expressed in the skin and represents an important alternative vitamin D metabolism pathway [3, 22]. As such, genetic variance in these pathways may influence vitamin D status and homeostasis.

Multiple RXRα variants displayed similar frequencies in EUR and EAS populations, potentially related to a broad reduction in UVB in Europe and East Asia compared to Sub-Saharan Africa. RXR are the most common subunit forming heterodimers with VDR, but little is known about the influence of RXR variants on vitamin D activity [23]. Expression of the RXRα subtype is particularly high in skin, and therefore SNPs could be of functional relevance to UVB-induced vitamin D activity [24, 25]. However, other UVB-related roles of retinoids and vitamin A derivatives in the skin should be considered, including involvement in circadian rhythm and photo-protection [26].

DHCR7/NADSYN1, VDR, RXRγ, CYP2R1, CYP24A1 and CYP11A1 variants did not display clear patterns of geographic distribution, likely reflecting diverse functional consequences. However, the majority of examined variants reside within introns or untranslated regions. Therefore, linkage disequilibrium of these variants with nearby functional variants needs to be considered.

It was hypothesised that selection of vitamin D-related SNPs would parallel geographic selection for skin pigmentation. The reported associations support this and indicate vitamin D SNPs display population-specific patterns, with genetic differences observed between populations which did not reflect increases and/or decreases in ancestral UVB environments. These population-specific patterns could coincide with migration patterns, as in the case of variants underlying skin pigmentation [14, 15] and support a link between vitamin D and the evolution of lighter skin, with further examination into this association warranted. Notably, evidence of positive selection for DHCR7/NADSYN1 variants has been reported; however, evidence of selection was not found for other examined vitamin D-related genes (CYP2R1 and GC), possibly due to selection taking place at an earlier time than examined, and/or in other vitamin D-associated genes, such as CYP27B1, CYP24A1 or VDR.

Many of the reported associations support previously reported frequency patterns in GC, VDR and DHCR7/NADSYN1 variants [6, 7, 27, 28]. GC rs7041 is a genetic determinant of vitamin D status, with a negative association between frequency and latitude reported [28, 29]. Here, similar latitudinal/UVB clines for several additional GC variants were observed. Of these, rs705117 and rs222020 have been linked to vitamin D status [30, 31]. Latitudinal clines in VDR SNPs have been observed, although these associations were limited to the Africa-Europe axis [6,7,8]. Potential latitudinal clines exist for several VDR variants examined here along this axis, but not when considering the East Asian populations. Several examined DHCR7/NADSYN1 variants (rs12800438, rs7944926, rs3794060, rs12280295) are part of a large haplotype block previously noted to have high frequency in Europeans and North East Asians [27]. Here multiple additional variants in this locus that differed in frequency between populations that may be functionally relevant were identified.

Strengths of this study include the collation of numerous cohorts from three genetically distinct populations exposed to differing UVB regimes and the simultaneous examination of multiple vitamin D-associated variants. However, the analysis was limited by data availability. Furthermore, the inclusion of multiple cohorts from the same area (e.g. multiple Italian and Han cohorts) might have resulted in over-representation of sub-populations in derived geographic groups.

This data is interesting from a human evolution perspective but also has relevance for public health recommendations and understanding disease risk. Vitamin D insufficiency is more likely in darker-skinned individuals, attributed to diminished synthesis of the vitamin due to pigmentation [5, 32, 33]. However, variants displaying apparent interethnic differences in frequency may also contribute to population differences in vitamin D status, and therefore current global and national dietary recommendations for this vitamin may not meet the needs of all populations equally. Further, numerous SNPs in vitamin D pathways have been identified as risk factors for multiple adverse health conditions [1, 4]. Given that variant frequency appears to vary by ancestry, disease risk factors could be population specific. A further possibility is that risks conferred by vitamin D SNPs may change depending on environmental factors, such as UVB exposure, with these concepts requiring further examination.

Conclusions

This study reports population differences for gene variants within multiple vitamin D-related loci that have not been explored previously. A key finding was that the frequency of many of these vitamin D variants are population-specific, and do not reflect changes in ancestral UVB environments. These population differences provide insight into the extent to which vitamin D metabolism and activity may vary between populations of different ancestry via genetic variance in numerous vitamin D-related genes. Given multiple SNPs within examined loci have been identified as disease risk factors, further examination of identified gene variants displaying interethnic differences in frequency and their potential relevance to disease outcomes is warranted.

Methods

NCBI 1000 Genomes Browser [34] and ALFRED (Allele Frequency Database) [35] databases were searched for variants in vitamin D-related genes; VDR; encoding for the vitamin D receptor; RXRα, RXRβ, and RXRγ; retinoid X receptor subtypes, GC; vitamin D binding protein, CYP2R1, CYP24A1, CYP11A1, CYP27A1 and CYP27B1; vitamin D hydroxylases, and DHCR7/NADSYN1; 7-dehydrocholesterol reductase/NAD(+) synthetase (examined together due their close positioning on the genome). Genotypic data was available for 170 variants in these genes. Variants were ranked by population differentiation, using fixation indices (FST) provided by ALFRED. Variants with the highest FST (top 30%; FST ≥ 0.13) were included in analyses, resulting in the inclusion of 51 variants in eight loci. A further four variants with unknown functional consequences were excluded (as per dbSNP - www.ncbi.nlm.nih.gov/snp), resulting in the analysis of 46 variants in VDR, RXRα, RXRβ, GC, CYP2R1, CYP24A1, CYP11A1 and DHCR7/NADSYN1 (Table 5). No RXRβ or CYP27B1 variants fit the inclusion criteria.

Table 5 Vitamin D-associated variants included in the study

Allelic frequencies of included variants were grouped based on European (EUR), East Asian (EAS) or Sub-Saharan African (AFR) ancestry and current residence (Table 6). The United Nations (UN) Geoscheme [37] was used to define the populations included in EUR, EAS and AFR groups. An adjustment was made to include only European Russia in Eastern Europe; defined as the western part of the Russian Federation bordered by the Ural Mountains range [38].

Table 6 Sample sets in derived EUR, EAS and AFR geographic groups

Multiple sample sets from a defined population (e.g. Han Chinese) were included provided there was sufficient evidence they were not duplicates. In the case of duplicates, the most recent data was used. Only sample sets with available genotypic data for all variants used to validate ancestry (described below) were included. Supplementary material 1 outlines available data for each vitamin D-related variant.

Latitude and longitude were recorded for all sample sets. EUR, EAS and AFR groups represented differing latitudinal ranges, with a range of 75° N–38° N for EUR, 49° N–21° N for EAS, 18° N–29° S for AFR (Table 6). These coordinates were used to collect information on the UVB levels in the three geographical regions following previously published methods [39]. Daily noontime surface irradiance data for 305 nm (Mw/m2/nm) was collected from the NASA Nimbus-7 Total Ozone Mapping Spectrometer for the total available period (15 years; 1978–1993), accessed via NASA’s web application, Giovanni [40]. Data was collected for UVB cells covering a half-degree latitude by longitude area within sample set locations and then used to calculate mean UVB levels for sample set location and then for each geographic region. The 305 nm wavelength was chosen as this was the shortest and most relevant of published available wavelengths to vitamin D UVB-synthesis [2]. Further information on UVB data collected for sample sets can be found in the supplementary material 1.

Allelic frequencies of three skin pigmentation-related variants, SLC24A5 rs1426654, SLC45A2 rs16891982 and OCA2 rs1800414, were used to assess if derived geographic groups accurately represented geographic regions with distinct ancestral skin pigmentation [20, 21]. The mean allelic frequencies for derived geographic groups were compared against previously reported frequencies for European, East Asian and Sub-Saharan originating populations. rs1426654 and rs16891982, previously reported to be fixed in Europeans (frequency < 0.90) and absent in East Asians and Africans [20], were used here to validate EUR. rs1800414 is limited to East Asians populations (frequency 0.50–0.60) and absent elsewhere, and was used to differentiate EAS from AFR [21].

Association between geographic groups and frequency of the variant allele for each polymorphism was analysed by least squares regression to generate p values and adjusted r2 values. Categorical comparisons of mean allele frequency between geographic groups were made using ANOVA (Tukey’s post hoc test). Analyses were weighted by cohort sizes. The p value threshold was adjusted for multiple testing using the Bonferroni method [41] to p < 0.001 for associations between variants and geographic region and p < 0.0001 for multiple comparisons between regions. Statistical analyses were performed using JMP (V13; SAS Institute Inc., Cary, NC, USA).