Mapping and annotating genomic loci to prioritize genes and implicate distinct polygenic adaptations for skin color

Kim, Beomsu; Kim, Dan Say; Shin, Joong-Gon; Leem, Sangseob; Cho, Minyoung; Kim, Hanji; Gu, Ki-Nam; Seo, Jung Yeon; You, Seung Won; Martin, Alicia R.; Park, Sun Gyoo; Kim, Yunkwan; Jeong, Choongwon; Kang, Nae Gyu; Won, Hong-Hee

doi:10.1038/s41467-024-49031-4

Mapping and annotating genomic loci to prioritize genes and implicate distinct polygenic adaptations for skin color

Article
Open access
Published: 07 June 2024

Volume 15, article number 4874, (2024)
Cite this article

Download PDF

You have full access to this open access article

From

View current issue

Mapping and annotating genomic loci to prioritize genes and implicate distinct polygenic adaptations for skin color

Download PDF

1277 Accesses
2 Altmetric
Explore all metrics

Abstract

Evidence for adaptation of human skin color to regional ultraviolet radiation suggests shared and distinct genetic variants across populations. However, skin color evolution and genetics in East Asians are understudied. We quantified skin color in 48,433 East Asians using image analysis and identified associated genetic variants and potential causal genes for skin color as well as their polygenic interplay with sun exposure. This genome-wide association study (GWAS) identified 12 known and 11 previously unreported loci and SNP-based heritability was 23–24%. Potential causal genes were determined through the identification of nonsynonymous variants, colocalization with gene expression in skin tissues, and expression levels in melanocytes. Genomic loci associated with pigmentation in East Asians substantially diverged from European populations, and we detected signatures of polygenic adaptation. This large GWAS for objectively quantified skin color in an East Asian population improves understanding of the genetic architecture and polygenic adaptation of skin color and prioritizes potential causal genes.

A GWAS in Latin Americans highlights the convergent evolution of lighter skin pigmentation in Eurasia

Article Open access 21 January 2019

Genetics of skin color variation in Europeans: genome-wide association studies with functional follow-up

Article Open access 12 May 2015

Adaptation of human skin color in various populations

Article Open access 15 June 2017

Introduction

Skin color is one of the few highly heritable phenotypes that varies between human populations because of strong selection for locally varying environments. At lower latitudes, where ultraviolet (UV) light is intense, dark skin protects against the photolysis of serum folate and has photoprotective properties^1,2, whereas at higher latitudes, light skin is advantageous for vitamin D synthesis in reduced UV-B light. This strong correlation between skin color and latitude is mirrored by signatures of positive selection around genetic variants that influence skin pigmentation³, which may reflect local adaptation to regional UV environments^4,5. The UV-B is responsible for vitamin D formation and affect endocrine gland functions and overall body homeostasis^6,7.

Genetic factors explaining skin color diversity according to ancestry and selection signals have been identified^4,8,9. For example, mutations in genes known to affect the function of melanocytes, such as MITF, MC1R, OCA2, and SLC45A2, are the target of natural selection in a novel environment with reduced UV exposure at higher latitudes along human migration pathways from sub-Saharan Africa. Notably, Europeans and East Asians have both shared and unique signatures of positive selection related to skin pigmentation. For instance, they share the same allele of KITLG that causes light pigmentation; however, alleles of SLC24A5 and MC1R are population-specific^4,9.

Recent genetic studies have identified substantial skin color diversity, even within the same population, and novel genetic determinants of skin pigmentation^10,11,12. However, previous genetic studies on skin color-related traits have been conducted mostly in African and European populations, and the genetic architecture of skin color in East Asian populations remains poorly understood (Supplementary Data 1). Since the evolution of lighter skin pigmentation is among the most tantalizing examples of human adaptive evolution¹³, a deeper investigation of the genetic determinants of skin color in East Asian populations is required to better understand the evolution of our own species.

Here, we conducted a large-scale genome-wide association study (GWAS) for objectively quantified skin color in 48,433 East Asians. We identified 23 loci associated with skin color, including 11 previously unreported loci, and showed the overall divergence of the identified genomic loci from the European population. Moreover, we quantified the interaction between genetic variants and sun exposure on skin color at the polygenic level. Our study provides further genetic evidence for the involvement of skin pigmentation genes in melanocytes and distinct polygenic adaptations under selection pressure worldwide.

Results

Study participants and quantification of skin color using image analysis

Of the 52,712 participants, 48,433 (91.9%) passed strict quality control procedures (see Methods). The skin color of the participants was quantified using the international commission on illumination (CIE) LAB values from photographs of sun-exposed skin: L*, a*, and b* values for skin luminance, red/green component, and yellow/blue component, respectively (Fig. 1). The distribution of the three skin color traits and other characteristics of the study participants was similar between camera resolution groups (group A and B skin color was measured using 18-megapixel images for 20,538 participants and 24.2-megapixel images for 27,895 participants, respectively) (Supplementary Data 2). L* and other skin color traits were negatively correlated (L* and a*, Pearson’s correlation coefficient [ρ] = −0.58, P < 2 × 10⁻¹⁶; L* and b*, ρ = −0.30, P < 2 × 10⁻¹⁶), whereas a* and b* were positively correlated (ρ = 0.32, P < 2 × 10⁻¹⁶) in 40,790 unrelated participants. Age, sun exposure time per day, and duration of outdoor activity were negatively correlated with L* and positively correlated with a* and b* (Supplementary Fig. S1). Sunblock usage was positively correlated with L* (β = 0.366, P < 2 × 10⁻¹⁶) and b* (β = 0.095, P = 3.02 × 10⁻⁷) but negatively correlated with a* (β = −0.203, P < 2 × 10⁻¹⁶). The distribution and effect of sun exposure variables varied across age groups (young [<37 years], middle [37–49 years], and old [> 49 years] age groups) (Supplementary Fig. S2 and Supplementary Data 3). For example, the effect of sun exposure time per day and sunblock usage on L* was 1.50 (effect size of the interaction term between sun exposure variable and age group [β_sun×age] = 0.203, P = 1.22 × 10⁻³) and 1.48 (β_sun×age = 0.155, P = 0.024) times greater, respectively, in the old age group when compared with the young age group. Male participants had lower L* (β = −4.848, P < 2 × 10⁻¹⁶) and higher a* (β = 2.566, P < 2 × 10⁻¹⁶) and b* (β = 1.005, P < 2 × 10⁻¹⁶) values than female participants.

**Fig. 1: Skin color distribution of participants.**

GWAS of skin color

In the discovery phase, we performed a GWAS meta-analysis for each skin color trait (L*, a*, and b*) in 48,433 East Asian participants (Supplementary Figs. S3 and S4). A total of 5,066,750 autosomal variants were tested for associations with skin color adjusted for age, sex, sun exposure variables, measurement month, genotyping array, and the first 10 principal components (PCs) of genetic ancestry (Fig. 2a). By applying a Bayesian linear mixed model (BOLT-LMM)¹⁴ with PCs as covariates, no evidence of population stratification was observed in quantile-quantile plots of the GWAS results (Supplementary Fig. S5). The effect size estimates in the discovery GWAS were not correlated with the variant loadings of the first 10 principal components that represent the population structure (Supplementary Fig. S6). The genetic correlation across skin color traits exhibited the same trends as the phenotypic correlation (L* and a*, r_g = −0.72, P = 1.42 × 10⁻⁴³; L* and b*, r_g = −0.50, P = 9.48 × 10⁻¹³; a* and b*, r_g = 0.21, P = 0.016). A total of 138,839 variants on chromosome X were also tested for associations with skin color in 42,770 female participants. However, none of the tested variants exhibited statistical significance (Supplementary Fig. S7).

We identified 26 lead variants at 23 independent loci associated with skin color traits, including 15 variants at 13 loci for L*, 15 variants at 13 loci for a*, and 13 variants at 12 loci for b*, respectively (Table 1 and Supplementary Data 4). Two independent lead variants were identified at each locus near GLIS1, OCA2, and MC1R (Supplementary Fig. S8). Among the 23 independent loci, 12 were previously reported and 11 were previously unreported, including 6, 8, and 2 loci for L*, a*, and b*, respectively. Detailed information on previous reports on skin color-related traits is provided in Supplementary Data 1. Among the previously unreported loci, GLIS1, SEM1, and GAB2 have been identified as being associated with disease, particularly in inflammatory epidermal conditions and melanoma^15,16,17. This GWAS of objectively quantified skin color traits identified more significant loci than a GWAS based on the categorized skin color according to the individual typology angle (ITA°) value criteria (Fig. 1c and Supplementary Fig. S9). The GWAS of the categorical skin color using POLMM¹⁸ identified 11 of 26 lead variants, with no additional significant loci, including two and nine variants in previously unreported and reported loci, respectively.

Table 1 Lead variants associated with the skin color identified through meta-analysis of GWAS (P < 5 × 10⁻⁸)

Full size table

Including rs74653330 (p.Ala481Thr), a missense lead variant in OCA2, a total of 12 independent nonsynonymous variants associated with skin color traits were identified in the GWAS (Supplementary Data 5): 1 in a previously unreported locus, 9 in previously reported loci, and 2 in nominally significant loci (top variants of 2 loci did not pass the genome-wide significance level, P < 5.0 × 10⁻⁸). Notably, a nonsynonymous variant, rs2511188 (p.Ile677Val), was identified in a previously unreported locus USP35 (P = 7.42 × 10⁻⁷ for L*), a gene that has been reported as a potential immunosuppressive factor in melanoma¹⁹.

The explanatory power of the identified loci for skin color was measured by its incremental R² value, defined as the increase in adjusted R² from the linear regression model with covariates only to the model with covariates and lead variants (Fig. 2b). The incremental R² values of the lead variants in all significant loci, including previously unreported loci, were 2.69%, 1.11%, and 3.47% for L*, a*, and b*, respectively. Compared to those of the lead variants in reported loci only, the incremental R² values of all loci, including previously unreported loci, were increased by 17.16%, 115.34%, and 5.57% for L*, a*, and b*, respectively.

Replication of GWAS results

The association between the lead variants and skin color traits was examined in 4,992 individuals (10.3% of the discovery cohort) who were externally independent of the discovery cohort (Supplementary Data 6). In the replication analysis, the effect sizes of the lead variants were comparable to those in the discovery GWAS (Spearman’s correlation coefficients [r_s] between effect sizes: L*, 0.908; a*, 0.761; b*, 0.832) and 8, 5, and 7 loci (lead variants or their proxies [LD r² ≥ 0.8 and within a 50 kb]) showed nominal associations for L*, a*, and b*, respectively (P < 0.05) (Supplementary Fig. S10 and Supplementary Data 4). The power-adjusted transferability (PAT) ratios of the discovery GWAS to the replication GWAS were 0.843, 0.552, and 0.729, for L*, a*, and b*, respectively, which was calculated by dividing the observed number by the expected number of nominally significant loci (see Methods).

To assess the replicability under comparable sample sizes, we conducted 10-fold cross-validation on the discovery set (4843–4846 individuals in each validation set) (Supplementary Data 7). The effect sizes of the lead variants from the validation set were comparable to those from the training set in each fold: r_s between effect sizes of the lead variants were 0.853–0.988, 0.659–0.882, 0.797–0.979 in GWAS for L*, a*, and b*, respectively (Supplementary Fig. S11 and Supplementary Data 8). The PAT ratio in each fold of cross-validation was similar to that in the replication analysis: PAT ratios were 0.677–1.182, 0.552–1.000, 0.688–1.020 in GWAS for L*, a*, and b*, respectively (Supplementary Fig. S12 and Supplementary Data 8). The limited replication of the lead variants, particularly for previously unreported variants, might be attributed to insufficient statistical power due to a small sample size. In the permutation meta-analysis of 10-fold groups (see Methods), the number of significant loci increased with larger sample sizes of GWAS, and previously unreported variants were not identified until the sample sizes reached 40–80% (60%, 40%, and 80% for L*, a*, b*, respectively, based on the median number of significant loci) of the discovery GWAS (Supplementary Figs. S13, S14).

To assess the replicability of GWAS results at the polygenic level, polygenic scores for skin color in 4411 unrelated participants from the replication set were calculated using the discovery GWAS results for L*, a*, and b*. Polygenic scores for L*, a*, and b* were significantly associated with the corresponding skin color traits (L*, β = 0.54, P = 7.79 × 10⁻²²; a*, β = 0.44, P = 1.11 × 10⁻²⁰; b*, β = 0.37, P = 1.89 × 10⁻¹²) in each linear model adjusted for GWAS covariates (Supplementary Fig. S15).

Heritability estimation and functional enrichment

Using linkage disequilibrium (LD)- and minor allele frequency (MAF)-stratified multicomponent genomic restricted maximum likelihood (GREML-LDMS) analysis, we estimated single-nucleotide polymorphism (SNP)-based heritability of skin color traits (L*, \({h}_{{{\mbox{g}}}}^{2}\) = 0.240, s.e. = 0.010, P = 7.12 × 10⁻⁶; a*, \({h}_{{{\mbox{g}}}}^{2}\) = 0.238, s.e. = 0.010, P = 1.64 × 10⁻⁹; b*, \({h}_{{{\mbox{g}}}}^{2}\) = 0.232, s.e. = 0.010, P = 6.29 × 10⁻⁸). The highest LD score quartile (1st LD quartile) accounted for approximately half of the total SNP-based heritability (45–54%), and differences in SNP-based heritability across MAF quintiles were less than 0.02 (Supplementary Data 9). In consideration of the observation that the effect of the environment varied across age groups, we also estimated SNP-based heritability in the three female age groups (Fig. 2c and Supplementary Data 9). The SNP-based heritability of skin color traits was the highest in the “young age” group (L*, \({h}_{{{\mbox{g}}}}^{2}\) = 0.412, s.e. = 0.038, P = 1.77 × 10⁻³; a*, \({h}_{{{\mbox{g}}}}^{2}\) = 0.310, s.e. = 0.039, P = 0.030; b*, \({h}_{{{\mbox{g}}}}^{2}\) = 0.307, s.e. = 0.038, P = 0.012).

Exocrine glands, skin, and connective tissue cells in the GWAS of L* (FDR = 0.089 for three tissues), and epithelial cells in the GWAS of a* (FDR = 0.001) were significantly enriched (Supplementary Fig. S16a and Supplementary Data 10) in tissue enrichment analysis using DEPICT²⁰. The “penis foreskin melanocyte primary cells” in skin tissue had the highest odds ratio (OR) in the GWAS of L* (OR = 4.75, 95% confidence interval = [2.29, 9.89], P = 3.00 × 10⁻⁵) in the epigenetic feature enrichment analysis using GARFIELD²¹ (Supplementary Fig. S16b).

Colocalization with expression quantitative trait loci (eQTL) in skin tissues

To map potential causal genes, we performed a colocalization analysis between 23 genome-wide significant loci and eQTLs in 2 types of skin tissue (sun-exposed lower leg skin and non-sun-exposed suprapubic skin) from the Genotype-Tissue Expression project (GTEx) and skin tissue from the Multiple Tissue Human Expression Resource (MuTHER) using COLOC (Fig. 2a and Supplementary Data 11). The eQTL results of 18 genes from skin tissue were colocalized (posterior probability for colocalization [PP.H4] > 0.8) at genome-wide significant loci. SLC6A17, SLC45A3, PM20D1, KLHL2, DLX6, CTR9, USP35, CPNE7, and SPIRE2 were colocalized in sun-exposed lower leg tissue, whereas NUCKS1, LTBP1, and TENT5A were colocalized in non-sun-exposed suprapubic tissue. In the 16q24.3 locus, which contains two lead variants in SPIRE2 and MC1R, three genes were colocalized: SPIRE2, DEF8, and CPNE7. Although the function of MC1R in the skin is well known, the eQTLs of the colocalized genes, which showed a significant pattern corresponding to the GWAS results, were in LD with the lead variant in SPIRE2 (Supplementary Fig. S17).

We also performed colocalization analysis with eQTLs in 47 other tissues from GTEx to investigate the possible pleiotropic effects of the identified loci (Supplementary Fig. S18). Among the genes colocalized in skin tissue, seven were also colocalized in other tissues, including RAB29, OPN4, USP35, and SPIRE2 in more than three other tissues. Of the 50 tissues analyzed, the tissue with the most colocalized genes was the sun-exposed lower leg skin tissue from GTEx (Supplementary Fig. S19). Notably, in non-sun-exposed suprapubic skin tissue, only TMTC3 near the KITLG locus was colocalized with the GWAS for L*, whereas the tissue had the second most colocalized genes for a* and b*.

Single-cell level gene expression patterns

To identify cell type-specific gene expression patterns of skin color-associated genes, we analyzed two single-cell RNA sequencing (scRNA-seq) datasets of healthy skin tissues. To control for batch effects, we used the Harmony tool (v.1.0) and found that the same cell types from the datasets were well integrated (Supplementary Fig. S20a). The integrated scRNA-seq data were clustered into nine cell types based on the expression patterns of well-known cell type markers²² (Fig. 3a and Supplementary Fig. S20b). In the integrated scRNA-seq data, 36 out of the 41 identified skin color-associated genes, including both the nearest and colocalized genes, were available.

Skin color-associated genes showed relatively high RNA expression levels in melanocytes (Fig. 3b). At the individual gene level, one-third of the tested skin color-associated genes (12 of 36 genes) exhibited the highest expression levels in melanocytes (Fig. 3c and Supplementary Data 12), and 33% (4 of 12 genes; USP35, BCKDHB, GAB2, and MRPS22) were in previously unreported loci: MFSD12, MC1R, OCA2, RAB32, SLC6A17, BNC2, DEF8, and SNORC were in previously reported loci. Of the remaining 24 genes, 42% (10 genes) were highly expressed in fibroblasts (NUCKS1, TENT5A, TRPS1, GLIS1, LTBP1, and BCO1) and keratinocytes (ATG1L2, UGT1A6, CTR9, and TMTC3). Three genes, NUCKS1, LTBP1, and TENT5A, which were colocalized only in non-sun-exposed suprapubic tissue, were the most highly expressed in fibroblasts.

Signatures of polygenic adaptation and association of genetic score with environmental factors

To identify evidence for the natural selection of the GWAS loci identified in the current study and compare the polygenic adaptation signal of those loci with the results of the skin color GWAS in UK Biobank (UKBB) Europeans, we calculated genetic scores for individual populations from the 1000 Genomes Project phase 3 and used the method proposed by Berg and Coop²³ to test for the overdispersion of genetic scores. Loci identified from the GWAS for L* and b* in the current study and the GWAS for light skin color in UKBB Europeans showed significant evidence for overdispersion of genetic scores globally (L*, Q_x = 59.82, P_Q = 3.90 × 10⁻³; a*, Q_x = 38.87, P_Q = 0.052; b*, Q_x = 47.88, P_Q = 0.020; light skin color in UKBB Europeans, Q_x = 357.62, P_Q = 1.53 × 10⁻³) and significant polygenic adaptation in the corresponding regional population (P-value for regional population < 0.01) (Fig. 4a and Supplementary Fig. S21a). The GWAS loci for L* and b* identified in this study showed the highest genetic score and significant adaptation signal in East Asian populations, whereas those from the UKBB European GWAS showed the highest score and adaptation signal in European populations, implying that the genetic factors influencing skin lightness might vary across populations.

**Fig. 4: Signals of polygenic adaptation for L* across the 1000 Genomes Project phase 3 populations.**

Genetic scores for individual populations showed correlation with geographic and environmental factors, albeit with P-values estimated from the Mantel test (P_Mantel) generally being underpowered (Fig. 4b and Supplementary Fig. S21b). The absolute latitude and mean annual solar radiation by geographic region (see Methods) for individual populations from the 1000 Genomes Project phase 3 and their allele frequencies of GWAS lead variants are provided in Supplementary Data 13. The genetic score for L* was positively and negatively correlated with absolute latitude (r_s = 0.513, P_permutation = 6.90 × 10⁻³, P_Mantel = 0.160) and mean annual solar radiation (r_s = −0.496, P_permutation = 9.30 × 10⁻³, P_Mantel = 0.130), respectively. Genetic scores for a* and b* were negatively and positively correlated with absolute latitude (a*, r_s = −0.581, P_permutation = 1.79 × 10⁻³, P_Mantel = 7.80 × 10⁻³; b*, r_s = −0.562, P_permutation = 2.70 × 10⁻³, P_Mantel = 0.072) and mean annual solar radiation (a*, r_s = 0.547, P_permutation = 3.63 × 10⁻³, P_Mantel = 0.121; b*, r_s = 0.526, P_permutation = 5.40 × 10⁻³, P_Mantel = 0.166), respectively. Notably, the linear relationship between genetic scores and environmental factors was stronger among individual populations of East Asian ancestry (L*, r_s with solar radiation (r_s-solar) = −0.657, r_s with absolute latitude (r_s-latitude) = 0.771; a*, r_s-solar = 0.771, r_s-latitude = −0.543; b*, r_s-solar = 0.600, r_s-latitude = −0.714) than among populations outside of East Asia (L*, r_s-solar = −0.313, r_s-latitude = 0.567; a*, r_s-solar = 0.418, r_s-latitude = −0.553; b*, r_s-solar = 0.357, r_s-latitude = −0.652).

Comparison of identified variants and polygenic score performance with the UK Biobank

We compared the genetic architecture of light skin color between East Asians and Europeans at the single variant and polygenic score levels using the GWAS results from the current study and the UKBB (Fig. 5). The skin color phenotype analyzed in the UKBB was limited to brightness, representing a categorical version of L*. In the UKBB East Asian sample (n = 2332), the effect sizes of the identified 15 L* lead variants were comparable to those with the current GWAS (r_s = 0.837, P = 2.77 × 10⁻⁴), despite the lack of power. Although the sample size of the UKBB European participants (n = 415,030) was substantially larger than the sample size of our study (n = 48,433), the effect sizes of 720 lead variants from the UKBB European GWAS for skin color were less consistent compared to those in the current study (r_s = 0.223, P = 7.34 × 10⁻⁴) or the UKBB East Asian sample (r_s = 0.111, P = 0.011).

**Fig. 5: Comparison of lead variants for L* and polygenic score performance with the UK Biobank.**

Polygenic scores for light skin color in the UKBB East Asian participants were calculated using GWAS results for L* and light skin color from the current study and the UKBB European study, respectively. The Spearman’s correlation coefficient (r_s) between the residual of polygenic score (adjusted for age, sex, and the first 10 PCs of genetic ancestry) and light skin color was approximately double when using the current GWAS results (r_s = 0.090, P = 1.28 × 10⁻³) than the UKBB European GWAS results (r_s = 0.049, P = 0.076). Polygenic scores derived using the current GWAS results differentiated between light and dark skin color better than when using the UKBB European GWAS results (Fig. 5b).

Interplay between polygenic score and sun exposure for skin color

The polygenic scores of 40,790 unrelated study participants were derived from leave-one-out GWAS, in which target samples for score calculation were excluded via 10-fold partitioning (Supplementary Data 14). Sensitivity analyses of the polygenic score were performed using only female participants to remove the potential confounding effects of sex. To estimate the relative effects of the polygenic score and environmental factors on L*, the study participants were partitioned based on their polygenic scores for L* and sun exposure variables (sun exposure hours per day and sunblock usage). The highest relative effect size for increasing L* was observed in the group with a high polygenic score, low amount of sun exposure (less than 1 hour of sun exposure per day), and frequent sunblock usage, with the largest effect size attributable to the polygenic score (Fig. 6a). The increase in effect size due to sunblock usage was greater in the group with high sun exposure (more than 3 hours of sun exposure per day) than in the group with low sun exposure. The increase in relative effect size explained by the polygenic score was 1.37–2.69 and 1.52–4.46 times higher than the increase explained by sun exposure hours per day and sunblock usage, respectively.

**Fig. 6: Interplay of polygenic score and sun exposure for L*.**

To examine the interplay between polygenic factors and sun exposure on skin color, we evaluated the interaction effect between the polygenic score and sunblock usage for each group of sun exposure hours per day using linear regression models adjusted for covariates. The polygenic score for L* and sunblock usage interacted negatively in the high sun exposure group (effect size of the interaction term [β_G×E] = −0.251, P_G×E = 7.33 × 10⁻³), but not in the low sun exposure group (β_G×E = 0.038, P_G×E = 0.452). In environments with substantial sun exposure (more than 3 hours per day), the difference in predicted L* by sunblock usage for study participants in the bottom 10th percentile of polygenic scores was more than twice as large as that for study participants in the top 10th percentile of polygenic scores (Fig. 6b). There was no significant interplay between the polygenic scores for a* or b* and sunblock usage, and no significant association between sunblock usage and b* (Supplementary Fig. S22).

Discussion

We aimed to objectively quantify skin color in 48,433 East Asians using image analysis and identify genetic variants and potential causal genes for skin color. This GWAS of objectively quantified skin color traits (CIE LAB values; L*, a*, and b*) produced more powerful results compared to GWAS based on ITA° value or questionnaire-based categorical skin color (Supplementary Data 1). We identified 23 skin color-associated loci, 11 of which were previously unreported, and the lead variants within the identified loci were examined in 4,992 individuals who were externally independent of the discovery cohort. The SNP-based heritability of skin color was estimated to be 24.0%, 23.8%, and 23.2% for L*, a*, and b*, respectively. The highest SNP-based heritability was estimated in the youngest age group (41.2%, 31.0%, and 30.7% for L*, a*, and b*, respectively, in females younger than 37 years), presumably due to their reduced environmental influence on skin color compared to other age groups. The explanatory power for skin color increased 1.06–2.15 times by identifying previously unreported loci. We identified twelve significant nonsynonymous variants associated with skin color. The genomic loci identified in this study were substantially divergent from those in the European population and a signature of polygenic adaptation was detected. The interaction between genetic variants and sun exposure on skin color was quantified at the polygenic level. Functional enrichment analyses showed significant enrichment of GWAS loci in primary skin and melanocyte cells. Potential causal genes for skin color were prioritized based on nonsynonymous variants, colocalization between GWAS and eQTL signals, and expression levels in melanocytes.

Skin color can be subdivided into brightness, redness, and yellowness, which can be represented by CIE LAB values, and is influenced by factors such as melanin, hemoglobin, oxyhemoglobin, and carotenoid levels²⁴. Melanin serves as a primary factor in protecting the skin from UV radiation and its regulation within cells involves intricate mechanisms mediated by various endocrine and biochemical signaling pathways^25,26. The skin color-associated loci identified in this study exhibited varying significance depending on the CIE LAB value, implying that these regions may be influenced by different factors and contribute to distinct biological pathways. For example, MFSD12 contributes to red-yellow pigmentation by maintaining cysteine levels within melanosomes^27,28, and SCARB1 influences skin yellowness by promoting the uptake of carotenoids^29,30. Among the genes identified in this study, several have been reported to be functionally associated with melanin synthesis-related pathways: (1) MC1R in melanin synthesis signal transmission from keratinocytes to melanocytes³¹; (2) TRPS1 and KITLG in the proliferation of epithelial cells and melanocytes^32,33; (3) OCA2 and MFSD12 in the regulation of amino acid intake into melanosomes^28,34; and (4) KITLG, RAB32, and SPIRE2 in melanosome transport or dispersion within melanocytes^35,36.

Although their detailed functions in melanogenesis have not been fully elucidated, some genes have been reported to influence pigmentation-related traits. UGT1A6, which has a genetic effect on b* (yellowness), is known to induce yellowish iris pigmentation³⁷ and is genetically associated with the clinical phenotype of Gilbert’s syndrome (a common syndrome characterized by yellowish discoloration of the skin)³⁸. In addition, the UGT1A gene family, including UGT1A6, is involved in the conjugation of vitamin D³⁹. It is well known that UV-B exposure is a key factor in vitamin D synthesis⁶. In a previous study, the membrane transporter gene SLC6A17 was associated with pigmentation-related traits (tanning ability)⁴⁰.

Genes that have not been investigated for their functional role in pigmentation have been reported to be genetically associated with melanin- or melanocyte-related traits; for example, TRPS1 is associated with tanning ability and skin cancer^41,42. Most of the genes in previously reported loci and unreported loci were colocalized with eQTLs in skin tissues and were highly expressed in melanocytes, keratinocytes, and fibroblasts, which are known to play a role in repairing and remodeling the skin during the skin aging process⁴³. A genetic variant of SLC45A3 is associated with survival rate in melanoma and the function of the SLC45 gene family—mediating the transport of sugar molecules across the plasma membrane^44,45. Despite skin color in East Asians being recognized as a phenotype with relatively low diversity compared to populations worldwide, our study identified several potential causal loci for skin color in East Asians. These may serve as candidates for future functional studies investigating the molecular mechanisms underlying skin color.

The effect sizes of light skin color-associated lead variants in Europeans in the UKBB were generally less consistent with those in East Asians in the UKBB, whereas the effect sizes of the lead variants in our study had a linear relationship with those in East Asians in the UKBB, even though the UKBB European sample size was more than eight times larger than that of our study. Consistently, the explanatory power of the polygenic score for light skin color in our study was approximately twice as high when applied to UKBB East Asians compared to the explanatory power of the score for UKBB Europeans. These findings are consistent with convergent evolution and a genetic architecture that differs from other widely investigated traits^46,47. The polygenic adaptation signature of our GWAS loci was skewed towards East Asian populations. Among the 1000 Genomes Project phase 3 populations, the genetic score for L* was highest in East Asian populations, indicating that genetic factors influencing skin lightness might vary across populations, as also suggested previously⁴⁷. The genetic scores for skin color traits were linearly associated with solar radiation and absolute latitude. Despite the relatively low diversity of these environmental factors in a regional population, the linear relationship between genetic scores and environmental factors was stronger in individual populations within East Asia than in those outside East Asia. In addition to the linear association between genetic and environmental factors, there was a significant interplay between these two factors. In environments with substantial sun exposure, participants with more alleles associated with darker skin color exhibited twice larger differences in skin lightness by sun exposure than participants with alleles associated with lighter skin color.

This study had certain limitations. First, most of the participants were women. The results of this study should be verified in the general population, and a study involving male participants is required to investigate sex-specific genetic factors. Second, although the phenotype had the advantage of being quantitative, it was measured in sun-exposed skin. To disentangle baseline skin color from tanning, the phenotype was adjusted for sun exposure-related covariates in the association test. Third, GWAS based on SNP array has limited ability to identify rare variants and population-specific signals. The utilization of population-specific reference panels for imputation might improve the discovery of additional loci⁴⁸. Fourth, the association of the lead variants derived from the GWAS was partially replicated in 10% of individuals of the discovery cohort, although the effect sizes of the discovery result were consistent with the replication result. The identification of previously unreported loci associated with skin color in the current study might be attributable to the larger sample size than previous research in non-European populations. Fifth, the lack of resources for post-GWAS analysis, such as eQTL and scRNA-seq data generated from East Asians, may have resulted in insufficient power to detect evidence for the prioritization of potential causal genes. Sixth, to identify population-specific or shared genetic factors affecting skin color, GWAS in diverse populations and trans-ancestry meta-analyses are required. Finally, functional analyses of the identified genes are required to elucidate their underlying mechanisms.

Despite these limitations, our study has several strengths. This study included the substantial sample size in non-European populations and performed device-measured objective quantification of skin color, resulting in increased statistical power compared with previous skin color GWASs and the identification of many GWAS loci. Another strength of our study is that the GWAS signals were interpreted using various post-GWAS analyses using the latest statistical methods, along with transcriptome and single-cell resources. For example, well-known genes, including SLC6A17, MFSD12, OCA2, UGT1A6, SCARB1, and MC1R, as well as genes in previously unreported loci, including TENT5A, SLC45A3, and SEM1, have shown functional evidence in post-GWAS analyses. Our findings provide additional examples of population-specific skin color alleles that have accumulated independently in East Asian populations.

In summary, we conducted the large GWAS for objectively quantified skin color in an East Asian population and identified potential causal genes and polygenic adaptations based on diverse evidence. Our results provide fundamental information regarding the genetic architecture of skin color, which can further help elucidate the functional roles and molecular mechanisms of skin color-related genes in future studies.

Methods

Study participants

In the discovery phase of this study, 52,712 participants of East Asian ancestry who lived in South Korea and had no severe medical conditions at the time of recruitment were recruited through offline cosmetics shops in 2018. The following characteristics of all the study participants were evaluated: (1) measurement of facial skin color; (2) saliva collection for microarray genotyping; and (3) answering lifestyle questionnaires including age, sex, height, weight, disease history, average amount of sunlight exposure per day, and sunscreen usage. The Institutional Review Board (IRB) of the LG Household & Healthcare Research Center approved this study (IRB Nos. 2017-PB-0001 and 2018-PB-001). All study participants were fully informed of the study contents and signed written consent forms.

Skin color measurement

To measure skin color, facial images were obtained using a Janus III system under normal light conditions (PIE Inc., Suwon, Korea). Participants were divided into two groups based on camera resolution: Group A, in which the skin color of 23,454 participants was measured using 18-megapixel images, and Group B, in which the skin color of 29,258 participants was measured using 24.2-megapixel images. Skin color image analysis was performed using an internal algorithm of the measuring instrument, which converted the images into numerical values. The following image analysis methods were applied to evaluate skin color: (1) RGB values of the analysis area for skin color, particularly for both cheeks (RGB scale range: 0–255); and (2) conversion of RGB values of each pixel into CIE LAB values: L* value for skin luminance (0 = dark, 100 = bright), a* value for the red/green component (positive value = red, negative value = green), and b* value for the yellow/blue component (positive value = yellow, negative value = blue).

Genotyping, quality control, and imputation

To obtain reliable results from this study, we excluded participants with (1) measured images with low quality via manual curation and (2) the self-reported items that were possibly entered incorrectly (height below 1 m or above 2.5 m; weight below 30 kg or above 200 kg; age under 10 years). In total, 49,279 East Asians were genotyped.

Genotyping was conducted on DNA samples extracted from saliva using Illumina Global Screening Array MD BeadChips (Illumina, CA, USA). Sample- and variant-level quality control (QC) of the genotyped data was performed using an elaborate QC pipeline with PLINK (v.1.90)⁴⁹ (see Supplementary Notes for details). Thereafter, the genotype data were phased using Eagle (v.2.3)⁵⁰ and imputed based on the Haplotype Reference Consortium (r1.1, 2016) reference panel using Minimac (v.4)⁵¹. Genetic variants with low imputation quality scores (R² < 0.8) or MAF < 0.5% were excluded to reduce false-positive imputation results. In total, 48,433 East Asians were used for subsequent analysis in this study.

Genome-wide association analyses

To assess the normality of residuals for each skin color trait (CIE LAB values; L*, a*, and b*) in a null model (a linear model with only covariates), standardized residuals were compared to the standard normal distribution using the Kolmogorov-Smirnov test. All residuals were not normally distributed except for the residuals for b* in Group A (Supplementary Fig. S23). Accordingly, inverse-normal transformed residuals of each skin color trait after adjusting for age, sex, three sun-exposure variables, and measurement month were used for the GWAS in groups A and B. The sun exposure variables were questionnaire-based and classified into three categories (average sun exposure hours per day, 1: more than 3 h, 2: 1–3 h, and 3: less than 1 h; sunblock usage, 1: never, 2: seldom, and 3: always; outdoor activities, 1: 3 or more times a week; 2: on the weekend; and 3: less than other categories). Associations between variants and skin color were tested using BOLT-LMM (v.2.3.4)¹⁴, a Bayesian linear mixed model, to adjust for sample relationships. Association tests were conducted for groups A and B and adjusted for genotyping batches and the first 10 PCs of genetic ancestry. An inverse variance-weighted fixed-effects meta-analysis of skin color was performed to combine summary statistics from groups A and B using METAL (released on 2011-03-25)⁵². The results of the meta-analysis were double-genomic controlled. In the meta-analysis, variants that reached a genome-wide significance level (P < 5.0 × 10⁻⁸) were considered statistically significant.

To identify independent variants, a stepwise selection was conducted for each significant meta-analysis result using GCTA-COJO (v.1.91.2)⁵³. The LD r² of these independent variants from all skin color traits was calculated and considered when selecting lead variants that represented significant loci of skin color traits; the LD r² across lead variants was less than 0.1. Lead variants were selected according to the priority of functional consequences (nonsense, missense, regulatory, intronic, or intergenic) and associated low P-value. The functional consequence and nearest gene were annotated using the Ensemble Variant Effect Predictor (VEP, v.98)⁵⁴. Variants that passed Bonferroni’s correction for 23 independent loci (P < 2.17 × 10⁻³) were considered as nominally significant and others were considered to have a null association with the skin color trait in the meta-analysis. To identify nonsynonymous variants, a Bonferroni’s correction threshold for a total of 9183 independent nonsynonymous variants (P < 5.44 × 10⁻⁶) was used.

Genome-wide associations between variants and categorical skin color were tested using POLMM (released on 2022-08-26)¹⁸, a proportional odds logistic mixed model. Skin color of the study participants was categorized with criteria of ITA° value⁵⁵, which has been used globally for skin color classification, to generate the categorical skin color. ITA° was calculated according to the following equation: \({{{{{\rm{ITA}}}}}}^\circ=\,[{{{{{\rm{ArcTan}}}}}}\left(\frac{{L}^{*}-50}{{b}^{*}}\right)]\times \frac{180}{\pi }\).

To measure the explanatory power of the identified loci for skin color, the incremental R² value was estimated. Similar to ref. ⁵⁶, the incremental R² value was defined as the increase in the adjusted R² from the linear regression model with covariates in the GWAS to the model with these covariates and lead variants.

Replication of GWAS results

A total of 4,992 individuals of East Asian ancestry who lived in South Korea and had no severe medical conditions at the time of recruitment were recruited through offline cosmetics shops from 2020 to 2023 and participated for the replication of GWAS results. Genotyping, quality control, imputation, and association tests were performed following the same protocols as those applied to the discovery phase samples, except for the imputation quality scores (R²): genetic variants with R² < 0.6 were excluded to maximize the number of replicated lead variants. Rs77310623, a lead variant in RAB32 for L* and b*, was excluded from the replication study due to its absence in the imputed replication data.

The PAT ratio⁵⁷, determined by dividing the observed number by the expected number of nominally significant (P < 0.05) loci, was calculated to assess the replicability of the discovery GWAS. GWAS lead variants and their proxies (LD r² ≥ 0.8) within a 50 kb window of each lead variant were selected to account for the observed number of loci. A locus was considered transferable if at least one of the tested variants was nominally significant and the direction of effect was consistent in both datasets. Power estimates were summed across the discovery GWAS loci for a given trait to provide an estimate of the number of loci expected to be significantly associated in the replication GWAS.

Participants in the discovery phase were randomly partitioned into ten groups to conduct cross-validation and permutation meta-analysis, maintaining the proportion of females and camera resolution groups. Meta-analyses of all possible permutations of the ten groups, termed permutation meta-analysis, were performed to estimate the number of significant loci (P < 5.0 × 10⁻⁸) based on the discovery sample size.

Heritability analysis

Genome-wide Complex Trait Analysis (GCTA, v.1.91.2beta)⁵⁸ was used to estimate SNP-based heritability of skin color traits, which is the proportion of variance explained by all SNPs. Notably, if causal variants have a different MAF spectrum than that used in the analysis or tend to be enriched in genomic regions with LD values higher or lower than the average, the estimated h² may be biased⁵⁹.

We used the GREML-LDMS⁵⁹ method in GCTA to estimate heritability with a region-specific LD heterogeneity correction. First, we calculated the LD scores of all SNPs using a sliding-window approach with a segment length of 200 kb (with a 100 kb overlap between two adjacent segments) and partitioned them into quartiles according to the LD scores. Second, each LD quartile was stratified into MAF quintiles with the same number of variants, resulting in 20 bins. Finally, we estimated the genetic relationship matrix (GRM) for each bin and jointly analyzed the 20 GRMs. We performed GREML-LDMS for skin color traits using age, sex, three sun exposure variables, measurement month, group, genotyping array, and the first 10 PCs as covariates.

To identify the age effect on the heritability of skin color traits in females, we divided the female participants into three subgroups: 11,369 participants younger than 37 years (“young age”), 17,011 participants aged between 37 and 49 years (“middle age”), and 14,390 participants older than 49 years (“old age”), and performed LD-stratified GREML analysis for each subgroup with the same covariates as the previous analysis. To evaluate the discrepancy of sun exposure variables and their effects on skin color across age groups, we conducted linear regressions in 36,246 independent female participants. The associations between age group and sun exposure variable were tested to assess the variation of sun exposure variable across age groups. The associations of the interplay between each sun exposure variable and age group on skin color were investigated to assess the discrepancy in the effects of sun exposure variable on skin color across age groups. These regression models included covariates such as measurement month, camera resolution, genotyping batches, and the first 10 PCs of genetic ancestry.

Functional enrichment analysis

We performed functional enrichment analysis for the GWAS results with 36 categorized MeSH terms using DEPICT (v.1.1)²⁰. Independent SNPs (LD r² < 0.2) that reached P < 5.0 × 10⁻⁴ were used for the DEPICT analysis. We considered significant enrichment after FDR correction for the 36 MeSH terms with FDR < 0.1.

We also performed tissue enrichment analysis using GARFIELD (v.2)²¹ because it uses different databases, including ENCODE and Roadmap Epigenomics Projects. This tool performs greedy pruning with an LD parameter r² > 0.1 and calculates the overlap between LD-pruned variants and regulatory or functional annotations from the databases. ORs were calculated to quantify tissue enrichment at nine different levels (T < 1 to T < 10⁻⁸) of GWAS P-value thresholds. Thresholds of enrichment P-values were applied as default settings.

Colocalization with eQTL database

Colocalization between the GWAS results and eQTL was performed using the coloc.abf function in the coloc R package⁶⁰. The eQTL results of 49 tissues, including 2 types of skin tissue (sun-exposed lower leg and non-sun-exposed suprapubic area) from GTEx v8⁶¹ and skin tissue from MuTHER⁶² were used for colocalization analysis. Colocalization with eQTL data was conducted for variants within 100 kb of each GWAS lead variant. For reliability, COLOC results that reached a high posterior probability for colocalization (PP.H4) > 0.8, and eQTL P-value of the lead variant or its proxies (LD r² ≥ 0.8) < 1 × 10⁻⁴ were considered colocalized. PP.H4 is the posterior probability for hypothesis H4, defined as a colocalized signal between a significant GWAS association and significant eQTL association.

scRNA-seq analysis

We used two publicly available scRNA-seq datasets for scRNA-seq analysis. Both datasets were sequenced from healthy skin tissues using the 10× Genomics Chromium platform. One of the scRNA-seq datasets was downloaded from the Gene Expression Omnibus as a count matrix file (accession number GSE130973)²². The other scRNA-seq dataset was downloaded from https://dom.pitt.edu/wp-content/uploads/2018/10/Skin_6Control_rawUMI.zip as a raw UMI count table file⁶³.

Downstream analysis was performed using the “Seurat” R package (v.3.2.3)⁶⁴. Initial filtering was performed using the following parameters: cells with fewer than 200 features or features detected in fewer than 3 cells. The two scRNA-seq datasets were combined using overlapping features that passed the initial filtering. For additional filtering after merging the datasets, we discarded cells with fewer than 500 features, more than 2000 features, or more than 5% mitochondrial content. A total of 24,203 cells and 19,321 features passed through all filters.

Data processing, including log normalization, selection of 2000 variable features using the variance-stabilizing transformation method, and data scaling was performed for visualization. Clustering was performed using the original Louvain algorithm. To reduce batch effects, the processed data were integrated using Harmony (v.1.0)⁶⁵ in the Seurat workflow. The integrated data were projected using UMAP⁶⁶ using the first 20 dimensions of the Harmony-corrected space. Cell types were identified using scaled average gene expression patterns of each cell type marker as described by ref. ⁶⁵. Cell type markers are shown in Supplementary Fig. S16b.

We selected 49 skin color trait-associated genes, including colocalized genes, corresponding genes with nonsynonymous variants, and the nearest genes to the lead variants. Four of the selected genes were discarded during the scRNA-seq filtering process, and seven other genes were not included in the scRNA-seq datasets. The remaining 38 genes were included in the analysis.

GWAS from the UKBB

For comparing the GWAS from the UKBB, summary statistics for skin color (data field 1717) were downloaded from the Pan-UK Biobank (https://pan.ukbb.broadinstitute.org/). Genome-wide significant variants (P < 5.0 × 10⁻⁸) with LD r² less than 0.1 with other variants were considered lead variants in the UKBB.

Selection analysis

For analysis of the polygenic adaptation of lead variants, including nominal significance, from the GWAS summary statistics, we calculated genetic scores for individual populations from the 1000 Genomes Project phase 3⁶⁷ and used the method of Berg and Coop (released on 2014-12-21)²³ to test for overdispersion of genetic scores. The background selection values estimated by ref. ⁶⁸ were used for this analysis. The genetic score of the \(k\) individual population using \(m\) GWAS lead variants is defined as \({G}_{k}=2{\sum }_{j=1}^{m}{\beta }_{j}{p}_{{kj}},\) where \({\beta }_{j}\) is an additive effect size estimate of the \(j\) variant and \({p}_{{kj}}\) is an observed allele frequency of the jth variant in the \(k\) individual population. The significance of the Q statistic, a statistic for overdispersion across populations, and P-value of individual or regional populations, which represents the divergence of the genetic values between the two populations relative to the null expectation under drift, were derived using GWAS lead variants from the current study and the UKBB European data. Independent variants (LD r² < 0.1) in both the GWAS and 1000 Genomes Project data were used to generate empirical null distributions for this test.

To evaluate the correlation between the genetic scores of lead variants and solar radiation, we used surface solar radiation data collected from January 1984 to December 2022 from the NASA POWER project (https://power.larc.nasa.gov/), and the mean annual solar radiation was calculated in kWh/m² per day units. Surface solar radiation values for each population were obtained by inserting the representative longitudes and latitudes (Supplementary Data 10). The longitude and latitude of each population were approximated based on the geographic region in which the population was investigated, similar to the study by ref. ¹⁰. The mean annual solar radiation and absolute latitudes were also used as parameters in the Berg and Coop analysis. Spearman’s correlation coefficient (r_s) was used as a measure of correlation. To assess the significance of correlation, the P-value of r_s was estimated under the null distribution of all possible permutations (P_permutation) and by Mantel test (P_Mantel).

Polygenic score for skin color

Polygenic scores for skin color were calculated using the PRS-CS (released on 2021-06-04)⁶⁹ auto model, which has been reported to outperform other polygenic scoring methods for polygenic traits⁷⁰. Polygenic scores of participants in the replication set, derived using the discovery GWAS, were used to assess the replicability of GWAS results at polygenic level. For 40,790 unrelated study participants, polygenic scores were derived from leave-one-out GWAS, where target samples for score calculation were excluded via 10-fold partitioning (Supplementary Data 11). To remove the possible confounding effects of sex, sensitivity analyses of the polygenic score with study participants were conducted using only female participants. We also calculated polygenic scores for the UKBB East Asian participants using the current GWAS results and the UKBB European GWAS results and compared the associations of polygenic scores with light skin color using Spearman’s correlation coefficient (r_s).

Interplay between polygenic score and sun exposure for skin color

To quantify the relative effect size of genetic and environmental factors on L*, the study participants were divided into eight groups based on the top and bottom quintiles of polygenic score, sun exposure hours per day, and sunblock usage. The relative effect sizes of each group were calculated using multivariate linear regression adjusted for the covariates used in the GWAS (age, sex, measurement month, outdoor activities, genotyping batches, and the first 10 PCs of genetic ancestry).

The interaction effects between the polygenic score and sunblock usage were calculated for each sun exposure group using linear regression adjusted for the covariates used in the GWAS. The skin color values predicted by the average covariates were calculated by conditioning each polygenic score mean and covariates in each polygenic score percentile group.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The genotype and phenotype data of East Asian participants in the analysis were collected by Migenstory, a subsidiary of LG Household & Healthcare. This individual-level genotype and phenotype data are protected and are not available due to data privacy laws. The full summary statistics of GWAS for L* (luminance), a* (red/green component), and b* (yellow/blue component) in 48,433 East Asians are publicly available at the NHGRI-EBI GWAS Catalog (https://www.ebi.ac.uk/gwas/downloads) with accession numbers GCST90320257, GCST90320258, and GCST90320259, respectively. The summary statistics of associations of variants in chromosome X with L* (luminance), a* (red/green component), and b* (yellow/blue component) in 42,770 East Asian females are publicly available at the NHGRI-EBI GWAS Catalog with accession numbers GCST90320260, GCST90320261, and GCST90320262, respectively. The UKBB genotype and epidemiologic data are available by requesting access on the UKBB homepage (https://www.ukbiobank.ac.uk/). The GTEx data are publicly available upon reasonable application (http://www.gtexportal.org/home/datasets). The MuTHER data are publicly available upon reasonable application (http://www.muther.ac.uk/Data.html). The scRNA-seq data were collected from the database in Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE130973) and University of Pittsburgh (https://dom.pitt.edu/wp-content/uploads/2018/10/Skin_6Control_rawUMI.zip). The surface solar radiation data from January 1984 to December 2022 were collected from the NASA POWER project (https://power.larc.nasa.gov/data-access-viewer). The 1000 Genomes Project phase 3 data are publicly available (https://www.internationalgenome.org/data).

Code availability

Previously developed pipelines were used to produce the results for the current study. No custom code was developed. Please see the Supplementary Information for details on the software URLs and data used.

References

Brenner, M. & Hearing, V. J. The protective role of melanin against UV damage in human skin. Photochem. Photobio. 84, 539–549 (2008).
Article CAS Google Scholar
Quillen, E. E. et al. Shades of complexity: new perspectives on the evolution and genetic architecture of human skin. Am. J. Phys. Anthropol. 168(Suppl 67), 4–26 (2019).
Article PubMed Google Scholar
Pickrell, J. K. et al. Signals of recent positive selection in a worldwide sample of human populations. Genome Res. 19, 826–837 (2009).
Article CAS PubMed PubMed Central Google Scholar
Deng, L. & Xu, S. Adaptation of human skin color in various populations. Hereditas 155, 1 (2018).
Article PubMed Google Scholar
Parra, E. J. Human pigmentation variation: evolution, genetic basis, and implications for public health. Am. J. Phys. Anthropol. 134(Suppl 45), 85–105 (2007).
Holick, M. F. Vitamin D deficiency. N. Engl. J. Med. 357, 266–281 (2007).
Article CAS PubMed Google Scholar
Slominski, A. T., Zmijewski, M. A., Plonka, P. M., Szaflarski, J. P. & Paus, R. How UV light touches the brain and endocrine system through skin, and why. Endocrinology 159, 1992–2007 (2018).
Article CAS PubMed PubMed Central Google Scholar
McEvoy, B., Beleza, S. & Shriver, M. D. The genetic architecture of normal variation in human pigmentation: an evolutionary perspective and model. Hum. Mol. Genet. 15, R176–R181 (2006).
Article CAS PubMed Google Scholar
Del Bino, S., Duval, C. & Bernerd, F. Clinical and biological characterization of skin pigmentation diversity and its consequences on UV impact. Int J. Mol. Sci. 19, 2668 (2018).
Article PubMed PubMed Central Google Scholar
Adhikari, K. et al. A GWAS in Latin Americans highlights the convergent evolution of lighter skin pigmentation in Eurasia. Nat. Commun. 10, 358 (2019).
Article ADS PubMed PubMed Central Google Scholar
Crawford, N.G. et al. Loci associated with skin pigmentation identified in African populations. Science 358, eaan8433 (2017).
Edwards, M. et al. Association of the OCA2 polymorphism His615Arg with melanin content in East Asian populations: further evidence of convergent evolution of skin pigmentation. PLoS Genet. 6, e1000867 (2010).
Article PubMed PubMed Central Google Scholar
Ju, D. & Mathieson, I. The evolution of skin pigmentation-associated variation in West Eurasia. Proc. Natl Acad. Sci. USA 118, e2009227118 (2021).
Article CAS PubMed Google Scholar
Loh, P. R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290 (2015).
Article CAS PubMed PubMed Central Google Scholar
Nakanishi, G., Kim, Y. S., Nakajima, T. & Jetten, A. M. Regulatory role for Kruppel-like zinc-finger protein Gli-similar 1 (Glis1) in PMA-treated and psoriatic epidermis. J. Investig. Dermatol. 126, 49–60 (2006).
Article CAS PubMed Google Scholar
Venza, M. et al. DSS1 promoter hypomethylation and overexpression predict poor prognosis in melanoma and squamous cell carcinoma patients. Hum. Pathol. 60, 137–146 (2017).
Article CAS PubMed Google Scholar
Yeh, I. et al. Targeted genomic profiling of acral melanoma. J. Natl Cancer Inst. 111, 1068–1077 (2019).
Article CAS PubMed PubMed Central Google Scholar
Bi, W. et al. Efficient mixed model approach for large-scale genome-wide association studies of ordinal categorical phenotypes. Am. J. Hum. Genet. 108, 825–839 (2021).
Article CAS PubMed PubMed Central Google Scholar
Zhang, Q. et al. USP35 is a potential immunosuppressive factor in skin cutaneous melanoma. J. Inflamm. Res. 15, 3065–3082 (2022).
Article PubMed PubMed Central Google Scholar
Pers, T. H. et al. Biological interpretation of genome-wide association studies using predicted gene functions. Nat. Commun. 6, 5890 (2015).
Article CAS PubMed Google Scholar
Iotchkova, V. et al. GARFIELD classifies disease-relevant genomic features through integration of functional annotations with association signals. Nat. Genet 51, 343–353 (2019).
Article CAS PubMed PubMed Central Google Scholar
Sole-Boldo, L. et al. Single-cell transcriptomes of the human skin reveal age-related loss of fibroblast priming. Commun. Biol. 3, 188 (2020).
Article CAS PubMed PubMed Central Google Scholar
Berg, J. J. & Coop, G. A population genetic signal of polygenic adaptation. PLoS Genet. 10, e1004412 (2014).
Article PubMed PubMed Central Google Scholar
Naik, P. P. & Farrukh, S. N. Influence of ethnicities and skin color variations in different populations: a review. Ski. Pharm. Physiol. 35, 65–76 (2022).
Article Google Scholar
Slominski, A. T. et al. Neuroendocrine signaling in the skin with a special focus on the epidermal neuropeptides. Am. J. Physiol. Cell Physiol. 323, C1757–C1776 (2022).
Article CAS PubMed PubMed Central Google Scholar
Slominski, R. M. et al. Melanoma, melanin, and melanogenesis: the Yin and Yang relationship. Front. Oncol. 12, 842496 (2022).
Article CAS PubMed PubMed Central Google Scholar
Marks, M. S. & Pavan, W. J. How a membrane transporter keeps melanocytes in the red. Pigment Cell Melanoma Res. 34, 666–669 (2021).
Article PubMed PubMed Central Google Scholar
Adelmann, C. H. et al. MFSD12 mediates the import of cysteine into melanosomes and lysosomes. Nature 588, 699–704 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Perrett, D. I., Talamas, S. N., Cairns, P. & Henderson, A. J. Skin color cues to human health: carotenoids, aerobic fitness, and body fat. Front. Psychol. 11, 392 (2020).
Article PubMed PubMed Central Google Scholar
Reboul, E. Mechanisms of carotenoid intestinal absorption: where do we stand? Nutrients 11, 838 (2019).
Nasti, T. H. & Timares, L. MC 1R, Eumelanin and Pheomelanin: their role in determining the susceptibility to skin cancer. Photochem Photobio. 91, 188–200 (2015).
Article CAS Google Scholar
Fantauzzo, K. A., Kurban, M., Levy, B. & Christiano, A. M. Trps1 and its target gene Sox9 regulate epithelial proliferation in the developing hair follicle and are associated with hypertrichosis. PLoS Genet. 8, e1003002 (2012).
Article CAS PubMed PubMed Central Google Scholar
Picardo, M. & Cardinali, G. The genetic determination of skin pigmentation: KITLG and the KITLG/c-Kit pathway as key players in the onset of human familial pigmentary diseases. J. Investig. Dermatol. 131, 1182–1185 (2011).
Article CAS PubMed Google Scholar
Bellono, N. W., Escobar, I. E. & Oancea, E. A melanosomal two-pore sodium channel regulates pigmentation. Sci. Rep. 6, 26570 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Park, M., Serpinskaya, A. S., Papalopulu, N. & Gelfand, V. I. Rab32 regulates melanosome transport in Xenopus melanophores by protein kinase a recruitment. Curr. Biol. 17, 2030–2034 (2007).
Article CAS PubMed PubMed Central Google Scholar
Alzahofi, N. et al. Rab27a co-ordinates actin-dependent transport by controlling organelle-associated motors and track assembly proteins. Nat. Commun. 11, 3495 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Wollstein, A. et al. Novel quantitative pigmentation phenotyping enhances genetic association, epistasis, and prediction of human eye colour. Sci. Rep. 7, 43359 (2017).
Article ADS PubMed PubMed Central Google Scholar
Ehmer, U. et al. Gilbert syndrome redefined: a complex genetic haplotype influences the regulation of glucuronidation. Hepatology 55, 1912–1921 (2012).
Article CAS PubMed Google Scholar
Wang, Z. et al. Human UGT1A4 and UGT1A3 conjugate 25-hydroxyvitamin D3: metabolite structure, kinetics, inducibility, and interindividual variability. Endocrinology 155, 2052–2063 (2014).
Article PubMed PubMed Central Google Scholar
Shido, K. et al. Susceptibility loci for tanning ability in the Japanese population identified by a genome-wide association study from the Tohoku medical megabank project cohort study. J. Investig. Dermatol. 139, 1605–1608 e13 (2019).
Article CAS PubMed Google Scholar
Visconti, A. et al. Genome-wide association study in 176,678 Europeans reveals genetic loci for tanning response to sun exposure. Nat. Commun. 9, 1684 (2018).
Article ADS PubMed PubMed Central Google Scholar
Chahal, H. S. et al. Genome-wide association study identifies novel susceptibility loci for cutaneous squamous cell carcinoma. Nat. Commun. 7, 12048 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
de Araujo, R., Lobo, M., Trindade, K., Silva, D. F. & Pereira, N. Fibroblast growth factors: a controlling mechanism of skin aging. Ski. Pharm. Physiol. 32, 275–282 (2019).
Article Google Scholar
Bohme, I., Schonherr, R., Eberle, J. & Bosserhoff, A. K. Membrane transporters and channels in melanoma. Rev. Physiol. Biochem Pharm. 181, 269–374 (2020).
Article Google Scholar
Gaudet, P., Livstone, M. S., Lewis, S. E. & Thomas, P. D. Phylogenetic-based propagation of functional annotations within the Gene Ontology consortium. Brief. Bioinform. 12, 449–462 (2011).
Article PubMed PubMed Central Google Scholar
Martin, A. R. et al. An unexpectedly complex architecture for skin pigmentation in Africans. Cell 171, 1340–1353 e14 (2017).
Article CAS PubMed PubMed Central Google Scholar
Norton, H. L. et al. Genetic evidence for the convergent evolution of light skin in Europeans and East Asians. Mol. Biol. Evol. 24, 710–722 (2007).
Article CAS PubMed Google Scholar
Choi, J. et al. A whole-genome reference panel of 14,393 individuals for East Asian populations accelerates discovery of rare functional variants. Sci. Adv. 9, eadg6319 (2023).
Article CAS PubMed PubMed Central Google Scholar
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Article CAS PubMed PubMed Central Google Scholar
Loh, P. R. et al. Reference-based phasing using the Haplotype Reference Consortium panel. Nat. Genet. 48, 1443–1448 (2016).
Article CAS PubMed PubMed Central Google Scholar
Das, S. et al. Next-generation genotype imputation service and methods. Nat. Genet. 48, 1284–1287 (2016).
Article CAS PubMed PubMed Central Google Scholar
Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010).
Article CAS PubMed PubMed Central Google Scholar
Yang, J. et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat. Genet. 44, 369–375 (2012). S1-3.
Article CAS PubMed PubMed Central Google Scholar
McLaren, W. et al. The ensembl variant effect predictor. Genome Biol. 17, 122 (2016).
Article PubMed PubMed Central Google Scholar
Chardon, A., Cretois, I. & Hourseau, C. Skin colour typology and suntanning pathways. Int. J. Cosmet. Sci. 13, 191–208 (1991).
Article CAS PubMed Google Scholar
Turley, P. et al. Multi-trait analysis of genome-wide association summary statistics using MTAG. Nat. Genet. 50, 229–237 (2018).
Article CAS PubMed PubMed Central Google Scholar
Huang, Q. Q. et al. Transferability of genetic loci and polygenic scores for cardiometabolic traits in British Pakistani and Bangladeshi individuals. Nat. Commun. 13, 4664 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565–569 (2010).
Article CAS PubMed PubMed Central Google Scholar
Yang, J. et al. Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index. Nat. Genet. 47, 1114–1120 (2015).
Article CAS PubMed PubMed Central Google Scholar
Wallace, C. Statistical testing of shared genetic control for potentially related traits. Genet. Epidemiol. 37, 802–813 (2013).
Article PubMed PubMed Central Google Scholar
Consortium, G. T. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).
Article Google Scholar
Grundberg, E. et al. Mapping cis- and trans-regulatory effects across multiple tissues in twins. Nat. Genet. 44, 1084–1089 (2012).
Article CAS PubMed PubMed Central Google Scholar
Tabib, T., Morse, C., Wang, T., Chen, W. & Lafyatis, R. SFRP2/DPP4 and FMO1/LSP1 define major fibroblast populations in human skin. J. Investig. Dermatol. 138, 802–810 (2018).
Article CAS PubMed Google Scholar
Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902 e21 (2019).
Article CAS PubMed PubMed Central Google Scholar
Korsunsky, I. et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat. Methods 16, 1289–1296 (2019).
Article CAS PubMed PubMed Central Google Scholar
McInnes, L., Healy, J. & Melville, J. Umap: Uniform manifold approximation and projection for dimension reduction. Preprint at arXiv:1802.03426v3 (2020).
Genomes Project, C. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Article ADS Google Scholar
McVicker, G., Gordon, D., Davis, C. & Green, P. Widespread genomic signatures of natural selection in hominid evolution. PLoS Genet 5, e1000471 (2009).
Article PubMed PubMed Central Google Scholar
Ge, T., Chen, C. Y., Ni, Y., Feng, Y. A. & Smoller, J. W. Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nat. Commun. 10, 1776 (2019).
Article ADS PubMed PubMed Central Google Scholar
Wang, Y. et al. Global Biobank analyses provide lessons for developing polygenic risk scores across diverse cohorts. Cell Genom. 3, 100241 (2023).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This study was supported by LG Household & Healthcare (C17-03803 and C19-08213, N.G.K.). All data used in the analysis were collected by Migenstory, a subsidiary of LG Household & Healthcare. UKBB data were obtained under application no. 33002.

Author information

These authors contributed equally: Beomsu Kim, Dan Say Kim, Joong-Gon Shin, Sangseob Leem.
These authors jointly supervised this work: Nae Gyu Kang, Hong-Hee Won.

Authors and Affiliations

Samsung Advanced Institute for Health Sciences and Technology (SAIHST), Sungkyunkwan University, Samsung Medical Center, Seoul, 06351, Republic of Korea
Beomsu Kim, Dan Say Kim, Minyoung Cho & Hong-Hee Won
Research and Innovation Center, CTO, LG Household & Healthcare (LG H&H), Seoul, 07795, Republic of Korea
Joong-Gon Shin, Sangseob Leem, Hanji Kim, Ki-Nam Gu, Jung Yeon Seo, Seung Won You, Sun Gyoo Park, Yunkwan Kim & Nae Gyu Kang
Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA, 02114, USA
Alicia R. Martin
Stanley Center for Psychiatric Research, Broad Institute, Cambridge, MA, 02141, USA
Alicia R. Martin
School of Biological Sciences, Seoul National University, Seoul, 08826, Republic of Korea
Choongwon Jeong

Authors

Beomsu Kim
View author publications
You can also search for this author in PubMed Google Scholar
Dan Say Kim
View author publications
You can also search for this author in PubMed Google Scholar
Joong-Gon Shin
View author publications
You can also search for this author in PubMed Google Scholar
Sangseob Leem
View author publications
You can also search for this author in PubMed Google Scholar
Minyoung Cho
View author publications
You can also search for this author in PubMed Google Scholar
Hanji Kim
View author publications
You can also search for this author in PubMed Google Scholar
Ki-Nam Gu
View author publications
You can also search for this author in PubMed Google Scholar
Jung Yeon Seo
View author publications
You can also search for this author in PubMed Google Scholar
Seung Won You
View author publications
You can also search for this author in PubMed Google Scholar
Alicia R. Martin
View author publications
You can also search for this author in PubMed Google Scholar
Sun Gyoo Park
View author publications
You can also search for this author in PubMed Google Scholar
Yunkwan Kim
View author publications
You can also search for this author in PubMed Google Scholar
Choongwon Jeong
View author publications
You can also search for this author in PubMed Google Scholar
Nae Gyu Kang
View author publications
You can also search for this author in PubMed Google Scholar
Hong-Hee Won
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization: B.Kim, J.G.Shin, S.Leem, H.H.Won, N.G.Kang; Methodology: B.Kim, D.S.Kim, J.G.Shin, S.Leem, M.Cho; Formal analysis: B.Kim, D.S.Kim, J.G.Shin, S.Leem, M.Cho; Investigation: J.G.Shin, S.Leem, H.Kim, K.N.Gu, J.Y.Seo, S.W.You, A.R.Martin, Y.Kim; Resources: Y.Kim, S.G.Park, N.G.Kang, H.H.Won; Data curation: B.Kim, D.S.Kim, J.G.Shin, S.Leem, Y.Kim; Writing of original draft: B.Kim, D.S.Kim, J.G.Shin, S.Leem, M.Cho, H.H.Won; Writing, reviewing, and editing: B.Kim, D.S.Kim, J.G.Shin, S.Leem, H.Kim, K.N.Gu, J.Y.Seo, S.W.You, A.R.Martin, Y.Kim, C.Jeong, N.G.Kang, H.H.Won; Visualization: B.Kim, D.S.Kim, J.G.Shin, S.Leem, M.Cho; Supervision: N.G.Kang, H.H.Won; Funding acquisition: S.G.Park, N.G.Kang

Corresponding authors

Correspondence to Nae Gyu Kang or Hong-Hee Won.

Ethics declarations

Competing interests

Migenstory’s business is exclusively involved in providing Direct-to-Consumer (DTC) genetic testing services and generating data for research at LG H&H, without any engagement in the development of medicine or related technologies. J.G.S., S.L., H.K., K.N.G., S.W.Y., S.G.P., Y.K., and N.G.K. are employees of LG H&H. Other authors declare no other competing interests.

Peer review

Peer review information

Nature Communications thanks Andrzej Slominski and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Peer Review File

Description of Additional Supplementary Files

Supplementary Data 1–14

Reporting Summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Kim, B., Kim, D.S., Shin, JG. et al. Mapping and annotating genomic loci to prioritize genes and implicate distinct polygenic adaptations for skin color. Nat Commun 15, 4874 (2024). https://doi.org/10.1038/s41467-024-49031-4

Download citation

Received: 04 July 2023
Accepted: 21 May 2024
Published: 07 June 2024
DOI: https://doi.org/10.1038/s41467-024-49031-4
Springer Nature Limited

Mapping and annotating genomic loci to prioritize genes and implicate distinct polygenic adaptations for skin color

Abstract

Similar content being viewed by others

Introduction

Results

Study participants and quantification of skin color using image analysis

GWAS of skin color

Replication of GWAS results

Heritability estimation and functional enrichment

Colocalization with expression quantitative trait loci (eQTL) in skin tissues

Single-cell level gene expression patterns

Signatures of polygenic adaptation and association of genetic score with environmental factors

Comparison of identified variants and polygenic score performance with the UK Biobank

Interplay between polygenic score and sun exposure for skin color

Discussion

Methods

Study participants

Skin color measurement

Genotyping, quality control, and imputation

Genome-wide association analyses

Replication of GWAS results

Heritability analysis

Functional enrichment analysis

Colocalization with eQTL database

scRNA-seq analysis

GWAS from the UKBB

Selection analysis

Polygenic score for skin color

Interplay between polygenic score and sun exposure for skin color

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation