Background

Differences in human skin pigmentation have been attributed to genetic variation in several different genes [13]. Among these, the melanocortin 1 receptor gene (MC1R, MIM#155555), a member of the G protein-coupled receptors superfamily, is the major contributor to normal pigment variation in humans. It is a small, highly polymorphic gene consisting of one exon with 951 coding nucleotides on chromosome 16q24.3.

Numerous studies have demonstrated associations between specific MC1R variants and red hair, light skin, poor tanning ability and heavy freckling [49]. A recent genome-wide association scan confirmed the role of MC1R SNPs in hair, eye, and skin pigmentation[3]. The functional role of many of these variants has been described [1013]. Several MC1R variants are also associated with increased risk of malignant melanoma in a variety of populations [1422] The effect of MC1R polymorphisms in melanoma risk appears to extend beyond its effect on pigmentation in most of these investigations, and to be linked to melanomas harboring mutations in the BRAF oncogene[23].

Several hypotheses have been generated in an effort to understand the evolutionary history of skin pigmentation in humans. It has been suggested that as humans migrated out of Africa to climates with more limited exposure to sunlight, relaxation of functional constraints in pigmentation genes, including MC1R, or selection for functionally relevant variants that led to lighter skin pigmentation occurred[24]. This could result in an improved ability to synthesize vitamin D in the presence of limited sunlight exposures [2527]. It has also been suggested that darker skin is favored in regions closer to the equator for protection against ultraviolet radiation[24]. In addition, differences in skin pigmentation could protect against pathogens and cold injury, and may have also been important in sexual selection[28].

Genetic variation of MC1R, in the form of single nucleotide polymorphisms (SNPs), is significantly different across populations from different geographic regions [29, 30]. In most regions of the genome, there is a higher degree of genetic variation in individuals of African descent than in other populations, most likely due to evolutionary history [31, 32]. MC1R is an exception to this observation. It has been shown to be more polymorphic in individuals of European descent than in those from Africa [29, 30]. A comprehensive study of SNP allele frequencies in MC1R from populations around the world, further quantified the large differences in the distribution of variants across populations, with a prominent difference between light and dark-pigmented individuals [29]. The goal of the current study was to expand on that study of MC1R genetic variation by characterizing nucleotide diversity, population specific differentiation (FST), and to study measures of selection.

Results

Nucleotide Diversity

Allele frequency (AF) data was compiled from a total of 2306 individuals who were grouped into seven populations, based on geographic location (Table 1). The actual number of individuals in each group was Africa 117, India 53, Southern Europe 838, Northern Europe 650, Asia 343, Papua New Guinea 40, and the United States 265. The number of SNPs per population evaluated in the study ranged from three (India) to 36 (Southern Europe), with a total of 55 SNPs studied in MC1R in these analyses. AFs are shown in Table 2. Thirty-seven SNPs were nonsynonymous (NS) and 18 were synonymous (S). The greatest number of SNPs was present in individuals from Southern Europe (29 NS, 7 S) and Northern Europe (18 NS and 3 S). The fewest number of SNPs were noted in the subjects from India (2 NS, 1 S), Papua New Guinea (2 NS, 2 S) and Africa (3 NS and 8 S).

Table 1 Description of populations and studies used for analyses of MC1R allele frequencies
Table 2 Allele frequencies (%) of MC1R single nucleotide polymorphisms in seven populations.

In order to correct for the variable population sizes, which could contribute to the absolute number of SNPs identified, π and θ, measures of nucleotide diversity, were calculated. The overall nucleotide diversity for all SNPs in MC1R, as measured by π, was 10.1 × 10-4 for all populations (2306 total subjects); it ranged from a low of 3.6 × 10-4 in subjects from India to a high of 11.1 × 10-4 from US subjects (Table 3). In subgroup analyses of the Northern and Southern European populations, π was the highest in subjects from Britain (18.1 × 10-4). θ, the population mutation parameter, was quite variable across populations, ranging from 6 × 10-4 in India to 47.3 × 10-4 in Southern Europe. In Southern European subjects, θ was the highest, 43.6 × 10-4, in Italy.

Table 3 MC1R nucleotide diversity, Tajima's D, and Fu's FS statistic for seven populations Statistically significant values are bolded.

Nucleotide diversity was also calculated for NS SNPs only to further understand the contribution of these SNPs to MC1R genetic variation (Table 3). π in NS SNPs ranged from a low of 0.6 × 10-4 in subjects from Africa to a high of 9.0 × 10-4 in the US. Within the European populations, π for NS SNPs was the greatest in Britain (10.9 × 10-4). As was seen for all SNPs, θ for NS SNPs was the highest in Italy (36.6 × 10-4).

In addition to a high degree of inter-population variability, MC1R has a higher degree of nucleotide diversity in comparison to other groups of genes (Table 4). Studies of genetic variation in various gene groups, e.g., genes important in telomere biology [33], antigen processing and presenting genes [34], pharmaceutical response [35], and environmental response genes [36], showed π values ranging from 3.0 × 10-4 to 6.7 × 10-4, while all seven populations studied for MC1R had a combined π value of 10.1 × 10-4. θ showed similarly higher values in MC1R (64.2 × 10-4) than in other sets of genes (all <8.4 × 10-4).

Table 4 Comparison of MC1R nucleotide diversity and Tajima's D statistic to other sets of genes. These data include subjects from different ethnic groups calculated as a whole. The number of genes evaluated is shown in parentheses.

Population Differentiation

The FST statistic, a pairwise measure of population differentiation, has been used extensively to compare the degrees of heterozygosity across populations[2, 37]. Therefore, MC1R FST was calculated for each of the described populations (Table 5). Overall, a very high degree of differentiation was noted between Asia and each of the other groups; FST ranged from 0.459 between Asia and Africa to 0.356 between Asia and the United States.

Table 5 FST statistic for MC1R in seven populations.

The degree of differentiation between subjects from Africa and the six other groups ranged from 0.101 (Papua New Guinea) to 0.232 (Southern Europe). Papua New Guinea had relatively modest degrees of population differentiation when compared to all populations except Asia, where it was very large. The least amount of population differentiation was found in comparisons between the United States, Northern Europe, and Southern Europe.

We also performed analyses on the subpopulations that comprised the Northern Europe (Britain, the Netherlands, and France) and the Southern Europe (Greece, Italy, and Spain) groups. FST values for all of these comparisons were between 0 and 0.03, suggestive of little differentiation (data not shown).

Selection

Several studies have identified signals of positive selection in pigmentation genes in subjects from East Asia, and Europe but not from those in Africa[2, 38, 39]. Whether or not positive selection is present at the MC1R locus, is an area of active investigation. Previous work suggested that the high degree of variation in MC1R is not due to selection but rather to a relaxation of functional constraint outside of Africa [25]. In order to further test this hypothesis we first determined Tajima's D statistic, a measure of the relationship between the number of segregating sites (SNPs) and nucleotide diversity (Table 3). It was not statistically significant (p > 0.05) in the populations from India, Asia, Papua New Guinea, and the United States, in which Tajima's D values were -0.72, 1.05, -0.33, and -1.10, respectively. The African population studied had Tajima's D value of -1.41 and a p-value of 0.048. Statistically significant and negative Tajima's D values were present in the Southern European population (-2.13, p = <0.001). Subgroup analyses of this population showed the same trend, with negative Tajima's D values and p-values <0.05 in Greece, Italy, and Spain. The combined Northern European group had a Tajima's D value of -1.53 (p = 0.026). However, only the population from the Netherlands had a statistically significant Tajima's D value in this group (-1.36, p = 0.047).

The FS test of neutrality developed by Fu (1997) and based on θ is a powerful method to further evaluate the polymorphic patterns under population growth and genetic hitchhiking. These values are shown in Table 3. Fu's FS was statistically significant in the grouped Southern European population (-27.92, p = <0.001) but only in the subpopulation from Spain (-8.44, p = 0.009). The Northern European population group also had a statistically significant FS value (-23.66, p = <0.001). The FS values were not statistically significant in the other populations. Fu and Li's D values were comparable in scope to Tajima's D in this study but p-values were not obtainable due to software limitations (data not shown).

Discussion

MC1R is a small, highly variant gene. This study evaluated the nucleotide diversity, population-specific differentiation, and tested for positive selection of MC1R based on a compilation of previously published data. We used allele frequencies from studies that reported sequencing the entire gene and used the 951 coding bp of MC1R as the reference sequence. The genotype data used for these analyses was derived from the sample size and reported AFs, as a result, we were unable to assess the haplotype structure of MC1R. Also, we could not assess the flanking sequences of MC1R because the exact regions sequenced were not reported in the published data.

We observed that nucleotide diversity, as measured by π and θ, was greater in MC1R than in other groups of genes. It should be noted that these differences may be somewhat skewed because our study evaluated only one, small gene, MC1R, while the other studies evaluated between 7 and 4950 different genes [3336]. Several other studies have noted the highly polymorphic nature of MC1R (compiled by Gerstenblith et al. [29]). Overall, the largest degree of nucleotide diversity was seen between Asia and all other populations, most likely due to the presence of the R163Q (c.488 G>A) SNP, which was present in 75% of Asians studied, versus less than 5% of any of the other six populations. It has been suggested that this allele may be present due to a bottleneck in Asian demographic history [25]. This allele has been shown to have reduced cell surface expression with corresponding impairment in cAMP coupling and effects in pigmentation [13].

The FST statistic was calculated to further assess the degree of population-specific differentiation in MC1R. Numerous population-specific SNPs and an overall high degree of population differentiation, as measured by FST between populations, are present in MC1R, particularly in Asians. Minimal differentiation was noted between Southern Europe, Northern Europe, and the United States. These individuals were identified in previous studies as Caucasian (i.e. of European descent), and likely share some degree of common ancestry. Although frequency of specific variants differ across populations of European descent (e.g. the allele frequency of the T allele of c.451C>T, p.R151C, was 1.9% in Greece but 10.2% in Britain/Ireland[29]), the sub-group analysis of the Northern and Southern European groups showed little evidence of population differentiation. This is consistent with other studies showing little among-population differentiation[25, 27, 30]. The African population in this study had moderate to large degrees of differentiation in comparison to other populations. This is consistent with prior MC1R SNP data showing fewer variants in individuals from Africa when compared to non-African populations[29].

Tajima's D statistic tests whether or not a gene or genomic region is evolving randomly (neutral evolution) or if the region is under selection (non-neutral evolution). It is based on the spectrum of AFs at different sites, as well as on population size. Tajima's D statistic was used to test MC1R for the presence of selection. These data, which are based on the very large sample size described herein, suggest that positive selection may be present in the Southern European population as a whole, as well as in its three subgroups, Greece, Italy, and Spain, based on the presence of negative Tajima D values and statistically significant p-values. The data also suggest the presence of some degree of positive selection in the Northern European population; but only the subgroup from the Netherlands had statistically significant p-values. It should be noted that Tajima's D statistic assumes that all nucleotides are equally mutable, subject to the same population dynamics, and can be misleading if a significant population bottleneck occurred. We also used Fu's FS test to address the presence of positive selection versus neutral evolution of MC1R. This was statistically significant only in the Southern and Northern European groups and the subpopulation from Spain and further suggests the presence of positive selection in the European populations. However, our data are also limited because we were only able to study the 951 bp of coding sequence in MC1R and were not able to assess the larger genomic region.

Several studies have evaluated genetic adaptation of the MC1R gene for evidence of positive selection with conflicting results. Some studies suggested that purifying selection is present in Africa and that relaxation of functional constraint in non-African populations, instead of positive selection, is present[25, 27, 40]. On the other hand, most recent studies have found evidence of positive selection at other pigmentation genes. For example, Myles et al [2] found evidence for positive selection in the DCT gene among individuals of Chinese ancestry. In their study, MC1R interpretations were limited because of the different SNPs genotyped between the Perlegen and HapMap data sets studied. In a study of 118 putative skin pigmentation genes, data were consistent with positive selection in subjects from Europe (OCA2, TYRP1, and KITLG) and in Asians (DCT, EGFR, and DRD2)[38]. Unfortunately, MC1R could not be evaluated in that study due to ascertainment criteria. It was also suggested that at least weak, recent positive selection may be present in MC1R, based on the AF variability between CEPH Utah and East Asian HapMap samples[3]. Our data suggest that MC1R may be under positive selection in some populations, although additional studies are needed to further evaluate this finding.

Conclusion

This study further quantifies the degree of MC1R genetic variation, illustrates the complexity of this variation across numerous populations, and suggests that positive selection plays a role in European populations. Understanding of population-specific genetic variation in MC1R and the role it plays not just in skin pigmentation, but sun sensitivity and melanoma risk, has the potential to impact clinical care and public health.

Methods

AF data of MC1R SNPs from populations around the world were compiled as described [29], from twenty-two skin cancer case-control and population studies that fully sequenced MC1R in distinct populations. From the studies included in Gerstenblith et al [29], we restricted our analyses to those that included data from healthy, control individuals [7, 8, 16, 18, 19, 21, 22, 25, 27, 4047]. We excluded studies that noted only the presence of a SNP but not actual AFs [4, 48, 49], that measured AFs on family members [4, 9, 18], that were restricted to the extremes of hair and skin color phenotypes [6], or to ethnically diverse groups in which it was not possible to determine the AFs for each ethnic group [17]. In addition, the study of MC1R variants in individuals from Spain was also included [22]. Populations from Europe and the United States were identified as Caucasian individuals in these studies. European populations were combined based on geographic locations. Northern Europe consisted on subjects from Britain, France, and the Netherlands. Southern Europe consisted of subjects from Greece, Italy, and Spain. The populations and studies from which they were derived are shown in Table 1.

Genotype data files were created from AFs for each population (Table 2) and were then analyzed in DNAsp version 4.0 [50]. The 951 base pairs (bp) of coding MC1R sequence (NM_002386) was used as the template. Since data files were created from AFs, we were unable to assess the haplotype structure of MC1R. Analyses were performed that were not dependent on haplotype structure. Nucleotide diversity (π), the average number of nucleotide substitutions per site between two sequences, was calculated with the Jukes and Cantor correction [51]. Theta (θ), the population mutation parameter (two times the mutation rate per site per generation times the number of heritable units in the population), was calculated on a base pair basis from the total number of segregating sites (SNPs) under the no recombination model [52]. Genetic differentiation among populations was measured by FST[53]. Arlequin (v3.11) was used to determine Tajima's D statistic and Fu's FS test[54] under a neutral model with 1000 simulations [55, 56]. Other genes and corresponding estimates of population differentiation were selected for comparison with MC1R values [3336].