A global view of the OCA2-HERC2 region and pigmentation
Mutations in the gene OCA2 are responsible for oculocutaneous albinism type 2, but polymorphisms in and around OCA2 have also been associated with normal pigment variation. In Europeans, three haplotypes in the region have been shown to be associated with eye pigmentation and a missense SNP (rs1800407) has been associated with green/hazel eyes (Branicki et al. in Ann Hum Genet 73:160–170, 2009). In addition, a missense mutation (rs1800414) is a candidate for light skin pigmentation in East Asia (Yuasa et al. in Biochem Genet 45:535–542, 2007; Anno et al. in Int J Biol Sci 4, 2008). We have genotyped 3,432 individuals from 72 populations for 21 SNPs in the OCA2-HERC2 region including those previously associated with eye or skin pigmentation. We report that the blue-eye associated alleles at all three haplotypes were found at high frequencies in Europe; however, one is restricted to Europe and surrounding regions, while the other two are found at moderate to high frequencies throughout the world. We also observed that the derived allele of rs1800414 is essentially limited to East Asia where it is found at high frequencies. Long-range haplotype tests provide evidence of selection for the blue-eye allele at the three haplotyped systems but not for the green/hazel eye SNP allele. We also saw evidence of selection at the derived allele of rs1800414 in East Asia. Our data suggest that the haplotype restricted to Europe is the strongest marker for blue eyes globally and add further inferential evidence that the derived allele of rs1800414 is an East Asian skin pigmentation allele.
Many genes have been associated with normal variation in human pigmentation (Sturm 2009; Sturm and Larsson 2009). Of those, OCA2 [MIM 611409], named for an abnormal pigmentation phenotype, oculocutaneous albinism type II (OCA2 [MIM 203200]), is a large gene extending over 300 kb on chromosome 15. OCA2 encodes the protein P, a transmembrane protein, and has been shown to play a role in pigmentation in both humans and mice (Frudakis et al. 2003). In humans, it has been implicated in iris, skin, and hair pigmentation (Duffy et al. 2007; Sturm et al. 2008; Kayser et al. 2008; Sulem et al. 2007). The exact function of P is unknown though it has been suggested to process and traffic tyrosinase, regulate melanosomal pH, or regulate glutathione metabolism (Toyofuku et al. 2002; Staleva et al. (2002); Sturm et al. 2001; Edwards et al. 2010).
The derived allele at a missense SNP (rs1800414, His615Arg) in exon 19 of OCA2 has been reported to be specific to East Asia (Yuasa et al. 2007; Anno et al. 2008). Edwards et al. (2010) showed an association between the derived allele of rs1800414 (C, 615Arg) and lighter skin pigmentation in a sample of individuals of East Asian ancestry from Canada and confirmed their results using an independent sample of Han Chinese.
Here we present our results on the global distributions of haplotypes and specific SNPs in the region of OCA2 and HERC2, genes that have been implicated in pigmentation variation in Europeans and East Asians. We also examine the LD between the SNPs and haplotypes of interest. Finally, we use long-range haplotype tests to show that OCA2 is or has been under selection in Europe and the derived allele of rs1800414 is, or has been, under selection in East Asia.
Materials and methods
SW Asia (7)
Central Asia (12)
Pacific Islands (4)
East Asia (21)
Chinese, San Francisco
Total Kidd lab samples
Total HGDP samples
DNA was extracted from lymphoblastoid cell lines for 57 of the population samples. The cell lines were established and/or maintained using common techniques described elsewhere (Anderson and Gusella 1984) in the lab of Kenneth K. and Judith R. Kidd at Yale University. Some cell lines were established by the Coriell Cell Repositories and by the National Laboratory for the Genetics of Israeli Populations at Tel Aviv University. The DNA for the 15 other population samples was obtained as DNA only from colleagues or the Coriell Cell Repositories (see Supplemental data). All samples were collected with informed consent by participants and with approval by all relevant institutional review boards.
Whole genome amplification
For the 15 DNA-only population samples, the DNAs were initially whole genome amplified using multiple displacement amplification (MDA), as described in Li et al. (2008a).
The 21 SNPs studied
Position in NCBI build 36.1
Ancestral allele frequency range
OCA2 East Asian
In addition to the data we generated, where available, we included data from the HapMap and the HGDP 650 k panel for rs4778138, rs4778241, rs7495174, rs12913832, and rs1667394 (Li et al. 2008b; Jakobsson et al. 2008). We omitted the HGDP data for those individuals who are part of our laboratory’s cell line collection and typed in our laboratory because we have larger sample sizes. All haplotypes were estimated using fastPHASE, and frequency maps were created using Surfer (ver 7) (Scheet and Stephens 2006). LD was calculated and LD figures were generated using HAPLOT with default parameters (Gu et al. 2005). For the selection studies we used relative extended haplotype homozygosity (REHH) and where applicable normalized haplosimilarity (nHS) (Sabeti et al. 2006; Hanchard et al. 2006). REHH and nHS are both based on the logical assumption that a variant under selection will rise to high frequency quickly before recombination has time to break down the extended haplotype on which the variant initially arose. In contrast, a neutral variant will take longer to reach a high frequency, allowing the extended haplotype time to be degraded by recombination. For the REHH test, a core haplotype containing the variant of interest is selected, an extended haplotype homozygosity score is then determined for each of the remaining SNPs moving outward from the core haplotype in each direction. Relative EHH scores weighted for allele frequency are then calculated for each of the non-core SNPs for each allele of the core haplotype, the scores of the SNP(s) furthest from the core are then tested for significance using 1,000 neutral simulations. nHS uses a moving window to determine a z-score for the least frequent allele of all SNPs in the dataset; again each z-score is compared to 1,000 datasets simulated under neutral conditions to determine if any show evidence of selection. Since nHS can only calculate a z-score for the least frequent allele of a given variant, it was only used when the allele of interest had a frequency <0.5. REHH and nHS was calculated using pselect (Han et al. 2007). Simulated data were created using Hudson’s ms (Hudson 2002). Two demographic models were used; the first was a model of a constant population size, and the second was a model of a bottleneck followed by an exponential expansion (a population starting 4,000 generations ago with a bottleneck occurring 1,600 generations ago and dropping the effective population size from 10,000 to 2,000 followed by an exponential expansion starting 400 generations ago leading to a population size of 100,000).
The allele frequencies for all 21 SNPs in all 73 population samples we genotyped are available in ALFRED (http://alfred.med.yale.edu) under the OCA2 and HERC2 loci or directly for each SNP by using the rs number in Table 2 as a keyword. As shown in Table 2, almost all of the SNPs had very large global allele frequency ranges, though for most SNPs the highest derived allele frequencies are found in Europeans. Other than rs1800407, with a range from 0.890 to 1.000 for the ancestral allele, the global allele frequency ranges are all above 0.7.
Blue-eye associated haplotypes
Definition of “blue-eye” haplotypes (BEHs)
Blue-eye associated allele
rs4778138, rs4778241, rs7495174
Geographic distributions of haplotypes
Geographic distribution of the derived allele of rs1800407
The T allele of rs1800407 has also been associated with blue-eye penetrance (Sturm et al. 2008). We estimated haplotype frequencies for haplotypes containing rs1800407 and the three BEHs (supplemental Fig. 1). The first observation is that the blue-eye associated alleles of the three BEHs are much more common than the derived allele of rs1800407. At BEH1, the T allele of rs1800407 most commonly occurs with the AAA allele and not the ACA allele that has been associated with blue eyes. The T allele with the ACA blue-eye associated allele is the second most common combination. Other combinations occur but they are rare. The T allele of rs1800407, when seen, is commonly paired with the blue-eye associated TG allele at BEH2 only in Northern and Eastern Europeans. This association may explain the increased blue-eye penetrance seen by Sturm et al. (2008) as a type of ascertainment effect. Elsewhere the T allele is more likely to be found paired with the CA allele. We see a similar pattern at BEH3 as we see at BEH2. The blue-eye associated CA allele of BEH3 commonly pairs with the T allele only in Northwestern and Eastern Europe and the TG allele is its most common partner elsewhere.
Geographic distribution of the derived allele of rs1800414
Haplotypes and LD
We saw no evidence for selection at any of the pigmentation regions in Africa or the Americas (supplemental Figs. 9 and 10).
Distribution of blue-eye associated alleles
The frequencies of the haplotypes associated with blue eyes of the three blue-eye associated haplotypes in the OCA2 and HERC2 genes are very similar in Northwestern and Eastern Europe where all three haplotypes have their highest frequencies (Fig. 2). This also holds true for homozygotes of the blue-eye associated alleles of these haplotypes (Supplemental Fig. 11). All three blue-eye associated alleles and homozygotes of these alleles are also present in Southern Europe and Southwest Asia at lower frequencies than those found in Northwestern and Eastern Europe; however, the frequencies of the TG allele of BEH2 and its homozygotes are lower than those of the ACA allele of BEH1 and the CA allele of BEH3. Outside of Europe, the blue-eye associated alleles of BEH1 and BEH2 are still common and homozygotes of these alleles are still seen but the blue-eye associated allele of BEH2 is much rarer and blue-eye associated homozygotes are virtually unseen.
Given the strong LD in Europe across all three haplotype systems, their association with the blue eye phenotype in Europe is understandable. However, these frequency data for other populations around the world and the essential restriction of blue eyes to Europe, shows that the BEH1 and BEH3 haplotype systems, and the composing SNPs are not universal markers of blue eyes. The TG allele at BEH2 is the best marker for blue eyes and may even contain the causal allele though the actual causative variant could be anywhere in the region of strong LD seen in European populations.
Global distribution of the light skin allele
We have shown that the C allele of the missense SNP rs1800414 is found almost exclusively in East Asia (Fig. 4). Within East Asia there is a general cline in the frequency of the C allele with the lowest frequencies in Western China, midrange frequencies in Southeast Asia, and high frequencies in Eastern East Asia. The major exception to this pattern is the Malaysians; in our small sample the derived allele is absent, but the Malays are an Austronesian group and they show similar frequencies to our other Austronesian populations (Micronesians and Samoans).
Selection in the OCA2-HERC2 region
We showed that the strongest signal of selection in Europe and Southwest Asia is at the TG allele of BEH2 and any signal seen at BEH1 and BEH3 is likely due to hitchhiking (Figs. 8, 9). Along with the distribution data, this strongly suggests that the TG allele of BEH2 is, contains, or is in strong LD with the blue eye causal mutation. It is possible that BEH2 is in the promoter region of OCA2 and the blue eye allele lowers the amount of OCA2 expressed either in the iris or globally.
This result also raises the question of why blue eyes would be under selection. Since there is no known biological advantage to having blue eyes, we think a likely answer is sexual selection that in Europe and Southwest Asia individuals with blue eyes are, or were, preferred as mates. Another possible explanation is that the blue eye phenotype is not being selected for; rather the TG allele of BEH2 has another phenotype, such as lighter skin pigmentation, which is under selection.
In East Asia, we show that the C allele of the missense SNP rs1800414 is also under selection (Fig. 10). Again this result is not completely unexpected since this allele has been associated with lighter skin pigmentation in East Asians, and variants affecting skin pigmentation have previously been shown to be targets of selection (Edwards et al. 2010; Izagirre et al. 2006; Lao et al. 2007; Norton et al. 2007).
We have shown that the TG allele of BEH2 has a much more restricted global distribution compared to the ACA allele of BEH1 and the CA allele of BEH3, the other two haplotypes published as associated with blue eyes (Duffy et al. 2007; Sturm et al. 2008; Kayser et al. 2008; Sulem et al. 2007;Mengel-From et al. 2010; Walsh et al. 2010). We also show that the TG allele of BEH2 has a strong signal of selection. Cook et al. (2009) showed melanocytes homozygous for the blue-eye associated allele of rs12913832 of BEH2 produced significantly less melanin than heterozygotes or those that were homozygous for the ancestral allele, but did not control for other SNPs in the region. This evidence suggests that BEH2 may contain the causal allele for blue eyes or at minimum is the best marker for the region in LD that does contain the causal allele. We have also shown that the C allele of rs1800414 is both restricted to East Asia and under selection in that region. This research provides further evidence for lighter pigmentation evolving by means of selection at least partly independently in Europeans and East Asians but at some genes in common.
These results, taken together with those from several forensic studies predicting iris pigmentation in mixed populations (Mengel-From et al. 2010; Spichenok et al. 2010; Valenzuela et al. 2010; Walsh et al. 2010; Pospiech et al. 2011), suggest that the SNPs of BEH2 (rs1129038 and rs12913832) are the best markers for blue eyes for forensic purposes. A recent study by Liu et al. (2010) found that rs12913832 has the strongest effect when eye color is measured quantitatively and can explain most of the variance in eye color amongst Europeans. However, several questions need to be answered. Are the SNPs in BEH2 responsible for the blue eye phenotype seen in Europeans or simply in strong LD with the causative allele? Is BEH2 in a promoter region for OCA2? Are blue eyes under sexual selection or is the TG allele also responsible for an additional selected phenotype such as light skin pigmentation? Both Eiberg et al. (2008) and Sturm et al. (2008) suggest that the BEH2 falls into a regulatory region of OCA2; however, Eiberg et al. believe the causal allele is a 166 kb haplotype that happens to contain the two SNPs of BEH2 and Sturm et al. suggest that rs12913832 is the causal allele. Eiberg et al. based their conclusion on lower activity when they used their blue-eye associated haplotype in a luciferase assay compared to other haplotypes. Sturm et al. based their conclusion on not finding a better associated SNP of known SNPs in the 5′ region of OCA2 or the 3′ end of HERC2 and that the probability of there being an unknown SNP with a stronger association was unlikely. Further research will be needed to answer these questions.
This research was funded in part by National Institutes of Health Grant GM57672 and National Institute of Justice, Office of Justice Programs, US Department of Justice Grants 2007-DN-BX-K197, 2010-DN-BX-K225 awarded to KKK. Points of view in this document are those of the authors and do not necessarily represent the official position or policies of the US Department of Justice. We would like to acknowledge all our collaborators who helped collect the samples used in this research as well as the National Laboratory for the Genetics of Israeli Populations at Tel Aviv University and the Coriell Cell Repositories. Finally we would like to thank the thousands of individuals who donated samples without whom this research would not be possible.
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.