First, we tested the genotyped SNPs for hair color association in our study sample. Although variation in MC1R is usually attributed to red hair color (Branicki et al. 2007; Grimes et al. 2001; Valverde et al. 1995), the compound variant MC1R-R in our study was significantly associated with all but one (auburn) hair color category, albeit its association was strongest with red hair (allelic OR: 12.6; 95% CI: [7.0–22.7]; P = 2.5×10−17; Table 1). The lack of association of the MC1R-R variant with auburn hair color may be caused by the small sample size of the auburn category and/or problems with correct classification of this hair color as reported elsewhere (Mengel-From et al. 2009). Furthermore, MC1R-R showed a clear recessive effect and a compound-heterozygote effect in that the R/R genotype carriers were much more likely to have red hair (genotypic OR: 262.2; 95% CI: [65.2–1,055.3]; P = 4.5 × 10−15) than the wt/R carriers (genotypic OR: 5.6; 95% CI: [2.5–12.6]; P = 4.0 × 10−5; Supplementary Table S2). The stronger association of MC1R SNPs with red hair than with non-red hair colors as observed here was also found previously (Han et al. 2008; Sulem et al. 2007). The SNP rs12913832 in the HERC2 gene was significantly associated with all hair color categories, most significantly with brown (allelic OR for T vs. C: 3.5; 95% CI: [2.0–6.1]; P = 1.3 × 10−5) and black (allelic OR: 3.3; 95% CI: [2.0–5.6]; P = 4.3 × 10−6; Table 1) hair. The T allele of rs12913832 showed a dominant effect on darker hair color in that the heterozygote carriers had a further increased OR of black hair (genotypic OR: 8.6; 95% CI: [3.9–18.9]; P = 7.2 × 10−8; Supplementary Table S2). This SNP was associated with total hair melanin in a recent study (Valenzuela et al. 2010). A previous study found HERC2 SNPs significantly associated with non-red, but not with red, hair colors (Sulem et al. 2007), and another one reported HERC2 association only with dark hair color (Mengel-From et al. 2009). However, an additional study found HERC2 association with all hair colors, albeit reported stronger association with non-red hair colors than with red hair (Han et al., 2008), in agreement with our findings. Additional SNPs in MC1R and HERC2 were also significantly associated with several hair colors (Table 1). Except for MC1R and HERC2 genes, no significant evidence of a dominant or a recessive effects on hair color was found for any other gene studied (Supplementary Table S2). SNPs in SLC45A2 (rs28777 allelic OR for C vs. G: 7.05; 95% CI: [2.2–22.3]; P = 0.001), IRF4 (rs12203592 allelic OR for T vs. C: 7.05; 95% CI: [2.2–22.3]; P = 0.01), and EXOC2 (rs4959270 allelic OR for A vs. C: 0.56; 95% CI: [0.35–0.91]; P = 0.02) were most significantly associated with black hair color (Table 1), in line with the previous reports (Han et al. 2008; Mengel-From et al. 2009). Further, an association of SLC45A2 with total hair melanin was reported (Valenzuela et al. 2010). SNPs in the ASIP gene were associated with red (rs2378249, P = 0.02), dark blond (rs2378249, P = 0.02), and blond-red (rs1015362, P = 0.04; Table 1). Significant ASIP association with red hair was reported previously (Sulem et al. 2008), as well as with total hair melanin (Valenzuela et al. 2010). The OCA2 gene was most significantly associated with brown hair color (rs4778138, P = 0.03), confirming previous findings of OCA2 involvement in hair color variation (Han et al. 2008; Mengel-From et al. 2009; Valenzuela et al. 2010), although one previous GWAS did not find significant evidence (Sulem et al. 2007). The TYR gene was significantly associated with brown (rs1393350, P = 0.02) and the SLC24A4 gene with blond (rs4904868, P = 0.04) and dark blond (rs2402130, P = 0.03). These results are largely consistent with previous findings (Sulem et al. 2007; Han et al. 2008; Mengel-From et al. 2009). Overall, at least one SNP in 9 out of the 12 genes studied showed significant association with certain hair color categories in our sample (Table 1). For three genes (TYRP1, TPCN2, and KITLG) the SNPs tested did not reveal statistically significant hair color association (but see below for the predictive effects of two of these genes), although these genes have been implicated in human hair color variation elsewhere (Sulem et al. 2007, 2008; Valenzuela et al. 2010; Mengel-From et al. 2009). This discrepancy may be influenced by the relatively small sample size in our study and the putatively smaller effect size of these three genes relative to the other genes studied.
The main goal of this study, however, was to investigate the predictive value of hair color associated SNPs as established in previous, and (mostly) confirmed in the present study. DNA-based prediction accuracies for hair color categories were evaluated by means of the area under the ROC curves (AUC), ranging from 0.5 (random) to 1 (perfect) prediction. Our model revealed that 13 single or combined (MC1R-R and MC1R-r) genetic variants from all, but one (TPCN2) of the 12 genes investigated contribute independently to the AUC value (Table 2) for 4 (Fig. 1a) and 7 hair color categories (Fig. 1b). As may be expected from the association results, MC1R_R has the most predictive power on red hair (AUC 0.86–0.88), and its predictive effect on non-red hair colors was considerably lower (AUC 0.63–0.68, Fig. 1). The HERC2 SNP rs12913832, when added to MC1R_R in the model, contributed most of all other genetic predictors to the accuracy for predicting all color categories (ΔAUC 0.08 for blond, 0.12 for brown, 0.03 for red, and 0.13 for black, Fig. 1). Adding the remaining 11 independent genetic predictors provides accuracy increase and usually with decreasing effects while increasing the number of markers (Fig. 1). Notably, some SNPs without statistically significant hair color association in our study (P > 0.05) did provide independent information toward hair color prediction (such as rs1042602 in TYR, rs683 in TYRP1, and rs12821256 in KITLG). Only the non-synonymous SNPs from the TPCN2 gene tested did not contribute to the prediction model and did not show a statistically significant association with any hair color category. Rs35264875 and rs3829241 in TPCN2 had been discovered recently as significantly associated with blond versus brown hair color in Icelandic and replicated in Icelandic and Dutch people (Sulem et al. 2008). Predicting each color type separately using binary logistic regression yield slightly lower accuracy compared to the multinomial model (Supplementary Table S3).
Table 2 Parameters of the prediction model based on multinomial logistic regression in a Polish sample
Overall, hair color prediction with 13 DNA components from 11 genes showed very good accuracy without cross-validation, such as AUC for blond = 0.81, brown = 0.82, red = 0.93, black = 0.87 in the 4 category model (Table 3; Fig. 1a), and AUC for blond = 0.78, d-blond = 0.73, brown = 0.82, auburn = 0.82, b-red = 0.92, red = 0.94, black = 0.88 (Table 3; Fig. 1b) when considering 7 categories. The mean accuracies derived from 1,000 cross-validations are somewhat lower for all hair color categories (least so for red), likely because of sample size effects as the rare alleles with large effects are not well captured in the training sets (Table 3).
Table 3 Hair color prediction accuracy using 13 genetic markers in a Polish sample
In general, the sensitivities for predicting brown, red, and black colors were considerably lower than the respective specificities, except for blond in the 4 categories and dark blond in the 7 categories (Table 3). The very low sensitivities for brown may reflect uncertainties in distinguishing between the dark-blond and brown colors on one side, and between the auburn, red and blond-red colors on the other side during phenotyping, as well as an additional sample size effect for auburn representing the smallest hair color group in our study (N = 12). However, the final model showed a good power to discriminate highly similar hair color categories, such as red and blond-red, as well as between blond and dark-blond (Table 3), underlining the value of the genetic markers involved in our hair color prediction model.
The ROC curves from the final model (Fig. 2) provide practical guides for the choices between desired false positive thresholds (1-specificity) and expected true positive rates (sensitivity) for predicting all color categories. For example, if the desired false positive threshold is 0.2 (in other words, if we use the predicted probability of P > 0.8 as the threshold for prediction, thus we know that we have at least 80% chance to be correct), then the expected true positive rates (or sensitivities) are 0.61 for blond (meaning that if a person has blond hair, our model provides a 61% chance to predict him/her as blond), 0.69 for brown, 0.78 for black, and 0.88 for red. Notably, incorrect predictions fall more frequently in the neighboring category than in a more distant category, so the predictive information can still provide useful information.
We noticed that the prediction accuracies for the blond and brown colors were somewhat lower than those for black and red colors. One reason for this difference may be in the environmental rather than genetic contribution to hair color variation. Hair color changes in some individuals during adolescence and such change is most often from blond to brown (Rees 2003). Since in our study we used adult individuals, those volunteers who had experienced such specific hair color change when being younger were grouped most likely in the brown hair category, although they may have blond associated genotypes. Consequently, these individuals would have lowered the prediction accuracy for brown relative to the brown-haired individuals who have not changed from blond. Our study design did not allow recording age-dependent hair color change, but this factor may be considered and tested in future studies. Although, volunteers in the red hair color group of our study was significantly younger at time of sampling than people in any other hair color category groups (P < 0.01), including age in the prediction modeling had only very little impact on the accuracy (AUC change <0.01). The age difference is most likely due to our targeted sampling procedure in which the red hair color category was over-sampled in young individuals (see material section for further details). In this study, gender was not significantly associated with any hair color and had no significant impact on hair color prediction accuracy.
Model-based hair color prediction analysis was also performed in a previous study using SNPs from MC1R, OCA2/HERC2, SLC24A4, TYR, KITLG and a marker from the region 6p25.3 (Sulem et al. 2007) that is close to the IRF4 and EXOC2 hair color candidate genes (Han et al. 2008). However, the prediction approach used by Sulem et al. (2007) is not directly comparable with ours; they applied a two-step approach and the steps did not only differ in the predicted hair colors, but also in the genetic markers used. First, they predicted red hair by only using the two most important red hair associated polymorphisms in MC1R (rs1805007 and rs1805008) and found that from those Icelandic individuals (used for replication) who were predicted with >50% probability to have red hair, about 70% indeed had red hair. To make these previous findings more comparable with ours, we performed red hair prediction in our data by using only rs1805007 and rs1805008 as used by Sulem et al. (2007) and received an AUC of 0.83. Notably, this value is considerably lower than the one we received for red hair using all markers analyzed in the present study (0.93 or 0.94). Hence, we can conclude that the additional SNPs we used in our full model, in particular the additional MC1R SNPs, improved red hair color prediction accuracy in our study. In a second step, Sulem et al. (2007) used associated SNPs from all 6 loci to predict blond, dark blond/light brown, and brown/black hair color categories. They found in their Icelandic replication set that among the individuals for whom brown hair color was predicted with >50% probability, about 60% indeed had brown/black hair. However, their prediction results were much less convincing for blond since, from the individuals predicted to be blond with only >40% probability (the highest threshold reported for blond), less than 50% were indeed blond, but about 50% were dark blond/brown and a few percentage were dark or red. Performing AUC prediction in our samples only with the SNPs used by Sulem et al. (2007) resulted in AUC values of 0.69 for blond, 0.71 for brown, and 0.75 for black. Again, AUCs for all non-red hair color categories as achieved in the present study considerably exceed those estimated from the markers used by Sulem et al. (2007), which demonstrates the extra value of the additional markers we included in our model for accurate prediction also of non-red hair colors. A recently published candidate gene study employed linear regression modeling using SNPs from hair color candidate genes and found that three SNPs in HERC2, SLC45A2 and SLC24A5 together explain 76% of total hair melanin in the study population (Valenzuela et al. 2010).
It has been shown that the least absolute shrinkage and selection operator (LASSO) approach (Tibshirani 1996) can be used to estimate marker effects of thousands of SNPs in linkage disequilibrium (LD) (Usai et al. 2009). Because some of the SNPs included in our study were in LD, we additionally performed the multinomial LASSO regression and compared the prediction results with those from our multinomial logistic regression model. The AUC estimates from LASSO using all samples (AUC blond = 0.88, brown = 0.89, red = 0.96, black = 0.96) are slightly higher than the ones from the multinomial logistic regression (Table 3). However, the average AUC values from the 1,000 cross-validations of the LASSO approach (AUC blond = 0.66, brown = 0.62, red = 0.86, and black = 0.76) are considerably lower than the ones obtained from all samples with the same approach, and are also lower than the results from the multinomial logistic regression (Table 3). This may indicate that there is a potential over-fitting problem in the LASSO method and our data.
Because the sample size used in this study is relatively small (N = 385), we estimated the effect of the total sample size on the accuracy of pigmentation prediction using a bootstrap analysis of the eye color data published previously (Liu et al. 2009), in which a AUC value of 0.91 was obtained for predicting blue eye color based on a large population sample (N = 6,168). As evident from Supplementary Figure S1, if the total sample size is smaller than 300 individuals, the AUC value for blue eye color tends to be under-estimated with large confidence intervals. For example, with only 100 samples the mean AUC value from 1,000 bootstrap analyses was considerably lower (AUC = 0.85, 95% CI: [0.6–1.0]; Figure S1) than the value of 0.91 as achieved with thousands of samples (Liu et al. 2009). However, this effect quickly diminishes when the sample size increases, and with about 350 samples the mean AUC value was close (AUC = 0.90, 95% CI: [0.80–0.97]; Figure S1) to the value obtained from thousands of samples, and only increased marginally until 800 samples. From this example of blue eye color we may extrapolate that the AUCs for hair color obtained from the 385 samples included in the present study (which are similar to the AUC obtained for blue eye color) are unlikely to change drastically when more individuals are added to the hair color model.
Many genetic studies on hair color (as well as eye and skin color) use phenotypic information provided by self-assessment, i.e. questionnaires filled out by the individual participants (e.g. Sulem et al. 2007, 2008; Han et al. 2008), which may be expected not to be completely reliable. To avoid hair color phenotype uncertainties potentially generated by such multiple-observer approach, we performed single-observer hair color grading in the present study. Some studies applied quantitative measures of hair color (Valenzuela et al. 2010; Mengel-From et al. 2009; Shekar et al. 2008). However, it is not clear how these methods as well as self-assessment and single-observer hair color categorization compare to each other and what the impact on DNA-based prediction accuracies is. On the one hand Vaughn et al. (2008) in a phenotypic study found some differences between single-observer hair color grading and spectrophotometric measurement, but the sample size was low (with about 100 individuals). On the other hand Shekar et al. (2008) in a genetic study could not confirm the utility of spectrophotometric estimation in relation to hair color rating. The single-observer grading approach we applied in the present study was found to be more accurate than using self-assessed hair color grading (Vaughn et al. 2008).
In conclusion, we demonstrated that human hair color categories can be accurately predicted from a relatively small number of DNA variants. The prediction accuracies achieved here for red and black hair color were in the similarly high precision range as previously obtained for blue and brown eye color, for which practical applications has already been implemented (Walsh et al. 2010a, b). Slightly lower prediction accuracies obtained here for blond and brown hair color, which were still higher than previously observed for non-blue/non-brown eye color (Liu et al. 2009), may be influenced by age-dependent hair color change during adolescence, which shall be investigated in more detail in future studies. Although our example of using eye color to monitor the effect of sample size to the AUC-based prediction accuracy of pigmentation traits indicate that the sample size used here for hair color prediction is large enough to obtain a reasonably accurate prediction model, our results may be further replicated in a larger study. Furthermore, it shall be tested in future studies if and to what extent SNPs from other genes with recently reported hair color association not used here add to the hair color prediction accuracy as presented. Overall, we evidently present hair color as the third externally visible characteristic that can be reliably predicted from DNA data after iris color (Liu et al. 2009; Walsh et al. 2010a, b; Valenzuela et al. 2010; Mengel-From et al. 2010), and human age, the latter demonstrated recently using quantification of T-cell DNA rearrangement (Zubakov et al. 2010). We therefore expect DNA-based hair color prediction, e.g. using the markers suggested here, to be used in future practical applications, such as in the forensic context. Furthermore, our study demonstrates that markers not statistically significantly associated with a trait in a study population can still independently contribute to the trait prediction in the same population, a notion that shall be considered in the design of future genetic prediction studies, including for diseases risks.