The concept of personalized medicine assumes that prediction of phenotypes based on genome information can enable better prognosis, prevention and medical care which can be tailored individually (Brand et al. 2008; Janssens and van Duijn 2008). However, practical application of genome-based information to medicine requires the disease risk to be predicted with high accuracy, while knowledge on genetics of common complex diseases is still insufficient to allow their accurate prediction solely from DNA data (Alaerts and Del-Favero 2009; Chung et al. 2010; Ku et al. 2010; McCarthy and Zeggini 2009). Another potential application for prediction of phenotypes from genotypes is forensic science. Knowledge gained on externally visible characteristics (EVC) from genotype data obtained by examination of crime scene samples may be used for investigative intelligence purposes, especially in suspect-less cases (Kayser and Schneider 2009). The idea is based on using DNA-predicted EVC information to encircle a perpetrator in a larger population of unknown suspects. Such approach could also be useful in cases pertaining identification of human remains by extending anthropological findings on physical appearance of an identified individual. However, the genetic understanding of human appearance is still in its infancy. One notable exception is eye (iris) color, where previous candidate gene studies and especially recent genome-wide association studies (GWAS) revealed 15 genes involved (Eiberg et al. 2008; Frudakis et al. 2003; Graf et al. 2005; Han et al. 2008; Kanetsky et al. 2002; Kayser et al. 2008; Liu et al. 2010; Rebbeck et al. 2002; Sulem et al. 2007). One of them, HERC2, harbors genetic variation most strongly associated with human eye color variation (Eiberg et al. 2008; Kayser et al. 2008; Liu et al. 2010; Sturm et al. 2008). Moreover, a recent systematic study investigating the predictive value of eye color associated single nucleotide polymorphisms (SNPs) (Liu et al. 2009) found that a model with 15 SNPs from 8 genes predicts categorized blue and brown eye color with high accuracies with a subset of only 6 SNPs covering most of the predictive information. The IrisPlex system employing these six SNPs and a prediction model based on data from thousands of Europeans was recently developed and validated for DNA prediction of human eye color in forensic applications (Walsh et al. 2010a, b). Furthermore, a recent GWAS on quantitative eye color explained about 50% of continuous eye color variation by genetic factors (Liu et al. 2010).

The recent progress on DNA prediction of human eye color raises expectations for the DNA prediction of other human pigmentation traits, such as hair color. Inheritance of one particular hair color in humans i.e., red hair, has already been explained to a significant degree. Valverde et al. (1995) have found that red hair color is mainly associated with polymorphisms in the MC1R gene. This information has been confirmed since in many other studies performed on various population samples (Box et al. 1997; Han et al. 2008; Harding et al. 2000; Flanagan et al. 2000; Kanetsky et al. 2004; Pastorino et al. 2004; Rana et al. 1999; Sulem et al. 2007). MC1R SNPs are fairly indicative for red hair and thus have already been implemented in forensic science (Branicki et al. 2007; Grimes et al. 2001), but the practical application of red hair color prediction (without the ability to predict other hair colors) strongly depends on the population it is applied to, given the strong differences in red hair color frequency between populations. Additional data on red hair color inheritance came from a recent GWAS in Icelanders, which revealed two SNPs in the ASIP gene, representing a MC1R antagonist, as significantly associated with red hair color (Sulem et al. 2008). Moreover, a position in the 3′-UTR of the ASIP gene was previously associated with dark hair color in European populations (Kanetsky et al. 2002; Voisey et al. 2006) using a candidate gene approach. The candidate gene approach also delivered two non-synonymous SNPs in SLC45A2 (MATP) with association to dark hair color in another study with several confirmation studies (Branicki et al. 2008a; Fernandez et al. 2008; Graf et al. 2005). Via several large GWASs various SNPs in/nearby genes in addition to MC1R, ASIP and SLC45A2 have been found with association to human hair color variation, such as OCA2, HERC2, SLC24A4, KITLG, TYR, TPCN2, TYRP1, IRF4, EXOC2, KIF26A, and OBSCN (Han et al. 2008; Sulem et al. 2007, 2008). Furthermore, two recently published studies tested for hair color association of SNPs from a large number of candidate genes, not only confirmed some previously known hair color genes, such as KITLG, OCA2, MC1R, TYRP1, TYR, SLC45A2, HERC2, ASIP, but additionally reported association with quantitative measures of hair color of SNPs in additional genes, such as SLC24A5, MYO5A, MYO7A, MLPH, GPR143, DCT, HPS3, GNAS, PRKARIA, ERCC6, and DTNBP1 in one or both studies (Mengel-From et al. 2009; Valenzuela et al. 2010). In the present study, we tested in Polish Europeans with single-observer hair color phenotype data the predictive power of 45 SNPs from 12 genes previously implicated with replicated evidence in human hair color variation.

Materials and methods

Subjects and hair color phenotyping

Samples were collected in years 2005–2009 from unrelated Europeans living in southern Poland. The study was approved by the Ethics Committee of the Jagiellonian University, number KBET/17/B/2005 and the Commission on Bioethics of the Regional Board of Medical Doctors in Krakow number 48 KBL/OIL/2008. All participants gave informed consent. Samples were collected from patients attending dermatological consultations at the Department of Dermatology of the Jagiellonian University Hospital. Hair color phenotypes were collected by a combination of self-assessment and professional single observer grading. A single dermatologist interviewed and assessed all the subjects. The questionnaire included basic information, such as gender and age as well as data concerning pigmentation phenotype. The study included 385 individuals (39.0% male) after genetic and phenotypic quality control. Categorical hair color was assessed by examination of the scalp in majority of cases. In a rare number of elderly volunteers and volunteers with dyed hair at the time of inspection we used ID photographs in combination with self-assessment for establishing natural hair color phenotypes. Hair color was classified into 7 categories: blond (16.4%), dark-blond (37.7%), brown (9.4%), auburn (3.1%), blond-red (11.2%), red (10.6%), and black (11.7%). For some analyses, we grouped blond and dark-blond into one blond group (54.1%) and auburn, blond-red, and red into one red group (24.9%) resulting into 4 categories. Notably, the frequency of red hair color in our study population is higher than expected in the general Polish population because of an enrichment of red hair colored individuals in the sampling process. This was done to demonstrate prediction accuracy of red hair, similar to other hair color, as red hair normally is relatively rare in the Polish population. Hence, the color distribution of the selected samples in our study does not represent that of the general Polish population (a point not relevant for the purpose of our study).

SNP ascertainment and genotyping

This study was based on 45 SNPs from 12 genes (Table 1), including SLC45A2, IRF4, EXOC2, TYRP1, TPCN2, TYR, KITLG, SLC24A4, OCA2, HERC2, MC1R, and ASIP that were associated with human hair color variation in several previous studies (Duffy et al. 2007; Graf et al. 2005; Han et al. 2008; Kanetsky et al. 2002, 2004; Valverde et al. 1995; Sulem et al. 2007, 2008). A subset of 25 SNPs were genotyped via mass spectrometry using Sequenom multiplexing (see Supplementary Table S1 for additional details about markers and methods). Multiplex assay design was performed with the software MassARRAY Assay Design version (Sequenom Inc., San Diego, USA). The settings were iPLEX and high multiplexing. For the rest of the settings the default was used. The results were two 7-plexes and one 11-plex (see Supplementary Table 1). A 2 ng of dried genomic DNA in 384-well plates (Applied Biosystems) was amplified in a reaction volume of 5 μl containing 1× PCR Buffer, 1.625 mM MgCl2, 500 μM dNTPs, 100 nM each PCR primer, 0.5 U PCR enzyme (Sequenom). The reaction was incubated in a GeneAmp PCR System 9700 (Applied Biosystems) at 94°C for 4 min followed by 45 cycles of 94°C for 20 s, 56°C for 30 s, 72°C for 1 min, followed by 3 min at 72°C. To remove the excess dNTPs, 2 μl SAP mix containing 1× SAP Buffer and 0.5 U shrimp alkaline phosphatase (Sequenom) was added to the reaction. This was incubated in a GeneAmp PCR System 9700 (Applied Biosystems) at 37°C for 40 min followed by 5 min at 85°C for deactivation of the enzyme. Then 2 μl of Extension mix is added containing a concentration of adjusted extend primers varying between 3.5 and 7 μM for each primer, 1× iPLEX buffer (Sequenom), iPLEX termination mix (Sequenom) and iPLEX enzyme (Sequenom). The extension reaction was incubated in a GeneAmp PCR System 9700 (Applied Biosystems) at 94°C for 30 s followed by 40 cycles of 94°C for 5 s, 5 cycles of 52°C for 5 s, and 80°C for 5 s, then 72°C for 3 min. After the extension reaction it is desalted by the addition of 6 mg Clean Resin (Sequenom) and 16 μl water, rotating the plate for 15 min and centrifuging. The extension product was spotted onto a G384 + 10 SpectroCHIP (Sequenom) with the MassARRAY Nanodispenser model rs1000 (Sequenom). The chip is then transferred into the MassARRAY Compact System (Sequenom) where the data are collected, using TyperAnalyzer version (Sequenom), SpectroACQUIRE version (Sequenom), GenoFLEX version (Sequenom), and MassArrayCALLER version (Sequenom). The data were checked manually after the data collection. Vaguely positioned dots produced by TyperAnalyzer 3.4 (Sequenom) and all the wells that 50% or more failed SNPs were excluded from analysis. Blanks (2%), controls (2%), and duplicates (9%) were checked for any inconsistencies and false positives. Furthermore, SNPs in the MC1R gene were analyzed by amplification and cycle sequencing of the complete MC1R exon using the procedure described in Branicki et al. (2007). Selected polymorphisms within genes ASIP (rs6058017), OCA2 (rs1800407, rs1800401, rs7495174, rs4778241, rs4778138), SLC45A2 (rs16891982, rs26722), and HERC2 (rs916977) were analyzed using minisequencing procedure and SNaPshot multiplex kit. Detailed information concerning protocols and primers were described elsewhere (Branicki et al. 2007; 2008a, b, 2009; Brudnik et al. 2009). Briefly, the PCR reaction was composed of 5 μl of Qiagen multiplex PCR kit (Qiagen, Hilden, Germany), 1 μl of primer premix, 2 μl of Q solution, and 2 μl (approximately 5 ng) of template DNA. The temperature profile was as follows: {94° for 15 min (94° for 30 s, 58–60°C for 90 s, and 72° for 90 s) × 32, 72°/10 min} 4°C/∞. The PCR products were purified with a mixture of ExoI and SAP enzymes (Fermentas, Vilnius, Lithuania) and subjected to multiplex minisequencing reactions with a SNaPshot multiplex kit (Applied Biosystems, Foster City, CA, USA). A 2 μl of SNaPshot kit was combined with 1 μl of extension primers premix, 1 μl of the purified PCR product, and nuclease-free water up to 10 μl. The products of extension reactions were purified with SAP enzyme and analyzed on ABI 3100 Avant genetic analyzer (Applied Biosystems, Foster City, CA, USA).

Table 1 Single SNP hair color association in a Polish sample

Statistic analysis

The frequency of hair color was compared between male and females using cross tabulation. Mean age was compared between color categories using one-way analysis of variance. MC1R polymorphisms are largely recessive when considered individually, but also interact with each other through the genetic mechanism known as “compound heterozygosity”. The high-penetrance variants, traditionally coded as “R”, include Y152OCH, N29insA, D84E (rs1805006), R142H (rs11547464), R151C (rs1805007), R160W (rs1805008), D294H (rs1805009), and low-penetrance variants, as “r”, include V60L (rs1805005), V92M (rs2228479), I155T (rs1110400), R163Q (rs885479). We followed the tradition and used the R and r variables by combining the information of all known causal variants in the MC1R gene in the association and the prediction analyses. The R variant was defined by the total number of the high-penetrance variants in the MC1R gene so that each individual has three possible genotype states, homozygote wildtype for all variants (wt/wt), heterozygote for one high-penetrant variant (wt/R), and homozygote for at least one or compound heterozygote for at least two high-penetrant variants (R/R). The r variant was defined similarly by considering the low-penetrant variants in the MC1R gene (wt/wt, wt/r, r/r). All ascertained SNPs including the R and r variants were tested for association with each hair color category (binary coded 0, 1) using logistic regression adjusted for gender and age. We derived the allelic Odds Ratios (ORs), where the SNP genotypes were coded using 0, 1, or 2 number of the minor alleles (Table 1). We also derived the genotypic ORs, where the homozygote minor alleles and heterozygote genotypes were compared with homozygote wildtypes (Supplementary Table S2).

We used a multinomial logistic regression model for the prediction analysis, and the modeling details follow closely the previous study of eye color (Liu et al. 2009). Consider hair color, y, to be four categories blond, brown, red, and black, which are determined by the genotype, x, of k SNPs, where x represents the number of minor alleles per k SNP. Let π1, π2, π3, and π4 denote the probability of blond, brown, red, and black, respectively. The multinomial logistic regression can be written as

$$ {\text{logit}}(\Pr (y = {\text{blond}}|x_{1} \ldots x_{k} )) = \ln \left( {{\frac{{\pi_{1} }}{{\pi_{4} }}}} \right) = \alpha_{1} + \sum {\beta (\pi_{1} )_{k} x_{k} } $$
$${\text{logit}}(\Pr (y = {\text{brown}}|x_{1} \ldots x_{k} )) = \ln \left( {{\frac{{\pi_{2} }}{{\pi_{4} }}}} \right) = \alpha_{2} + \sum {\beta (\pi_{2} )_{k} x_{k} } $$
$${\text{logit}}(\Pr ({{y}} = {\text{red}}|x_{1} \ldots x_{k} )) = \ln \left( {{\frac{{\pi_{3} }}{{\pi_{4} }}}} \right) = \alpha_{3} + \sum {\beta (\pi_{3} )_{k} x_{k} } $$

where α and β can be derived in the training set.

Hair color of each individual in the testing set can be probabilistically predicted based on his or her genotypes and the derived α and β,

$$ \pi_{1} = {\frac{{\exp (\alpha_{1} + \sum {\beta (\pi_{1} )_{k} } x_{k} )}}{{1 + \exp (\alpha_{1} + \sum {\beta (\pi_{1} )_{k} } x_{k} ) + \exp (\alpha_{2} + \sum {\beta (\pi_{2} )_{k} } x_{k} ) + \exp (\alpha_{3} + \sum {\beta (\pi_{3} )_{k} } x_{k} )}}} $$
$$ \pi_{2} = {\frac{{\exp (\alpha_{2} + \sum {\beta (\pi_{2} )_{k} } x_{k} )}}{{1 + \exp (\alpha_{1} + \sum {\beta (\pi_{1} )_{k} } x_{k} ) + \exp (\alpha_{2} + \sum {\beta (\pi_{2} )_{k} } x_{k} ) + \exp (\alpha_{3} + \sum {\beta (\pi_{3} )_{k} } x_{k} )}}} $$
$$ \pi_{3} = {\frac{{\exp (\alpha_{3} + \sum {\beta (\pi_{3} )_{k} } x_{k} )}}{{1 + \exp (\alpha_{1} + \sum {\beta (\pi_{1} )_{k} } x_{k} ) + \exp (\alpha_{2} + \sum {\beta (\pi_{2} )_{k} } x_{k} ) + \exp (\alpha_{3} + \sum {\beta (\pi_{3} )_{k} } x_{k} )}}} $$
$$ \pi_{4} = 1 - \pi_{1} - \pi_{2} - \pi_{3} . $$

Categorically, the color category with the max(π1, π2, π3, π4) was considered as the predicted color.

We evaluated the performance of the prediction model in the testing set using the area under the receiver operating characteristic (ROC) curves, or AUC (Janssens et al. 2004). AUC is the integral of ROC curves which ranges from 0.5 representing total lack of prediction to 1.0 representing perfect prediction. Cross-validations were conducted 1,000 replicates; in each replicate 80% individuals were used as the training set and the remaining samples were used as the testing set. The average accuracy estimates of all replicates were reported. Because of a relatively small sample size and rare MC1R polymorphisms with large effects, the cross-validation may give conservative estimates of the prediction accuracy. Thus, we report both the results with and without cross-validations, i.e. using the whole sample for training and prediction.

The selection of SNPs in the final model was based on the contribution of each SNP to the predictive accuracy using a step-wise analysis by iteratively including the next largest contributor to the model. The contribution of each SNP was measured by the gain of total AUC of the models with and without that SNP. The MC1R, R and r, and the OCA2 SNP, rs1800407, were always included in the prediction model due to their known biological function. The HERC2 SNP, rs12193832, was also always included because of its known extraordinary large effect on all human pigmentation traits.

Because the sample size included in the current study is relatively small, we estimated the effect of sample size on the accuracy of the prediction analysis using the data from a previously published study of eye color (Liu et al. 2009), in which a larger sample was available (N = 6,168). A sample of n individuals was randomly bootstrapped 1,000 times from the 6,168 participants of the Rotterdam Study, for whom the eye color information and genotypes of the six most important eye color SNPs were available. For each bootstrap, a binary logistic regression model was built in a randomly selected subsample (80% of n individuals) using the six most eye color predictive SNPs from Liu et al. (2009) as the predictors and the blue eye color (yes, no) as the binary outcome. The logistic model was then used to predict the blue color in the remaining sample (20% of n individuals), based on which an AUC value was derived. The mean, the 95% upper, and the 95% lower AUC values of the 1,000 bootstraps were reported. The bootstrap analysis was conducted for various n ranging from 100 to 800 (Supplementary Figure S1).

Further, we conducted a prediction analysis using the multinomial LASSO regression model implemented in the R library glmnet v1.1-4 (Friedman et al. 2010). The cross-validations of LASSO analysis were also conducted 1,000 replicates based on the 80–20% split.

Results and discussion

First, we tested the genotyped SNPs for hair color association in our study sample. Although variation in MC1R is usually attributed to red hair color (Branicki et al. 2007; Grimes et al. 2001; Valverde et al. 1995), the compound variant MC1R-R in our study was significantly associated with all but one (auburn) hair color category, albeit its association was strongest with red hair (allelic OR: 12.6; 95% CI: [7.0–22.7]; P = 2.5×10−17; Table 1). The lack of association of the MC1R-R variant with auburn hair color may be caused by the small sample size of the auburn category and/or problems with correct classification of this hair color as reported elsewhere (Mengel-From et al. 2009). Furthermore, MC1R-R showed a clear recessive effect and a compound-heterozygote effect in that the R/R genotype carriers were much more likely to have red hair (genotypic OR: 262.2; 95% CI: [65.2–1,055.3]; P = 4.5 × 10−15) than the wt/R carriers (genotypic OR: 5.6; 95% CI: [2.5–12.6]; P = 4.0 × 10−5; Supplementary Table S2). The stronger association of MC1R SNPs with red hair than with non-red hair colors as observed here was also found previously (Han et al. 2008; Sulem et al. 2007). The SNP rs12913832 in the HERC2 gene was significantly associated with all hair color categories, most significantly with brown (allelic OR for T vs. C: 3.5; 95% CI: [2.0–6.1]; P = 1.3 × 10−5) and black (allelic OR: 3.3; 95% CI: [2.0–5.6]; P = 4.3 × 10−6; Table 1) hair. The T allele of rs12913832 showed a dominant effect on darker hair color in that the heterozygote carriers had a further increased OR of black hair (genotypic OR: 8.6; 95% CI: [3.9–18.9]; P = 7.2 × 10−8; Supplementary Table S2). This SNP was associated with total hair melanin in a recent study (Valenzuela et al. 2010). A previous study found HERC2 SNPs significantly associated with non-red, but not with red, hair colors (Sulem et al. 2007), and another one reported HERC2 association only with dark hair color (Mengel-From et al. 2009). However, an additional study found HERC2 association with all hair colors, albeit reported stronger association with non-red hair colors than with red hair (Han et al., 2008), in agreement with our findings. Additional SNPs in MC1R and HERC2 were also significantly associated with several hair colors (Table 1). Except for MC1R and HERC2 genes, no significant evidence of a dominant or a recessive effects on hair color was found for any other gene studied (Supplementary Table S2). SNPs in SLC45A2 (rs28777 allelic OR for C vs. G: 7.05; 95% CI: [2.2–22.3]; P = 0.001), IRF4 (rs12203592 allelic OR for T vs. C: 7.05; 95% CI: [2.2–22.3]; P = 0.01), and EXOC2 (rs4959270 allelic OR for A vs. C: 0.56; 95% CI: [0.35–0.91]; P = 0.02) were most significantly associated with black hair color (Table 1), in line with the previous reports (Han et al. 2008; Mengel-From et al. 2009). Further, an association of SLC45A2 with total hair melanin was reported (Valenzuela et al. 2010). SNPs in the ASIP gene were associated with red (rs2378249, P = 0.02), dark blond (rs2378249, P = 0.02), and blond-red (rs1015362, P = 0.04; Table 1). Significant ASIP association with red hair was reported previously (Sulem et al. 2008), as well as with total hair melanin (Valenzuela et al. 2010). The OCA2 gene was most significantly associated with brown hair color (rs4778138, P = 0.03), confirming previous findings of OCA2 involvement in hair color variation (Han et al. 2008; Mengel-From et al. 2009; Valenzuela et al. 2010), although one previous GWAS did not find significant evidence (Sulem et al. 2007). The TYR gene was significantly associated with brown (rs1393350, P = 0.02) and the SLC24A4 gene with blond (rs4904868, P = 0.04) and dark blond (rs2402130, P = 0.03). These results are largely consistent with previous findings (Sulem et al. 2007; Han et al. 2008; Mengel-From et al. 2009). Overall, at least one SNP in 9 out of the 12 genes studied showed significant association with certain hair color categories in our sample (Table 1). For three genes (TYRP1, TPCN2, and KITLG) the SNPs tested did not reveal statistically significant hair color association (but see below for the predictive effects of two of these genes), although these genes have been implicated in human hair color variation elsewhere (Sulem et al. 2007, 2008; Valenzuela et al. 2010; Mengel-From et al. 2009). This discrepancy may be influenced by the relatively small sample size in our study and the putatively smaller effect size of these three genes relative to the other genes studied.

The main goal of this study, however, was to investigate the predictive value of hair color associated SNPs as established in previous, and (mostly) confirmed in the present study. DNA-based prediction accuracies for hair color categories were evaluated by means of the area under the ROC curves (AUC), ranging from 0.5 (random) to 1 (perfect) prediction. Our model revealed that 13 single or combined (MC1R-R and MC1R-r) genetic variants from all, but one (TPCN2) of the 12 genes investigated contribute independently to the AUC value (Table 2) for 4 (Fig. 1a) and 7 hair color categories (Fig. 1b). As may be expected from the association results, MC1R_R has the most predictive power on red hair (AUC 0.86–0.88), and its predictive effect on non-red hair colors was considerably lower (AUC 0.63–0.68, Fig. 1). The HERC2 SNP rs12913832, when added to MC1R_R in the model, contributed most of all other genetic predictors to the accuracy for predicting all color categories (ΔAUC 0.08 for blond, 0.12 for brown, 0.03 for red, and 0.13 for black, Fig. 1). Adding the remaining 11 independent genetic predictors provides accuracy increase and usually with decreasing effects while increasing the number of markers (Fig. 1). Notably, some SNPs without statistically significant hair color association in our study (P > 0.05) did provide independent information toward hair color prediction (such as rs1042602 in TYR, rs683 in TYRP1, and rs12821256 in KITLG). Only the non-synonymous SNPs from the TPCN2 gene tested did not contribute to the prediction model and did not show a statistically significant association with any hair color category. Rs35264875 and rs3829241 in TPCN2 had been discovered recently as significantly associated with blond versus brown hair color in Icelandic and replicated in Icelandic and Dutch people (Sulem et al. 2008). Predicting each color type separately using binary logistic regression yield slightly lower accuracy compared to the multinomial model (Supplementary Table S3).

Table 2 Parameters of the prediction model based on multinomial logistic regression in a Polish sample
Fig. 1
figure 1

Accuracy of hair color prediction using DNA variants in a Polish sample. AUC was plotted against the number of SNPs included in the multinomial logistic model for predicting 4 (a) and 7 (b) hair color categories. SNP annotation and prediction ranks are provided in Table 2

Overall, hair color prediction with 13 DNA components from 11 genes showed very good accuracy without cross-validation, such as AUC for blond = 0.81, brown = 0.82, red = 0.93, black = 0.87 in the 4 category model (Table 3; Fig. 1a), and AUC for blond = 0.78, d-blond = 0.73, brown = 0.82, auburn = 0.82, b-red = 0.92, red = 0.94, black = 0.88 (Table 3; Fig. 1b) when considering 7 categories. The mean accuracies derived from 1,000 cross-validations are somewhat lower for all hair color categories (least so for red), likely because of sample size effects as the rare alleles with large effects are not well captured in the training sets (Table 3).

Table 3 Hair color prediction accuracy using 13 genetic markers in a Polish sample

In general, the sensitivities for predicting brown, red, and black colors were considerably lower than the respective specificities, except for blond in the 4 categories and dark blond in the 7 categories (Table 3). The very low sensitivities for brown may reflect uncertainties in distinguishing between the dark-blond and brown colors on one side, and between the auburn, red and blond-red colors on the other side during phenotyping, as well as an additional sample size effect for auburn representing the smallest hair color group in our study (N = 12). However, the final model showed a good power to discriminate highly similar hair color categories, such as red and blond-red, as well as between blond and dark-blond (Table 3), underlining the value of the genetic markers involved in our hair color prediction model.

The ROC curves from the final model (Fig. 2) provide practical guides for the choices between desired false positive thresholds (1-specificity) and expected true positive rates (sensitivity) for predicting all color categories. For example, if the desired false positive threshold is 0.2 (in other words, if we use the predicted probability of P > 0.8 as the threshold for prediction, thus we know that we have at least 80% chance to be correct), then the expected true positive rates (or sensitivities) are 0.61 for blond (meaning that if a person has blond hair, our model provides a 61% chance to predict him/her as blond), 0.69 for brown, 0.78 for black, and 0.88 for red. Notably, incorrect predictions fall more frequently in the neighboring category than in a more distant category, so the predictive information can still provide useful information.

Fig. 2
figure 2

ROC curves for the final model including 13 DNA predictors for 4 (a) and 7 (b) hair color categories in a Polish sample

We noticed that the prediction accuracies for the blond and brown colors were somewhat lower than those for black and red colors. One reason for this difference may be in the environmental rather than genetic contribution to hair color variation. Hair color changes in some individuals during adolescence and such change is most often from blond to brown (Rees 2003). Since in our study we used adult individuals, those volunteers who had experienced such specific hair color change when being younger were grouped most likely in the brown hair category, although they may have blond associated genotypes. Consequently, these individuals would have lowered the prediction accuracy for brown relative to the brown-haired individuals who have not changed from blond. Our study design did not allow recording age-dependent hair color change, but this factor may be considered and tested in future studies. Although, volunteers in the red hair color group of our study was significantly younger at time of sampling than people in any other hair color category groups (P < 0.01), including age in the prediction modeling had only very little impact on the accuracy (AUC change <0.01). The age difference is most likely due to our targeted sampling procedure in which the red hair color category was over-sampled in young individuals (see material section for further details). In this study, gender was not significantly associated with any hair color and had no significant impact on hair color prediction accuracy.

Model-based hair color prediction analysis was also performed in a previous study using SNPs from MC1R, OCA2/HERC2, SLC24A4, TYR, KITLG and a marker from the region 6p25.3 (Sulem et al. 2007) that is close to the IRF4 and EXOC2 hair color candidate genes (Han et al. 2008). However, the prediction approach used by Sulem et al. (2007) is not directly comparable with ours; they applied a two-step approach and the steps did not only differ in the predicted hair colors, but also in the genetic markers used. First, they predicted red hair by only using the two most important red hair associated polymorphisms in MC1R (rs1805007 and rs1805008) and found that from those Icelandic individuals (used for replication) who were predicted with >50% probability to have red hair, about 70% indeed had red hair. To make these previous findings more comparable with ours, we performed red hair prediction in our data by using only rs1805007 and rs1805008 as used by Sulem et al. (2007) and received an AUC of 0.83. Notably, this value is considerably lower than the one we received for red hair using all markers analyzed in the present study (0.93 or 0.94). Hence, we can conclude that the additional SNPs we used in our full model, in particular the additional MC1R SNPs, improved red hair color prediction accuracy in our study. In a second step, Sulem et al. (2007) used associated SNPs from all 6 loci to predict blond, dark blond/light brown, and brown/black hair color categories. They found in their Icelandic replication set that among the individuals for whom brown hair color was predicted with >50% probability, about 60% indeed had brown/black hair. However, their prediction results were much less convincing for blond since, from the individuals predicted to be blond with only >40% probability (the highest threshold reported for blond), less than 50% were indeed blond, but about 50% were dark blond/brown and a few percentage were dark or red. Performing AUC prediction in our samples only with the SNPs used by Sulem et al. (2007) resulted in AUC values of 0.69 for blond, 0.71 for brown, and 0.75 for black. Again, AUCs for all non-red hair color categories as achieved in the present study considerably exceed those estimated from the markers used by Sulem et al. (2007), which demonstrates the extra value of the additional markers we included in our model for accurate prediction also of non-red hair colors. A recently published candidate gene study employed linear regression modeling using SNPs from hair color candidate genes and found that three SNPs in HERC2, SLC45A2 and SLC24A5 together explain 76% of total hair melanin in the study population (Valenzuela et al. 2010).

It has been shown that the least absolute shrinkage and selection operator (LASSO) approach (Tibshirani 1996) can be used to estimate marker effects of thousands of SNPs in linkage disequilibrium (LD) (Usai et al. 2009). Because some of the SNPs included in our study were in LD, we additionally performed the multinomial LASSO regression and compared the prediction results with those from our multinomial logistic regression model. The AUC estimates from LASSO using all samples (AUC blond = 0.88, brown = 0.89, red = 0.96, black = 0.96) are slightly higher than the ones from the multinomial logistic regression (Table 3). However, the average AUC values from the 1,000 cross-validations of the LASSO approach (AUC blond = 0.66, brown = 0.62, red = 0.86, and black = 0.76) are considerably lower than the ones obtained from all samples with the same approach, and are also lower than the results from the multinomial logistic regression (Table 3). This may indicate that there is a potential over-fitting problem in the LASSO method and our data.

Because the sample size used in this study is relatively small (N = 385), we estimated the effect of the total sample size on the accuracy of pigmentation prediction using a bootstrap analysis of the eye color data published previously (Liu et al. 2009), in which a AUC value of 0.91 was obtained for predicting blue eye color based on a large population sample (N = 6,168). As evident from Supplementary Figure S1, if the total sample size is smaller than 300 individuals, the AUC value for blue eye color tends to be under-estimated with large confidence intervals. For example, with only 100 samples the mean AUC value from 1,000 bootstrap analyses was considerably lower (AUC = 0.85, 95% CI: [0.6–1.0]; Figure S1) than the value of 0.91 as achieved with thousands of samples (Liu et al. 2009). However, this effect quickly diminishes when the sample size increases, and with about 350 samples the mean AUC value was close (AUC = 0.90, 95% CI: [0.80–0.97]; Figure S1) to the value obtained from thousands of samples, and only increased marginally until 800 samples. From this example of blue eye color we may extrapolate that the AUCs for hair color obtained from the 385 samples included in the present study (which are similar to the AUC obtained for blue eye color) are unlikely to change drastically when more individuals are added to the hair color model.

Many genetic studies on hair color (as well as eye and skin color) use phenotypic information provided by self-assessment, i.e. questionnaires filled out by the individual participants (e.g. Sulem et al. 2007, 2008; Han et al. 2008), which may be expected not to be completely reliable. To avoid hair color phenotype uncertainties potentially generated by such multiple-observer approach, we performed single-observer hair color grading in the present study. Some studies applied quantitative measures of hair color (Valenzuela et al. 2010; Mengel-From et al. 2009; Shekar et al. 2008). However, it is not clear how these methods as well as self-assessment and single-observer hair color categorization compare to each other and what the impact on DNA-based prediction accuracies is. On the one hand Vaughn et al. (2008) in a phenotypic study found some differences between single-observer hair color grading and spectrophotometric measurement, but the sample size was low (with about 100 individuals). On the other hand Shekar et al. (2008) in a genetic study could not confirm the utility of spectrophotometric estimation in relation to hair color rating. The single-observer grading approach we applied in the present study was found to be more accurate than using self-assessed hair color grading (Vaughn et al. 2008).

In conclusion, we demonstrated that human hair color categories can be accurately predicted from a relatively small number of DNA variants. The prediction accuracies achieved here for red and black hair color were in the similarly high precision range as previously obtained for blue and brown eye color, for which practical applications has already been implemented (Walsh et al. 2010a, b). Slightly lower prediction accuracies obtained here for blond and brown hair color, which were still higher than previously observed for non-blue/non-brown eye color (Liu et al. 2009), may be influenced by age-dependent hair color change during adolescence, which shall be investigated in more detail in future studies. Although our example of using eye color to monitor the effect of sample size to the AUC-based prediction accuracy of pigmentation traits indicate that the sample size used here for hair color prediction is large enough to obtain a reasonably accurate prediction model, our results may be further replicated in a larger study. Furthermore, it shall be tested in future studies if and to what extent SNPs from other genes with recently reported hair color association not used here add to the hair color prediction accuracy as presented. Overall, we evidently present hair color as the third externally visible characteristic that can be reliably predicted from DNA data after iris color (Liu et al. 2009; Walsh et al. 2010a, b; Valenzuela et al. 2010; Mengel-From et al. 2010), and human age, the latter demonstrated recently using quantification of T-cell DNA rearrangement (Zubakov et al. 2010). We therefore expect DNA-based hair color prediction, e.g. using the markers suggested here, to be used in future practical applications, such as in the forensic context. Furthermore, our study demonstrates that markers not statistically significantly associated with a trait in a study population can still independently contribute to the trait prediction in the same population, a notion that shall be considered in the design of future genetic prediction studies, including for diseases risks.