Genome-wide association analysis reveals QTL and candidate mutations involved in white spotting in cattle
- 354 Downloads
White spotting of the coat is a characteristic trait of various domestic species including cattle and other mammals. It is a hallmark of Holstein–Friesian cattle, and several previous studies have detected genetic loci with major effects for white spotting in animals with Holstein–Friesian ancestry. Here, our aim was to better understand the underlying genetic and molecular mechanisms of white spotting, by conducting the largest mapping study for this trait in cattle, to date.
Using imputed whole-genome sequence data, we conducted a genome-wide association analysis in 2973 mixed-breed cows and bulls. Highly significant quantitative trait loci (QTL) were found on chromosomes 6 and 22, highlighting the well-established coat color genes KIT and MITF as likely responsible for these effects. These results are in broad agreement with previous studies, although we also report a third significant QTL on chromosome 2 that appears to be novel. This signal maps immediately adjacent to the PAX3 gene, which encodes a known transcription factor that controls MITF expression and is the causal locus for white spotting in horses. More detailed examination of these loci revealed a candidate causal mutation in PAX3 (p.Thr424Met), and another candidate mutation (rs209784468) within a conserved element in intron 2 of MITF transcripts expressed in the skin. These analyses also revealed a mechanistic ambiguity at the chromosome 6 locus, where highly dispersed association signals suggested multiple or multiallelic QTL involving KIT and/or other genes in this region.
Our findings extend those of previous studies that reported KIT as a likely causal gene for white spotting, and report novel associations between candidate causal mutations in both the MITF and PAX3 genes. The sizes of the effects of these QTL are substantial, and could be used to select animals with darker, or conversely whiter, coats depending on the desired characteristics.
Coat patterning traits provide visual characteristics that allow differentiation between domesticated animal breeds and between strains within breeds. White spotting is one of these phenotypes, and is a feature of a variety of mammals including cattle, horses, dogs, cats and mice. White spotting is a complex quantitative trait, for which several genes with major effects have been described and are relevant across species, as well as many other loci with small effects that account for the remaining genetic variance . This oligogenic architecture derives from the multifaceted biology that contributes to white spotting of the coat, which is hypothesised to arise from abnormal melanocyte precursor migration and/or development. Mouse models have demonstrated that pigment cells originate from the neural crest cells via the SOX10 positive glial bipotent progenitor cells during embryogenesis, and migrate dorsally via the neural tube . These cells proceed to differentiate into melanoblasts by acquiring expression of the genes micropthalmia-associated transcription factor (MITF), proto-oncogene receptor tyrosine kinase (KIT) and dopachrome tautomerase (DCT), and migrate down the ventral axis of the body. When the cells reach their destination, they migrate into the epidermis where some melanoblasts localise to the hair follicle and differentiate into melanocytes. A subset of melanoblasts dedifferentiate, losing MITF and KIT gene expression, and colonise the hair follicle bulge where they act as melanocyte stem cells and replenish differentiated melanocytes during subsequent hair cycles . Disruption of any of the above processes is expected to result in parts of the body lacking mature melanocytes, and thus regions of abnormal pigmentation in the hair coat.
Quantitative trait loci (QTL) and mutations that cause white spotting have been described for a variety of species. Genetic studies in the horse revealed an inversion in the KIT gene associated with the Tobiano white-spotting , and a mutation in the PAX3 gene associated with a splashed white pattern [4, 5]. Several mutations in the KIT gene have also been associated with complete white  or roan coat phenotypes . Studies on white spotting in dogs have revealed associations with the MITF gene , and in mice more than 10 genes have been reported to be associated with white spotting traits, including the KIT and MITF genes . Comparatively few studies have investigated the genetics of white spotting in cattle. Liu et al.  found significant QTL on chromosomes 6, 18 and 22 using linkage analysis within Holstein–Friesian (HF) × Jersey (J) crossbred cows. It has been suggested that the QTL on chromosomes 6 and 22 might be underpinned by the KIT and MITF genes, respectively . Fontanesi et al.  compared the sequences of the MITF gene in white spotted Italian Holstein and Simmental cattle, and solid coloured Italian Brown and Reggiana cattle, and found a haplotype (carrying allele g.31831615T) that is associated with white spotting. This haplotype accounts for some, but not all of the variation observed in the white spotting phenotype . More recently, Hofstetter et al.  investigated atypical white spotting in Brown Swiss cattle. They identified two completely linked single nucleotide variants within the 5′ regulatory region of the MITF gene associated with white spotting, and although these variants largely account for the manifestation of white spotting, they do not account for the variability between individuals, which provides further evidence for a polygenic trait . Hayes et al.  detected the MITF and KIT genes in a genome-wide association study (GWAS) that investigated the proportion of black in black and white Holstein cows, and reported an additional signal on chromosome 8, which carries PAX5 i.e. another potential candidate gene for this trait . Together these studies converge on the involvement of KIT and MITF gene expression in white spotting in dairy cattle, however the causal variants that drive these effects have yet to be definitively identified and may be breed-specific.
Here, our aim was to investigate white spotting in New Zealand dairy cattle, by using whole-genome sequence genotype data to conduct the largest GWAS of white spotting to date. We report three genome-wide significant QTL for white spotting. Effects on chromosomes 6 and 22 extend on previous associations at these loci, and further implicate the KIT and MITF genes as responsible for these effects. For the first time, we also report a QTL on chromosome 2 that implicates the PAX3 gene in white spotting of dairy cattle and highlight an amino acid substitution that may underlie this effect.
White spotting data were derived from several cohorts of animals that included: 885 outbred dairy bulls (223 J, 327 HF, and 335 HF × J), 1389 outbred dairy cows (51 J, 265 HF, and 1073 HF × J), and 699 HF × J F2 cross cows from an experimental pedigree. Breed definitions, in these cases, define animals from a 4-generation pedigree that were 16/16 J or HF as purebreds, with 15/16 animals defined as crossbreeds. The F2 animals were ½ HF × ½ J, representing a study population that was previously described in several publications [10, 13, 14, 15]. Genotyping data were available for 2973 animals, with genotype and phenotype information derived as described in the following sections.
Measurements of white spotting in our study population
For animals in the F2 population, proportion of white spotting values that had been derived for a previous study  were used directly in the current study. Video footage was recorded on 1389 cows walking single file either into or out of the milking shed using a GoPro HERO4 camera, at a 4000 pixel horizontal resolution. Still images that provide a clear side-on view of each animal were captured from the video footage using VideoPad Video Editor (v5.3). Additional side-on images representing either the right or left profile of 885 bulls were made available by LIC and incorporated into the dataset. First, cows and bulls were scored for the presence or absence of white on their coat and, then, the proportion of white spotting was quantified. Quantification was carried out manually using the image processing software, GNU Image Manipulation Program (GIMP, v2.9.8), to generate an objective measurement of the proportion of white color. The freehand tool was used to trace each animal and remove the background. The pixel count from the remaining image, and the pixel count after manually subtracting the white regions on the coat, were used to calculate the proportion of white spotting on the coat.
Genotypes, whole-genome sequencing, and sequence imputation
For 760 of the outbred cows included in the study, tissue samples were obtained from ear tissue biopsies and DNA extraction and genotyping were performed by GeneSeek (Lincoln, NE, USA) using the GeneSeek GGP50 k SNP chip. For all the remaining individuals, we used available single nucleotide polymorphism (SNP) genotypes that were previously obtained by genotyping at Geneseek on a variety of platforms including the Geneseek GGPv1, GGPv2, GGPv3, GGP50 k, Illumina BovineSNP50 or BovineHD 777k SNP chips. A full list of the genotyping platforms, the number of SNPs per panel and the number of animals genotyped per panel are in Additional file 1: Table S1. Subsets of the reference and target populations that are described in this paper have been published by Lopdell et al. , and Littlejohn et al. [14, 17].
Whole-genome sequencing, read mapping, and variant calling were performed on a population of 116 HF, 95 J and 354 crossbred cattle as previously described [16, 17]. Briefly, DNA samples were sequenced based on 100-bp paired-end reads on the Illumina Hiseq platform, read mapping was performed using the UMD3.1 genome build and the BWA MEM 0.7.8 software  and resulted in mean and median mapped read depths of 15× and 8×, respectively. Variants were called using the GATK HaplotypeCaller (v3.2) software , which incorporates base quality score recalibration. Then, phasing of the variants was performed using Beagle 4 , and variants with phasing allelic R2 metrics lower than 0.95 were excluded for quality filtering purposes. These criteria yielded the ~ 19.5 M whole-genome sequence variants that constituted the reference set for imputation into the 2973 SNP-chip genotyped samples used for GWAS.
A step-wise imputation was performed using the Beagle 4 software . Note that these procedures were conducted to create an imputed sequence resource that is much larger than that used in the current study and represented ~ 150,000 animals, which have been accumulated over time and imputed in three different batches. The overall pipeline was as follows: first, the animals that were typed on the GGP panels were imputed to a reference panel representing the BovineSNP50 SNP-chip. Then, BovineSNP50 data (now consisting of both imputed and physically genotyped data) were used to impute all the animals to the BovineHD platform. We also conducted a parallel step to impute all the samples to the GGPv3 platform, to recover non-overlapping content between that platform and the BovineSNP50 SNP-chip. These steps yielded two datasets that comprised an ‘all animals imputed to BovineHD’ set, and an ‘all animals imputed to GGPv3′ set. These datasets were then merged, creating a scaffold for genome sequence imputation that contained all the animals imputed to all content from all SNP-chips. Following sequence imputation (by using Beagle 4), data were then filtered to remove variants with extreme Hardy–Weinberg statistics (HW exact test; removal of 47,660 variants based on p < 1 × 10−30), and near-monomorphic positions (minor allele frequency (MAF) < 0.0001; removal of 911,633 variants). These criteria yielded 18,641,995 variants, which were extracted for the subset of 2973 animals with color phenotypes from the larger ~ 150,000 animal dataset. In terms of genetic representativeness between the sequence reference animals and the 2973 GWAS animals, 1282 cattle were directly represented by both a sequenced sire and maternal grandsire in the reference dataset, of which 1122 were represented by a sire or maternal grandsire in this population.
Population structure adjustments, covariates, and GWAS
To address population stratification in the association models due to breed and relatedness, genomic relationship matrices (GRM) were generated using GCTA (v1.91.1 beta). These calculations involved the creation of 29 GRM, one for each bovine autosome, to enable a ‘leave one chromosome out’ GWAS approach where each GRM differs by the absence of a single autosome—thus avoiding double fitting when testing the effect of candidate variants. These GRM were calculated using a curated subset of variants from the Illumina BovineSNP50 platform, which comprised 34,963 variants that had been quality-filtered based on Mendelian concordance parameters, minor allele frequency (those with a MAF < 0.02 were removed), LD pruning (those with a R2 > 0.9 were removed), and deviation from Hardy–Weinberg equilibrium (those with a p < 0.15 were removed). The GCTA (v1.91.1 beta) software was used to conduct the mixed linear model-based association analysis (MLMA), which incorporates the GRM as outlined above, in addition to fixed effects for farm of origin and cohort (the latter relevant to the F2 animals with the first cohort born in spring 2000 and the second cohort born in spring 2001 [13, 14, 15, 17, 21]). Whole-genome sequence variants were filtered to remove the variants with a MAF lower than 0.005 prior to MLMA, this filter being different to that applied previously based on the frequencies present in the subpopulation of 2973 animals. To account for multiple hypothesis testing, a p value threshold of 5 × 10−8 was deemed to be significant for variant associations.
Visualization and interpretation of association results and candidate variants
To assess candidacy of the associated variants, RNA-seq data representing black and white bovine skin were sourced from a data submission accompanying the Koufariotis et al.  paper, and uploaded into the Integrative Genomics Viewer (IGV) for visualization . Sequence variants in intervals of interest were functionally annotated by using SNPEff (v4.3)  and the Ensembl UMD3.1 gene annotation set, with custom scripts to visualize these effects in Manhattan plots. To assess conservation metrics for candidate causal variants, genome evolutionary rate profiling (GERP) scores were obtained for the 32-way amniota vertebrae alignments (v92.31) from the Ensembl portal, with both element and site-wise scores reported in the text [25, 26]. For multiple protein alignments that were used to investigate the conservation of the PAX3 p.Thr424Met mutation, PAX3 homologues were retrieved for other species using BLAST, and aligned using the Geneious software .
Structural variant analysis
Sequence alignments representing the three major QTL regions were manually inspected in animals that displayed segregating tag-SNP genotypes to detect gene-disrupting structural mutations that might explain these QTL. However, given the ambiguity of the association signals at the chromosome 6 locus, a more formal analysis was conducted. Here, CNVnator (v0.3.3)  was used to predict the presence of structural variants based on sequence read depth, using the same whole-genome sequence dataset as described in the ‘Genotypes, whole-genome sequencing, and sequence imputation’ section. This analysis used a sliding window size of 1000-bp with a 500-bp overlap and focused on a 20-Mb region on chromosome 6 (60 to 80 Mb). Then, predicted structural variants were ranked based on their genotype correlation with the top two QTL tag variants at the chromosome 6 locus (Chr6 g.64210286A>G rs451683615 and Chr6 g.71722665C>T rs463810013). Sequence alignments of relevant variants were visually inspected in IGV  to assess evidence of a legitimate structural variant at each of these sites, weighted in the context of read mapping quality, gaps and/or other issues with the reference genome assembly, and whether the variant was polymorphic between samples. CNVnator-assigned genotypes were assessed in the same way for multimodality by visual inspection of copy number histograms.
Since white spotting might be influenced by genes that operate via different mechanisms, we conducted two separate GWAS that differed in the definition of the phenotype. First, white spotting was scored as the presence or absence of white on the coat and encoded as a binary phenotype (N = 2973 animals). Second, white spotting was coded as a quantitative variable, where animals were scored based on the overall proportion of white (N = 2232 animals). Solid color animals were not included in the latter population, for which proportion of white was also log-transformed prior to association analysis to render data in a form approximating a normal distribution. All phenotypic measures were based on manual analysis of photographs (see Methods section), that included images representing 699 Holstein–Friesian × Jersey (HF × J) F2 cows scored as part of a previous QTL study . The breed composition and sexes of the remaining animals are described in the Methods section, which include a mixture of HF, J, and HF × J cows and bulls.
Top 10 variants for each significant quantitative trait locus detected in the genome-wide association analysis for proportion of white spotting
Variant reference ID
Effect size (%)a
1.83 × 10−79
8.67 × 10−79
1.21 × 10−78
1.38 × 10−77
2.47 × 10−77
6.11 × 10−77
1.59 × 10−76
1.59 × 10−76
2.29 × 10−76
2.57 × 10−76
1.10 × 10−64
6.37 × 10−61
8.08 × 10−61
8.08 × 10−61
8.99 × 10−61
7.05 × 10−59
7.47 × 10−49
1.34 × 10−48
9.51 × 10−48
1.90 × 10−47
1.27 × 10−13
1.40 × 10−13
1.40 × 10−13
1.40 × 10−13
1.40 × 10−13
1.41 × 10−13
1.41 × 10−13
1.55 × 10−13
1.55 × 10−13
1.58 × 10−13
Analysis of the significant loci on each detected chromosome
A novel, polymorphic MITF pseudogene as a candidate for the white spotting QTL Notably, we observed a predicted missense mutation that affects MITF at Chr22 g.31769331C>T (rs110881545; Fig. 2a). Although it could be a candidate mutation for the QTL, this variant was not significant, and was called at a very low frequency in the genome sequence reference population used for imputation (MAF < 0.01). Manual inspection of sequence alignments from animals heterozygous for this variant showed read depth anomalies around annotated intron–exon boundaries, which led us to analyze in more detail these features. Although we used DNA-based sequence data, at these boundaries we observed an increased sequencing depth for the exons, which are reminiscent of RNA-sequence alignments (see Additional file 2: Figure S1). Analysis of soft-clipped reads from the exons showed that the mismatches corresponded to neighboring exon structures, which suggest that they were derived from a mis-mapped, processed MITF pseudogene. Non-exonic read pairs from the apparent MITF pseudogene mapped to a single location on chromosome 12 at 58.7-Mb, indicating that this locus is the likely site of integration of the pseudogene. Notably, this pseudogene was polymorphic across animals, which raised the possibility that the QTL might be caused by this structural variant. String match searching for spliced MITF sequence reads from the whole-genome sequence alignments, allowed us to genotype the 565 whole-genome-sequenced animals in our reference population for the pseudogene, giving a MAF of 0.026 for the integrated allele. This MAF value contrasted markedly with that of the top tag variant from GWAS (MAF = 0.304); and when pairwise linkage disequilibrium statistics were examined between the pseudogene ‘genotype’ and variants from the broader chromosome 22 and chromosome 12 regions, the most highly correlated markers were also non-significant in the GWAS (chromosome 12, maximum R2 = 0.72 for rs461882713 Chr12 g.6060748C>G, p = 0.72; chromosome 22, maximum R2 = 0.69 for rs384283283 Chr22 g.31734120C>T, p = 0.67). Although the processed MITF pseudogene was a good biological candidate for the modulation of coat color or pattern, these observations led us to assume that it was not responsible for the white spotting QTL in our study.
Top variants mapping within introns 1, 2, 3 and up to 100-kb upstream of the annotated MITF TSS, with conservation (GERP) score for 32 amniota vertebrates (Ensembl Bos taurus v92.31—UMD3.1)
Variant reference ID
1.83 × 10−79
8.67 × 10−79
6.11 × 10−77
1.59 × 10−76
1.59 × 10−76
2.29 × 10−76
6.69 × 10−76
7.97 × 10−76
8.11 × 10−73
1.28 × 10−72
2.82 × 10−72
4.06 × 10−72
5.25 × 10−72
6.63 × 10−72
6.63 × 10−72
8.75 × 10−71
1.54 × 10−70
2.59 × 10−70
The top variant at the chromosome 6 locus (Chr6 g.64210286A>G rs451683615, p = 1.1 × 10−64), maps to an intergenic region approximately 280-kb downstream of the KCTD8 gene, which represents quite a considerable distance from the KIT gene (~ 7.5-Mb). However, the third and fourth most strongly associated variants map within the fourth intron of KIT (Chr6 g.71873479T>C rs109512689, p = 8.08 × 10−61 and Chr6 g.71873455A>C rs385773341, p = 8.08 × 10−61).
Multiple segregating QTL at the KIT locus One explanation for the dispersed nature of the chromosome 6 QTL is that this locus comprises multiple, overlapping effects. Linkage disequilibrium (LD) analysis between the top variant (Chr6 g.64210286A>G rs451683615) and the next three most strongly associated variants (Chr6 g.71722665C>T rs463810013, Chr6 g.71873479T>C rs109512689 and Chr6 g.71873455A>C rs385773341) supports this hypothesis, with rs451683615 being in relatively low LD with the other variants (maximum R2 = 0.35). Furthermore, when rs451683615 was fitted as a fixed effect, the signal on chromosome 6 still exceeded the genome-wide significance threshold (p = 5 × 10−8), with the two strongly correlated KIT variants (R2 = 0.91) rs208251862 (Chr6 g.71692344C>A; p = 7.1 × 10−19) and rs463810013 (p = 1.5 × 10−18) now being the top variants (Fig. 4b). When the rs463810013 variant was fitted as a fixed effect to represent these effects, rs451683615 once again became the most significant variant (p = 3.054 × 10−25; Fig. 4c), and when both rs451683615 and rs463810013 were fitted as fixed effects, a small signal was still detected near KIT (smallest p = 3.31 × 10−11 for Chr6 g.72007252A>T rs109258078; Fig. 4d). These results suggest that the signal observed on chromosome 6 is likely the result of two or more QTL, and/or alternatively, the consequence of one or more structural variants that are not well tagged, and therefore cannot be easily accounted for by fitting biallelic SNPs in the association models.
Description and LD summary statistics for the candidate structural variants that are most highly correlated with tag SNPs rs451683615 (Chr6 g.64210286A>G) and rs463810013 (Chr6 g.71722665C>T)
Region spanning CNV
rs451683615 correlation (R2)
rs46381013 correlation (R2)
3.24 × 10−22
3.74 × 10−5
2.79 × 10−11
5.89 × 10−34
4.78 × 10−12
8.08 × 10−61
Breed, frequency, and effect size characteristics of the three major QTL
Q allele frequencies for the top variant at each QTL for 589 purebred Holstein–Friesians and 274 purebred Jerseys
Variant reference ID
HF Q frequency
J Q frequency
We present the first association analysis for white spotting in dairy cattle using imputed whole-genome sequence data. This study comprises the largest GWAS for this phenotype, to date, providing details of the genetic effects on white spotting in a population of approximately 3000 HF, J, and their crosses. We provide evidence for the implication of the KIT, MITF and PAX3 genes in white spotting of the coat, and further suggest regulatory and missense variants that potentially explain the effects of the MITF and PAX3 genes.
MITF is the only plausible candidate for the QTL on chromosome 22, which encodes a transcription factor that has been shown to impact pigmentation in cattle [12, 36], mice , horses [4, 5], dogs [38, 39], humans , and most recently ducks . It is also the only gene located near the top associated variant (Chr22 g.31769747A>G rs209784468), which is situated in intron 2 of MITF transcripts based on the analysis of skin RNA-seq data. The rs209784468 variant falls within a conserved genomic region, which, in conjunction with its status as the lead associated variant, makes rs209784468 a candidate causal variant for this QTL. Given that this SNP and other lead variants are non-coding, and given the lack of other candidate variants that map to protein-coding sequences, we hypothesize that the mechanism underlying the QTL on chromosome 22 is a modulation of the expression of MITF. However, how this effect manifests itself during development is unknown. MITF is required during embryonic development to stimulate the transition of neural crest cells into melanocyte precursors . If the MITF gene is not expressed within the small window during which transition is meant to take place, future expression of MITF cannot rescue melanocyte development . Impaired functionality or expression of the MITF gene during development will result in a reduced number of melanocytes, and manifest itself as white spotting on the coat . However, impaired functionality of the MITF gene within the mature hair follicle may also impair melanocyte survival and differentiation , thus decreasing the number of pigment producing melanocytes. In humans and mice, loss-of-function mutations in MITF cause severe symptoms including: coloboma, osteopetrosis, microphthalmia, albinism and deafness [43, 44]. Disruptive mutations in MITF also cause Tietz syndrome, which is characterized by depigmentation of the skin, hair, iris and severe hearing loss, and Waardenburg syndrome type 2A, which is characterized by patchy depigmentation of the skin and bi- or unilateral deafness in humans and mice [37, 40, 45]. Interestingly, mutations with a strong effect have also been observed in cattle [36, 46]. The white spotting MITF variant that we describe in this study represents a common allele (or nearly fixed in the case of HF animals), with no known effects on hearing or other undesirable phenotypes. The fact that this variant causes a less severe phenotype than the variants with a strong effect fits with an expression-based mechanism for this QTL, however it would still be interesting to compare the phenotype of the segregating individuals for the QTL identified in the current analysis with the phenotypes of individuals with more severe MITF syndromes (e.g. hearing loss). In terms of functional analyses, to unambiguously test the role of the rs209784468 SNP and other linked candidates, experiments analogous to those performed in an investigation of human hair color loci  could be performed. Cell-culture-based analyses or studies on model organisms could be conducted to perturb the candidate loci that have an effect on gene expression or pigment formation/melanocyte function.
The most significant variant for the QTL on chromosome 6 mapped to a region 7.5-Mb upstream of the KIT gene. Although seemingly too far away to cause this signal, the KIT gene is perhaps the single most famous and well-characterized pigmentation gene. There are 19 reported mutations within or near the equine KIT gene that cause either complete depigmentation, or white spotting [3, 5, 6, 48], and there are approximately 76 known KIT alleles in mice that cause dominant or semi-dominant white spotting [9, 49]. A KIT translocation mutation has also been identified as the causative mutation for ‘color sidedness’ and the white coat phenotype in Belgian Blue and White Galloway cattle [31, 50]. Although it is possible that the white spotting QTL in the current study is underpinned by contributions from other genes, these facts make KIT worthy of consideration as the likely causal agent underlying the chromosome 6 signals. Thus, the inconsistency of the mapping data may instead represent an amalgamation of multiple signals at the locus, and/or some other complexity that is not well represented by our imputed genome sequence dataset. Indeed, when the lead variants were consecutively fitted in our association analyses, no single variant could account for the signal. Given the precedent regarding the KIT structural mutations that influence coat phenotypes, we also conducted a sequence-based structural analysis of a broad, 20-Mbp region encompassing KIT and the top tag variants from the GWAS. This analysis did not reveal any obvious candidate but it is possible that these efforts were confounded by errors in the genome assembly around KIT, an observation highlighted through analyses by Whitacre et al. . If such confounders exist, breed-specific de novo assemblies and sequence information based on long-read sequencing technologies, such as single-molecule sequencing , may be helpful in future investigations of the locus. Additional future work could also attempt to fine map the effects in alternative breeds in which fewer QTL could be segregating, or alternatively conduct functional analyses as mentioned in the previous section for the associated variants that map to intron 4 of KIT itself.
To our knowledge, the observation of a likely role for PAX3 in white spotting of the coat in cattle is a novel finding. The top variant for this QTL on chromosome 2 mapped to a region 0.3-Mb upstream of the PAX3 gene, although bioinformatic prediction of variant effects revealed a highly associated p.Thr424Met missense mutation that could underlie this QTL. Previous studies have reported variants in PAX3 that cause pigmentation phenotypes in humans , mice  and horses [4, 5] and variation in ambilateral circumocular pigmentation in the Fleckvieh breed of cattle . The latter phenotype describes pigmentation of the area that encircles the animals’ eyes in breeds that otherwise have a white head, which raises the possibility that white spotting in HF is influenced by the same QTL that is involved in ambilateral circumocular pigmentation in Flekvieh cattle. In humans, as for some mutations in MITF, protein-changing variants in PAX3 have been shown to cause a similar form of Waardenburg syndrome, which is characterized by wide set eyes, hearing loss and regions of depigmentation in the iris, hair and skin [52, 55]. Studies in humans and mice have demonstrated that the PAX3 gene encodes a transcription factor that binds directly to the proximal M promoter of the MITF gene, thus facilitating expression of MITF [29, 55, 56, 57]. Studies of different spontaneous and radiation-induced PAX3 mutations in Splotch mice have suggested that PAX3 is required for proper development of neural crest cells, expansion of melanoblast populations, and prevention of melanoblast terminal differentiation . Thus, if the function of the PAX3 protein is altered, MITF transcription and activity may be impaired, which in turn may have an impact on regional melanocyte populations and melanogenesis, resulting in an increased proportion of white spotting on the animal’s coat. It is also interesting that Hayes et al.  observed an association between variants that are located next to the bovine PAX5 gene and the proportion of black on the coat. We did not observe a genome-wide significant signal on chromosome 8, although this association was demonstrated in Australian Holsteins ; the highlighted tag SNP in their study was not tested for association here because it was nearly fixed in our population (MAF < 0.001) and was excluded from the dataset. Unlike PAX3, the associations of PAX5 and MITF with melanogenesis are unclear, but the implication of these two structurally related transcription factors in independent GWAS should be analyzed in future work. Regarding the other major QTL identified, functional studies are required to confirm a causative effect of the PAX3 p.Thr424Met mutation, and confirm the molecular mechanism through which this QTL acts.
Our results add strength to previous analyses that suggest the involvement of the KIT and MITF genes in white spotting of the coat in cattle, and reveal a new QTL for this trait at the PAX3 locus. The genes identified highlight the commonality of the mechanisms that underlie the modulation of skin and hair pigmentation in animals, in which all three genes are key regulators of melanocyte development, migration, and differentiation. Moreover, these three genes have already been implicated in the modulation of pigment phenotypes in diverse species. In addition, the sizes of the effect of the major QTL being substantial, there is potential for selection of whiter or darker animals, depending on the farmers’ preferences.
The authors would like to acknowledge all farm owners and managers who took part in our study, and in particular Joyce Voogt for her valuable insights into farmer opinions. We would like to acknowledge Fiona Brown, Nicolas Lopez-Villalobos, Danny Donaghy and Martin Correa Luna from Massey University and Sandeep Seernam from AgResearch for their help during the data collection process. Lastly, we would like to acknowledge Stella Sim, Esther Donkersloot and Neil Macdonald from LIC for providing photographs used in this research.
SJ performed most of the bioinformatic and statistical analyses with help from ER, KT, TJL and TJJJ; SJ, AY, CC, GW, LM and ML were involved in data collection; KT conducted sequence imputation; SJ, ML, and SRD conceived the study and experiments; BLH, DG, ML, RGS, RJS and SRD were involved in the supervision of the project; SJ and ML wrote the manuscript. All authors read and approved the final manuscript.
This work was supported by the Ministry for Primary Industries (Wellington, New Zealand), which co-funded the work through the Primary Growth Partnership. External funders had no role in the design of the experiment, the collection, analysis or interpretation of the data, or writing the manuscript.
Ethics approval and consent to participate
All animal experiments were conducted in strict accordance with the rules and guidelines outlined in the New Zealand Animal Welfare Act 1999. Most data were generated as part of routine commercial activities that are outside the scope of those requiring formal committee assessment and ethical approval (as defined by the above guidelines). Approval was sought for coat scoring procedures that were not based on pre-existing photographs, and subsequently approved by the AgResearch Animal Ethics Committee, Hamilton, New Zealand (approval AEC 14090).
Consent for publication
AY, CC, GW, KT, LM, TJJJ, TJL, SRD, BH, RS, ML are employees of Livestock Improvement Corporation, a commercial provider of bovine germplasm. The remaining authors declare that they have no competing interests.
- 21.Berry SD, Lopez-Villalobos N, Beattie EM, Davis SR, Adams LF, Thomas NL, et al. Mapping a quantitative trait locus for the concentration of β-lactoglobulin in milk, and the effect of β-lactoglobulin genetic variants on the composition of milk from Holstein-Friesian x Jersey crossbred cows. N Z Vet J. 2010;58:1–5.PubMedCrossRefPubMedCentralGoogle Scholar
- 24.Cingolani P, Platts A, Wang LL, Coon M, Nguyen T, Wang L, et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: sNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin). 2012;6:80–92.PubMedPubMedCentralCrossRefGoogle Scholar
- 30.Whitacre L. Structural variation at the KIT locus is responsible for the piebald phenotype in Hereford and Simmental cattle. PhD thesis, University of Missouri. 2014.Google Scholar
- 40.Léger S, Balguerie X, Goldenberg A, Drouin-Garraud V, Cabot A, Amstutz-Montadert I, et al. Novel and recurrent non-truncating mutations of the MITF basic domain: genotypic and phenotypic variations in Waardenburg and Tietz syndromes. Eur J Hum Genet. 2012;20:584–7.PubMedPubMedCentralCrossRefGoogle Scholar
- 58.Jivanji S, Worth G, Lopdell TJ, Yeates A, Couldrey C, Reynolds E, et al. Genome-wide association analysis reveals QTL and candidate mutations involved in white spotting in cattle. Dryad Digital Repository. 2019. https://doi.org/10.5061/dryad.tqjq2bvtf.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.