Background

The biosynthesis of bovine milk fat is a complicated process regulated by many genes belonging to several pathways [1]. Genetic analyses of bovine milk fatty acids have shown heritable variation. Short and medium chain fatty acids (C4:0 up to and including C16:0), which are synthesized de novo in the mammary gland, have moderate to high heritability [2, 3]. Long chain fatty acids (i.e. C16:0 and higher) are derived from blood lipids that originate mainly from the diet and endogenous lipids, nevertheless, they have low to moderate heritability [24]. Identification of genomic regions, and preferably individual genes, responsible for genetic variation in milk fat composition will enhance the understanding of biological pathways involved in fatty acid synthesis and may point towards opportunities for changing milk fat composition via selective breeding. Candidate gene studies have shown that polymorphisms in diacylglycerol O-acyltransferase 1 (DGAT1 K232A) [5] and stearoyl-CoA desaturase 1 (SCD1 A293V) [6] have important effects on milk fat composition [712]. Many genes are involved in the biosynthesis of milk fat, and analyzing these candidate genes one by one in a candidate gene approach is not an option; therefore, quantitative trait loci (QTL) studies try to identify regions associated with milk fat composition to identify candidate genes that are worth considering.

In order to identify genomic regions involved in the biosynthesis of milk fat, Schennink et al. [13] and Stoop et al. [14] performed genome-wide linkage analyses of milk fatty acids and detected genome-wise significant QTL and several suggestive QTL. Other linkage studies have been performed for single chromosomes [15] or the fat composition of adipose tissue in beef cattle [1518].

Recent developments in molecular genetics have made it possible to perform genome-wide association studies using thousands of single nucleotide polymorphism (SNP) markers to detect QTL. A genome-wide association study has higher power to detect QTL and provides more precise estimates of QTL locations compared to a linkage study. Some genome-wide associations for routinely evaluated traits in dairy cattle, such as milk production and fertility, have been published [1922]. To the best of our knowledge, no genome-wide association study of milk fatty acids has been reported.

The aim of this study was to perform a genome-wide association analysis using 50,000 SNP markers to identify QTL for individual fatty acids in bovine milk. Associations were studied for even-chain saturated fatty acids (C4:0-C18:0), even-chain monounsaturated fatty acids (C10:1-C18:1), and the polyunsaturated fatty acid C18:2cis9,trans11 (CLA).

Methods

Phenotypes

The fat composition of winter milk samples from 1,905 first-lactation Dutch Holstein Friesian cows was available for this study. The cows were housed on 398 commercial farms throughout the Netherlands. At least three cows were sampled per farm. The cows were between 63 and 282 days in milk. The period of negative energy balance in early lactation was avoided by choosing cows over 63 days in lactation. The population consisted of five large paternal half-sib families from proven sires (200, 199, 195, 176, 101 daughters per sire) and 50 small paternal half-sib families from test-sires (10-24 daughters per sire), as well as 190 cows descending from 45 other proven sires (1-30 daughters per sire). The pedigree of the cows was supplied by CRV (Cooperative cattle improvement organization, Arnhem, the Netherlands) and consisted of 26,300 animals.

Milk fat composition was measured by gas chromatography. Many fatty acids were measured, but only the major fatty acids are reported here: even-chain saturated fatty acids C4:0 to C18:0, even-chain (cis9) monounsaturated fatty acids C10:1 to C18:1, and the polyunsaturated fatty acid CLA. The fatty acids were expressed in terms of weight-proportion of total fat weight (w/w%). In total, these fatty acids made up 89% of the total fat. Table 1 presents the mean, phenotypic standard deviation, and intra-herd heritability for the fatty acids included in this study. More detailed information about the population and phenotypes can be found in Stoop et al. [3].

Table 1 Descriptive statistics of milk fatty acids

Genotypes

Blood samples were collected from the cows for DNA isolation. The cows were genotyped using a custom Infinium Array (Illumina, San Diego, CA, USA) designed by CRV. In total, 1,810 cows were successfully (call rate > 90%) genotyped. The cows were genotyped for 50,855 technically successful SNPs. The assumed map positions of the SNPs were based on the bovine genome assembly BTAU 4.0 [23]. From these 50,855 SNPs, a total of 776 SNPs could not be mapped to any of the Bos taurus (BTA) chromosomes and were assigned to BTA 0. In addition, 591 of the SNPs were located on the X chromosome. The SNPs on BTA 0 and the X chromosome were included in the study. The average distance between SNPs was 52,452 bp. Monomorphic SNPs (n = 245), SNPs with a genotyping rate < 80% (n = 383), and SNPs with a genotype frequency < 0.006 (1-9 observations for one of the genotype classes, SNPs with two genotype classes instead of three were included in the final marker set; n = 5,494) were discarded from the original SNP set of 50,855 SNPs, resulting in the final marker set of 44,733 SNPs used for the association analysis. Table 2 provides an overview of the number of SNPs available for the association study per chromosome.

Table 2 SNP information per Bos Taurus chromosome

Statistical analysis

For the association study, both phenotype and genotype information was available for 1,706 individuals. A two-step single SNP association analysis was performed. In the first step, the genome was screened for interesting regions using a general linear model. In the second step, the interesting regions were verified using an animal model.

In the first step, a genome-wide association study was performed with a general linear model using the R package 'SNPassoc' [24]. In this step, the analyzed phenotypes were pre-adjusted for systematic environmental effects, and the general linear model accounted for the SNP effect and the effect of sire. The general linear model used in the first step was:

(1)

where y* was the phenotype adjusted for the systematic environmental effects; sire was the fixed effect of sire; SNP was the fixed effect of SNP genotype; and e was the random residual. Sire effect was included in the SNPassoc model to account for paternal half-sib relations. Phenotypes were adjusted for days in milk, age at first calving, calving season, and herd. Adjusted phenotypes were obtained from the phenotypes of 1,905 cows using an animal model in ASReml [25]:

(2)

where y was the (unadjusted) phenotype; μ was the overall mean; dim was the covariate describing the effect of days in milk; afc was the covariate describing the effect of age at first calving; season was the fixed effect of the class of calving season (June-Aug 2004, Sept-Nov 2004, or Dec 2004-Jan 2005); scode was the fixed effect accounting for differences in genetic level between groups of proven bull daughters, young bull daughters, and other bull daughters; herd was the random effect of herd, distributed as N(0, I ), with identity matrix I and herd variance ; animal was the random additive genetic effect of the individual, distributed as N(0, A ), with the additive genetic relationship matrix A and the additive genetic variance ; and e was the random residual, distributed as N(0, I ), with identity matrix I and residual variance .

The genome-wide false discovery rate (FDR) was controlled according to the method described by Storey and Tibshirani [26], by separately calculating the genome-wide FDR based on the P-values from the general linear model for each trait using the R package 'qvalue'. Associations with a genome-wide FDR < 0.05 for the general linear model were considered significant.

The first step was performed to identify interesting regions, which were then further analyzed with an animal model to account for all relationships among individuals. Including a polygenic effect and accounting for genetic relationships would be more appropriate [27]. The model including a polygenic effect is computationally demanding when analyzing many traits, SNPs, and animals; therefore, in the second step we only analyzed the regions that contained multiple SNPs that were significant in the first step.

A region started at the first significant SNP on a chromosome that was followed by an additional significant SNP within 10 Mbp; the region was extended as long as another significant SNP occurred within 10 Mbp from the previous one and ended at the last significant SNP that was not followed by another significant SNP within the next 10 Mbp. Thus, a region contained at least two significant SNPs. More than one region could be present on the same chromosome when there were groups of significant SNPs located within 10 Mbp from each other but further than 10 Mbp from the other region(s) on the chromosome. The 10 Mbp distance between significant SNPs is rather large, but it was chosen to prevent having many small regions on one chromosome, each containing a small number of significant SNPs.

In the second step, all SNPs in regions with significant effects were analyzed using animal model (2) extended with an SNP effect in ASReml [25]. In this model the phenotypes were simultaneously adjusted for systematic environmental effects, for all genetic relationships among individuals, and for the SNP genotype. Associations with a -log10(P-value) ≥ 3 were considered significant.

The genetic variance explained by an SNP was calculated from the estimated genotype effects from animal model (2) extended with an SNP effect and the observed genotype frequencies. The result was expressed as a percentage of the total additive genetic variance. These percentages can be overestimated, especially when the effect of an SNP is small, this is due to the so called Beavis effect [28]. The percentage of the total additive genetic variance explained by the most significant SNP per trait per region is reported. The most significant SNP can differ per trait for a region associated with multiple traits.

Results

In the first step of the single SNP association study, all SNPs were analyzed using a general linear model. The analysis resulted in many significant (FDR < 0.05) associations between SNPs and the studied fatty acids. Figure 1 shows the genome-wide plots of -log10(P-values) for all of the studied fatty acids. All analyzed fatty acids showed significant associations with at least one genomic region.

Figure 1
figure 1

Genome-wide association plots for milk fatty acids. Genome-wide plots of -log10 (P-values) (y-axis) for association of loci with saturated fatty acids (A) and unsaturated fatty acids (B) analyzed with a general linear model. The genomic position is represented along the x-axis and chromosome numbers are given on the x-axis. The dashed horizontal lines represent the 0.05 false discovery rate thresholds. The y-axis are cut off at -log10(P-value) of 9.

From the results of the general linear model, a total of 64 significant regions were defined, affecting one or more (up to 10) traits. Significant associations with SNPs were detected on all chromosomes; however, on BTA 29 we identified only a single SNP associated with a trait instead of multiple SNPs in a region. Therefore, no regions were defined on BTA 29. These 64 regions were analyzed with an animal model and 54 remained significant at -log10(P-value) ≥ 3. Thus, most of the regions identified with the general linear model were confirmed with the animal model. The correlation between P-values in both analyses was 0.90. All results mentioned hereafter originate from the animal model and concern the 54 regions that remained significant at -log10(P-value) ≥ 3 with the animal model.

Significant associations were detected with several regions for most of the fatty acids, especially C14:0 (19 regions) and C16:1 (18 regions), whereas C18:0 and C12:0 were associated with only one region (Table 3). The X chromosome showed significant association with C14:0. Table 3 shows that several regions affect more than one trait. In particular, the medium chain unsaturated fatty acids were often associated with the same region.

Table 3 Regions significantly (-log10(P-value ≥ 3)) associated with fatty acids and the percentage of the total additive genetic variance explained

Sixteen of the unmapped SNPs showed significant associations with one or more of the studied fatty acids. Comparing the sequence of these significant unmapped SNPs against genome assembly UMD 3.0 [29] identified 14 of them as mapping to regions 14a, 19b, and 26, which were already identified as being associated with the traits. The other two unmapped SNPs (ULGR_BTA-38166 and ULGR_BTA-38169) were significantly associated with C10:0 and located at 24.2 Mbp on BTA 16 according to the UMD 3.0 genome assembly [29].

For each region, we estimated the variance explained by the most significant SNP for each trait. Within a region, this most significant SNP can be a different SNP per trait. The percentage of total additive genetic variance explained by the most significant SNP per region ranged from 0.9 for C10:0 on BTA 1 to 61.9 for C18:1 on BTA 14 (Table 3). These percentages can be overestimated, especially when the effect of an SNP is small, this is due to the so called Beavis effect [28]. The QTL with large effects, such as region 14a and 26, are less likely overestimated.

Three chromosomes showed highly significant regions, which were associated with multiple fatty acids: region 14a showed significant associations with 10 traits, region 19b with 8 traits, and region 26 with 7 traits. The region 14a harbors DGAT1 and region 26 harbors SCD1; these genes are known to affect fat composition [712]. These regions explained a large portion of the total additive genetic variation. The results also showed that additional regions had an effect on single fatty acids or a certain group of fatty acids.

In this section the SNP showing the strongest association of the whole genome for each trait is reported. Strongest associations for C14:0 (-log10(P-value) = 36.6), C16:0 (-log10(P-value) = 51.3), and C16:1 (-log10(P-value) = 54.1) were found with two SNPs (ULGR_SNP_AJ318490_1b and ULGR_SNP_AJ318490_1c), located at 0.4 Mbp on BTA 14, that are responsible for the DGAT1 K232A polymorphism. For C6:0 (-log10(P-value) = 8.4), C18:1 (-log10(P-value) = 48.6), and CLA (-log10(P-value) = 15.8) the DGAT1 K232A SNPs were the most significant SNPs immediately after an unmapped SNP (ULGN_SNP_AJ318490_2). After comparing this unmapped SNP sequence against the sequence of DGAT1, we identified it as an SNP located in DGAT1 that is in high LD (r2 = 0.99) with the DGAT1 K232A SNPs. The strongest associations for C10:1 (-log10(P-value) = 41.1), C12:1 (-log10(P-value) = 22.8), and C14:1 (-log10(P-value) = 80.0) were found with an unmapped SNP (ULGR_SNP_SCD). Also, C16:1 showed strong association (-log10(P-value) = 34.8) with this unmapped SNP on BTA 26, but C16:1 showed the strongest association with the DGAT1 K232A SNPs on BTA 14 as mentioned above. After comparing this unmapped SNP sequence against the sequence of SCD1, we identified it as the SNP underlying the SCD1 A293V polymorphism. Although SCD1 was not mapped on genome assembly BTAU 4.0, it was mapped on genome assembly UMD 3.0 at 21.1 Mbp on BTA 26 [29]. The strongest associations for C10:0 (-log10(P-value) = 11.0) and C12:0 (-log10(P-value) = 9.7) were found with an SNP (ULGR_MARC_10099_486) located at 52.5 Mbp on BTA 19. The strongest association for C4:0 (-log10(P-value) = 6.6) was found with an SNP (ULGR_BTA-45866) located at 54.6 Mbp on BTA 19, and the strongest association for C8:0 (-log10(P-value) = 6.2) was found with an SNP (ULGR_BTA-45847) located at 55.1 Mbp on BTA 19. Also, C14:0 showed strong association (-log10(P-value) = 22.0) with an SNP (ULGR_BTA-45758) on BTA 19 located at 52.1 Mbp, but C14:0 showed the strongest association with the DGAT1 K232A SNPs on BTA 14 as mentioned above. The strongest association for C18:0 (-log10(P-value) = 5.8) was found with an SNP (ULGR_BTA-28678) located at 64.2 Mbp on BTA 7.

Discussion

This study is the first to report a genome-wide association study of bovine milk fatty acids. A two-step single SNP association analysis was performed. In the first step, the genome was screened for interesting regions using a general linear model. In the second step, the interesting regions were verified using an animal model. The animal model accounted for all relationships between individuals in the pedigree, whereas the general linear model accounted only for the paternal half-sib structure of the phenotyped cows. Ignoring relationships between individuals can cause false positive associations [30]; therefore, the animal model was applied to verify the results from the general linear model.

The results showed that DGAT1 and SCD1 are highly associated with several of the fatty acids. The results also showed that, for some traits, other regions were more significantly associated, such as BTA 19 for some short and medium chain saturated fatty acids. In addition, many other regions were associated with fatty acids, but with smaller effects.

Schennink et al. [13] and Stoop et al. [14] reported a linkage study of milk fat composition using some of the same data used in the present study. In the linkage study, 1,341 SNPs were genotyped in 849 cows and their seven sires. Schennink et al. [13] and Stoop et al. [14] detected genome-wide significant QTL on BTA 6, 14, 19, and 26 for fatty acids included in the present study. The QTL on BTA 14, 19, and 26 were confirmed in our study. Our results suggest a QTL on BTA 6 for C6:0 and C8:0 (Figure 1A), but for C6:0 only one SNP exceeded the FDR threshold, and for C8:0 none of the SNPs exceeded the FDR threshold. Therefore, this region was not included in the animal model analysis in our second step. Given the quite stringent threshold, this region is still likely to harbor a QTL for C6:0 and C8:0.

The suggestive QTL found by Schennink et al. [13] and Stoop et al. [14] on BTA 2, 13, 14, 17, 19, and 26 was also confirmed in our study. Other suggestive QTL reported in the studies could not be confirmed, though in some cases a QTL was indicated, but this did not pass the threshold. In addition, several novel regions significantly associated with fatty acids were detected in our study but not reported by Schennink et al. [13] and Stoop et al. [14].

Some studies of the fat composition of adipose tissue in beef cattle were confirmed by our findings regarding milk fat composition. This result indicates that these regions are not unique for milk fat composition. Our findings confirmed the QTL detected by Morris et al. [18] in subcutaneous fat from beef cattle: C16:1 on BTA 1; C14:0, C16:1, and C18:1 on BTA 19; C14:0 and C14:1 on BTA 26; and C14:0 on BTA 27. Our findings confirmed the QTL detected by Abe et al. [17] in back fat, intermuscular fat, or intramuscular fat from beef cattle: C14:0 and C18:1 on BTA 19. Our findings confirmed the QTL detected by Alexander et al. [16] in the longissimus dorsi of beef cattle: CLA on BTA 7. Our findings confirmed the QTL detected by Uemoto et al. [31] in intramuscular adipose tissue from beef cattle: C18:1 on BTA 19.

Major regions

Three major regions were detected in this genome-wide association study (regions 14a, 19b, and 26), with significant effects on milk fat composition. These regions showed highly significant associations with several fatty acids. The regions on BTA 14 and 26 are regions that have been studied previously, and our results confirm the earlier findings [512]. The region on BTA 19 has not been studied extensively in relation to milk fat composition, but it harbors a number of candidate genes involved in fatty acid synthesis.

The region on BTA 14 showing an association with C6:0, C8:0, C14:0, C16:0, C10:1, C12:1, C14:1, C16:1, C18:1, and CLA is the region harboring DGAT1, which is known to influence milk production traits [5] and milk fat composition [9, 10, 12]. For all of these traits, except C10:1, C12:1, and C14:1, the two most significant SNPs on BTA 14 (located at 0.4 Mbp) were the two SNPs corresponding to the dinucleotide substitution of DGAT1 resulting in a K to A amino acid substitution (DGAT1 K232A). The K allele is associated with larger fractions of C6:0, C8:0, C16:0, and C16:1, and with smaller fractions of C14:0, C18:1, and CLA.

For C10:1 and C14:1, the most significant SNP (ULGR_BTC-068225) was located at 3.0 Mbp, and for C12:1 the most significant SNP (ULGR_BTC-067423) was located at 3.7 Mbp. However, for C10:1 and C12:1, the DGAT1 K232A SNPs were also significant. After correcting the phenotypes for the effect of the DGAT1 K232A polymorphism, the -log10(P-values) of these most significant SNPs decreased from 6.50 to 2.30 for C10:1, from 5.44 to 2.52 for C12:1, and from 4.63 to 2.40 for C14:1. The LD between these most significant SNPs and the DGAT1 K232A SNPs was moderate (r2 = 0.26 and 0.31). Although the associations of C10:1, C12:1, and C14:1 were just below the significance threshold after correcting for the DGAT1 K232A genotype, the findings still suggest that an additional QTL may be present for the medium chain unsaturated fatty acids on BTA 14.

The region on BTA 26 that showed an association with C10:0, C14:0, C16:0, C10:1, C12:1, C14:1, and C16:1 is the region harboring SCD1, which is known to be associated with the desaturation of fatty acids [7, 8, 10, 11]. For all of these traits, except C16:0, the most significant SNP on BTA 26 corresponded to the nucleotide substitution in SCD1 that causes an A to V amino acid substitution (SCD1 A293V) at 21.1 Mbp. The A allele was associated with larger fractions of C10:1, C12:1, and C14:1, and with smaller fractions of C10:0, C14:0, and C16:1. Thus, the A allele resulted in more C10:1 and C14:1 at the cost of C10:0 and C14:0. A similar effect was found for C12:0 and C12:1, though C12:0 was not significantly associated with the SNP. The opposite effect was found for C16:0 and C16:1; the A allele resulted in less C16:1 and more C16:0, though C16:0 was not significantly associated with the SNP. The SCD1 gene codes for the SCD enzyme, which desaturates saturated fatty acids to Δ9 unsaturated fatty acids in the mammary gland [32]. The association of this SNP with, and its effects on, the medium chain unsaturated fatty acids and their equivalent saturated fatty acids is, therefore, in agreement with the function of the enzyme. The associations we identified for the medium chain unsaturated fatty acids confirm previous studies on the effect of the SCD1 A293V polymorphism on milk fatty acids [8, 10, 11]. The associations we identified for the medium chain saturated fatty acids confirm only the results of Schennink et al. [10], who used the same population as the present study.

The SCD1 A293V SNP was not significant for C16:0 (-log10(P-value) = 1.09), which also confirms previous studies [7, 10, 11]. The most significant SNP for C16:0 on BTA 26 was located at 28.8 Mbp. This SNP was not in LD with the SCD1 A293V SNP (r2 = 0.08), and correcting for SCD1 A293V had little effect on the significance of the SNP associated with C16:0 (-log10(P-value) decreased from 5.52 to 4.46). Also, one allele of this SNP is associated with larger fractions of C10:1, C12:1, C14:1, and C16:0, and with smaller fractions of C10:0, C12:0, C14:0, and C16:1, suggesting that it has something to do with desaturation, but it was only significantly associated with C16:0. We did not identify obvious candidate genes in this region.

The region on BTA 19, at 32.7-61.8 Mbp, showed associations with C4:0, C8:0, C10:0, C12:0, C14:0, C16:1, C18:1, and CLA, i.e. mainly with the short and medium chain saturated fatty acids and with the long chain unsaturated fatty acids. No significant effects were found for C6:0, but a QTL was indicated below the threshold in the region on BTA 19 (Figure 1A). Morris et al. [15] performed a linkage analysis of milk fatty acids on BTA 19, detecting QTL for C8:0, C10:0, C12:0, C14:0, C18:1, and C18:2, which was confirmed by our findings and suggested fatty acid synthase (FASN, at 52.2 Mbp) as a candidate gene responsible for the observed effect. In addition to FASN, several other genes located within the region on BTA 19 are involved in the biosynthesis of milk fat, including sterol regulatory element binding transcription factor 1 (SREBF1, at 35.7 Mbp), ATP citrate lyase (ACLY, at 43.4 Mbp), signal transducer and activator of transcription 5A (STAT5A, at 43.7 Mbp), and growth hormone (GH, at 49.7 Mbp). These genes are all candidate genes because SNPs in the whole region showed an association with the traits, perhaps in LD with mutations in genes not captured by our marker set. The strongest association was found near FASN, but also near some other candidate genes as discussed below.

The region on BTA 19 was strongly associated with C14:0 and explained a large portion of the total additive genetic variation of C14:0. The SNP most significant for C14:0 (-log10(P-value) = 22.04) was also the most significant SNP for C18:1 (-log10(P-value) = 4.52) and located at 52.1 Mbp on BTA 19 (ULGR_BTA-45758 in LOC518878), which is 71,862 bp from FASN. This SNP also showed significant effects on C4:0 (-log10(P-value) = 3.57), C10:0 (-log10(P-value) = 7.01), C12:0 (-log10(P-value) = 5.96), and CLA (-log10(P-value) = 3.67), whereas the association with C8:0 (-log10(P-value) = 2.96) was just below the threshold. The effects of SNPs on C8:0, C10:0, C12:0, and C14:0 were in opposite direction of the effects on C4:0, C16:1, C18:1, and CLA. Fatty acid synthase (encoded by FASN) is a multi-enzyme system involved in de novo fatty acid synthesis. The SNP effects suggest that less C4:0 and more C8:0, C10:0, C12:0, and C14:0 are produced by fatty acid synthesis, or vice versa, which is in line with the function of FASN. Three SNPs in FASN were included in the marker set used in our study; however, two of them (FASN 16009 and FASN 763 ) were monomorphic for our population. The third SNP, FASN 17924 , showed association with C14:0 (-log10(P-value) = 3.05). Schennink et al. [33] also studied FASN 16024 , finding a significant association with C14:0 for the same population as in our study. Morris et al. [15] did not find significant associations between FASN 17924 and C14:0, but did find a significant association between C14:0 and FASN 15531 and FASN 15603 .

A haplotype of five FASN SNPs has been shown to be significantly associated with C6:0, C8:0, C10:0, C12:0, C14:0, and C18:1 in Friesian-sired cows [15], which are almost the same traits for which we found an association in the region, though not specifically with SNPs in FASN. This finding suggests a QTL in this region with an effect on short and medium chain fatty acids and long chain unsaturated fatty acids, but whether it is actually FASN that causes the association remains unclear. Our genome-wide scan showed that an SNP outside of FASN is the most significant SNP for C14:0 and was also associated with some of the other traits. This SNP showed very strong association with C14:0, whereas the association of C14:0 with the SNP in FASN was just barely significant. Candidate gene studies have shown that FASN is mainly associated with C14:0, but we found associations with additional traits, similar to the haplotype findings of Morris et al. [15]. Perhaps the causal mutation is located outside of FASN and is mainly the effect on C14:0 strong enough to be detected by SNPs in LD with this mutation.

This region on BTA 19 also harbors GH, and SNPs in this gene have been associated with milk production traits, including fat yield [34]. One SNP in our marker set was located in exon 5 of GH (GH-D30713-299) and showed significant association with C18:1 (-log10(P-value) = 3.81). The neighboring SNP showed even more associations: with C8:0 (-log10(P-value) = 3.48), C10:0 (-log10(P-value) = 5.13), C12:0 (-log10(P-value) = 3.86), and C14:0 (-log10(P-value) = 7.41).

No SNPs were located in the other candidate genes on BTA 19. The SNP showing the strongest association with C16:1 was located at 43.8 Mbp on BTA 19 (BFGL-NGS-111365), which is 13,873 bp from STAT5A. The SNPs neighboring ACLY and STAT5A showed significant associations with several of these traits, especially C14:0. All SNPs in the regions seem to be detecting the same effect, which is strongest with C14:0. Although previous studies suggested FASN as the candidate gene for association, which of the candidate genes causes the effect remains debatable. The causal mutation might even be in a gene not considered here.

Additional regions

In addition to the three major regions mentioned above, many additional regions showed associations with fatty acids (see Figure 1). We will not discuss all of the regions in this paper, but what follows are some select regions that showed association with three or more traits.

On BTA 6, region 6a was associated with C12:1, C14:1, and C16:1 (Table 3). The SNP effects were in the same direction for all three fatty acids. This region harbors the genes ATP binding cassette, subfamily G, member 2 (ABCG2, 37.4 Mbp) and peroxysome proliferator-activated receptor-gamma coactivator-1alpha (PPARGC1A, 44.8 Mbp). The SNPs most significantly associated with C12:1 (-log10(P-value) = 5.03, at 44.3 Mbp), C14:1 (-log10(P-value) = 4.39, at 41.2 Mbp), and C16:1 (-log10(P-value) = 4.63, at 40.2 Mbp) were located between these candidate genes. ABCG2 has been associated with milk fat yield and percentage [35, 36]. One SNP in our marker set was located in ABCG2 and showed no significant associations with the studied fatty acids. PPARGC1A has been associated with milk fat yield in German Holsteins [37], but this was not confirmed by Khatib et al. [38] in two larger American Holstein populations. Up to 10 SNPs in our marker set were located in PPARGC1A, but none of these were significantly associated with C12:1, C14:1, or C16:1. However, one of the 10 SNPs in PPARGC1A showed an association with C16:1, which was just below the significance threshold (-log10(P-value) = 2.86). In a candidate gene study, Schennink et al. [33] found a significant association in the same population as in the present study for two SNPs in PPARGC1A that were not included in our marker set: PPARGC1A 1790+514 with C16:1 and PPARGC1A 1892+19 with C14:1. Our genome-wide scan indicates that not PPARGC1A, but another region has the strongest effect on the unsaturated medium chain fatty acids.

On BTA 7, two regions showed an association with several fatty acids: region 7a associated with C10:1, C12:1, C14:1, and C18:1; and region 7b associated with C14:0, C12:1, C14:1, and C18:0 (Table 3).

Region 7a showed an association with almost all unsaturated fatty acids. In general, the SNP effects on C18:1 were in the opposite direction of the SNP effects on C10:1, C12:1, and C14:1. SNPs in this region were either associated with C10:1, C12:1, and C14:1, or with C18:1, suggesting one QTL for the medium chain unsaturated fatty acids and another for C18:1. The SNP effects on C10:0, C12:0, and C14:0 were in the same direction as the effects on C10:1, C12:1, and C14:1, but they were not significant. The SNP effects on C18:0 were in the same direction as the effects on C18:1, but they were not significant. Although only unsaturated fatty acids were significantly associated with this region, the SNP effects suggested that this QTL has nothing to do with desaturation because the SNP effects on unsaturated fatty acids were in the same direction as the effects on their saturated equivalents.

Region 7b showed associations with C14:0, C18:0, C12:1, and C14:1. The most significant SNP for each trait was different, but they were located in the same neighborhood, around 64.0-64.1 Mbp, and were in high LD with one another (r2 = 0.53-0.97). This finding indicates the likelihood of one QTL in this region with an effect on these four traits. The effects of these most significant SNPs were in the opposite direction for C18:0 compared to C14:0, C12:1, and C14:1. The SNP effects on C10:0 and C12:0 were in the same direction as the effects on C14:0, C12:1, and C14:1, but they were not significant. The SNP effects on C16:0, C18:1, and CLA were in the same direction as the effects on C18:0, but they were not significant. These SNP effects suggest that this QTL has something to do with a trade-off between long chain fatty acids and de novo synthesis of medium chain fatty acids. No candidate genes were located in this region.

On BTA 13, region 13 was associated with C6:0, C8:0, C10:0, C14:1, and C16:1 (Table 3). This region confirms the QTL for C6:0, C14:1, and C16:1 detected by Stoop et al. [14], who also found that C8:0 and C10:0 showed a QTL around the same position as C6:0, but these QTL were just below the threshold and, therefore, not reported. For C6:0, C8:0, and C10:0, the same SNP, located at 64.8 Mbp (ULGR_SNP_BES11_Contig346_1209), was the most significant SNP in the region, and the SNP effect was in the same direction for all three traits. These traits have a high genetic correlation [3], which supports one QTL affecting these three short chain fatty acids. This SNP is located in acyl-CoA synthetase short-chain family member 2 (ACSS2), which activates acetate for de novo fatty acid synthesis [1]; thus, ACSS2 is a good candidate gene for a QTL with an effect on C6:0, C8:0, and C10:0. Given that this particular significant SNP is located in an intron (between exon 16 and 17) of ACSS2, this SNP is not likely the causal mutation, but it can be in high LD with the causal mutation. In addition, the region also had an effect on C14:1 and C16:1. The SNP effects for C14:1 and C16:1 were in the same direction, but these effects were in opposite direction of SNP effects on C6:0-C10:0. The SNP in ACSS2 had no significant association with C14:1 or C16:1, which indicates an additional QTL with an effect on these unsaturated fatty acids.

On BTA 27, region 27 was associated with C14:0, C16:0, C10:1, C12:1, C14:1, and C16:1 (Table 3). The region on BTA 27 includes 1-acylglycerol-3-phosphate O-acyltransferase 6 (AGPAT6), which is the most abundant isoform of all AGPAT mRNA (~60%) in the mammary gland and involved in positioning fatty acids on the second position of the triglyceride backbone [1]. Given that 62.2% of C14:0 and 43.1% of C16:0 is located at the second position of the triglyceride backbone [39], this gene might be a candidate for this association. The SNP effects on C14:0 and C16:0 were in opposite directions, which suggests competition between C14:0 and C16:0 for the second position of the triglyceride backbone as an explanation for this association. In AGPAT6 knock-out mice, the composition of the triacylglycerol is altered and contains proportionally more polyunsaturated fatty acids at the expense of monounsaturated fatty acids [40], which might explain the effect of this region on the monounsaturated fatty acids. In general, the SNP effects on C10:1, C12:1, C14:1, and C16:1 were in the same direction.

Variance explained

Table 3 shows that regions 14a and 26 explained a large portion of the total additive genetic variation of C10:1, C12:1, C14:0, C14:1, C16:0, C16:1, C18:1, and CLA. This variation is caused by DGAT1 and SCD1. For other traits, however, no regions had such large effects. The sum of the total additive genetic variance explained by the most significant SNP per region for C4:0-C12:0 and C18:0 was less than 20% of the total additive genetic variance. This finding suggests that these traits are influenced by many genes with small effects. Regions explaining roughly 1% of the total additive genetic variation or more were detected by the studied design (e.g., C10:0 on BTA1), but additional regions with either undetectable small effects or that were not dense enough in our marker set to detect the effect is likely.

Even though short chain fatty acids are produced by de novo synthesis, less than 20% of the total additive genetic variance was explained by the analyzed regions. FASN plays an important role in de novo synthesis, but less than 6.5% of the total additive genetic variance of short chain fatty acids is explained by the region harboring FASN. De novo synthesis of fatty acids requires several compounds in addition to FASN enzymes to elongate fatty acids. One of the compounds is acetate, which is activated by ACSS2 for de novo synthesis, ACSS2 was indicated as a candidate gene associated with C6:0-C10:0 in this study. More genes like this that assist FASN in de novo synthesis and, therefore, explain a portion of the total additive genetic variation is likely. Also, genes involved in transport and triacylglyceride production might explain some of the total additive genetic variation.

Conclusions

A genome-wide association study of 50,000 SNPs was performed for milk fatty acids, resulting in many QTL. All over the genome regions were associated with milk fatty acids, some regions with just one fatty acid and other regions with multiple fatty acids. Milk fat composition is strongly influenced by polymorphisms in DGAT1 and SCD1, genes that have large effects on medium chain fatty acids and unsaturated fatty acids. Several regions showed associations with these milk fatty acids, but with smaller effects. The short chain fatty acids, C12:0 and C18:0, are not strongly affected by genes with large effects, but are influenced by regions with small effects. Some regions included candidate genes involved in milk fat synthesis pathways. On BTA 19, there were several genes involved in fat synthesis underlying the region associated with multiple fatty acids. Only in a few cases was an SNP associated with fatty acids actually located in a candidate gene. Regions identified in this study can be fine mapped to find causal mutations. The results also create opportunities for changing milk fat composition through breeding by selecting individuals based on their genetic merit for milk fat composition, which can be retrieved from the estimated SNP effects and the individual's genotype.