Background

The increase in milk production as a result of intense genetic selection in dairy cattle has been accompanied by a higher incidence of reproductive health issues and production-related diseases, including metritis, ketosis and fatty liver [1, 2]. This can be due to the metabolic changes and challenges in high-producing dairy cows early in lactation and the failure of the cows in maintaining their internal homeostatic and homeorhetic regulations [3]. It has been reported that approximately 50% of the metabolic and infectious diseases occur during the postpartum and transition period in dairy cattle [4, 5]. Reports of eleven studies carried out in several countries have shown that the median incidence of ketosis (per cow and year) was 3.2% for the Holstein breed [6,7,8,9], which can cause a high morbidity in the farm and substantial financial losses to the dairy farmers [10, 11]. Ketosis leads to hypoglycemia and hyperketonemia, and has the highest rate of prevalence after calving that extends to the lactation peak [12, 13]. This metabolic disease can be diagnosed through clinical signs in dairy cattle, including decrease in appetite, weight loss and decrease in milk production [14]. However, these symptoms are usually difficult to detect; therefore, the amount of ketone bodies (acetone, β-hydroxybutyric acid (BHB), and acetoacetate) present in blood, milk, urine and lymph is used as good indicators of subclinical cases in cows [11, 15]. Subclinical ketosis has been defined as when the cow shows a blood BHB level of > 1.2 mmol/L [16, 17] or a non-esterified fatty acid (NEFA) concentration of > 0.4 mM pre-partum or > 1.0 mM postpartum [18]. The gold standard diagnosis of clinical and subclinical ketosis is generally BHB in blood; however, routine blood sampling is not easy to implement, expensive for the farmers [6] and stressful on the cow side [19]. Several studies showed that there may be value in measuring BHB and acetone in milk, which are closely associated with ketosis in cattle [20,21,22]. Mid-infrared spectroscopy (MIR) has recently become a fast, cheap and high-throughput method for analyzing chemicals in livestock and food sectors [23]. The MIR technique uses the absorption of electromagnetic radiation by a sample to determine its chemical composition and has been used to predict milk quality traits since 1960 [24]. It has been shown that the genetic correlation between ketosis and MIR predicted milk BHB in dairy cattle is strong, about 0.75 [8, 25]. Additionally, the prediction accuracy of milk BHB concentrations using fourier transform mid-infrared (FTMIR) was reported to be 71% [26].

Many studies have investigated the metabolic pathways, candidate genes and gene-networks that affect metabolic diseases (including ketosis) or metabolic energy balance in dairy cows pre and post-partum [1, 15, 27, 28]. Additionally, several genes were found to be associated with ketosis biomarkers in dairy cows [15]. However, significant genome-wide regions and genes harboring putative causative mutations have not been reported in previous studies, using the predicted milk BHB concentrations. The objective of this study was 1) to identify genome-wide regions associated with MIR predicted BHB concentrations in milk, as an indicator of sub-clinical ketosis in North American Holstein dairy cattle, and 2) to perform enrichment analysis to identify biologically significant genes and pathways associated with MIR predicted BHB concentrations in milk, as an indicator of subclinical ketosis, and their possible associations with other metabolic correlated traits.

Results

Association analysis

Association analyses using a single SNP regression mixed linear model identified strong associations for the MIR predicted milk BHB from the first (SCK1) and later lactations (SCK2). The –log10 (P-value) of the tested SNPs from the GWAS analyses are shown as Manhattan plots for SCK1 (Fig. 1a) and SCK2 (Fig. 1b). The number of significant SNPs identified at a 5% genome-wise FDR varied from 71 for SCK1 to 369 SNPs for SCK2 (Additional file 1). The Quantile-Quantile (Q-Q) plots and genomic inflation statistics (lambda) of the two GWAS analyses are shown in Fig. 2a and Fig. 2b for SCK1 and SCK2, respectively. The plots showed no evidence of inflation of model-based statistics with a λmedian = 0.986 for SCK1 and λmedian = 0.964 for SCK2 traits.

Fig. 1
figure 1

Genome-wide association analysis using single SNP regression mixed linear model for subclinical ketosis. The log10 of the P-value for association with SNPs is plotted for A. subclinical ketosis in first lactation (SCK1) and B. subclinical ketosis in later lactations (SCK2). Chromosome number is shown on the horizontal axis. The red line is the threshold for significant SNPs at 1% FDR. The green line is the threshold for significant SNPs at 5% FDR

Fig. 2
figure 2

Quantile-quantile (Q-Q) of P-values of SNPs from single SNP regression mixed linear model for subclinical ketosis. The blue dots represent the –log10(P-values) of the observed test statistics for no association for a. subclinical ketosis in first lactation (SCK1) and b. subclinical ketosis in later lactations (SCK2). The red line denotes the expected pattern under the null hypothesis. Deviations between the red line and blue dots indicate how the observed probability of the test statistics deviate from the expected

Annotations and network-based analysis

Annotation of the genes to the corresponding significant SNPs from the GWAS resulted in identification of nine unique genes for SCK1 (Table 1) and 86 unique genes for SCK2 (Table 2). Gene-network analysis was performed using the IPA tool. Using two gene lists, the IPA analysis produced two putative networks for SCK1 and seven networks for the SCK2. The top two putative gene-networks for the SCK1 were associated with disease and functions involved in hematological diseases, organismal injuries and abnormalities, and infectious diseases. For the SCK2, several of the networks resulting from IPA were related to general molecular and cellular processes, including cell death and survival, molecular transport and cellular development. Most of these networks were also involved in more specific pathways such as developmental disorders, embryonic development, endocrine disorders, and metabolic diseases. The most informative networks and molecular interactions, in the context of this study, were the networks, interacting molecules and genes grouped around interferon gamma (IFNG), and tumor necrosis factor (TNF) with IPA score of 20 for SCK1 (Fig. 3), and leptin (LEP) gene with IPA score 12 for SCK2 (Fig. 4).

Table 1 List of candidate genes and corresponding significant identified SNPs within or very close (±100 kb) to the SNPs for subclinical ketosis in first lactation (SCK1)
Table 2 List of candidate genes and corresponding significant identified SNPs within or very close (±100 kb) to the SNPs for subclinical ketosis in second and later lactations (SCK2)
Fig. 3
figure 3

Network of interactions between GWAS candidate genes for subclinical ketosis in first lactation (SCK1) using ingenuity pathway analysis (IPA). The network shows direct and indirect interactions between the candidate genes enriched for significantly associated SNPs (FDR < 5%) with two gene hubs tumor necrosis factor (TNF) and interferon gamma (IFNG) as two important immune system regulators. The red label indicates the genes that were enriched for significantly associated SNPs. All the relationships were obtained using information contained in the IPA repository

Fig. 4
figure 4

Network of interactions between GWAS candidate genes for subclinical ketosis in later lactations (SCK2) using ingenuity pathway analysis (IPA). The network shows direct and indirect interactions between the candidate genes enriched for significantly associated SNPs (FDR < 5%) with two gene hubs tumor necrosis factor (TNF) and leptin (LEP) as two important genes involved in immune system and lipid metabolism regulators. The red label indicates the genes that were enriched for significantly associated SNPs. All the relationships were obtained using information contained in the IPA repository

The candidate SNP enrichment analysis resulted in enrichment of seven biologically significant GO terms for SCK1 and 91 significant GO terms for SCK2 (Additional file 2), potentially involved in pathways affecting subclinical ketosis. The semantic similarities among the enriched GO terms (FDR < 1%) using REVIGO is shown in Fig. 5.

Fig. 5
figure 5

Gene Ontology scatterplot constructed using REVIGO in R [37] for all the GO terms and genes within the identified significant regions for subclinical ketosis. Settings used for REVIGO program were as follow: database: Bos taurus, semantic similarity: 0.5 (small), semantic similarity measure: SimRel. Colors indicate the P-value of the enriched GO terms provided by the authors. The size of each bubble shows the GO terms with more significant P-values. The size of the bubbles indicates the frequency of the GO terms

Discussion

Subclinical ketosis in first lactation (SCK1)

Most of the identified significant SNPs (FDR < 1%) for SCK1 in this study were located on Bos taurus autosome (BTA) 6 (Fig. 1a, Additional file 1). These SNPs were located at around 88.4~94.8 Mb and this region has not been reported to be significantly associated with ketosis or subclinical ketosis in previous investigations. The long length of regions identified here might be associated with the imputation process. As a result of imputing, missing SNPs and more SNPs and haplotypes (in possible linkage-disequilibrium (LD) with the casual mutations) are added to the dataset. In addition, a Holstein cattle sample commonly has close relatives, who share long haplotypes. Having long haplotypes in LD with the causal mutation might be a reason for the long QTL region identified by the GWAS.

One single significant SNP (FDR < 5%) on BTA6 (BOVINEHD0600015345: rs109930261) was located at 56 Mb (Additional file 1). The position of this SNP is close to the position of several SNPs reported in a recent study on ketosis susceptibility in US Jersey cattle [29]. Parker et al. (2018) reported that the largest proportion of the variance for the ketosis was associated with a region on chromosome 6 at 56.1 Mb [29]. This region was shown to be associated with the genes and functions that affect basal glucose uptake and regulation of glucose transporter GLUT1 and lipid droplet formation [29, 30]. Association of this region on chromosome 6 with milk metabolites, glucose and glutamate, and left-sided displaced abomasum has been also reported in Danish [31] and German [32] Holstein cattle, respectively. Another highly significant (FDR < 1%) SNP identified on BTA6 for SCK1 (BovineHD0600024791: rs109793149) at around 90.5 Mb is located in the gene C-X-C motif chemokine ligand 8 (CXCL8). The protein encoded by this gene is a major mediator of the inflammatory response, secreted primary by neutrophils and acts at the site of infection (Gene ID: 3576). A study of the effect of the ketone body BHB on innate defense capability of the bovine mammary epithelial cells showed that expression of this gene (CXCL8) was significantly induced 30 h post infection with E. coli bacteria [28]. The result of this study is in agreement with previous researches reporting important interactions between metabolic disease and immune response in dairy cow [33, 34]. The association of several genes with key functions in immune response has also been reported in a previous study [29]. Several other candidate genes, including SLC4A4, GC, NPFFR2, ADAMTS3, RASSF6 and MTHFD2L, were identified in this study on BTA6, located within the region 88~90 Mb. The SLC4A4 gene (Solute carrier Family 4, Member 4) is a protein coding gene that encodes a sodium bicarbonate cotransporter (Na+/HCO3 cotransporter) involved in the regulation of bicarbonate secretion, absorption and intracellular pH (ID: 8671). The importance of this gene along with other ketogenic genes in various metabolic pathways as well as intracellular pH control in the rumen epithelium tissue was previously reported in neonatal Holstein calves [35]. Another region for milk BHB concentration harbors a functional candidate gene named GC gene. A fine mapping study of a QTL on bovine chromosome 6 using imputed full sequence data suggested that the GC gene has a key role in clinical mastitis and milk production in dairy cattle [36]. The GC gene encodes vitamin D-binding protein (DBP) which is the main carrier of vitamin D3. In both human and bovine, during infection, monocytes and macrophages express and produce and enzyme that converts vitamin D to its biologically active form (25(OH)D3), which has an important role in the immune system and host defense of the animal [36,37,38]. Moreover, concentrations of 25(OH)D3 are reported to be associated with pregnancy cycle and lactation stage [39]. The NPFFR2 gene (neuropeptide FF receptor 2) along with GC and ADAMTS3 (ADAM metallopeptidase with thrombospondin type 1 motif 3) genes were reported previously to be located within a QTL region affecting clinical mastitis in Norwegian Red dairy cattle [36]. Several regions and genes identified in this study associated with BHB in milk were previously reported to be associated with inflammatory response. In this regard, Drackley [40] and Contreras and Sordillo [41] have proposed that inflammation is the missing link in the pathology of metabolic disorders in transition cows. Intense lipid metabolism, during the transition period, results in significant release of non-esterified fatty acids (NEFA) into the blood stream which can directly activate NF-ĸB signaling pathways [41, 42]. The NF-ĸB can induce the expression of several cytokines, chemokines and their receptors and subsequently enhance the cellular inflammatory response [42]. The metabolic effects of the acute systematic inflammation in transition cow results in mobilization of more adipose tissue, liver glycogen breakdown [43] and accumulation of the triglyceride in the liver [42]. All of these events are associated with ketosis and fatty liver in dairy cattle [44]. Taking all these results together, the authors of this study speculate that this region on BTA6 and the associated candidate genes might regulate pathways and events related to energy metabolism, inflammatory response function and cellular crosstalk in subclinical ketosis.

The GWAS results also showed three significant SNPs (BOVINEHD1000005959: rs110237995; BOVINEHD1000005943: rs110139536 and BOVINEHD1000005952: rs135236614, FDR < 5%) located on BTA10 at around 17 Mb (Additional file 1). Parker et al. (2018) reported a region on chromosome 10 associated with ketosis susceptibility; however, the position of this region was different from what was found in the current study. Three significant SNPs (FDR < 5%, Additional file 1) on chromosome 20 spanning around 55.3–57.4 Mb were also identified significant in this study. Two of these SNPs in this region (BOVINEHD2000015856: rs133951346; BOVINEHD2000015861: rs135815835) were located within the intronic region of gene F-box and leucine rich repeat protein 7 (FBXL7). The FBXL7 gene includes one of the four subunits of the E3 ubiquitin protein ligases, which play a role in phosphorylation-dependent ubiquitination of proteins (Gene ID: 23194). The ubiquitin system plays a pivotal role in the regulation of immune response and controls the basic aspect of the immune system, including lymphocyte development, differentiation and activation [45, 46]. The FBXL7 gene has also been reported to be located within a region that explains the highest genetic variance associated with clinical mastitis in first lactating US Holstein dairy cattle [47]. Therefore, this gene might be associated with inflammation factors that affect both clinical mastitis and clinical ketosis during first lactation in dairy cattle.

The IPA network enrichment analysis of the genes identified for SCK1 and their direct and indirect interactions with other molecules and genes are shown in Fig. 3. This network shows the association of most of the identified genes for SCK1 with two hub genes, IFNG and TNF. The IFNG gene encodes a soluble cytokine that is a member of type II interferon class and has important immune-regulatory functions in both innate and adaptive immune systems (Gene ID: 3458). In addition, cows with metritis or clinical endometritis show higher serum concentrations of leptin, adiponectin, IL-1β, IL-6and TNF-α [48]. In transition dairy cows, once mobilized NEFA reach the liver, the TNF-α decreases liver glucose production [43] and increase the triglyceride accumulation which can promote metabolic disorders [42]. Kasimanickam et al. [48] also reported that an increase in anti- and pro-inflammatory cytokines early in lactation is associated with uterine postpartum pro-inflammatory conditions, increase in adipose tissue metabolism and loss of body condition score; these last two factors are key biological features that affect ketosis and subclinical ketosis in dairy cattle [28, 29, 48].

Candidate SNP enrichment analysis for SCK1 has resulted in enrichment of seven significant GO terms (FDR ≤ 1%), introducing 71 SNPs as candidate SNPs (Additional file 2). From this significant GO terms, several terms including GO:0051180 (vitamin transport), GO:0051183 (vitamin transmembrane transport activity) and GO:0042359 (vitamin D metabolic process) were related to vitamin transport within and between cells and chemical reactions and pathways involving vitamin D (Additional file 2). Vitamin D has been reported to have regulatory effects on adipose tissue, lipid storage and calcium metabolism in humans and it has been shown that vitamin D receptor (VDR) is expressed in adipose tissue [49]. It has also been reported that plasma 25-hydroxy-vitamin D3(25OHD3) is inversely related to metabolic events and obesity complications, including hyperglycemia, hypertension, insulin resistance and dyslipidemia in humans [50].

Subclinical ketosis in second and later lactations (SCK2)

A total number of 369 SNPs (FDR < 5%) were identified significant for SCK2 in this study. These SNPs were mainly located on BTA6, 14 and 20 (Fig. 1b). The list of all significant SNPs and chromosomes are given in the Additional file 1.

The presence of these significant regions on chromosomes (6, 14 and 20) for SCK2 was reported in a previous short paper by Nayeri et al. [51] where the largest peak was located on BTA14 (Fig. 1b). In the following, the position of significant SNPs, their functional associations and other identified significant regions will be further discussed. One highly significant SNP on BTA14 (ARS-BFGL-NGS-4939: rs109421300, FDR < 1%) was shown to be an intronic SNP within DGAT1 gene (Additional file 1). Association of the DGAT1 gene with milk fat content [52, 53] and several production traits has been reported in several previous studies [53,54,55]. However, there were no previous indications of association of the identified SNP in this region with ketosis in previous studies. The enzyme produced by the DGAT1 gene (acyl-CoA: diacylglycerol acyltransferase) catalyzes the last step of triacyglyceride (TAG) synthesis in the liver and can significantly influence the energy metabolism in dairy cows [56]. It has been shown that excessive accumulation of TAG in liver and oxidative stress as a result of altered lipid metabolism early in lactation can increase the risk of metabolic disease including fatty liver and ketosis in dairy cattle [56]. Estimates of genetic correlations between 305-d milk yield and ketosis were found to be mostly unfavorable, with estimates ranging from 0.00 to 0.77 [57,58,59]. Therefore, one would expected QTL/genes with pleiotropic effects on both ketosis and milk yield would exist. This seems to be observed in this study, but the mechanism of the pleiotropic effects might involve intricate indirect pathways. These findings may also support the IPA enrichment analysis results, in which the DGAT1 enzyme was shown to be interacting indirectly with LEP and TNF genes (Fig. 4). Leptin is a hormone produced in the adipose tissue and its function is mainly to maintain glucose homeostasis and regulate appetite and energy metabolism in dairy cattle [60]. The TNF gene encodes a pro-inflammatory cytokine that is mainly secreted by macrophages; this cytokine, as explained previously, is involved in regulation of different biological processes including cell differentiation, proliferation, apoptosis and lipid metabolism and has been implicated in different autoimmune and metabolic-associated diseases including insulin resistance (Gene ID: 7124). Investigations on human and various animal models suggest that fatty acids can affect the host inflammatory responses in several ways [61]. One way is through the altered lipid metabolism, which often results in the increased levels of NEFA in the blood. Fatty acids are key resources of energy metabolism and are oxidized to produce Acyl-CoA and ATP. When there is a nutrient surplus, adipose tissue stores a large quantity of fatty acids in the form of TAGs [3]. This stored fatty acids can be then mobilized by lipolysis when there is an energy deficit, including early in lactation in dairy cattle [62, 63], which leads to increased NEFA in the blood for uptake by various tissues [64]. However, other tissues and cells may not be able to handle a sustained nutrient overload which might result in adverse effects [63]. Increased levels of circulating NEFAs are associated with increased systemic inflammatory conditions in humans [63]. Excessive fat mobilization due to a severe energy deficit impairs the cow’s immune function and fertility and leads to metabolic stress [15]. Therefore, altered lipid metabolism, increased concentrations of non-esterified fatty acids, excessive adipose stores and oxidative stress during the onset of lactation in dairy cattle, may negatively affect the inflammatory response of the individual to several pro-inflammatory diseases including metritis, mastitis and ketosis. This might explain the underlying biological pathways and interactions of the genes DGAT1, LEP and TNF and their associations with subclinical ketosis.

Interestingly, in this study the GWAS identified two highly significant SNPs (BovineHD4100010548: rs110651119; BovineHD1400000400: rs109221516, FDR < 1%) at around 25 Mb on BTA14 located in the gene lymphocyte antigen 6 family member H (LY6H), and two significant SNPs (BovineHD1400000443: rs133518905; BovineHD1400000446: rs136122630 at FDR < 1% and < 5%, respectively) at around 27 Mb located in the gene lymphocyte antigen 6 family member K (LY6K) (Table 2). The putative role of these two genes (LY6H and LY6K) in immune response has been reported in several previous investigations [65,66,67] and LY6K gene was reported to be a significant candidate gene for mastitis susceptibility in US and Chinese Holstein cattle [47, 68]. Therefore, these regions, underlying genes and biological pathway might be also associated with subclinical ketosis in later lactations.

Two other highly significant peaks for SCK2 were located on BTA6 (spanning around 78.7–98.6 Mb) and BTA20 (55–63 Mb) (Fig. 1b, Additional file 1). Three SNPs on BTA20 (at around 53 and 57 Mb) and a region on BTA6 (spanning around 88~93 Mb) were found to overlap between SCK1 and SCK2. One significant intronic SNP (BovineHD4100005290: rs110899052, FDR < 5%) on BTA6 at 87 Mb in this study was located in the gene casein alpha s1 (CSN1S1). Associations of two significant SNPs within the promotor region of the CSN1S1 gene with milk yield traits in German Angeln dairy cattle were reported in a quantitative trait loci (QTL) mapping study performed by Sanders et al. (2006) [69]. The presence of the significant genomic regions near the casein gene cluster (at 87 Mb) affecting milk production and milk component traits has been reported in several breeds including North American, Dutch, and US Holstein cattle for milk fat, milk protein yield and protein deviations [53, 70], in German Holstein [71] and Brazilian Holstein cattle [72] for milk yield and fat yield; however, this region was not reported to be associated with ketosis in previous studies. This region and its related gene might be associated with the high fat:protein ratio and decrease in milk protein percentage in early lactation [73].

Four SNPs in the current study (BovineHD1100004827: rs110571223; BovineHD1100004831: rs43672311) within gene BIRC6 and (BovineHD1100004868: rs110243275; BovineHD1100004913: rs42050566) within the gene TTC27 at 15~15.3 Mb were significant (FDR < 5%) (Additional file 1) on BTA11. This result is in agreement with the result of a GWAS study on ketosis susceptibility in Jersey dairy cows, in which significant regions were reported on BTA11 between 14.9 to 16.9 [29]. This region was reported to be associated with genes and pathways that regulate several functions, including metabolism, mastitis, steroid hormone metabolism and innate immune system receptors [74, 75]. Several SNPs on BTA25, spanning around 25.1–27.9 Mb for SCK2 (Additional file 1) were also identified. This result is supported by a gene-based mapping analysis of metabolic adaptation and metabolic disease in dairy cattle, in which two genes (ENSBTAG00000031551: PRSS53; ENSBTAG00000000781: HIP1) at around 27 Mb on BTA25 were shown to be associated with blood BHB concentrations and metabolic disorders [1]. Moreover, one highly significant SNP identified in the current study (BovineHD2500007432: rs42072596, FDR < 1%) on BTA25 was previously reported to be associated with carnitine and glycerophosphocholine in milk of Danish Holstein cattle [76]. This SNP is an intron variant located within the gene CLN3 which encodes a protein involved in lysosomal function (Gene ID: 1201). Mutations in this gene are associated with a neurodegenerative disease in humans which leads to reduced levels of carnitine in plasma [77]. Carnitine has been an important component of mitochondria involved in fatty acid β-oxidation and lipid metabolism [78]. Additionally, Buitenhuis et al. [31] reported a high correlation between milk carnitine and choline (0.86), and choline and BHB in milk (0.97). Moreover, Klein et al. (2011) showed that high values of glycerophosphocholine in milk throughout the lactation period are connected with a low ketosis incidence in dairy cattle [79]. Therefore, this significant SNP and its assigned gene (CLN3) might have an important biological function associated with subclinical ketosis and other correlated metabolic traits in dairy cattle. Another highly significant SNP (BovineHD2500007435: rs42071217, FDR < 1%) at around 26 Mb on BTA25 within the candidate gene APOBR was also identified in the current study, which was previously reported to carry a causative variant highly associated with milk levels of glycerophosphocholine in German Holstein dairy cattle [15]. Other putative functionally important candidate genes for SCK2 on chromosome 25 found in this study were IL4R (BovineHD2500007120: rs42062457), and PSPH (BovineHD2500007818: rs109700166) genes.

The significant peak found on BTA20 for SCK2 was located at 55.3~63.9 Mb. Several significant SNPs (FDR < 1%) on this region (32 SNPs) were located within or very close to the gene trio Rho guanine nucleotide exchange factor (TRIO) (Table 2). This gene encodes a large protein that functions as a GDP to GTP exchange factor (Gene ID: 7204) and was reported to be a significant candidate gene for maternal effect on weaning weight and milk yield traits in Blonde d’Aquitaine beef cattle [80]. It has also been shown that the TRIO gene controls leukocyte trans-endothelial migration during inflammatory conditions and other diseases, such as rheumatoid arthritis [81]. Thus, this significant region on chromosome 20 and its associated gene might be also related to the inflammatory response in transition dairy cow and the occurrence of metabolic disorders. Other positional candidate genes that were identified in this study on BTA20 (55–63 Mb) were ANKH, MYO10 and DNAH5. This region (at 55–63 Mb) on BTA20, was not reported to be associated with ketosis or subclinical ketosis in previous studies, therefore this region seems to be a novel region for subclinical ketosis.

Of the 369 SNPs that were used as candidate SNPs (FDR < 5%) in SNP2GO for enrichment analysis, a total of 114 SNPs were overrepresented in 91 enriched GO terms (Additional file 2). Some of these terms showed a clear association with SCK2 including GO:0030073 (insulin secretion); GO:0001678 (cellular glucose homeostasis) and GO:0046883 (regulation of hormone secretion). In order to have a better insight of the key biological pathways and their association with subclinical ketosis, GO terms were summarized and visualized using REVIGO web-server (Fig. 5). Blue and green bubbles show the GO terms with more significant P-values. The size of the bubbles indicates the frequency of the GO terms. One important GO term in the scatterplot is a term associated with regulation of Rho protein signal transduction; many researches over the last decade have uncovered the molecular links between the RhoGTPase and the NFKB pathways, and its association with inflammation and immune response [82].

Overlapping regions among SCK1 and SCK2

The only overlapping peaks between the SCK1 and SCK2 traits were in the regions identified on chromosome 6 at 88–93 Mb and chromosome 20 at 57 Mb. These regions affect milk BHB concentrations (subclinical ketosis) both in first and later lactations in cows. In dairy cattle milk production increases in later parities [83]. Production of more milk might be associated with increased milk fat content due to adipose mobilization and a higher fat:protein ratio [84].

Conclusions

The genome-wide association analysis on MIR predicted BHB milk in this study identified several significant regions associated with subclinical ketosis in first (SCK1) and later (SCK2) lactations in North American Holstein dairy cattle. The significant regions were mostly located on BTA6, 14 and 20. Regions on BTA6 and BTA20 were found to overlap between SCK1 and SCK2. Several of the identified regions were reported in previous investigations as associated with ketosis biomarkers, milk production, clinical mastitis and immunity in dairy cattle. However, a novel region on BTA20 at 55–63 Mb was also identified, which was not previously reported. Functional analysis identified pathways and biological terms and processes involved in lipid metabolism and immune function. The results of this study can be used for further genetic analysis to identify genes and causal variants that affect ketosis and other correlated metabolic diseases.

Methods

Animals and data

The Canadian Dairy Network (CDN, Guelph, Ontario, Canada) provided pedigree, genotypes and estimated breeding values (EBV) of cows and proven bulls for MIR predicted milk BHB in first lactation (SCK1) and later lactations (SCK2). Milk BHB measurements have been collected as a part of routine phenotyping via MIR spectroscopy by Canadian DHI since January 2013 using an existing FOSS calibration model and MilkoScan FT+ (FOSS Analytical A/S, Hillerød, Denmark) [85]. The correlation between chemical methods for measuring milk BHB and predictions using MIR spectroscopy has been shown to be approximately 0.80 [27].

Genotypes of 24,657 Canadian Holstein bulls (9,856) and cows (14,801) with the BovineSNP50K (50 K, 41,097 SNPs) BeadChip (Illumina, San Diego, CA) were provided by CDN. The SNPs in the 50 K panel passed standard quality control measures used by CDN. These genotypes were then imputed to the high density (HD, 311,725 SNPs, after editing 777 K panel for redundant SNPs [86]) genotypes (with a reference population of 2,507 animals) using FImpute V2.2 software [87]. Quality control was performed on the imputed HD genotyping data using the snp1101 software [88] and SNPs with low call rate < 0.9, low minor allele frequency (MAF) < 0.01 and excess of heterozygosity > 0.15 were excluded. After quality control, a total of 298,210 SNPs remained which were used for further genome-wide association analysis.

Estimation of breeding values for milk BHB

Estimated breeding values were calculated by CDN with a multiple-trait (9 traits) linear animal model [89] including metabolic disease traits, milk recording indicators, and body condition score. Milk BHB, or sub-clinical ketosis, for first and later (up to fifth) lactations is recorded at the first test-day from 5 to 45 days in milk (DIM), expressed as milk BHB in mmol/L and log transformed. Data for lactations greater than 2 are treated as repeated observations. The model equation for milk BHB at first lactation included the fixed effects of herd, year-season, DIM, age-season-parity, and random herd-year and animal additive genetic effects. For later lactations, a random permanent environment effect was added to account for repeated measures. Specific model details and assumptions can be found in Jamrozik et al. (2016).

Calculating de-regressed EBV

The EBV for MIR predicted milk BHB were de-regressed as explained in Garrick et al. [90] to be used in the genome-wide association study (GWAS). The Garrick et al. (2009) regression procedure adjusts for ancestral information and eliminates shrinkage contained in the EBV; therefore, the de-regressed EBV only contains their own and the descendant’s information on each animal [90].

Genome wide association study

The association analysis was performed using a single SNP regression mixed linear model implemented in the snp1101 software [91]. The mixed linear model used in this study was as below: Yi = μ + βgi + ai + ei.

where Yi is pseudo phenotype of the ith bull (bull’s de-regressed EBV); μ is the overall mean; β is the linear regression coefficient (allele substitution effect) of the SNP; gi is the SNP genotype of the ith bull (coded as 0 for BB, 1 for AB and 2 for AA SNP genotypes); ai is the random additive polygenic effect of the ith bull and ei is the random residual effect.

Assumptions of the model include a~N(0,  Gσ a2 ) in which G is the genomic relationship matrix [91] and σ a2 is the polygenic additive genetic variance; e ~N(0, Rσ e2) in which σ e2 is the residual variance, where a and e are vectors of additive polygenic and residual effects, respectively. R is a diagonal matrix containing weights for the residual variance based on the reliabilities of the de-regressed bull EBV [91].

In order to account for multiple testing, genome-wise false discovery rate (FDR) of 5 and 1% were used to identify significant and highly significant associations, respectively. The inflation factor λ [91] and quantile-quantile (Q-Q) plots were also calculated to compare observed distributions of –log(P-value) to the expected distribution under the no association model for each trait.

Enrichment analysis

Significant SNPs at the level of 5% FDR were mapped to the corresponding genes in Ensemble database (Ensembl 90, Bos taurus UMD3.1, http://useast.ensembl.org/Bos_taurus/Info/Index) using the getBM() function in R-biomaRt package (https://www.bioconductor.org/) [92]. Genes within ±100 kb from the identified significant SNPs were used for further functional analysis.

Functional analysis was performed to identify the biological pathways and gene-networks associated with the list of positional candidate genes using the Ingenuity Pathway Analysis software (IPA; Ingenuity System Inc., USA). Additionally, a candidate SNP enrichment analysis was performed to identify genomic annotations and associated Gene Ontology terms (GO terms) for each trait using SNP2GO R package, as explained by Szkiba et al. (2014). For the GO term enrichment analysis, the Ensembl version 90 genomic annotation file (for Bos taurus UMD 3.1 assembly) in conjugation with Ensembl gene ID file (from Ensembl version 91 with gene ID and GO term accession) was used. The SNP2GO “extension” and the “runs” parameters were set to 50 nucleotides and 100,000, respectively and a false discovery rate (FDR) of 1% was used to correct for multiple testing [93]. The collection of enriched GO terms (FDR < 0.01) resulted from the SNP2GO analysis were then summarized and visualized using REVIGO web server (http://revigo.irb.hr) [94, 95]. This analysis assists the interpretation of the result by reducing the number of redundant enriched GO terms, using a simple clustering algorithm, and produces a scatterplot, relying on semantic similarities [95]. For this analysis, the GO terms enriched for both traits and the P-values from SNP2GO were uploaded to REVIGO. Settings used for REVIGO program were: database: Bos taurus, semantic similarity: 0.5 (small), semantic similarity measure: SimRel.