Background

Dystocia, or calving difficulty [1], has an important negative economic impact in the dairy industry. Births that require assistance result in increased veterinary costs, reduced longevity of cows, and increased mortality of calves [2, 3]. Difficult calving reduce the length of a cow’s productive life by 10% [2], mainly due to an increase in culling risk. Meyer et al. [4] and Bicalho et al. [5] found that dystocia had a major influence on stillbirth incidence of Holsteins, with the odds of a stillbirth being significantly greater than the odds of a live calf when the calf needed assistance. Holstein and Holstein crosses that experience difficult calvings also have increased median calving-conception intervals [5], number of services to conception, and times to first service [6]. All of these factors directly affect the profitability of a herd, mostly by reducing the lifespan of cows in the herd and by increasing the number of replacements needed.

Several contributing risk factors have been associated with dystocia [1, 7]. A disproportion in calf birth weight and maternal pelvic size is a major contributor to dystocia in domestic dairy cattle [1]. Calf birth weight is influenced by gestation length, which in turn, is influenced by paternal and maternal breed [1]. Both gestational length and dystocia have been shown to have a genetic component, and pedigree records have enabled the estimation of direct and maternal effects [8, 9]. In addition, genetic correlations have been identified between these functional traits and different conformation traits. Eaglen et al. [8] reported for example significant genetic correlations of gestation length with rump width, and of maternal calving difficulty with chest width, and with body depth. Furthermore, even assuming a similar heritability across breeds, there is an intrinsic variability for calving difficulty within breed. Different authors reported dystocia was highest for cows calving to Holstein bulls while Jersey calves caused the least dystocia [10,11,12]. As a result a search for quantitative trait loci (QTL) associated with dystocia in dairy cattle produced results only in Holsteins [9]. Indeed, at least 27 QTL were reported across the genome in Holstein-Friesian cattle, but to the best of our knowledge, none in Brown Swiss or Jersey.

Genome-wide association studies (GWAS) have been a valuable tool to detect single nucleotide polymorphisms (SNP) associated with specific phenotypes. Key functional mutations may be revealed by GWAS, assuming a sufficiently large number of individuals and dense SNP panels were available. However, for complex traits where a large number of genes are expected to influence the phenotype each with a small absolute effect, GWAS studies showed to be limited [13]. Genomic information has been officially incorporated into genetic evaluations of US Holstein, Brown Swiss and Jersey cattle since 2009 [14], and Ayrshires since 2013 [15]. The success of genomic selection relies on the accuracy of the SNP effects calculated and the linkage disequilibrium (LD) between the SNP and the QTL for the trait [16]. Patterns of LD vary among breeds, especially when these breeds have undergone divergent selection for many generations and the effective population size is small [17]. Genomic selection becomes important especially for complex traits of low heritability, such as fertility and health traits, where the accuracy of conventional selection is lower than for production traits [18, 19]. In addition, using multi-breed reference populations [20, 21] has been proposed as a way to ensure robust predictions when the reference population is small in size.

An appealing approach to enhance genomic predictions is to improve prediction using insight of the underlying molecular mechanisms of complex traits [22]. Biological networks in principle should better capture the genetic architecture of complex traits, being less dependent on LD patterns characteristics of a population. In this regard, correlation networks have been widely used both with gene expression [23, 24] or genotype data to integrate information from different levels. Fortes et al. [25, 26] used genotype data in cattle to unravel biological networks related with fertility traits and puberty. Constructing relevant within and across-breed gene networks for these three breeds related to dystocia may provide insight into the biology of the trait and produce robust predictions for dystocia in different cattle breeds. The objective of this study is therefore to identify and characterize breed specific and across-breed gene networks influencing dystocia in Holstein, Brown Swiss and Jersey cattle and verify if their inclusion in genomic prediction increase accuracy of prediction.

Methods

A schematic representation of the overall set of analyses is provided in Additional file 1: Figure S1.

Animals and phenotypes

Pseudo-phenotypes used in this project were derived from national genetic evaluations of calving difficulty, gestation length and conformational traits for three common breeds of US dairy cattle: Holstein (HO), Brown Swiss (BS) and Jersey (JE). Six traits previously associated with dystocia were employed, namely: calving difficulty (direct sire -DCD- and maternal grand sire -MCD-), gestation length (GL), and conformation traits (stature -STAT-, strength -STR- and rump width -RW-). The genetic merit of selected bulls was reported as predicted transmitted ability (PTA) and was provided by the Animal Genomics and Improvement Laboratory (AGIL), USDA. A detailed description of the different traits in selected bulls can be found at https://www.uscdcb.com/what-we-do/genetic-evaluations/. Briefly, calving difficulty was recorded on a scale of 1 to 5 (the larger the score the more difficult the calving) and a sire-maternal grand sire threshold model was fitted to the data to obtain the PTA [27]. The PTA was reported as the percentage of births that are difficult (calving difficulty score ≥ 4) for first-calf heifers. The model implemented allowed the estimation of both, sire and a maternal grand sire effect. This resulted in the availability of two pseudo-phenotypes for calving difficulty, DCD and MCD, respectively for the direct and maternal component. Gestation length (GL) is the direct effect on the interval from conception to parturition and it was derived from breeding and calving data [28]. Conformational (type) traits considered were stature (STAT), strength (STR) and rump width (RW). For type traits, the PTA was calculated with a multi-trait animal model [29, 30]. The PTA was a function of each bull’s daughters’ deviations from the base mean and of his parent’s average.

De-regression

The PTAs for all traits were transformed into de-regressed PTA (dPTA), removing the parent average effect from contribution to the PTA [31] as well as the PTA’s variance shrinkage. The de-regression procedure produced weights (w) obtained using a c parameter of 0.5 [31]. The c parameter reflects the predictive ability of the genetic covariates, a large c results in overemphasis of less accurate information whereas a small c results in too little emphasis on less accurate results [31, 32]. When the parent average (PA) was not available, it was estimated as \( \widehat{PA} \) = (0.5*sire PTA) + (0.25*maternal grandsire -MGS- PTA) + (0.25*average year of birth -AYB- PTA) [18], \( \widehat{PA} \) = (0.5*sire PTA) + (0.5*AYB PTA), or \( \widehat{PA} \) = (0.25*MGS PTA) + (0.75*AYB PTA) depending on data availability. Only records with a reliability of the de-regressed PTA larger than 0.2 and only animals with pseudo-phenotypes in all traits (to avoid an animal-trait confounding effect) were kept for the analyses. After editing, there were 8780 HO bulls, 505 BS bulls, and 1818 JE bulls with pseudo-phenotypes available. Pseudo-phenotypic data summary statistics are shown in Table 1. Correlation among traits is shown in Additional file 2: Figure S2.

Table 1 Mean and standard deviation of de-regressed PTA and de-regressed reliabilities for the phenotypes used (\( \overline{x} \) ± sd)

Genotypes and genome-wide association analysis

Single nucleotide polymorphism (SNP) genotypes from different chips [14] were provided by the USDA’s AGIL. A total of 35,565, 34,665 and 41,580 quality-controlled SNP genotypes were available for BS, JE and HO bulls, respectively. Quality control included removing SNPs with minor allele frequency lower than 5% and call rate lower than 90%.

The genome-wide association was done on a single-trait basis within each of the three breeds. We used a Bayes-B implementation in GenSel software [33] such that,

$$ \mathrm{y}={\mathbf{1}}_{\mathbf{n}}\upmu +\sum \limits_{\mathrm{i}=1}^{45188}{\mathrm{Z}}_{\mathrm{i}}{\mathrm{g}}_{\mathrm{i}}+\mathbf{e} $$

where y is a vector of dPTAs, n is the number of bulls (8,780,505, or 1818), μ is the overall mean, g is a vector with random SNP effects with their variance conditional on π = 0.9, Z the incidence matrix of genotypes and e is a vector of residuals, considered to be normally distributed N(0, Rσe2) with Rii = 1/w i and σe2 ~ X− 2(v = 10, S). The conditional marker variance at each iteration was

$$ {\sigma}_g^2\pi \sim \left(\begin{array}{cc}0& p\left(\pi \right)\\ {}{vS}^2{X}_v^{-2}& p\left(1-\pi \right)\end{array}\right) $$

in which X− 2 is an inverted chi-squared distribution with v = 4 and S = [σ2 *(v – 2)]/v with σ2 = 5 for σg2 and σ2 = 3 for σe2.

A total of 40,000 samples were obtained from the Gibbs sampler, 5000 of which were discarded as burn-in. We did not expect the results to be very sensitive to the chain length, due to the relatively high reliability of the pseudo-phenotypes. However, a preliminary analysis was done with 50,000 iterations and 10,000 as burn-in for 3 traits per breed and results did not change (results not shown). A proxy for the proportion of genetic variance (PVg) accounted for by each SNP was calculated as:

$$ {\mathrm{PV}}_{\mathrm{g}}=\frac{\left[2\mathrm{p}\left(1-\mathrm{p}\right)\right]{\overline{\mathrm{a}}}^2}{\sum_1^{45188}\left[2\mathrm{p}\left(1-\mathrm{p}\right)\right]{\overline{\mathrm{a}}}^2} $$

where \( \overline{a} \) = posterior mean of the effect for that locus averaged across the post burn-in samples and p is the major allele frequency.

For each breed/trait a lenient significance threshold was arbitrarily set at the 75th percentile of the sample, that is, we retained the top 25% of SNP (corresponding to a count of 11,297 SNP) for further analyses.

Mapping of SNP to genes

Each SNP in the chip was mapped to its closest annotated gene using the Bos taurus UMD3.1 assembly [34] (http://www.ensembl.org/index.html). Criterion for mapping consisted in a 5′ or 3′ maximum distance to the nearest gene smaller or equal to 2500 bp. A total of 15,634 SNP were mapped to a total of 8599 genes. In the case of SNP mapped to more than one gene, all genes selected were retained. In the case of genes mapped by more than one SNP, a SNP filter was applied to preserve the “one SNP to one gene” rule (see below).

Association weight matrix and gene networks

In general, the construction of an association weight matrix (AWM) followed the procedures previously implemented by Fortes et al. [26]. For each of the three breeds, the construction of the AWM started with the selection of relevant SNP from those identified as significant in the association analyses. Relevant SNP selected were the ones which met at least one of the three following criteria: a) were significant for all traits evaluated, b) were significant for all of the functional traits evaluated (DCD, MCD, and GL), c) were significant for all the type traits evaluated. Each column in the AWM corresponded to a trait. Each row corresponded to a SNP. Each cell in the matrix corresponded to the z-score normalized effect size for the particular SNP. The effect size was defined as the posterior mean of the SNP effect averaged across the post burn-in chains. Then, each row in the AWM was indexed with its gene. When different relevant SNP were mapped to the same gene, the most significant relevant SNP was assigned to that gene (“one SNP to one gene” rule). The most significant relevant SNP of a particular gene was defined as the one with the largest PVg on average across all traits. When relevant SNP were not mapped to a gene but they met criterion a), they were kept in the AWM. Therefore, the AWMmxt had as many rows as [indexed genes + SNP meeting criterion a)] and as many columns as traits (6). Subsequently, row-wise partial correlations were computed on the AWM matrix using the PCIT algorithm [35] in R [36]. This algorithm allowed identification of significant partial correlations and produced an m symmetric adjacency matrix. Each cell in the adjacency matrix corresponded to a partial correlation between gene i and gene j. When partial correlations were not significant the value in the cell was set to 0. The significant correlations identified could be interpreted as significant gene-gene interactions (connections). These interactions put together resulted in a gene network.

Finally and since with the PCIT algorithm better sensitivity is achieved with larger number of traits [35], to avoid spurious connections we trimmed the network to a sub-network consisting only of connections with a partial correlation (rxy|z) equal or larger than 0.98. We computed 4 networks, 1 per breed (breed-specific networks) and 1 for the intersection of the 3 breeds, that is, genes shared among the 3 breed-specific networks (across-breed network). To ensure that genes in the across-breed network were not a product of chance we tested the intersection size using a variant of the Binomial distribution [37]. This statistic models the intersection sizes when sampling from different sets without replacement. For visualization of results we used R-igraph [38].

Gene ontology (GO) enrichment analyses

For each breed, we used the genes in the adjacency matrix to search for enriched functional annotation. Human homologs of these genes were retrieved, and used to evaluate enrichment of the GO biological process branch using GOstats [39]. To account for the false positives resulting from the multiple testing, we used a false discovery rate (FDR) [40] of ≤35% in Holstein and Jersey. We used FDR ≤ 60% in Brown Swiss due to the smaller power of the analysis resulting from a smaller n when compared to the other breeds. Human homologs were used due to the richer annotation as compared to cattle. The human homologs were contrasted to a background list containing all genes from the Homo sapiens GRCh38 assembly representing the 8599 Bos Taurus genes. To reduce dimensionality and redundancy in the lists of enriched GO terms we used REVIGO [41] with a C value of 0.5 (C is a user specified proportion of terms remaining in the list after the algorithm has finished the cluster representatives. A lower (more stringent) value of C will result in a shorter, but also a more semantically diverse list [41]) and the simRel measure of semantic similarity [42]. Semantic similarity measures exploit GO structure to compare concepts within an ontology based on common ancestors, to reduce functional redundancies. The terms are assigned to clusters based on a semantic similarity measure. At C = 0.53 there is 99% chance an above-background similarity exist between each pair of terms in a cluster [41]. The semantic similarity threshold at which the terms were removed from a list and assign to a cluster corresponds to the dispensability. Cluster representatives always have dispensability less than C [41]. Therefore, the smaller the dispensability the more representatives were the terms.

Mapping of genes to dystocia QTL

For each breed, we retrieved the genes for enriched GO terms and evaluated their location in the genome. Specifically, we searched for genes located within dystocia QTL for the UMD3.1 assembly available at the CattleQTLdb [9] (http://www.animalgenome.org).

Pathway analysis

For all 4 networks we evaluated biological pathway enrichment (FDR ≤ 0.35). We used the genes in each breed-specific network that were within dystocia QTL and all other genes interacting with them. In addition, we evaluated pathway enrichment by the 80 genes in the across-breed network. We used a combined measure that evaluated KEGG pathway overrepresentation as well as KEGG pathway perturbation to assess significance [43]. In the breed-specific networks, the measure of change (ΔEgi) used to compute the perturbation in gene i was the average allelic effect across traits. In the across-breed network, ΔEgi corresponded to the average allelic effect across breeds and traits. We evaluated enrichment in two species with rich annotation, human and mouse.

Gene networks in genomic predictions

Genomic prediction of bulls’ genetic potential was performed with the inclusion of network information in order to investigate any potential increase in accuracy deriving from including network information through a GBLUP model.

The models were tested using a 4-fold cross-validation scheme. For each breed, bulls were separated into four medoids (families) by k-means based on the pedigree-based relationship matrix [44], thus creating four training-validation sets within each breed. Number of bulls masked in each validation set was 122, 177, 88, 118 for BS, 317, 633, 496, 371 for JE and 2426, 2083, 1986, 2285 for HO.

The genome-wide association analysis and association weight matrix construction was repeated for each training set, and served to construct models for the prediction of PTA in the validation set. Different models were tested. A general genomic relationship matrix G [45] was used as reference method and built on the whole set of markers (BASE). Markers were then gathered if ranking among the top 25% for PVg for at least three of the six traits considered, a genomic relationship matrix was built on these markers (TOP25) as well as on the remaining markers (BOT75) [32]. Network information was first included by restricting the set of markers to those mapped a gene and retained after the construction of the AWM. The genomic relationship matrix was constructed by including only these markers (NET) or all the remaining markers (FREE). Matrices (BASE, TOP25, BOT75, NET) were constructed using the first method reported by VanRaden [45]. The second approach (CONN) included AWM correlations among markers (genes) in building the G matrix, in order to represent genes that were connected in the network and increase their contribution to the overall genomic variation. This matrix was built following:

$$ \boldsymbol{G}={\boldsymbol{ZDZ}}^{\hbox{'}} $$

being Z a matrix of centered and scaled genotypes [45] for the set of markers in the network and D the AWM, with ‘1’ values on diagonal and pairwise gene correlations on the off-diagonals.

In order to provide same scale as the other matrices, CONN was scaled using the allele frequency-free method described by VanRaden [45] using the BASE matrix as reference.

Models were built including different combinations of effects (matrices) as reported in Table 2. Briefly, each model included one (models 1, 2, 4 and 6) or two (models 3, 5 and 7) animal additive genetic effects, that differed in the genomic relationship matrix used. Variance components for each effect within model-trait-training set combination were estimated using Gibbs Sampler as implemented in R package BGLR (Pérez and de los Campos, 2014). Model predictor was calculated as the sum of the estimates of the intercept and the genetic effect(s) as included in each model. Prediction was assessed by masking dPTA in training set in a 4-fold cross-validation, accuracy of prediction was measured as the correlation between the predicted and masked real dPTA.

Table 2 Genomic relationship matrix (G) calculated and employed in the prediction of calving difficulty

Results

Genome-wide association analysis

Significant SNP showed an average PVg across traits larger or equal than 4.54 × 10− 6 in HO, 1.79 × 10− 5 in BS, and 1.18 × 10− 5 in JE. The distribution of PVg averaged across all traits for each SNP is shown in Additional file 2: Figure S2. In each breed, few SNP showed an average PVg across traits comparatively larger than the rest (Table 3). Manhattan plots for all traits in all breeds are reported in Additional file 3: Figure S3, Additional file 4: Figure S4, Additional file 5: Figure S5, Additional file 6: Figure S6, Additional file 7: Figure S7, Additional file 8: Figure S8, Additional file 9: Figure S9 and Additional file 10: Figure S10. In HO, the SNP that explained the greatest average PVg across traits was on BTA18 at 57.5 Mbp. This SNP is located within three previously described QTL strongly associated with calving difficulty [46,47,48,49]; and close to QTL for gestation length [50], birth index [47, 48], birth weight [51] and conformation [46]. Furthermore, when looking within a genome window of size 8 Mbp centered on 57Mbp (53–61 Mbp), our results were consistent with another QTL described to be affecting calving difficulty in European Holstein cattle [52] and more specifically to QTL related to dystocia [46, 53, 54]. Likewise, SNP in BTA5 explained a large amount of PVg in HO and JE. In both breeds the significant SNP were located around 106 Mbp. A QTL affecting calving difficulty and related to traits affecting body size was reported to be located ~ 70.8 Mbp on BTA5 for Holstein Friesian [55]. In addition, a genomic region was described at ~ 140 Mbp to be associated with calving interval in a HO-JE GWAS [56]. In BS the 4 most significant SNPs were located on BTA17 at 68.1and 68.2 Mbp, and in BTA25 at 1.4and 1.6 Mbp. Genomic regions were described in Holstein-Friesian associated with dystocia in BTA17 at 2.7 and 37 Mbp. A QTL associated with dystocia was also described in Holstein BTA25 at 6.3 Mbp [53]. No QTL associated with reproductive performance or conformation were reported in BS in BTA17 in the neighborhood of our significant SNPs. However, a strong signal has been found on BTA25 of BS, for QTL associated with production and conformation traits [57]. Specifically, a QTL associated with body depth spans the region between 1.1 and 2.9 Mbp where our most significant SNP were located. In JE, no related QTL were described in BTA15, BTA9, or BTA4. A QTL associated with gestation length in BTA5 spanning 26.4–26.6 Mbp was described in a Jersey-Limousin cross [58].

Table 3 Top 5 SNP explaining the larger proportion of genetic variance averaged across traits in Holstein (HO), Brown Swiss (BS), and Jersey (JE)

Gene networks

For complex traits like dystocia, a large number of genes are expected to influence the phenotype, each with a small absolute effect. Therefore, GWA studies would only partially unravel molecular features affecting the trait [13]. Biological networks may better capture the genetic architecture of complex traits. Breed-specific dystocia gene networks included 1272 genes in HO, 1454 in BS and 1455 in JE. Their number of connections (the degree of the vertices induced by the PCIT algorithm) ranged between 1 and 80 in HO, 1 and 17 in BS, 1 and 13 in JE. Biological networks usually follow a Power-law distribution [59]. Although it is in general hard to see that a distribution is a Power-law when networks are not big enough, we used a Kolmogorov-Smirnov test to validate this assumption, and the null hypothesis of the networks being drawn from a Power-law distribution was not rejected [38]. Of this larger number of genes total of 256 genes in HO, 275 in BS, and 253 in JE were associated with reported dystocia QTL (regardless of breed). Furthermore, 9 of these genes in HO, 2 in BS, and 3 in JE were among the most-connected ones in the network -top 5% of the sample for HO and JE, top 10% for BS- (Table 4). None of the top 5 SNP explaining the larger average PVg in the GWAS were among the most connected genes/SNP in the networks. The HO network had 2 well-defined clusters of highly connected genes, whereas clustering in the BS and JE networks seemed less defined. Clusters included many genes that were enriching significant GO terms as well as genes located within dystocia QTL (data not shown). However, we should point out that the different number of bulls from each breed used in these analyses may have impacted the structure of the networks, such that in the HO with a significant larger sample size, more interactions could be detected. In addition to constructing breed-specific networks, we explored an across-breed dystocia network consisting of genes shared among the breed-specific networks. There were 80 genes at the intersection of the three networks. The probability of an intersection of size 80, given the size of the breed-specific networks, was highly significant (p ≤ 7.9e-11) making it a very unlikely event to occur by chance [37]. For these 80 genes, the gene-gene interactions differed among breeds. That is, interactions among genes inside the intersection and interactions among genes within the intersection with other genes outside of it varied in the three breeds (Fig. 1). A total of 9 genes in the intersection were within reported dystocia QTL (Table 5). Additionally, a more stringent across-breed network would include only the sub-network of these genes that are connected in the same way in the 3 breeds. In our study no such sub-network was identified. However, when looking at 2 breeds at a time, we found that small sub-networks of between 7 and 17 vertices existed for all three possible pair-wise combinations (Fig. 2).

Table 4 Most-connected (top 5% of the sample for HO and JE, top 10% for BS) genes in the breed-specific networks, located within dystocia QTL
Fig. 1
figure 1

Circular plots of the cattle genome, showing length of chromosomes (grey rectangles), location of genes on the across-breed network (blue lines), and interactions among genes in Holstein (HO), Brown Swiss (BS), and JE (Jersey)

Table 5 Genes in the cross-breed network located within dystocia QTL
Fig. 2
figure 2

Genes and gene-gene interactions shared by all possible breed pair combinations

GO and pathway enrichment analyses

We used the genes within the breed-specific and across-breed networks to search for enriched GO terms (FDR ≤ 0.35 HO and JE; 0.6 BS). In this analyses we used a relatively high FDR to account for sample size disparities among breeds and still allow the interpretation of the data in the context of biological processes. A total of 162 GO biological process branch terms were enriched in HO, 22 in BS, and 33 in JE. Enriched GO terms comprised 856 in the HO network, 133 in the BS network, and 490 in the JE network. A total of 185 of these genes were located within 24 reported dystocia QTL in HO, 31 were within 14 reported dystocia QTL in BS, and 86 were within 18 reported dystocia QTL in JE. In the HO breed 2 terms stood out as significant: regulation of ARF GTPase activity (FDR = 0.0098) and regulation of ARF protein signal transduction (FDR = 0.0584). Similarly, in the BS regulation of ARF GTPase activity was among the most significant for the BS dystocia network. ARF GTPases are within the Ras superfamily of GTP-binding proteins. ARF proteins localize to membranes in the cells and regulate the actin cytoskeleton and vesicle trafficking [60,61,62]. The actin cytoskeleton plays an important role in the formation of the immunological synapsis [63]. The immunological synapsis is key in the recognition of antigens and the articulation of an effective immune response, ARF GTPase regulators have been previously reported to be involved in vesicular secretion regulation of T cells [63]. In the JE, no terms were significant. The genes in the JE network were mainly enriched in terms related to muscle and nervous system development, such as striated muscle cell development, regulation of calcium ion-dependent exocytosis, neuron projection morphogenesis, and microtubule bundle formation. The source of this signal was most likely the skeletal muscle. Differences in the prenatal fiber development among breeds of dairy and beef cattle have been described [64]. In addition, differences in fibrogenesis were also described in Wagyu versus Angus cattle [65]. Most research focusing on skeletal muscle has mostly involved beef cattle, but differences in skeletal muscle development of Jerseys compared to other dairy breeds may be possible. Furthermore, this signal may be targeting differences between HO-BS and JE during the active stage of labor when voluntary abdominal straining starts. Genes in the across-breed network enriched 370 GO terms (FDR ≤ 0.35). The most significantly enriched terms were related to neuron morphogenesis and differentiation (FDR ≤ 0.0539). We then reduced these lists of GO terms to 46 non-redundant terms in HO, 7 in BS, 10 in JE and 93 in the across-breed. The most significant non-redundant terms (cluster representatives) in each breed are presented in Table 6.

Table 6 Top non-redundant terms for the GO biological process branch. For Holstein, Jersey and the cross-breed networks, the terms correspond to a FDR ≤ 35% and a dispensability = 0. For Brown Swiss, the terms correspond to a FDR ≤ 60% and a dispensability = 0

Regarding pathway enrichment, in the HO and JE networks several pathways were overrepresented, whereas in the BS network none was overrepresented at the significance threshold. Most pathways overrepresented were shared in mouse and human (Table 7). In HO, the most significant pathway was cell cycle (FDR ≤ 0.002). In the across-breed network, overrepresented pathway was Shigellosis. Although other pathways in this network did not make the significance threshold, top pathways showed a consistent trend towards resistance to infection as being involved in molecular differences underlying dystocia in dairy cattle. Top pathways in the across-breed network included: Bacterial invasion of epithelial cells (hsa05100), Regulation of autophagy (hsa04140), Gap junction (hsa04540) and Regulation of actin cytoskeleton (hsa04810).

Table 7 Top KEGG pathways (FDR ≤ 0.35). When pathways were enriched in both species, mouse and human, we listed the one with the smaller FDR

Our GWAS results showed that SNP explaining the largest proportion of genetic variance were also within or in the proximity of reported QTL directly or indirectly associated with size. That is QTL for calving difficulty, birth index, body size, and body depth. All these QTL would potentially add to a feto-pelvic disproportion. And yet, it has been shown that there are some Holstein bulls that could produce the same (lower) incidences of dystocia in their progeny as some Jersey bulls [12]. Results from the biological networks analysis pointed to a link among dystocia, bacterial infection and alteration of muscle functionality. That is, sires might not only influence the feto-pelvic proportion but may also be influencing the response of their female progeny to certain environmental factors.

Variance absorbed by gene networks

Variance absorbed by the different genomic relationship matrices in the cross-validation scheme is reported in Additional file 11: Table S1, Additional file 12: Table S2 and Additional file 13: Tables S3 for HO, JE and BS, respectively. The value reported in the average value (SD) of the posterior means as repeated by excluding each of the 4 validation sets subsequently.

Variance explained by the genomic effect was generally higher for type traits (from .57 to .87 across trait/breeds/model when employing the BASE G), intermediate for calving difficulty traits (from .08 to .43 across trait/breeds/model when employing the BASE G) and low for GL (from .06 to .14 across trait/breeds/model when employing the BASE G). Estimates on HO showed smaller variation across the replicates, whereas this variation was intermediate for JE and higher for BS, probably due to reduced sample size. Trend in variance absorbed by the different effects was moderately consistent across breeds. Effects TOP25, NET and CONN were the effects absorbing the highest proportion of variance, BASE was intermediate and BOT25 and FREE were absorbing the lowest. Some models included two different effects to assess complementarity between the 2 sets of markers. In that case, effects TOP25, NET and CONN generally absorbed more variance than BOT75 and FREE.

Gene networks in genomic prediction

Prediction accuracy is reported in Table 8. Pattern across models and traits were consistent, with HO showing the highest accuracies among the three breeds. Prediction accuracy was higher for type traits, lower for GL and intermediate for calving difficulty traits. Prediction accuracy did not increase or decrease significantly across models. A slight decrease was observed when including network information (NET and CONN), which was recovered when complementary effect was included (FREE). An attempt to include the across breed network in genomic prediction resulted in no increase in accuracy and a slight decrease for the Holstein breed (results not shown). While there is relatively sizable body of evidence in the use of AWM to identify potential candidate genes and pathways [66, 67], their use in genomic prediction is somewhat more limited. Snelling and colleagues [68] found that including AWM and gene networks resulted in an increase in prediction accuracies of approximately 10–15% according to trait for beef tenderness traits. Related but simplified approaches based on marker effect SNP-weighting for production traits in US Holstein have often obtained marginal increases in accuracies [32].

Table 8 Prediction accuracy (SE) for the different combination of traits network genomic relationship matrix for Holstein, Jersey and Brown Swiss

Discussion

The genome-wide association analysis produced a description of regions of the genome affecting calving difficulty in Holstein, Brown Swiss and Jersey. The regions identified were in the proximity of previously described QTL that would most likely affect calving ease by altering the feto-pelvic proportion. Yet the network analysis pointed to a richer and more complex biology underlying dystocia in the different breeds. While most of the research regarding calving difficulty in dairy cattle has focused on the Holstein breed or its crosses, this paper provides and insight into calving difficulty in Brown Swiss and Jersey cattle. In our current analysis inclusion of network information did not increase accuracy of genomic prediction. This could be due to sample size and the use of mid density SNP arrays. Future investigations should focus on increasing the size of the populations and the density of the SNP available through the inclusion of female genotypes and imputed sequence information. While genotypes may be sufficient to predict performance in well-recorded and highly heritable traits, genomic selection on other traits may benefit from the incorporation of functional information and network structure into the predictions.

Conclusions

In the current paper we have employed a gene network analysis to identify genomic regions associated with calving difficulty in dairy cattle. As expected, these analyses identified individual SNP associated with calving difficulty, and enriched pathways that included terms related to dystocia. Yet we did not identify a clear across breeds network and network inclusion did not increased predictive ability in genomic selection.