Background

Fertility, reproduction, calving, and fertility disorders represent a group of traits that directly impact the economic efficiency and animal welfare in the dairy industry [1, 2]. However, the long-term genetic selection for production traits (especially milk yield) in Holstein cattle has negatively impacted fertility and reproductive performance due to their unfavorable genetic correlations [3,4,5]. Fertility and reproduction traits were only added to worldwide official genetic evaluations over the past two decades [6, 7]. In this context, the more recent implementation of genomic selection (GS) in North American Holstein cattle has greatly contributed to improving genetic gains for fertility and reproduction traits [8]. The usually low heritability estimates [9] and great complexity of biological mechanisms affecting fertility and reproductive performance impact GS accuracy, which relies, among other factors, on population-specific linkage disequilibrium (LD) between genotype markers and causal variants. Therefore, a promising approach is to genotype and use the causal or tightly linked variants within known quantitative trait loci (QTL) related to fertility and reproduction traits in GS schemes, which is expected to improve the prediction accuracy for such traits [10]. Despite the great efforts that have been made to identify QTL, functional genes, and putative causal variants related to fertility and reproduction traits in dairy cattle, it is expected that many potential candidate variants still remain to be uncovered, especially with the increase in detection power achieved by using whole-genome sequence (WGS) variants.

Genome-wide association studies (GWAS) have been popularly performed as a standard method for QTL mapping and candidate gene discovery in both humans and other species [11]. For instance, Fortes et al. [12] and Ma et al. [3], based on a systematic review, reported that a considerable number of QTL and candidate genes located in the Bos taurus autosomes (BTA) and in the X chromosome are associated with fertility and reproduction traits. However, most of these studies used the low- or medium-density single-nucleotide polymorphism (SNP) panels, which might hinder the discovery of candidate genes and causal variants. Due to the reduced cost of high-throughput sequencing approaches, several large-scale genome resequencing initiatives have been undertaken, such as the 1,000 Bull Genomes Project [13] and the Canada’s Ten Thousand Cows Genome Project [14]. Based on these large reference populations, WGS variants can be obtained cost-effectively through computational imputation from SNP panels to WGS [15, 16]. The use of WGS variants for GWAS can enhance the discovery of variant-trait associations especially when there is only short-range LD surrounding causal variants [17]. Previous studies have successfully used real or imputed WGS variants for performing GWAS of fertility and reproduction traits in other dairy cattle populations (e.g., [18,19,20]).

There is a plethora of traditional and novel trait definitions for measuring fertility and reproductive performance in females and males, which are further complicated by alternative definitions across breeding programs [3, 21, 22]. Furthermore, fertility and reproduction traits recorded in heifers and lactating cows have always been treated as different traits, i.e., they have been analyzed separately due to their usually low genetic correlations [9, 23]. Therefore, the detection of pleiotropic variants affecting multiple fertility and reproduction traits is paramount for unraveling their biological background. In this context, Bolormaa et al. [24] proposed an efficient multiple-trait analysis for directly combining the estimated effects of SNPs from single-trait GWAS based on a chi-square statistic, which facilitates the discovery of potential pleiotropic variants. In dairy cattle, this method has been applied to multiple fertility and reproduction traits, which were mainly based on SNP array data (e.g., [25,26,27]).

In this study, we first performed single-trait GWAS analyses for 18 fertility and reproduction traits in North American Holstein cattle, using imputed WGS genotypes. Following the method proposed by Bolormaa et al. [24], the multiple-trait chi-square statistics were subsequently calculated regarding different trait categories, including heifer traits, cows traits, sire traits, and their combinations. Our main objective was to identify candidate pleiotropic SNPs and genes associated with various fertility and reproduction traits in heifers, cows, and sires, which will contribute to a better understanding of the underlying biological mechanisms of fertility and reproduction traits in North American Holstein cattle.

Results

Deregressed breeding values and imputed SNP variants

The analyzed traits and their phenotypic summaries (i.e., deregressed estimated breeding values, dEBV) are shown in Table 1. The number of included animals that have phenotypes ranged from 3,803 for SCSc (sire calf survival – cow) to 5,986 for CA (calving ability). The average accuracy of dEBV (± standard deviation, SD) was 0.55 ± 0.02. After performing the quality control (QC), about 5.6 million SNPs (ranging from 5,396,362 for CA to 5,880,012 for SCSc) were retained for single-trait GWAS analyses. These SNPs were distributed across all autosomes with a mean (± SD) pairwise distance of 445 ± 1,478 bp and a mean (± SD) minor allele frequency (MAF) of 18.7 ± 15.6% (Table S1).

Table 1 Traits, summary statistics, and trait categories of multiple-trait analysis

Summaries of single-trait and multiple-trait GWAS analyses

The single-trait GWAS revealed a total of 1,484 SNPs that were significantly associated with the 18 fertility and reproduction traits with no overlapping SNPs between traits. The highest and lowest numbers of significant SNPs were observed for DO (days open) and NRRh (non-return rate heifer), respectively (Fig. 1A). These significant SNPs were broadly distributed across all autosomes (Fig. 1B), with the greatest number of significant SNPs located on BTA3, BTA22, and BTA23. For each trait, the top three chromosomes with the greatest numbers of significant SNPs are shown in Fig. 1C. BTA4, BTA20, and BTA22 were observed in common for four traits [i.e., CSc (calf survival cow), DCA (daughter calving ability), DO, and SCEc (sire calving ease cow) for BTA4, CSh (calf survival heifer), DF (daughter fertility), NRRh, and SCSh (sire calf survival heifer) for BTA20, and CA, CEh (calving ease heifer), FSTCh (first service to conception heifer), and SCEh (sire calving ease heifer) for BTA22]. The detailed distribution of significant SNPs identified across traits and chromosomes is shown in Table S2. The number of QTL identified in the single-trait GWAS ranged from 38 for NRRh to 73 for SCEc. There were three QTL overlapping between CA and CTFS (calving to first service), and three overlapping between CSh and SCEc (Table S3). The highest estimated genome inflation factor was equal to 1.090 (95% CI of 1.075–1.104) for DCA, and its mean ± SD across all traits was equal to 1.052 ± 0.024 (Fig. S1).

Fig. 1
figure 1

Significant SNPs revealed by single-trait GWAS. Beside the numbers of SNPs identified among 18 fertility and reproduction traits (A) and 29 autosomes (B), the top three chromosomes with the largest numbers of significant SNPs were shown for each trait (C). AFS = Age at first service; FSTCh = First service to conception heifer; NRRh = Non-return rate heifer; CEh = Calving ease heifer; CSh = Calf survival heifer; CA = Calving ability; CTFS = Calving to first service; DO = Days open; FSTCc = First service to conception cow; NRRc = Non-return rate cow; CEc = Calving ease cow; CSc = Calf survival cow; DCA = Daughter calving ability; DF = Daughter fertility; SCEh = Sire calving ease heifer; SCEc = Sire calving ease cow; SCSh = Sire calf survival heifer; SCSc = Sire calf survival cow

For the multiple-trait analyses, the defined trait categories of heifer, cow, and sire traits, and their combinations are shown in Table 1. There were 425 significant SNPs revealed by multiple-trait analysis, including 87 for heifers, 72 for cows, 84 for sires, 96 for heifers and cows, and 86 for all animals. Of these, only one SNP (BTA9:21,350,462) was shared between heifers and cows, whereas no SNP for sire traits was shared with heifers, cows, or heifers and cows (Fig. S2). When both heifer and cow traits were analyzed together, there were 15 and 25 significant SNPs overlapping with the independent analyses of heifer and cow traits, respectively. In summary, the multiple-trait analyses revealed a total of 328 unique significant SNPs, which are distributed across all autosomes.

Multiple-trait analysis for heifer traits

The multiple-trait analysis of six heifer traits revealed 87 significant SNPs broadly distributed across 22 chromosomes (Fig. 2 and Table 2). Of these, the highest numbers of significant SNPs were observed on BTA20, BTA22, BTA23, and BTA18. The three most significant SNPs were located on BTA18, BTA20, and BTA20. A total of 176 protein-encoding and eight long non-coding RNAs (lncRNA) genes were found within ± 100 Kb regions around the significant SNPs, and 41 SNPs were located in the exonic, intronic, or upstream/downstream regions of 36 genes. All 87 significant SNPs were clustered into 69 QTL and three QTL (BTA18:39.58–40.01 Mb, BTA20:15.57–15.94 Mb, and BTA20:29.77–30.06 Mb) were identified based on three or more SNPs. Furthermore, 21 of these QTL (30.4%) overlapped with 138 previously reported reproduction-related QTL in cattle (Table S4). Among them, one QTL on BTA3:89.78–89.98 Mb overlapped with 98 previously reported QTL that are associated with luteal activity (LA) and conception rate (CR).

Fig. 2
figure 2

Manhattan plots for multiple-trait analysis for different trait categories. Statistically significant SNP are denoted by triangles

Table 2 Significant SNPs and candidate genes from multiple-trait analysis for six heifer fertility and reproduction traits

The functional analyses revealed that the identified 184 candidate genes were significantly enriched into 21 Gene Ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways (Fig. S3). Among them, the immunity-associated biological functions were predominant, such as the antigen processing and presentation that was associated with multiple genes within the regions of BTA23:25.72–25.92 Mb and BTA23:28.53–25.73 Mb. These genes included the major histocompatibility complex (MHC), class II, DR alpha (BOLA-DRA), MHC class I antigen (BOLA-NC1), MHC class I heavy chain (BOLA), and MHC Class I JSP.1 (JSP.1). Another significantly enriched GO term, the “prolactin receptor binding”, was found related to three genes, named growth hormone E4 (GHE4), prolactin-related protein 14 (PRP14), and prolactin-related protein IX (PRP9), within the region of BTA23:34.80–35.03 Mb.

Multiple-trait analysis for cow traits

There were 72 SNPs significantly associated with cow traits (Fig. 2 and Table 3). These significant SNPs were distributed across 25 chromosomes, with the highest number of SNPs observed on BTA3, BTA8, and BTA10. The three most significant SNPs were located on BTA17, BTA13, and BTA27, respectively. There were 37 significant SNPs located within the exonic, intronic, or upstream/downstream regions of 23 genes, while 101 candidate genes were found within ± 100 Kb of 35 intergenic SNPs. Fifty-six QTL were found, and two of them (BTA3:67.59–67.79 Mb, BTA8:49.18–49.38 Mb) were supported by three or more significant SNPs. By searching the Cattle QTL Database, 17 QTL (30.4%) overlapped with 31 previously reported QTL that are associated with calving interval (CI), CR, first service conception (FSC), and number of inseminations per conception (IPC) (Table S5).

Table 3 Significant SNPs and candidate genes from multiple-trait analysis for nine cow fertility and reproduction traits

Eight GO terms and one KEGG pathway were significantly enriched for the 124 identified candidate genes (Fig. S3). Acylglycerol O-acyltransferase activity and diacylglycerol metabolism were the most significantly associated biological functions, which were linked to four genes (Monoacylglycerol O-acyltransferase 2, MOGAT2; 2-acylglycerol O-acyltransferase 2, LOC785379; ENSBTAG00000017443; ENSBTAG00000015091) within the region of BTA15:54.88–55.08 Mb and to one gene (Membrane bound O-acyltransferase domain containing 7, MBOAT7) within the region of BTA18:63.08–63.28 Mb. Three GO terms of “integrin complex”, “protein complex involved in cell adhesion”, and “plasma membrane signaling receptor complex” were linked to six genes on BTA25 (Glutamate ionotropic receptor NMDA type subunit 2A, GRIN2A; Integrin subunit alpha M, ITGAM; Integrin subunit alpha X, ITGAX; Integrin subunit alpha D, ITGAD), BTA18 (Shisa family member 7, SHISA7), and BTA10 (Integrin subunit alpha 11, ITGA11). The KEGG pathway of “Olfactory transduction” was also associated with 12 genes on BTA5 and BTA10.

Multiple-trait analysis for sire traits

Eighty-four significant SNPs were associated with sire traits, which were broadly distributed across 24 chromosomes (Fig. 2 and Table 4). The three chromosomes that had the highest numbers of significant SNPs were BTA7, BTA17, and BTA22. The three most significant SNPs were located on BTA4, BTA5, and BTA26. There were one, 32, and one significant SNP located within the exonic, intronic, and upstream/downstream regions of 30 protein-encoding genes, respectively. Other 104 protein-encoding and 13 lncRNA genes were found within ± 100 Kb regions of 50 intergenic SNPs. Seventy-three QTL regions were determined from the 84 significant SNPs, and 31 of them (42.5%) overlapped with 72 previously reported QTL associated with reproduction traits in sires such as scrotal circumference, daughter pregnancy rate (DPR), CR, IPC, and interval to first estrus after calving (IFEC); Table S6. Among all the 147 candidate genes found for sire reproduction traits, only one GO term of “N-acetyl-beta-D-galactosaminidase activity” was shown to be significantly enriched (Fig. S3). Two genes located on BTA20, including HEXB (Hexosaminidase subunit beta) and LOC786974 (Beta-hexosaminidase subunit beta) were also involved in this biological function.

Table 4 Significant SNPs and candidate genes from multiple-trait analysis for four sire fertility traits

Multiple-trait analysis for the combined trait categories

When the 14 female fertility and reproduction traits were combined (heifers and cows), 96 significant SNPs were identified across 28 chromosomes (Fig. 2 and Table S7). The three most significant SNPs were located on BTA23, BTA22, and BTA4, respectively. Ninety-six QTL were detected and 25 of them overlapped with 60 previously reported reproduction-associated QTL in cattle (Table S8). There were 171 candidate genes found, in which, nine GO terms and one KEGG, such as Phagosome (KEGG:04,145) and rough endoplasmic reticulum (GO:0,005,791), were significantly enriched for them (Table S9).

The multiple-trait chi-square statistics was also applied to all 18 traits of heifers, cows, and sires combined. Eighty-six significant SNPs and 178 candidate genes were identified across 26 chromosomes (Fig. 2 and Table S10). All significant SNPs were clustered into 77 QTL, and 27 of them (35%) overlapped with previously reported reproduction-associated QTL in cattle for traits, such as CR, calving ease (CE), DPR, and IPC (Table S11). The functional enrichment analyses revealed four GO terms, including “2-acylglycerol O-acyltransferase activity”, “Diacylglycerol biosynthetic process”, “Acylglycerol O-acyltransferase activity”, and “O-acyltransferase activity” (Table S9). For these biologically relevant genes identified for heifer, cow, and sire traits in this study, we also performed a protein–protein interaction analysis, but no direct functional interactions among them were observed as shown in Fig. S4.

Discussion

Refined studies on fertility and reproduction traits in dairy cattle

Due to the great economic importance of fertility and reproduction traits in cattle industry, GWAS have been frequently carried out especially since genotyping become affordable for large sample sizes. Besides the fact that SNP panels have been used in former GWAS of cattle fertility and reproduction traits [27,28,29,30,31,32,33,34,35,36,37,38,39], whole-genome sequence variants recently began to be used to refine QTL boundaries and identify causal genes [18,19,20]. Therefore, the imputed or real WGS variants is being increasingly used for GWAS, especially for traits with low heritability and that are highly polygenic. Because of intrinsic correlations among various indicator traits of fertility and reproduction, it is preferable to perform multiple-trait GWAS to detect pleiotropic variants and increase the detection power of important variants. The classical methods of multiple-trait GWAS [40, 41] are difficult to implement in practice, especially on large-scale datasets due to computing requirements. Alternatively, an approximate method was proposed by Bolormaa et al. [24], which efficiently computes a multiple-trait chi-square statistic from estimated SNP effects from single-trait GWAS. In this study, we used imputed WGS variants and this approximate method for multiple-trait GWAS of fertility and reproduction traits in Holstein cattle, aiming to refine the associated pleiotropic variants, candidate genes, and QTLs.

Polygenic nature of fertility and reproduction traits

Most fertility and reproduction traits are lowly heritable in cattle. Berry et al. [21] reported a summary of heritability estimates below 0.05 for most female traits and 0.05–0.22 for most male traits. Similarly, the reported QTL and candidate genes in literature have been found to be extensively distributed across genome, and few of them could be repeated across studies [3, 12]. In this study, a similar genetic landscape was observed, as all autosomes harbored significant SNPs for one or more traits analyzed, being consistent with the polygenic nature of these traits. Numerous fertility and reproduction indicator traits have been defined for measuring reproductive performance at different reproductive stages in female cattle [3, 21]. However, some of these traits are by definition related with each other, such as CI is greatly determined by the interval from calving to conception. Therefore, pleiotropic variants are expected for these traits [10, 42]. Overall, we found that the number of significant SNPs from multiple-trait analysis was less than the cumulative sum of separate single-trait analysis; and similar results were found in previous GWAS studies on fertility and reproduction [e.g., 27].

Differences between heifer and cow traits

It is well known that the genetic basis and physiological processes underlying reproduction differ between heifers and lactating cows. In general, heifer traits have higher heritability, but lower genetic correlations with each other compared to cow traits [43, 44]. There are also only moderate genetic correlations between heifer and cow traits, which are lower than between cow traits in different parities [23, 44,45,46]. Likewise, both ovarian structures and circulating steroids (such as serum concentrations of progesterone and estradiol) were observed to act differently between heifers and cows. In addition, lactating cows have lower circulating steroid concentrations, but larger ovulatory follicles and luteal structures than heifers [47, 48].

In previous GWAS for fertility and reproduction traits, different variants, candidate genes, and QTL were found significantly associated with heifers and cows [20, 26, 27, 37]. For instance, Fang et al. [26] found a significant QTL on BTA17 associated with interval from first to last insemination (IFLI) and non-return rate (NRR) in heifers, but not in cows. For NRR, distinct significant QTL regions on BTA17 were also suggested for heifers and cows [20]. Liu et al. [27] analyzed four fertility traits (IFLI, FSC, IPC, and NRR) measured on both heifers and cows, and found that none of the significant QTL overlapped and most of these QTL were located on distinct chromosomes. In this study, our multiple-trait analysis found that there were only two candidate QTL (around at BTA9:21.35 Mb and BTA23:28.63 Mb) overlapping between heifers and cows. Furthermore, our functional analyses of candidate genes also suggested differences between heifer and cow traits. The candidate genes found for heifers were significantly involved in immunity-associated biological functions, whereas candidate genes found for cows were associated with acylglycerol O-acyltransferase activity and diacylglycerol metabolism.

QTL and candidate genes found for heifer traits

The mean CR of Holstein heifers was estimated to be 56.3% in the United States, which is influenced by the age at breeding, month of breeding, age of service sire, and other factors [45]. Accordingly, our multiple-trait analysis for heifer traits suggested dozens of associated QTL that are broadly distributed across three quarters of all included chromosomes. Notably, one-third of these QTL overlapped with previously identified reproduction-related QTL in cattle. For example, the QTL located on BTA3:89.78–89.98 Mb was supported by a QTL previously reported to be associated with the commencement of LA and proportion of cows in LA between 25 and 60 days in milk [19], and with CR [49]. There were five QTL previously reported to be associated with DPR, CR, and IFEC [50,51,52], overlapping with one QTL on BTA9:95.88–96.08 Mb found in this study. The QTL located on BTA12:77.58–77.78 Mb overlapped with four previously reported QTL associated with CE, DPR, and stillbirth [31, 53]. Furthermore, we identified many novel QTL in this study, such as BTA13:41.92–42.15 Mb, BTA18:39.57–40.01 Mb, BTA19:53.68–53.88 Mb, and BTA22:11.29–11.84 Mb, which showed strong association with heifer traits.

For the CR at first service and the number of times bred in Holstein heifers, Galliou et al. [54] found that three out of the five most significant pathways were involved in immune system regulation, which was consistent with our results. More importantly, the non-classical MHC I gene of BoLA-NC1, found in this study and involved in the KEGG pathway of allograft rejection, was suggested to increase maternal immunity against the fetus [55]. Another non-classical MHC I gene (JSP.1) was found to play a crucial role during early pregnancy in heifers in a previous study [56]. Similarly, Melo et al. [57] found that the MHC class II genes were significantly associated with pregnancy success in Nellore cows. As the link between autophagy and reproduction has been acknowledged [58], another autophagy-related gene of phosphatidylinositol 3-kinase catalytic subunit type 3 (PIK3C3) was identified in this study. These findings suggest the importance of immunological tolerance of dam to fetus and other immune responses in establishing successful pregnancy.

Notably, a QTL, that was located on BTA9:95.88–96.08 Mb and supported by five previously reported QTL, contained four candidate genes having biological implications in reproduction, including the superoxide dismutase 2 (SOD2) for meiotic defects in mouse oocytes [59], WT1 associated protein (WTAP) for spermatogenesis in mice [60], acetyl-CoA acetyltransferase 2 (ACAT2) for DPR in Holstein [52], and t-complex 1 (TCP1) for mediating sperm-oocyte interaction in mice [61]. On BTA22, there were five suggestive candidate genes, including the transforming growth factor beta receptor 2 (TGFBR2) for female fertility in pigs and mice [62, 63], DLEC1 cilia and flagella associated protein (DLEC1) and solute carrier family 22 member 14 (SLC22A14) regulating male fertility in mice [64, 65], activin A receptor type 2B (ACVR2B) associated with premature ovarian failure in human [66], and 6-phosphofructo-2-kinase fructose-2,6-biphosphatase 4 (PFKFB4) involved in spermatogenesis in mice [67]. Two genes, the tripartite motif containing 27 (TRIM27) and PRP14 on BTA23, were suggested to be associated with male and female fertility, respectively [68, 69]. Other positional candidate genes included the transmembrane protein with EGF like and two follistatin like domains 2 (TMEFF2) on BTA2 involved in early oocyte development in human [70] and associated with bull sperm morphometry [71], HECT, C2 and WW domain containing E3 ubiquitin protein ligase 1 (HECW1) on BTA4 required for estrogen-induced degradation of Scribble [72], cystatin 8 (CST8) on BTA13 involved in sperm maturation [73], dynein axonemal heavy chain 17 (DNAH17) on BTA19 associated with human sperm fertility [74, 75], ADAM metallopeptidase domain 3A (ADAM3A) on BTA27 regulating sperm migration in mice [76], and zona pellucida glycoprotein 4 (ZP4) on BTA28 regulating embryo development in rabbits [77]. Taken together of the experimental evidence in literature, these genes have been supported to hold biological roles in reproduction of humans and other species.

QTL and candidate genes found for cow traits

Our multiple-trait analysis for cow traits revealed that all significant QTL were broadly distributed across more than 85% of the chromosomes and one-third of them were supported by previously reported QTL in cattle. For instance, one QTL located on BTA18:63.08–63.28 Mb was previously reported significantly associated with CI and stillbirth [36, 78]; one QTL located on BTA20:56.86–57.06 Mb was supported by two previously reported associations with CR, FSC, and IPC [49, 54]; and one QTL located on BTA5:61.82–62.02 associated with FSC and CR [79]. In addition, many novel QTL identified in this study harbored candidate genes that have biological functions impacting reproduction according to the literature.

Our analyses of candidate genes for cow traits revealed functional implications that are in contrast to those for heifers. Instead of immune system, the most significant functions were associated with fatty acid metabolism, such as the acylglycerol O-acyltransferase activity and diacylglycerol metabolism. Mattos et al. [80] comprehensively reviewed effects of dietary fatty acids on reproduction in ruminants, which include the influences on ovarian follicle and corpus luteum function via improved energy status, and the synthesis of reproductive hormones, such as steroids and prostaglandins. In addition, the maternal lipid metabolism during pregnancy positively influence fetal growth [81, 82]. Many genes involved in fatty acid metabolism were similarly found to be associated with reproductive performance in Nellore cattle [57]. The seasonal changes in Holstein fertility was related to fatty acid composition of follicles [83]. Furthermore, the relationship between dietary fatty acids and ovarian function was experimentally found in Holstein cows [84].

Interestingly, numerous candidate genes identified in this study also have the known biological functions associated with reproduction in literature. Two genes (GRIN2A and ITGAX) located on BTA25 were associated with CTFS in Iranian Holstein cattle [85] and with altered expression in polycystic ovary syndrome in women [86], respectively. There were four candidate genes located on BTA8, including the kinesin family member 13B (KIF13B) involved in oocyte meiosis [87], annexin A1 (ANXA1) with regulatory functions during early pregnancy in mice [88] and differentially expressed between less fertile and normally fertile Holstein [89], proprotein convertase subtilisin/kexin type 5 (PCSK5) that contributed to ovarian follicle development in rats [90], and zinc finger protein 483 (ZNF483), which is associated with age of puberty in Brahman heifers [91]. Two genes, the fem-1 homolog B (FEM1B) and sperm equatorial segment protein 1 (SPESP1) on BTA10, were reported to be involved in polycystic ovary syndrome in humans [92] and required for fully fertile sperm in mice [93], respectively. Four genes were associated with different reproduction traits, including cell adhesion molecule 2 (CADM2) on BTA1 with the number of piglets born dead in pigs [94], calcium voltage-gated channel auxiliary subunit beta 4 (CACNB4) on BTA2 with CTFS in Iranian Holstein Cattle [85], ST6 N-acetylgalactosaminide alpha-2,6-sialyltransferase 5 (ST6GALNAC5) on BTA3 with fertility index in Holstein cattle [95], and ATP/GTP binding protein like 1 (AGBL1) on BTA21 with out of season lambing in sheep [96]. Other suggestive candidate genes included the protein tyrosine phosphatase receptor type Z1 (PTPRZ1) on BTA4 differentially expressed between high- and low-fertility Holstein cows [97], LPS responsive beige-like anchor protein (LRBA) on BTA17 differentially methylated between high- and low-fertility bulls [98], peroxisome proliferator activated receptor gamma (PPARG) on BTA22 involved in the release of oocytes each estrous cycle [99], and jumonji domain containing 1C (JMJD1C) on BTA28 required for long-term maintenance of male germ cells in mice [100].

QTL and candidate genes found for sire traits

In dairy cattle, the genetic evaluation of male reproduction has received much less attention in comparison with female traits [101, 102]. Sire CR in Holstein was included in two previous GWAS studies [103, 104], which revealed significant SNPs located on multiple chromosomes. In this study, we included four sire traits and found that the number and genomic distribution of QTL are comparable with or higher than that in females. Interestingly, more than 40% of these QTL identified in this study overlapped with previously reported reproduction-related QTL in cattle. For instance, all four QTL located on BTA17 (BTA17:18.23–18.56 Mb, BTA17:33.87–34.07 Mb, BTA17:48.40–48.60 Mb, and BTA17:72.94–73.14 Mb) overlapped with previous reports for DPR, IPC, CE, stillbirth, IFEC, and NRR [31, 34, 49, 50, 54] and three out of four QTL on BTA25 (BTA25:26.02–26.22 Mb, BTA25:27.45–27.65 Mb, and BTA25:39.05–39.25 Mb) overlapped with previous reports for age at first calving (AFC) and CR [105, 106].

Only one GO term (N-acetyl-beta-D-galactosaminidase activity) was significantly enriched among all candidate genes of sire traits. The two involved genes (HEXB and LOC786974) on BTA20 have no functional evidence in literature with respective to reproduction. However, we found many novel candidate genes that have direct functional implications in reproduction. For example, the engulfment and cell motility 1 (ELMO1) gene on BTA4 was found to play crucial roles in clearance of apoptotic germ cells and spermatogenesis in mice [107]. Two genes, JMJD1C and cilia and flagella associated protein 70 (CFAP70) on BTA28, are required for long-term maintenance of germ cells in mice [100] and are associated with multiple morphological abnormalities of sperm flagella in human [108]. Notably, both ELMO1 and JMJD1C genes were suggested by the significant SNPs located in intronic regions. One well-known candidate gene on BTA15 (follicle stimulating hormone subunit beta, FSHB), associated with semen quality and fertility in bulls [109], was also identified in this study.

There were four suggestive candidate genes on BTA7, including the PRELI domain containing 2 (PRELID2) associated with fertility traits in pigs [110], ADAM metallopeptidase domain 19 (ADAM19) differentially expressed between different quality of bovine blastocysts [111], SRY-box transcription factor 30 (SOX30) required for male fertility in mice [112, 113], and versican (VCAN) involved in embryo implantation in rabbits [114]. Three candidate genes were found on BTA17, including RAB33B, member RAS oncogene family (RAB33B), associated with freezability in boar spermatozoa [115], transport and golgi organization 2 homolog (TANGO2) influencing bull fertility [116], and DGCR8 microprocessor complex subunit (DGCR8) involved in human spermatogenesis [117]. On BTA25, the mitogen-activated protein kinase 3 (MAPK3), septin 14 (SEPTIN14), and poly(A) polymerase beta (PAPOLB) were found to regulate ovulation in mice [118], spermatogenesis in humans [119], and spermatogenesis in mice [120], respectively. Other suggestive candidate genes included the synaptophysin like 1 (SYPL1) on BTA4 contributing to sperm maturation in sheep [121], multiple EGF like domains 11 (MEGF11) on BTA10 associated with lifetime productivity in pigs [122], fucosyltransferase 8 (FUT8) on BTA10 differentially expressed between matured oocytes from older and younger women [123], myelin transcription factor 1 (MYT1) on BTA13 involved in reduced fertility of anovular dairy cows [124], Rho guanine nucleotide exchange factor 3 (ARHGEF3) on BTA22 associated with number of piglets weaned [125], monoglyceride lipase (MGLL) on BTA22 influencing metabolism of endocannabinoids in bovine endometrium [126], NEDD4 like E3 ubiquitin protein ligase (NEDD4L) on BTA24 associated with AFC in buffaloes [127], and BCL2 apoptosis regulator (BCL2) on BTA24 influencing oocyte and early embryo survival in humans [128].

Combining multiple trait categories

We combined multiple trait categories for multiple-trait analysis for exploiting potential pleiotropic effects for fertility and reproduction traits. When the 14 female traits were analyzed together, 10 out of the 33 suggestive candidate genes in heifer and cow traits were identified. In addition, 15 out of the 54 suggestive candidate genes in heifers, cows, and sires were identified when also combining the four male traits together. There were five additional identified candidate genes in the combined analysis that have biological implications in reproduction in the literature, including Cbp/p300 interacting transactivator with Glu/Asp rich carboxy-terminal domain 2 (CITED2) on BTA9, cadherin 1 (CDH1) on BTA18, fanconi anemia complementation group M (FANCM) on BTA23, DPY30 domain containing 1 (DYDC1) on BTA28, and neural EGFL like 2 (NELL2) on BTA5 [129,130,131,132,133]. Among them, NELL2 was recently found to be involved in lumicrine system essential for testis-epididymis-spermatozoa signaling and male fertility [133]. Furthermore, three chromosomes (BTA3, BTA11, and BTA24) no longer contained significant regions when all 18 traits were analyzed together. A possible explanation for these findings is that the increased amount of information might have removed spurious associations.

Implications of the study

The results of this study have three main implications. First, we enhanced compelling evidence that traits measured in heifers, cows, and sires hold relatively distinct polygenic nature, therefore having specific evaluation is needed in breeding programs. Second, the QTL and candidate genes found in this study can be specifically incorporated into genomic prediction models by giving greater weights to the more important markers. Furthermore, SNPs located in the relevant genomic regions identified can also be included in commercial genotyping platforms to increase the accuracy of genomic prediction. Third, we suggested several novel candidate genes associated with fertility and reproduction traits in cattle, which should be further investigated in sequel studies. Additionally, the use of closer to biology phenotypes that better capture the biological mechanisms underlying the fertility and reproduction traits and the use of multi-omic data (e.g., transcriptomics, metabolomics, epigenomics) can further facilitate the mapping of QTL and variants associated with fertility and reproduction. Future studies could further investigate the most promising candidate genes using gene editing or gene knock-out experiments and real WGS data (instead of imputed WGS), which would enable the investigation of rare alleles (usually removed due to low imputation accuracy) and structural variation in the genome (e.g., copy number of variants, insertions, deletions). For the significant intergenic SNPs found in this study, their possible functional roles will be explored when a comprehensive annotation of genomic regulatory elements are available.

Conclusions

The multiple-trait GWAS of 18 fertility and reproduction traits in North American Holstein cattle revealed several QTL and candidate genes associated with heifer, cow, and sire traits. These QTL were broadly distributed across the entire genome, which are consistent with the polygenic nature of fertility and reproduction traits. The biological functions of immune response and fatty acid metabolism were significantly enriched in heifer and cow traits, respectively, whereas no known functional enrichment was found for sire traits. The most important chromosomes, which had three or more significant QTL, were BTA22 and BTA23 for heifer traits, BTA8 and BTA17 for cow traits, and BTA4, BTA7, BTA17, BTA22, BTA25, and BTA28 for sire traits. Several candidate genes that have not been previously reported in cattle and other livestock were strongly suggested for heifer (SOD2, WTAP, DLEC1, PFKFB4, TRIM27, HECW1, DNAH17, and ADAM3A), cow (ANXA1, PCSK5, SPESP1, and JMJD1C), and sire (ELMO1, CFAP70, SOX30, DGCR8, SEPTIN14, PAPOLB, JMJD1C, and NELL2) traits. More than one-third of the QTL identified in this study overlapped with previously reported reproduction-related QTL in cattle, and more than 50 candidate genes were supported by functional implications in reproduction, as found in the literature. In addition, many important candidate genes were identified via significant SNPs located in their intronic regions. These observations indicate high detection power and mapping resolution of this study. Therefore, a comprehensive investigation on underlying genetic basis of fertility and reproduction in heifers, cows, and sires are provided in this study, whose findings can be used for improving genomic evaluation for fertility and reproduction traits in Holstein cattle.

Methods

Animals, phenotypes and genotypes

The phenotypic and genotypic datasets used in this study were provided by the Canadian Dairy Network (CDN), a member of Lactanet Canada (Guelph, ON, Canada). A total of 18 fertility and reproduction traits were analyzed in this study (Table 1). For all traits, dEBV were used as pseudo-phenotypes, which were computed following VanRaden et al. [134] and only dEBV with accuracy greater than 0.50 were kept for further analyses. The details of trait definitions and the effects included in the statistical model used to predict the original estimated breeding values (EBV) for each trait are reported in Oliveira Junior et al. [9].

Imputed WGS datasets containing 29,548,077 SNPs were available for 9,131 animals. The detailed genotype imputation process using the FImpute software [135], including the number of animals per SNP panel, methods, and QC, was described by Chen et al. [136]. In brief, genotype imputation was performed in two steps: (1) imputation from medium density panels (9,131 cows and 56,955 or 60,914 SNP; Illumina, San Diego, CA, USA) to the BovineHD panel [HD, 311,725 SNP after a preliminary quality control; Illumina, San Diego, CA, USA]; and (2) imputation from HD to WGS. The reference population for step 1 had 2,397 animals (from the same herds and Holstein population), whereas for step 2, there were 1,147 animals with WGS data (from the 1,000 Bull Genomes Project, which also included North American Holstein animals) [136]. An additional QC step was applied after imputation, which required the individual and genotype missing rates to be lower than 0.1 (this step was done as some markers were still missing after the imputation process), MAF to be higher than 0.01, and no extreme deviation from Hardy–Weinberg equilibrium (only retained SNPs with P > 1.0 × 10–8). The SNPs that were poorly imputed were also removed from the analyses. This was done in a previous step where accuracy of imputation was calculated based on genotype concordance rate and allelic R2 on a per-SNP basis, where the SNPs retained had a concordance rate and allelic R2 greater than 0.95 and 0.85, respectively. All QC were conducted using the PLINK software [137], after which an average of 5,576,878 SNPs were retained for single-trait GWAS using 3,803 to 5,986 animals (depending on the trait, as showed in Table 1).

Single-trait association analyses

As all known fixed effects were fitted when predicting the EBVs used to compute the dEBVs, only SNP and polygenic effects were included in the mixed linear model used for the association analyses, i.e.:

$$\mathbf{y}=1\upmu +\mathbf{X}\mathrm{b}+\mathbf{Z}\mathbf{u}+\mathbf{e},$$

where \(\mathbf{y}\) is the vector of dEBV for each analyzed trait; \(1\) is a vector of ones; \(\upmu\) is the overall mean; \(\mathrm{b}\) is the fixed effect of the SNP tested for association, X is a vector containing the genotype score for the tested SNP; \(\mathbf{u}\) is the random vector of polygenic effect with \(\mathbf{u} \sim N(0, \mathbf{G}{\sigma }_{u}^{2})\), where \(\mathbf{G}\) is the genomic-based relationship matrix (GRM), \({\sigma }_{u}^{2}\) is the additive genomic variance of polygenic effects; \(\mathbf{Z}\) is the incidence matrix for \(\mathbf{u}\); and \(\mathbf{e}\) is a random vector of residual effects with \(\mathbf{e} \sim N(0, \mathbf{I}{\sigma }_{e}^{2})\), where \(\mathbf{I}\) is an identity matrix and \({\sigma }_{e}^{2}\) is the residual variance. The genetic relationship between individuals \(j\) and \(k\) in the GRM was computed as [138]:

$${G}_{jk}=\frac{1}{N}\sum_{i=1}^{N}\frac{({x}_{ij}-2{p}_{i})({x}_{ik}-2{p}_{i})}{2{p}_{i}(1-{p}_{i})},$$

where \({p}_{i}\) is the frequency of the reference allele for the ith SNP; \({x}_{ij}\) and \({x}_{ik}\) are the numbers of copies of the reference allele for individuals \(j\) and \(k\), respectively; and \(N\) is the total number of SNPs used. Instead of using all included SNPs, a total of 29 GRMs were constructed by randomly sampling 50,000 SNPs from the remaining 28 chromosomes (i.e., iterative process including all chromosomes except the one in which the analyzed SNP was located). These analyses were implemented using the GCTA software [138].

The strongly-linked SNPs were clumped out if they had high LD (r2 > 0.9) with another SNP that have a lower P in the GWAS, using the PLINK software [137]. This clumping process avoids overestimating the number of significant SNPs and was suggested to be preferable compared to the pruning method without considering the P values from GWAS [139]. To check if there was potential stratification due to the population structure, the genomic inflation factor (λ) [140] was evaluated, along with its 95% confidence interval.

Multiple-trait chi-square statistics

Following Bolormaa et al. [24], the multiple-trait analysis was performed using five different trait categories (i.e., heifers, cows, sires, heifers and cows, and all traits), as shown in Table 1. The CA trait was simultaneously included into both heifer and cow categories as it is a sub-index incorporating other traits. For each category of \(n\) traits, the multiple-trait chi-square statistics of a SNP was obtained as follows [24]:

$${\upchi }^{2}={{\varvec{t}}}_{i}^{^{\prime}}{{\varvec{V}}}^{-1}{{\varvec{t}}}_{i}$$

where \({{\varvec{t}}}_{i}\) is a \(n\times 1\) vector of signed t-values for \({i}^{th}\) SNP (that is equal to the allele effect divided by its standard error) across the \(n\) analyzed traits; \({{\varvec{t}}}_{i}^{^{\prime}}\) is the transpose of vector \({{\varvec{t}}}_{i}\) (\(1\times n\)); and \({{\varvec{V}}}^{-1}\) is the inverse of \(n\times n\) correlation matrix, in which the pairwise correlations of traits were calculated over the estimated SNP effects. The null hypothesis that the SNP has no significant effect on any of the tested traits was tested based on the \({\upchi }^{2}\) distribution with \(n\) degrees of freedom.

Adjustment for multiple-hypotheses testing

Due to the polygenic architecture of fertility and reproduction traits, a large number of QTL are expect, therefore a Bonferroni correction at 5% chromosome-wise significance level [141] was carried, by dividing 0.05 by the number of independent chromosome segments (\({M}_{e}\)) to account for dependency among tests. The \({M}_{e}\) was calculated as follow [142]:

$${M}_{e}=\frac{2{N}_{e}L}{\mathrm{ln}\left({N}_{e}L\right)},$$

where \({N}_{e}\) is the effective population size, which was set to 66 according to a recent report in North American Holstein [143]; \(L\) is chromosome length expressed in centi-Morgans (cM; one cM was considered to be equivalent to 1 Mb). Thus, SNPs were considered as statistically significant if their \({-\mathrm{log}}_{10}(P)\) was higher than the chromosome-wide threshold [\(-{\mathrm{log}}_{10}(0.05/{M}_{e})\)], which ranged from 4.15 to 4.65 depending on the chromosome (average ± SD = 4.39 ± 0.12).

QTL, positional candidate genes, and functional analyses

The QTL were defined as chromosomal regions of ± 100 Kb around the significant SNPs, therefore, considering a flanking distance where, on average, high LD is expected in North American Holsteins [144]. The identified QTL were compared to previously reported QTL in the Cattle QTL Database – Release 43 [145]. To most effectively avoid missing the potential causal genes, all positional candidate genes within QTL region, including protein-encoding and lncRNA, were retrieved using the biomaRt R package [146]. The ARS-UCD1.2 assembly (https://ensembl.org/Bos_taurus/Info/Index) was used as the reference genome. Functional enrichment analyses of the candidate genes were conducted using the g:GOSt function of the g:Profiler web server [147], including the target datasets of the GO terms [148] and KEGG pathways [149]. The default parameters and methods for adjusting for multiple hypotheses testing were used, targeting an adjusted 5% level of significance. For the biologically suggested candidate genes, we also performed a protein–protein interaction analysis using the STRING software [150].