Background

There is a negative genetic correlation between milk yield and fertility in dairy cattle [13]. Partly as a result, the large improvement in milk yield over the last 40 years was accompanied by a decline in fertility [46]. Genetic selection for fertility is hampered by low heritability. For example, the heritability for daughter pregnancy rate (DPR), the fertility trait most widely measured in the United States, has been estimated at 0.04% [2]. Genetic estimates of fertility can be improved by genome-wide single nucleotide polymorphism (SNP) arrays. Utilization of the BovineSNP50 chip from Illumina (San Diego, CA, USA) improved reliability for DPR [7, 8] but the low heritability and polygenic nature of the trait has meant that improvements in reliabilities achieved by incorporation of genomic information was less than for other traits. Thus, while the incorporation of information from the SNP50 chip increased reliability of DPR by 17% in Holsteins, this improvement was one of the lowest of the 12 traits examined [8].

One possible way to improve the accuracy of genomic estimates of fertility is to incorporate SNPs for specific genes involved in reproduction into SNP panels. The bovine genome contains over 20,000 genes, and over 14,000 of those do not contain a single SNP on the BovineSNP50 chip [9]. Incorporation of candidate gene SNPs into genomic tests for reproduction would allow selection of causative SNPs or SNPs physically more close to causative SNPs. Such an approach has been successful for improving ability to detect genomic associations with disease [10].

Many genes have been associated with reproduction in the dairy cow. Among these are SNPs related to in vitro fertilization or development, such as STAT5A[11], FGF2[12, 13] and PGR[14]), DPR (CAST[15]), sire conception rate including STAT5A[16], FGF2[16], and ITGB5[17], calving interval (GHR[18]), superovulation response (FSHR[19]), twinning rate (IGF1[20]) and incidence of still birth (NLRP9[21] and LEP[22]). In beef cattle, SNPs related to reproductive function include those in HSPA1A, associated with calving rate [23], and PAPPA2, associated with calving interval [24].

The previously mentioned SNPs only represent a small portion of the genes involved in reproductive processes. Recent studies have revealed genes whose expression in tissues or cells of importance to reproduction vary with reproductive status; these genes are candidates for containing SNPs that impact fertility. For example, genes were identified that were differentially regulated in the brain of cows displaying strong estrus compared to those displaying weak estrus [25], in the endometrium of heifers which produced viable embryos compared to those which produced non-viable embryos [26], and in biopsies from embryos that resulted in live calves as compared to embryos that died following embryo transfer [27]. Genetic variants in the genes differentially expressed in the aforementioned studies and others may be responsible for differences in fertility among animals.

The goal of the current study was to identify SNPs in candidate genes affecting reproductive processes. The approach was to evaluate effectiveness of SNPs in candidate genes for explaining genetic variation in DPR. Three types of SNPs were evaluated: SNPs previously reported to be associated with reproductive traits of dairy or beef cattle or physically close to genetic markers for reproduction, SNPs in genes that are well known to be involved in reproductive processes, and SNPs in genes reported to be differentially expressed between physiological conditions in a variety of tissues associated in reproductive function. As an additional goal, SNPs were also evaluated for their relationship to other traits. Given the negative genetic correlation between milk yield and reproduction [13], it was hypothesized that some SNPs associated with DPR would have an antagonistic relationship with production traits.

Methods

Selection of bulls

Straws and ampules of semen were obtained from 550 Holstein bulls born between 1962 and 2010. Bulls were chosen based on their predicted transmitting ability (PTA) and reliability for DPR. In particular, bulls were chosen to have either a high PTA for DPR (≥ 1.7) or low PTA for DPR (≤ −2) with reliability as high as possible. The PTA for the low DPR group (n = 262) ranged from −5.9 to −2 (average = −3.5), and the PTA for the high DPR group (n = 288) ranged from 1.7 to 5.3 (average = 2.87). Reliabilities ranged from 0.46 to 0.99 (3 bulls < 50%, 17 between 50 and 60%, 150 between 60 and 70%, 213 between 70 and 80%, 47 between 80 and 90%, and 120 greater than 90%). The distribution of reliabilities was similar between the low (average = 79%) and high (average = 77%) DPR groups. Predicted transmitting abilities for a variety of traits of the bulls are presented in Additional file 1: Table S1. Semen was obtained from the Cooperative Dairy DNA Repository [CDDR (Beltsville, MD, USA; 445 bulls)], Alta Genetics (Watertown, WI, USA; 38 bulls), Genex Cooperative Inc. (Shawano, WI, USA; 31 bulls), Taurus-service Inc. (Mehoopany, PA, USA; 26 bulls), Foundation Sires Inc. (Listowel, ON, CAN; 5 bulls), Accelerated Genetics (Baraboo, WI, USA, 2 bulls), Interglobe Genetics (Pontiac, IL, USA, 2 bulls), and Nebraska Bull Service (McCook, NE, USA, 1 bull). Five bulls were born in the 1960s, 15 in the 1970s, 54 in the 1980s, 154 in the 1990s, and 322 in the 2000s.

SNP discovery

The choice of 434 SNPs to be used for genotyping was made using a three-step process: candidate gene selection, SNP identification, and SNP selection. A list of candidate genes affecting reproduction was compiled using two methods. The first was to include genes commonly known to affect reproductive processes such as steroidogenesis (STAR, HSD17B3, etc.), follicular development (LHB, FSHB, etc.), oocyte maturation (BMP15, GDF9, etc.), and early embryonic development (CSF2, IGF1, etc.), as well as nutritional genes including orexins (NPY, HCRT, etc.) and anorexins (CCK, LEP, etc.). Furthermore, genes that were in physical proximity to SNPs related genetically to interval to insemination (IGFBP7, IRF9, etc. [28]) and 56 d non-return rate (BAIAP2, SCRN1, etc. [29]) were included. In addition, genes reported to be differentially expressed between physiological conditions in a variety of tissues associated with reproductive function were incorporated. This list included genes differentially regulated in the following conditions: the brain of cows displaying strong vs. weak estrus (CALCR, POMC, etc. [25]), embryos after cryopreservation (BAX, DSC2, etc. [29]), superovulated embryos compared to embryos from unstimulated dams (GOLGA4, KIT, etc. [30]), embryos which survived to term compared to embryos that died in vivo after embryo transfer (ATP5A1, OCLN, etc. [27, 31]), embryos treated with CSF2 (CACNA1G, MADD, etc. [32]) or IGF1 (COQ9, CREG1, etc. [33]) compared to control embryos, embryos cultured in vitro in the well-of-the-well system compared to embryos cultured in groups (CSNK1E, ZP4, etc. [34]), oocytes compared to 8-cell embryos (CLIC4, PDGFR, etc. [35]) and blastocysts (GJA1, TAF9, etc. [36]), oocytes at different stages of oocyte maturation (CPS1F, ZP2, etc. [37]), endometrium related to embryo survival (DGKA, BSP3, etc. [26]), endometrium in lactating cows compared to non-lactating cows (APBB1, ST13, etc. [38]) or pregnant cows compared to non-pregnant cows (ASL, GPLD1, etc. [38]), cumulus cells regulated by the LH surge (DHCR24, HAS2, etc. [39]), at different stages of oocyte maturation (AP3B1, CLU, etc. [40]), or from embryos produced in vivo embryos compared to embryos produced in vitro [LPL, MAGED1, etc. [41]), dominant follicles compared to subordinate follicles (CYP19A1, FST, etc. [4245]), liver during the transition period (ACLY, PCCB, etc. [46]), mammary tissue during lactation (ABCA1, INSR, etc. [47]), and oviduct at diestrus compared to estrus (C3, OVGP1, etc. [48]).

Using the procedures described above, a total of 1532 candidate genes were identified. The SNPs in each of these genes were identified by querying the SNP database maintained by the National Center for Biotechnology Information (dbSNP; http://www.ncbi.nlm.nih.gov/snp). Then, SNPs were screened to only include those in the coding region of the gene which resulted in a nonsense, frameshift, or missense mutation. Of the 1532 genes screened, 553 genes containing a total of 1644 SNPs fit those criteria. In addition to these markers, SNPs previously linked to fertility were considered for inclusion. That list of candidate SNPs included CAST[15], FGF2[16], FSHR[19], GHR[18], HSPA1A[23], ITGB5[17], LEP[22], NLRP9[21], PAPPA2[24], PGR[14], SERPINA14[49], and STAT5A[11, 16].

In order to determine the final list of SNPs to be used in the assay, each SNP was graded based on primer designability and predicted change in protein function. Each SNP causing an amino acid change was evaluated for the likelihood that the SNP would change the structure of the encoded protein using an exchangeability matrix [50]. The average exchangeability value was calculated for each substitution of pairs of amino acids, and SNPs were ranked in order of exchangeability. For final selection of 434 SNPs, a maximum of one SNP per gene was selected. Nonsense mutations were selected first, then frameshifts, followed by SNPs with the lowest score in the exchangeability matrix (those most likely to cause a change in protein function). The selection criteria were also applied to SNPs already linked genetically to reproduction. Of the final selected SNPs, 5 were the exact SNPs used in the literature: STAT5A[11], FGF2[16], PGR[14], HSPA1A[23], and PAPPA2[24], and 7 SNPs were replaced with the best option using the criteria mentioned above (ITGB5, GHR, FSHR, NLRP9, LEP, CAST, and SERPINA14). The final list of genes used in the assay is shown in Additional file 1: Table S2 and the SNPs that were chosen from those genes are shown in Additional file 1: Table S3. The SNP panel included 10 nonsense, 22 frameshift, 397 missense, 1 synonymous, 3 intron region, and 1 promoter region SNPs.

SNP genotyping

Total DNA was extracted from each straw of semen using the DNeasy Blood and Tissue kit (Qiagen, Valencia, CA, USA) according to the manufacturer’s instructions. Amount of double-stranded DNA was assessed using the Quant-itTM Picogreen® dsDNA kit (Invitrogen, Grand Island, NY, USA), and DNA was resuspended to a concentration of 50 ng/μL. Genotyping was performed by GeneSeek Inc. (Lincoln, NE, USA) using the Sequenom MassARRAY® system (iPLEX GOLD; Sequenom, San Diego, CA, USA) according to the manufacturer’s instructions. The technique is based on the analysis of DNA products using matrix-assisted laser desorption ionization time-of-flight mass spectrometry [51]. The region of DNA containing the SNP was amplified by PCR, a primer extension reaction was performed to generate allele-specific DNA products, and the size and amount of each allele-specific product was determined using chip-based mass spectrometry.

Quality control

Samples with call rates < 70% were removed from all analyses. The average call rate prior to removing those samples was 88.2%. After removing the failed samples, the average call rate was 91.2%. Reliability was assessed by duplicating 18 SNPs for every DNA sample, and by assaying 63 DNA samples twice. Of the duplicated SNPs, 16 were selected based on interest (CAST, CSF2, CYP19A1, FGF2, GHR, HSPA1A, IFNT, ITGB5, LEP, LHCGR, NALP9, PAPPA2, PGR, POU5F1, STAT5A, and UTMP) and the other two were selected based on poor primer designability (ETF1 and POMC). The primers for the duplicated SNPs were designed based on the sequence of the opposite DNA strand of where the original primer was designed. The duplicated DNA samples were randomly selected. There was 99.2% identity between SNPs duplicated within an assay and 98.6% identity between duplicated samples. After quality control was assessed, duplicated samples were merged. If any genotype at a given SNP did not match between samples, both genotypes were deleted and treated as a no call. Duplicated SNPs were merged in the same manner. The call rate after merging samples and SNPs was 91.5%.

Statistical analysis

Minor allele frequency (MAF) was determined using the FREQ procedure of SAS (V9.3; SAS Institute Inc., Cary, NC). Distributions of genotypes were tested for deviation from Hardy-Weinberg equilibrium (HWE) using a chi-square test. In addition, chi-square was used to determine whether MAF differed between high and low DPR bulls.

The association of genetic variants with each trait was evaluated using the MIXED procedure of SAS. The full model included:

Y i = byr j + βSNP k + POLY i + ϵ i

where Y i is the deregressed PTA of the trait of interest for the ith bull (i = 1, 2, …, 550) , byrj is the fixed effect of the jth birth year (j = 1, 2, …, 5; where birth year is grouped by decade: 1960 to 2010) of the ith bull, β is the linear regression coefficient for the kth SNP, SNP k is the number of copies (k = 0, 1, or 2) of the major allele, POLY l is the random polygenic effect (including all available pedigree information) of the ith bull, and ϵ i is the random residual effect. The POLY l  ~ Aσa2 and ϵ i  ~ Iσe2, where A is the numerator relationship matrix, I is an identity matrix, σa2 is the additive genetic variance of the trait of interest, and σe2 is the residual error variance. All of the available pedigree information for each bull was used when modeling the covariance among the polygenic effects [52].

SNP effects were estimated using two analyses. In the first, genotype was considered a continuous variable to determine the allele substitution effect (the linear effect of the number of copies of the major allele; least-squares means represent values for 0,1 and 2 copies of the major allele). In the second, genotype was considered a categorical variable, and an orthogonal contrast was used to estimate dominance effects [(AA + aa)/2 vs. Aa]. SNPs in which the linear or dominance effect was P < 0.05 were noted. To control for multiple testing, false discovery rate was controlled for by calculating the Q value using the Q-value package in R [53]. The acceptable false discovery rate for the Q value analysis was chosen as 0.05.

Pathway analysis

The list of genes significantly related to DPR was subjected to pathway analysis using Ingenuity Pathway Analysis (IPA) software (Ingenuity Systems, http://www.ingenuity.com). The reference set was the Ingenuity Knowledge Base (genes only) and both direct and indirect relationships that were experimentally observed were included. Three analyses were conducted. The first was to identify canonical pathways in which 2 or more genes were overrepresented. The program was also used to build customized networks of genes based on direct and indirect relationships. Finally, upstream regulators in which genes related to DPR were overrepresented were identified. A P value of 0.05 or less was considered significant for all analyses.

Results

Genetic characteristics of bulls used for genotyping

The range of PTAs for bulls are shown in Additional file 1: Table S1, while the effect of DPR class (high or low) on PTAs are shown in Table 1. Daughter pregnancy rate class had a significant effect on all other traits examined. In particular, bulls in the high DPR class had higher PTAs for heifer conception rate (HCR), cow conception rate (CCR), productive life (PL), net merit (NM), fat percent (FPC), and protein percent (PPC) and lower PTAs for milk yield (MY), fat yield (FY), protein yield (PY), and somatic cell score (SCS) than bulls in the low DPR class (Table 1). Correlations among PTAs are shown in Additional file 2: Table S4. Daughter pregnancy rate was significantly and positively correlated with HCR (0.61), CCR (0.91), PL (0.81), NM (0.49), FPC (0.16), and PPC (0.31) and was significantly and negatively correlated with MY (−0.45), FY (−0.35), PY (−0.34), SCS (−0.55), and birth year (BY; -0.15). These results are consistent with correlations reported earlier [54] for traits included in the lifetime net merit selection index. Since the bulls were selected from the two extremes of DPR, correlations within DPR class (DPRC) were also examined (Additional file 2: Table S5). Within the high DPRC, DPR was positively correlated with HCR and CCR and negatively correlated with NM, MY, FY, PY, and BY. Within the low DPRC, DPR was positively correlated with CCR, PL, and NM and was negatively correlated with SCS and BY.

Table 1 Effect of daughter pregnancy rate class (high or low) on predicted transmitting ability for selected traits of bulls used for genotyping

Minor allele frequencies

Of the 434 SNPs, only 107 had MAF ≥ 5% and only 98 of those that had MAF ≥ 5% and had a call rate > 70%. Nine SNPs had MAF ≥ 5% but failed the genotyping process (call rate < 70%; AHCYL2, APBB1IP, FXC1, HSF1, PHGDH, POMC, SLC1A5, ST13, and TTF1) and were removed from all further analyses. The probability that the MAF was ≥ 5% was dependent upon the type of SNP. Four of the 5 genes in which the SNP was in the non-coding regions or was synonymous had a MAF ≥ 5% (80%) whereas only 20% (2/10) of the nonsense, 25% of the missense (99/397), and 9% (2/22) of the frameshift mutations had ≥ 5% MAF (χ2 for non-coding/synonymous vs others, 8.34, P < 0.01).

Hardy-Weinberg equilibrium

Characteristics of the 98 SNPs in which MAF ≥ 5% and call rate was > 70% are shown in Additional file 2: Table S6. A total of 26 SNPs were not in equilibrium (AVP, BOLA-DMB, C17H22orf25, CCDC88B, CCT8, CD2, CFDP2, COQ9, DEPDC7, DTX2, FUT1, HSD17B6, IBSP, IFNT2, MARVELD1, NEU3, RALGPS1, SEC14L1, SREBF1, STAT5A, SYTL2, TAF9, TSPYL1, UHRF1, WBP1, and ZP2). All but one of these SNPs caused a missense mutation. The exception was for UHRF1, which was a frameshift mutation where the mutation causing the frameshift had a frequency of 91.7%. The genes most out of equilibrium were CCT8, MARVELD1 and SYTL2, in which the number of minor allele homozygotes was lower than expected, CD2, DTX2, NEU3, and RALGPS1, in which the number of heterozygotes was lower than expected, and TAF9 and TSPYL1, in which the number of heterozygotes was greater than expected.

SNP effects on daughter pregnancy rate

Each of the 98 SNPs with MAF ≥ 5% and a call rate > 70% were analyzed for effects on DPR and other genetic traits. Two types of analyses were performed: a regression analysis to determine the allele substitution effect of each SNP (0, 1 or 2 copies of the major allele) and use of an orthogonal contrast to determine the dominance effect (heterozygote vs. the average of the two homozygotes). Both P values and Q values corrected for multiple testing were determined. Since the Q value correction for multiple testing is highly conservative in cases where few tests are significant, both the P value and the Q value were used to identify SNPs associated with genetic traits.

Results for DPR are shown in Table 2. Allele substitution effects were different from 0 for 40 genes [ACAT2, AP3B1, APBB1, BSP3, C17H22orf25 (interim symbol TANGO2), C1QB, C7H19orf60, CACNA1D, CAST, CCDC86, CD14, CD40, CFDP2, COQ9, CPSF1, CSNK1E, CSPP1, DEPDC7, DSC2, DYRK3, FUT1, GPLD1, HSD17B12, HSD17B7, LDB3, MARVELD1, MON1B, MRGPRF, MS4A8B, NEU3, NFKBIL1, NLRP9, OCLN, PARM1, PCCB, PMM2, RABEP2, TBC1D24, TDRKH, and ZP2]. These effects were significant based on both P and Q values. In addition, there were 4 genes exhibiting dominance based on P values, including two in which the allele substitution effect was significant (CD14 and FUT1) and two in which the allele substitution was not significant (ARL6IP1 and TSHB). After correcting for multiple testing, none of the dominance effects achieved significance.

Table 2 SNPs associated with daughter pregnancy rate a

SNP effects on other fertility traits

For HCR, there were allele substitution effects for 19 SNPs (AP3B1, APBB1, C1QB, CACNA1D, CD14, CPSF1, CSNK1E, DEPDC7, DSC2, FSHR, FYB, GPLD1, HSD17B7, LDB3, MS4A8B, NFKBIL1, PARM1, TDRKH, and ZP2) and dominance effects for 5 SNPs (ARPL6IP1,CACNA1D, CD14, DZIP3, and GOLGA4; Table 3). None of the dominant effects remained significant after correcting for multiple testing. The only SNPs significant after correcting for multiple testing were allele substitution effects for DEPDC7, LDB3, MS4A8B, PARM1, and TDRKH.

Table 3 SNPs associated with heifer conception rate a

For CCR, there were allele substitution effects for 29 SNPs (ACAT2, AP3B1, APBB1, BCAS1, C1QB, CAST, CCDC86, CCT8, CFDP2, COQ9, CPSF1, CSNK1E, CSPP1, FUT1, GPLD1, HSD17B7, LDB3, MARVELD1, MON1B, NEU3, NFKBIL1, OCLN, PARM1, PMM2, RABEP2, TBC1D24, TDRKH, WBP1, and ZP2) and dominance effects for 4 SNPs (ARL6IP1, SEC14L1, SERPINE2, and SLC18A2; Table 4). All but one of the allele substitution effects were significant after correction for multiple testing, the exception being for ARL6IP1, but none of the dominance effects were significant based on Q values.

Table 4 SNPs associated with cow conception rate a

SNP effects on productive life and net merit

For PL, there were allele substitution effects for 33 SNPs (ACAT2, AP3B1, ASL, CCDC86, CD40, CFDP2, COQ9, CSPP1, DEPDC7, DSC2, FSHR, FUT1, GPLD1, HSD17B12, HSD17B6, HSD17B7, HSPA1A, LDB3, LHCGR, MARVELD1, MON1B, MS4A8B, NEU3, OCLN, PARM1, PCCB, PMM2, RABEP2, SYTL2, TBC1D24, TDRKH, WBP1, and ZP2) and dominance effects for 5 SNPs (ARL6IP1, AVP, CSPP1, DEPDC7, and IBSP; Table 5). After correcting for multiple testing, none of the dominant effects were significant.

Table 5 SNPs associated with productive life a

For NM, there were allele substitution effects for 30 SNPs (ACAT2, AP3B1, ASL, C17H22orf25, CCT8, CD2, CD40, COQ9, CSNK1E, DEPDC7, EPAS1, FST, FUT1, HSD17B12, HSD17B6, HSPA1A, IBSP, LDB3, LHCGR, MON1B, MRPL48, MS4A8B, NEU3, OCLN, PARM1, PCCB, PMM2, RABEP2, TBC1D24, and TDRKH) and dominance effects for 6 SNPs (SNPs in ARL6IP1, CD14, DEPDC7, FGF2, IBSP, and SLC18A2; Table 6). Except for HSPA1A, the allele substitution effects were significant after correcting for multiple testing, but dominance effects were not significant.

Table 6 SNPs associated with net merit a

SNP effects on production traits

There were fewer effects on production traits compared to fertility traits, which is consistent with the conclusion of Cole et al. [55] that yield traits generally are consistent with an infinitesimal model, in which the trait is controlled by many alleles of small effect. For MY, there were allele substitution effects for 18 SNPs and dominance effects for 6 SNPs (Table 7). Only linear effects of CD14, CPSF1, FAM5C, and PARM1 were significant after correcting for multiple testing. For FY, there were allele substitution effects for 13 SNPs and dominance effects for 7 SNPs (Table 8). Only the linear effects of CPSF1 and PARM1 were significant after correcting for multiple testing. For FPC, there were allele substitution effects for 10 SNPs and dominance effects for 4 SNPs (Table 9). After correcting for multiple testing, only linear effects of CPSF1, DEPDC7, FAM5C, MS4A8B, and SREBF1 were significant.

Table 7 SNPs associated with milk yield a
Table 8 SNPs associated with fat yield a
Table 9 SNPs associated with fat percent a

For PY, there were allele substitution effects for 17 SNPs and dominance effects for 4 SNPs (Table 10). None of the effects were significant after correcting for multiple testing. For PPC, there were linear effects of 21 SNPs and 1 SNP with a dominance effect (Table 11). After correcting for multiple testing, only the linear effects of BSP3, CPSF1, FAM5C, FCER1G, FUT1, HSPA1A, MS4A8B, PARM1, and TDRKH were significant.

Table 10 SNPs associated with protein yield a
Table 11 SNPs associated with protein percent a

Results for SCS are shown in Table 12. There were allele substitution effects of 8 SNPs and dominance effects for 6 SNPs. After correcting for multiple testing, the linear effects of CFDP2, CPSF1, DSC2, FST, PMM2, SEC14L1, TXN2 and the dominance effect of NFKBIL1 were significant.

Table 12 SNPs associated with somatic cell score a

Relationship between allele substitution effects for SNPs related to DPR with effects on other traits

It was determined whether SNPs affecting DPR had association with other traits and, if so, whether the allele substitution effect was in the same or opposite direction as for DPR. Results are shown in Figure 1. As expected, many SNPs associated with DPR were also associated with HCR and CCR and in the same direction as for DPR. Of 40 SNPs in which there was a linear effect on DPR (Q < 0.05), 13 also were associated with HCR and 25 were associated with CCR. In all cases, the allele substitution effect was in the same direction for DPR and either HCR or CCR. Similar results were observed for PL and NM. Of the 40 SNPs associated with DPR, 26 were also associated with PL and 20 with NM and the allele substitution effect was in the same direction for DPR and either PL and NM.

Figure 1
figure 1

Allele substitution effects of SNPs on fertility and production traits. Only effects with P values and/or Q values < 0.05 are included. Numbers represent the magnitude of the allele substitution effect. Dark blue rectangles are linear effects in the same direction as DPR (Q < 0.05), light blue rectangles are linear effects in the same direction as DPR (P < 0.05 but Q value ≥ 0.05), red rectangles are linear effects in the opposite direction as DPR (P and Q value < 0.05), and pink rectangles are linear effects in the opposite direction as DPR (P < 0.05 but Q ≥ 0.05). Empty cells indicate that there was no significant effect of the SNP on the trait based on P or Q values. Abbreviations are as follows: CCR, cow conception rate; DPR, daughter pregnancy rate; FPC, fat percent; FY, fat yield; HCR, heifer conception rate; MY, milk yield; NM, net merit; PL, productive life; PPC, protein percent; PY, protein yield; SCS, somatic cell score.

Fewer SNPs associated with DPR were also associated with production traits. Furthermore, when occurring, the direction of the effect was often in the opposite direction as for DPR, especially for yield traits. There were 10 SNPs associated with MY and all but one (CPSF1) were in the opposite direction as for DPR (i.e., genotypes favoring DPR were unfavorable for milk yield). There were 7 SNPs associated with FY and 5 of these were in the opposite direction as for DPR. There were 7 SNPs associated with PY and 5 of these were in the opposite direction as for DPR. For other production traits, however, there were fewer negative relationships between allele substitution effects on DPR. For FPC, there were 5 SNPs but only 1 was in the opposite direction as DPR. For PPC, there were 13 SNPs but only 1 was in the opposite direction as DPR. For SCS, there were 5 SNPs with three in the same direction as DPR and two in the opposite direction.

Of the 40 SNPs related to DPR, there were 29 that were not negatively associated with yield traits (milk, fat and protein). Thus, it should be possible to select for specific SNPs affecting DPR without compromising yield traits.

Relationship between SNPs associated with DPR and SNPs reported previously to be related to fertility

Of the 434 SNPs analyzed, 17 were chosen because they had previously been reported to be associated with reproduction or to be close to SNPs related to interval to insemination or 56-d non-return rate (Additional file 1: Table S2). Of these, only 8 had a MAF ≥ 5% (CAST, FGF2, FSHR, HSPA1A, IRF9, NLRP9, PGR and STAT5A) and only 2 (CAST and NLRP9) were significantly associated with DPR (Table 2).

The physical location of each SNP associated with DPR in the current study was compared to markers from the BovineSNP50 chip previously associated with DPR [7]. Figure 2 shows the relative location of the SNPs and the SNP50 marker effects. The SNP effects from the current custom array have a much greater effect on DPR than those found on the BovineSNP50 chip. The largest genetic standard deviation on the BovineSNP50 chip for DPR was 0.07 genetic standard deviations [7]; however, in the current study, the marker effect ranged from 0.44 to 1.78 (Additional file 3: Table S7).

Figure 2
figure 2

Manhattan plots comparing SNP effects on daughter pregnancy rate from the current study (panel A; UF SNP) to marker effects from the BovineSNP50 chip in a previous study (panel B; USDA SNP) [[7]]. Each chromosome is represented in a different color along the x-axis. The y-axis is the marker effect on daughter pregnancy rate (genetic standard deviations). The markers are color coordinated according to their chromosome location.

A literature search was conducted to determine if any SNPs previously related to fertility were within 100,000 bases of any of the SNPs related to DPR in the current study. The literature provided evidence for 3 other SNPs located close to SNPs from the current study. A SNP in DGAT1, which is about 65,000 bp from the SNP in CPSF1, was associated with 28 and 56 day nonreturn rate to first service, age at puberty, number of inseminations per conception, and conception rate [5658]. A SNP in TNF, which is about 25,000 bp from the SNP in NFKBIL1, was associated with early first ovulation in postpartum cows [59]. Also, a SNP in HSD14B14, which is about 60,000 bp from the SNP in FUT1, was associated with DPR [7]. Since these SNPs are close in distance, there could be linkage disequilibrium between them. Therefore, it is possible that either gene in each of the previous locations could contain the causative SNP.

Effect of tissue type used for SNP discovery on probability of identifying SNPs associated with DPR

An analysis was performed to determine whether the tissue type used to identify genes for SNP discovery affected the probability that a gene was related to DPR (Additional file 3: Table S8). Using chi-square analysis, fewer SNPs identified in genes identified as expressed in the brain or pituitary were significantly associated with DPR (18%) than for embryo genes (51%), endometrium or oviduct genes (50%) or ovary genes (43%) (χ2 for brain/pituitary vs others, 3.74, P = 0.05).

Pathway analysis of genes with SNPs associated with DPR

There were a total of 5 canonical pathways in which 2 or more genes were overrepresented (P < 0.05). These were Estrogen Biosynthesis (HSD17B7 and HSD17B12), Estrogen-Dependent Breast Cancer Signaling (HSD17B7 and HSD17B12), Hepatic Fibrosis/Hepatic Stellate Activation (CD14 and CD40), Tight Junction Signaling (CDSF1 and OCLN), and Dopamine-DARPP32 Feedback in cAMP Signaling (CACNAID and CSNK1E). The IPA software also built 4 networks of genes related to DPR. The most revealing was one that included 16 genes (ACAT2, AP3B1, COQ9, CPSF1, CSPP1, DEPDC7, DSC2, GPLD1, HSD17B12, MARVELD1, MON1B, NFKBIL1, PMM2, RABEP2, TBC1D24, and TDRKH) which interacted directly or indirectly with UBC (Additional file 3: Figure S1).

The list of genes related to DPR was also examined for upstream regulators in which regulated genes were significantly (P < 0.05) overrepresented. A total of 5 transcription factors were identified (Additional file 3: Figure S2) including HNF4A, which regulates 8 genes associated with DPR, TCF3, which regulates 3 DPR genes, and CTBP2, FOSB, and SP100, which each regulate one gene. Additional regulators of genes associated with DPR were two hormones (estradiol and prostaglandin E1) and one growth factor (TGFB1). Estradiol regulates 10 DPR genes, TGFB1 regulates 6 genes, and prostaglandin E1 regulates 2 genes (Figure 3).

Figure 3
figure 3

Growth factors and steroids which regulate daughter pregnancy rate genes. Only significant pathways are shown (P < 0.05). Red symbols are genes in which SNPs were associated with daughter pregnancy rate and arrows represent regulation.

Discussion

The results of this study verified that the candidate gene approach could be a successful method of determining markers for DPR. It was anticipated that, since the SNPs used for genotyping were specifically chosen for their function in reproductive processes, a larger proportion of them would be associated with reproductive traits than for production traits. Such a result was obtained. Of the 98 genes that met the criteria for analysis (MAF ≥ 5% and call rate ≥ 70%) and where effects were P < 0.05, there were 42 genes associated with DPR (Table 2) but only 23 associated with MY (Table 7). Moreover, all of the significant SNP effects for DPR in this study were between 5 and 25 times greater than the largest marker effect from the BovineSNP50 chip [7] (Figure 2 and Additional file 3: Table S7). This result is probably due to the differences in SNP selection between the two methods. The majority of SNPs on the BovineSNP50 chip are between genes (63%) and over 14,000 genes are not represented by a SNP on the Bovine SNP50 chip [9]. In the current study, almost all of the SNPs examined were located within the coding region of the gene and the remainder were close physically to the coding region. Moreover, SNPs were chosen to maximize the probability that there would be a change in the characteristic of the protein encoded for the gene. Thus, it is likely that many of the SNPs that have large effects on DPR do so because they are causative SNPs resulting in changes in protein function. The remainder may represent linkages to causative SNPs. The SNPs identified in this study may be closer to the causative SNPs than the SNPs on the BovineSNP50 chip. Allele substitution effects were estimated individually with a linear mixed model, rather than simultaneously as described in Cole et al. [7], which also could explain some of the differences.

Polymorphisms in the current study were chosen for having the greatest probability of changing protein function. In order to maximize the possibility of finding causative SNPs, we prioritized the selection of SNPs within a gene to favor those causing the greatest change in protein function. This decision may have been one reason why there was a high rate (75%) of SNPs with MAF < 5% because the SNP would be subjected to purifying selection. Only 20% of the nonsense, 25% of the missense and 9% of the frameshift mutations had MAF ≥ 5% whereas this frequency was 80% of the 5 SNPs that were in a non-coding region or did not result in an amino acid substitution. Many of the SNPs were not in Hardy-Weinberg equilibrium and this, too, may reflect the effect of the SNPs on protein function.

Of the 9 SNPs most out of equilibrium, only 3 (CCT8, MARVELD1 and SYTL2) had less than expected frequencies of minor allele homozygotes. The interpretation is that few of the mutations in which MAF was ≥ 5% were lethal. Interestingly, for six genes, the heterozygote was more or less frequent than expected. Some of the decrease in heterozygosity could be due to inbreeding, which is high in Holstein cattle [60]. Other changes in heterozygosity could be due to either an advantage or disadvantage of the heterozygote. Heterozygote advantage could be due to the ability of receptors to recognize more forms of the peptides they bind (e.g. MHC class I [61]), heterozygotes having the optimal level of gene expression [62], or in theory, the optimal allele being different for different cell types. A reason for heterozygote disadvantage is not clear.

The antagonistic genetic relationship between fertility traits and milk production [13] was verified here. There was a negative correlation between DPR and MY across DPR classes (Additional file 2: Table S4) and within cows in the high DPR class (Additional file 2: Table S5). Nonetheless, there were many SNP related to DPR (and often, other reproductive and health traits) that were not antagonistic for MY. Accordingly, it should be possible to select for DPR without reducing MY. Of the 40 SNPs linearly related to DPR, only 11 were negatively associated with MY, FY, or PY (Figure 1).

SNPs that affected DPR were also positively related with other fertility traits (HCR, CCR, and NM). Other studies have also shown a positive genetic correlation among fertility traits [3, 28, 63]. It is not surprising that these traits are affected by the SNPs associated with DPR. One determinant of DPR is CCR. In addition, PL depends in part on the probability of culling for reproduction. The equation to calculate NM includes DPR and PL. The fact that SNPs associated with DPR are also associated with HCR, CCR, PL and NM means that selection of genes that improve DPR are likely to improve other reproductive traits and traits that depend upon reproduction.

SNPs linked to traits in the current study that were previously linked to other traits are summarized in Table 13. Of the 17 genes with SNPs previously linked to fertility or close to SNPs related to fertility traits, 9 SNPs had MAF < 5% (BAIAP2, GHR, LEP, IGF1, IGFBP7, ITGB5, PAPPA2, SCRN1, and SERPINA14) and were not analyzed. Of the other 8, 2 were significantly associated with DPR (CAST and NLRP9) and one tended to be (FGF2). The exact SNP in CAST analyzed here was previously associated with DPR, PL, and NM [15]. A different SNP in NLRP9 than the one studied here was associated with incidence of still birth [21]. Another gene, FGF2, tended to have an association with DPR (P = 0.08), with the AA genotype being superior to the GG genotype. Previously, the AA genotype of FGF2 was associated with higher estimated relative conception rate in bulls [16] although, surprisingly, associated with lower in vitro embryo development [12]. Another SNP, in PGR, was previously associated with in vitro fertilization rate and development [14] and in vivo fertilization [64] and pregnancy rates [65], and while not significant (P = 0.16), the GG genotype was superior to the CC genotype for DPR in agreement with the superior genotype seen earlier [14, 64, 65]. A SNP in FSHR was previously associated with superovulation response [19, 66] and, while not significantly associated with DPR in the current study, was associated with HCR and PL. There was no significant effect of genotype for four other SNPs in genes previously associated with reproductive traits, including HSPA1A, associated with calving rate in beef cattle [23], IRF9, which was physically close to a SNP for interval to insemination [28], and STAT5A, associated with in vitro embryo development [11] and sire conception rate [16]. Note that HSPA1A was significantly associated with PL and NM (Table 13) and both of these traits depend upon reproductive function.

Table 13 SNPs associated with at least one trait in the current study that were previously linked to one or more other traits a

The genes in the current study with SNPs that were associated with DPR participate in a wide range of physiological functions associated with reproductive processes. Many function in the endocrine system, either in synthesis of hormones or in cell signaling. The estrogen biosynthesis pathway was one of the pathways in which genes associated with DPR were significantly overrepresented. The gene ACAT2 is involved in cholesterol metabolism [67], and expression of ACAT2 in cumulus cells is increased for infertile women as compared to fertile women [68]. The gene HSD17B12 encodes for an enzyme that converts estrone to estradiol [69]. It is also involved in the synthesis of arachidonic acid and is essential for embryo survival in mice [70]. Another gene related to DPR, HSD17B7, also converts estrone to estradiol [71] and is essential for de novo cholesterol synthesis in the fetus [7274]. In addition to genes involved in steroid synthesis, TSHB, a gene which codes for the β strand of the pituitary hormone, TSH, was associated with DPR. Thyroid function, which is under the control of TSH [75], can impact reproductive function in cattle [76, 77]. Some genes related to DPR may also affect release of neurotransmitters controlling hypothalamic-pituitary function. One, AP3B1, is involved in formation of synaptic vesicles [78], and APBB1 controls GnRH-1 neurogenesis [79]. Another, TBC1D24, stimulates primary axonal arborization [80, 81]. Polymorphisms in TBC1D24 have been associated with shortened axons and epileptic seizures [80, 81].

Among the DPR genes involved in cell signaling are the G protein-coupled receptors MRGPRF and MS4A8B[82], GPLD1, which cleaves cell surface proteins anchored by phosphatidylinositol glycans [83], the sialidase NEU3, which is important for insulin signaling [84], CACNA1D, a component of calcium channels [85], and DSC2, an important component of membrane rafts and cell-cell junctions [86] and which is involved in blastocoel formation [87]. Similarly, OCLN is a major component of tight junctions and is involved in barrier stability [88]. Another gene involved in cell-cell binding related to DPR is PMM2, which isomerizes mannose 6-phosphate into mannose 1-phosphate [89], which eventually is converted to GDP-fucose and used to make fucosylated glycans [90]. Fucosylated glycans serve several functions, including leukocyte-endothelial adhesion, host-microbe interactions, embryo compaction, and signal transduction [90].

One gene associated with DPR, CSNK1E, is involved in paracrine regulation of cell function as a positive regulator of the canonical WNT/β-catenin pathway [91, 92]. The WNT pathway plays important roles in cell differentiation [93, 94], preimplantation development [95], formation of the epiblast [96] and implantation [97]. Moreover, CSNK1E regulates circadian rhythm by controlling nuclear entry of PER1, a regulator of CLOCK[98]. Expression of PER1 was associated with depth of anestrus at the start of the breeding season in beef cattle [99].

Three genes related to DPR are involved with the function of spermatozoa in the female tract. The gene BSP3 aids in maintaining sperm motility during storage in the oviduct [100]. Protein concentrations are associated with bull fertility [101] and the mRNA is down-regulated in the endometrium of heifers which carried a pregnancy to term compared to those in which the embryo died after transfer [26]. Another gene, CAST, plays an important role in sperm capacitation and the acrosome reaction [102104] and may play a role in oocyte calcium-mediated processes that occur during oocyte activation [105]. The same SNP in CAST found to be associated with DPR in this study was earlier associated with DPR, PL, NM and SCS [15]. The embryonic gene ZP2 encodes for a protein that makes up part of the zona pellucida and is the location that sperm bind on the zona pellucida [106, 107]. One of the genes related to DPR, NLRP9, is likely to play an important function in the oocyte. The gene is expressed in the oocyte, and steady-state amounts of NLRP9 mRNA decline after fertilization and become undetectable after the maternal to zygote transition [108110].

There is much evidence to implicate immune function in the establishment of pregnancy [111]. Seven of the genes with SNPs associated with DPR are involved in immune function. The gene C1QB is involved in complement activation [112], CD14 is a co-receptor for recognition of bacteria [113], CD40 regulates cell surface receptor signaling [114], and NFKBIL1 regulates dendritic cell function [115]. Additionally, MON1B and RABEP2 help regulate phagocytosis and endocytosis [116, 117] and mutations in FUT1 have been associated with disease resistance [118120]. Polymorphisms in FUT1 have also been associated with total number of piglets born [121, 122] and number of piglets alive at weaning [119]. It is possible that allelic variants in these genes that are positively associated with DPR improve immune function and decrease incidence of diseases such as endometritis, metritis, and mastitis that disrupt reproduction [123125].

Three genes related to DPR are anti-apoptotic: ARL6IP1, DYRK3 and PARM1I[126128]. Induction of apoptosis in the oocyte and associated cumulus cells is associated with reduced fertilization rate [129131]. Two molecules that improve embryo competence for establishment of pregnancy after transfer into recipients, CSF2 and IGF1 [132, 133], are anti-apoptotic in embryos [32, 134].

A variety of other roles are also represented by the genes with SNPs associated with DPR. Two genes are involved in energy pathways (COQ9 and PCCB). The COQ9 protein is necessary for the synthesis of CoQ10 [135], which is needed for generating ATP [136]. PCCB is an enzyme that converts proponyl CoA to methylmalonyl CoA during gluconeogenesis [137]. The CSPP1 gene plays a role in spindle formation and cytokinesis [138], MARVELD1 inhibits cell cycle progression and migration [139], and LDB3 helps organize actin and α-actinin binding in sarcomeres [140]. Finally, CPSF1 is involved in 3′ end-processing of pre-messenger RNAs into messenger RNAs [140].

Several gene networks were significant among the genes related to DPR but most contained only two genes. The exceptions were estrogen biosynthesis, discussed earlier, and a network of genes associated with ubiquitin C (UBC). It is not surprising that the proteins encoded for by so many genes bind to UBC because ubiquitin is involved in a large number of intracellular functions [141144]. Five transcription factors (HNF4A, TCF3, CTBP2, FOSB, and SP100), two hormones (estradiol and prostaglandin E1), and one growth factor (TGFB1) were determined by the IPA software to be significantly overrepresented as regulators of DPR genes. Each of these upstream regulators could be studied further for the potential to improve fertility by regulating activation of pathways controlled by these molecules.

Conclusions

In conclusion, SNPs in a total of 40 genes associated with DPR were identified as well as SNPs for other traits. It might be feasible to include these SNPs into genomic tests of reproduction and other traits. The genes associated with DPR are likely to be important for understanding the physiology of reproduction and manipulating reproduction function in cattle. Given the large number of SNPs associated with DPR that were not negatively associated with production traits, it should be possible to select for DPR without compromising production.