Background

Heifers are young female cows that have not yet given birth to a calf and are the future of the dairy herd. Heifer health and fertility are crucial for the success of a dairy farm, as it directly affects milk production and the sustainability of a farm [1]. Healthy heifers are more likely to produce more milk once they begin lactating. In addition, heifers need to be healthy and well-cared for to conceive and carry a calf to term successfully. Unhealthy heifers may have difficulty getting pregnant, leading to lower milk production and fewer replacement cows for the farmer [2]. Diseased heifers may also require costly veterinary treatments that can add up quickly and cut into profits. Early first calving in heifers is also important for dairy farming, particularly for economic and environmental considerations, because it can reduce unproductive periods and increase lifetime production, faster generation turnover and selection progress, and improve reproductive efficiency [3]. In summary, heifer health and fertility are critical for a profitable and sustainable dairy operation [1].

The health and fertility of cows are complex issues that involve various factors, including nutrition, environment, physiology, and genetics [4,5,6]. Compared to cows, heifers generally encounter more challenges as heifers have not yet reached sexual maturity and are not yet capable of smooth reproduction and production. Moreover, heifers typically have additional nutritional requirements than cows, as they are still growing and developing. Despite the complexity of cattle health and fertility, many GWAS studies have been conducted to identify genomic regions and genes associated with health and fertility-related traits in cattle [7,8,9,10,11,12]. For instance, the bovine MHC region has been associated with cow livability and immune system-related diseases [7]. The ABCC9 and GC genes have been associated with pregnancy rate [8], while ARRDC3 was associated with growth and calving traits [4]. Heifer fertility and health traits are less studied than cows, mainly due to limited data availability.

Although the heritability of fertility and health traits tends to be relatively low, CDCB and the USDA Animal Genomics and Improvement Lab have been evaluating fertility and health-related traits using the large volume of data collected from the dairy industry (https://uscdcb.com/). Recently, they added heifer livability and early first calving to the evaluation system [13, 14]. Heifer livability represents the expected survival percentage of an animal’s female offspring from 2 days after birth up to 18 months of age in a herd with average management conditions. Larger, positive values of heifer livability are more favorable. It measures a heifer’s overall resistance to causes leading to mortality. Since the most common reasons for heifer death are digestive and respiratory diseases [15], heifer livability is primarily related to the resistance to these diseases and other causes of death. The heritability of heifer mortality has been estimated to be less than 1% in many studies [13, 16, 17]. Early first calving (EFC) is defined as the age at first calving. As a heifer fertility trait, the heritability of EFC is only 2–3% [14, 18, 19]. As part of the genetic evaluation process, traits have been corrected for management effect by CDCB, resulting in a PTA (Predicted Transmission Ability) that can be used directly for genetic studies. In this research, we aim to identify genes and genomic regions associated with these heifer traits using the large amount of genotype and phenotype data from the US dairy genomic database. To further boost power, we also included transcriptome and other functional genomics data for fine-mapping and validation.

Results

Large-sample GWAS of heifer livability and early calving

Our large-sample GWAS started with a discovery population of 3,649,734 genotyped Holstein cattle (336,386 bulls and 3,313,348 cows). After calculating deregressed predicted transmitting ability (PTA) as phenotype and editing, we included 510,318 and 768,645 animals for the GWAS of heifer livability and early first calving, respectively. All of the animals were imputed to 79 K SNPs, and we retained 73,554 SNPs after QC editing. We applied SLEMM [20] to perform the GWAS analyses that can efficiently run large-scale mixed models and incorporate variational residual variances for differential reliabilities of degressed PTAs. As a result, we found only one QTL region for each of the two traits (Fig. 1A and B C, and 1D). Nonetheless, for both traits, the P values for the majority of SNPs showed no inflation of test statistics and good quality of the results (Fig. 1A C). After removing SNPs with low minor allele frequency (MAF), only one SNP (ARS-BFGL-NGS-105563, P = 1.28e-07) passed the Bonferroni-corrected threshold for early calving (Table 1). Interestingly, this SNP is located near the Bovine MHC region on BTA 23 that encodes many fundamental molecules for regulating the immune response [21].

Fig. 1
figure 1

Large-scale GWAS results of heifer livability and early calving based on the 79 K SNPs. (A) Quantile–quantile (QQ) plot for heifer livability. (B) Manhattan plot for heifer livability. (C) QQ plot for early calving. (D) Manhattan plot for early calving. The red horizontal lines correspond to the genome-wide significance threshold

Table 1 GWAS results for two heifer traits based on 79 K SNPs

Despite the few QTL regions detected in the initial GWAS, we evaluated all SNPs passing the suggestive significance levels for functional annotation analyses. For heifer livability, we obtained 118 genes located within or overlapping the vicinity of leading SNPs (< 1 Mb) using BioMart in the Ensembl database (Ensembl Genes 106; Table S1). Several genes close to the top SNPs exhibited biological relevance for cow livability, including CHCHD7 and PLAG1, which are related to growth and development [22] and the LYN gene related to the regulation of innate and adaptive immune responses [23]. We also performed GO analysis by KOBAS [24] to determine the potential biological functions of these genes. As a result, 139 significant GO terms (P < 0.05) were found, with the top relevant terms being mineral absorption, homeostasis, metabolic process, and development (Table S2). According to existing studies and the cattle QTL database [25], in the upstream and downstream 1 Mb range of the top SNPs, many QTLs were previously associated with milk production, body type, and disease related traits in dairy cattle (Table S3).

For early first calving, we identified 596 genes within or near the associated SNPs (Table S4). Notably, the top associated SNPs were located within or near the bovine MHC region on BTA 23, indicating potential connections between the immune system and early first calving [26]. Many nearby genes were involved with immune functions and relevant biology for early calving, including ABCF1, ABHD16A, AGER, BOLA-NC1, BTN1A1, LTA, LTB, etc. We also performed the GO analysis based on these genes and detected 91 significant GO terms (P < 0.05) with a large proportion associated with immune processes (Table S5). Finally, previously reported QTL within 1 Mb of associated SNPs were associated with milk production, reproduction, body type, and disease-related traits in cattle (Table S6).

Sequence-level GWAS and fine mapping of heifer livability and early calving in 27,235 bulls

To refine the GWAS results, we conducted additional GWAS analyses with imputed sequence data for heifer livability and early first calving in 27,235 bulls that have highly accurate phenotypes. We used 3,148,506 imputed sequence SNPs as genotype and de-regressed PTAs as phenotype. After editing and filtering on reliability, we included 11,562 and 10,700 bulls for heifer livability and early calving, respectively. The QTL regions discovered in the large-sample GWAS were validated for both traits at the nominal significance level (Fig. 2A and B C, and 2D). Interestingly, sequence-level GWAS found some additional associations compared to low-density SNP data (Fig. 2B and D). As shown in Tables 2 and 16 SNPs passed the genome-wide threshold for heifer livability, and two SNPs passed the threshold for early calving. By checking the 1-Mb regions surrounding these associated SNPs, we identified many genes that were also detected in the large-sample GWAS, namely MOG, OR12D2E, OR12D3, OR2H1, OR5V1, OR5V1C, OR5V2, TRIM10, TRIM15 (Table S7).

Fig. 2
figure 2

GWAS results of heifer livability (n = 11,562) and early calving (n = 10,700) based on bulls with imputed sequence variants. (A) Quantile–quantile (QQ) plot for heifer livability. (B) Manhattan plot for heifer livability. (C) QQ plot for early calving. (D) Manhattan plot for early calving. The red horizontal lines correspond to the genome-wide significance threshold

Table 2 GWAS results based on imputed sequence variants for heifer livability (n = 11,562) and early calving (n = 10,700)

Functional enrichment analysis

We analyzed the enrichment of GWAS signals across SNPs in different functional genomic regions based on the 27 K bulls and imputed sequence data. We first categorized sequence variants into 14 groups based on the locations of 14 chromatin states reported previously [27], i.e., CTCF/Active_TSS, Active_TSS, CTCF/Promoter, Active_Promoter, Flanking_TSS, Promoter, Poised_Promoter, Active_Enhancer, CTCF/Enhancer, Primed_Enhancer, Active_Element, Insulator, Polycomb_Repressed, and Low_Signal. For heifer livability, we observed significant enrichment of variants in Active_Promoter, Promoter, CTCF/Enhancer, Primed_Enhancer, and Active_Element (Fig. 3). For early calving, we observed significant enrichment of associated variants in Active_TSS, Active_Element, and Insulator (Fig. 3).

We further investigated the enrichment of variants concerning their genomic locations (conserved) and genic annotations (CDS, intron, and UTR) inferred by SnpEff [28]. As a result, we observed significant enrichment of CDS and conserved variants in the GWAS results of both traits. For heifer livability, we observed enrichment of intron variants (Fig. 3). And for early calving, we observed significant enrichment of variants in the UTR regions (Fig. 3).

Fig. 3
figure 3

Enrichment of fine-mapping variants across functional annotations

Transcriptome-wide association study (TWAS)

TWAS seeks to identify trait-associated genes by testing for the association between a phenotype and the genetic components of gene expression levels [29]. By linking our GWAS results and existing transcriptome data from the CattleGTEx project [30] via a TWAS analysis, we detected four and 23 significant gene-trait association pairs for heifer livability and early calving, respectively (Fig. 4). Interestingly, we discovered six genes overlapped with 27 K bulls GWAS results for early calving in the Bovine MHC region, including two genes in lymph node tissue and one gene in blood, adipose, hypothalamus, and leukocyte (Table 3). In addition, the expression of OR12D2 in adipose was significantly associated with early calving (Table 3), consistent with previous findings that OR12D2 is linked with MHC [31].

Fig. 4
figure 4

Manhattan plots of transcriptome-wide association study (TWAS) for heifer livability and early calving

Table 3 Gene-trait association pairs detected by TWAS based on CattleGTEx database

Discussion

In this study, we identified genomic regions and candidate genes associated with heifer livability and early first calving using large-scale GWAS, functional enrichment analysis, and TWAS. We reported a major QTL in the bovine MHC region to be related to early first calving, suggesting a potential connection between the immune system and reproduction. TWAS using the CattleGTEx data confirmed the association and revealed several candidate genes in the bovine MHC locus.

Generally, health and fertility traits are expected to be complex traits with low heritability due to their multifactorial nature [32]. The two traits analyzed in this study, heifer livability and early first calving, also have low estimated heritability, 0.0223 and 0.0328, respectively. These two heifer traits can be more complex due to the unique characteristics of heifers. For instance, heifers are not yet fully developed and may not be ready for reproduction. With the large sample sizes used in this study, we only found one major QTL for early first calving but none for heifer livability. Still, when we explored the genome-wide enrichment of GWAS signals with functional genomic regions, we reported significant enrichment of the association signals in promoter and enhancer regions, indicating an exciting connection between the regulation of gene expression and these two complex traits.

The immune system plays a central role in protecting the body from pathogens and infections. Moreover, it has a role in fertility and reproduction. In females, the immune system is involved in the whole reproductive process, from the development and maturation of the egg to implantation and maintenance of the pregnancy [33]. The immune system must tolerate the developing fetus, which is genetically different from the mother, while still protecting against infections. If the immune system is overactive, it can cause infertility or miscarriage, while an underactive immune system can lead to increased susceptibility to infections and complications during pregnancy. These potential connections between the immune system and reproduction further support the MHC QTL and candidate genes with early first calving in heifers.

This study showcased the usefulness of functional genomics data in post-GWAS and fine-mapping studies in cattle. When the associated variants are located in non-coding or intergenic regions, functional genomics data like those from the FAANG [27] and CattleGTEx projects would be useful to provide information about the biological mechanisms underlying the associations. Integration of functional genomics data with GWAS may also boost the power of detection when the power of the original GWAS is limited. For instance, the TWAS results in this research provided additional evidence for the MHC QTL with early first calving.

Conclusion

Due to the complex genetic architecture of health and fertility traits, our large-scale GWAS analyses only detected a few major QTL for heifer livability and early first calving. Interestingly, the major QTL for early first calving is located in the bovine MHC region. This association was further supported by post-GWAS analyses and TWAS, indicating a connection between the immune system and early reproduction. Despite the low power for major QTL, we evaluated the distribution of GWAS signals across different functional genomic regions. We found significant enrichment in promoter and enhancer-related regions, which supports the contribution of gene regulation to the genetics of complex traits.

Methods

Data description

In this study, we conducted GWAS analyses with two datasets, a discovery dataset including 3,649,734 Holstein cattle (336,386 bulls and 3,313,348 cows) genotyped by various SNP chips and imputed to 79,060 SNPs and a fine-mapping dataset including 27,235 bulls genotyped by 50 K SNP chips and imputed to 3,148,506 sequence variants. The original SNP data of the discovery dataset were from multiple SNP chips with densities ranging from 3 to 50 K [34]. The CDCB and USDA AGIL laboratory routinely process the original genotype data and impute to 79 K common SNPs specifically selected for official evaluations using FindHap program [35]. For the discovery dataset, we applied PLINK 1.9 [36] to remove SNPs with call rates < 95%, minor allele frequencies (MAF) < 0.01, Hardy-Weinberg equilibrium (HWE) P < 10− 6, and to remove animals with > 5% missing genotypes. After this filtering, 73,554 SNPs and 3,520,002 animals (325,905 bulls and 3,194,097 cows) were retained for downstream analyses.

The phenotype data were part of the December 2021 genomic evaluations from the U.S. Council on Dairy Cattle Breeding (CDCB), which routinely calculates predicted transmitting ability (PTA) values for dairy cattle of multiple breeds. We only included Holstein data for this study. We used deregressed PTA values as phenotype in the GWAS of two traits, heifer livability and early first calving [37]. To ensure robustness and accuracy, we excluded animals with low reliability. The majority of filtered animals were young cows without any phenotypic records. Finally, the total number of animals used was 510,318 and 768,645 for heifer livability and early calving, respectively.

For the fine-mapping dataset, we obtained imputed sequence data of 27,235 bulls from previous studies [8]. Briefly, the imputation was conducted with FindHap v3 [35] and 444 Holstein bulls from the Run5 of 1000 Bull Genomes Project as reference. Stringent filtering and removal of intergenic SNPs resulted in an enriched set of 3,148,506 sequence variants. The imputation was highly accurate with an average percentage of consistent genotypes 96.7%. Similarly, we excluded animals with low reliability for deregressed PTA values, retaining 11,562 and 10,700 bulls for heifer livability and early calving, respectively. In this study, we only considered autosomal chromosomes BTA 1–29 from the Bos taurus ARS-UCD1.2 assembly [38].

GWAS analysis

We analyzed the discovery and fine-mapping datasets separately in the GWAS analysis. We performed the GWAS using a linear mixed model approach implemented in the SLEMM program [20]. SLEMM can handle large-scale (up to millions) genome-wide association studies while accounting for genomic relationships. In addition, SLEMM can model differences in the reliability between individual phenotypes using an error weight parameter to account for the variation of deregressed PTAs, which is calculated by 1/r2-1, where r2 is the reliability of deregressed PTAs.

After GWAS analysis, we retrieved genes within 1 Mb of the significant SNPs using BioMart in the Ensembl database (Ensembl Genes 106). We carried out Gene Ontology (GO) and Pathway analysis using KOBAS [24]. GO terms with a False Discovery Rate (FDR) less than 5% were considered statistically significant. Furthermore, we compared the regions within 1 Mb of the significant SNPs with our previous GWAS results [8] and the cattle QTLs in the Animal QTL database [25] to check if any associated genomic regions were previously reported.

Functional enrichment analysis with genome annotations

To evaluate the potential functions of the associated genomic regions, we explored the enrichment of GWAS results in different functional regions using the 27,235 bulls and imputed sequence variants. We performed enrichment analyses via MPH (MINQUE for Partitioning Heritability, https://github.com/jiang18/mph) with the annotations inferred by SnpEff [28] and 14 chromatin states from eight tissues reported by Kern et al. [27]. MPH is designed to partition SNP heritability with genotypes of related individuals or with long-spanning LDs. MPH is comparable to GREML in terms of accuracy, while being much faster and more memory efficient. It can do weighted analyses if residual variances are unequal and use many overlapping functional annotations. This approach included two steps: building genomic relationship matrices (GRMs) based on the different SNP annotation datasets, and partitioning SNP heritability accordingly. We set --min_maf and --min_hwe_pval as 0 and 1e-8 respectively. We calculated standard errors using the Delta method.

Transcriptome-wide association study (TWAS)

We performed TWAS analyses based on the 27 K bull data using S-PrediXcan [39] to link GWAS results with transcriptome data that we assembled in a previous study [30]. For the TWAS analyses, we used the CattleGTEx v.1 eQTL models [36]. For each trait, we imputed and harmonized GWAS summary statistics and then performed TWAS across 24 cattle tissues separately. We considered genes with P < 0.005 as suggestive significant.