Background

Hanwoo cattle, an indigenous breed of Korea, is a testament to the nation’s rich agricultural heritage. Revered for their exceptional qualities and cultural significance, they transcend being mere livestock; they embody a living tradition deeply connected to the country’s history and traditions. This connection reflects the harmonious balance between traditional farming practices and contemporary agricultural methods. Emerging from the lush landscapes of the Korean Peninsula, these cattle symbolize national pride, celebrated for their gentle disposition and adaptability to local climates and, notably, for the superior quality of their beef [1].

Genetic improvement efforts within Hanwoo cattle breeding programs have historically prioritized carcass quality and growth traits, driven by the accessibility of trait information and the simplicity of analytical techniques [2]. This focus is justified by the significant economic importance of these traits [3]. However, a notable shift has occurred in recent years, with increased attention being directed toward the genetic analyses of reproductive traits in Hanwoo cattle. This shift was driven by the recognition that reproductive traits are important in calf productivity. These traits carry substantial economic significance for sustainable food production, particularly in the context of monotonic livestock, such as cattle and buffaloes. Prioritizing fertility enhancement has emerged as the optimal strategy for mitigating culling costs, preserving valuable genetic resources, and augmenting overall farm profitability [4]. Historically, challenges in improving reproductive efficiency have been attributed to factors such as low heritability, the binomial nature of a short-controlled breeding season, and delayed expression of traits throughout an animal’s life [5]. This evolution from carcass to reproductive traits underscores the need for comprehensive and sustainable Hanwoo cattle management practices.

The key reproductive traits under the current study included age at first calving (AFC), calving interval (CI), gestation length (GL), and number of artificial inseminations per conception (NAIPC). These traits play pivotal roles in determining the efficiency and effectiveness of cattle breeding programs. AFC is an important trait representing the age at which a female bovine, typically a heifer, gives birth to its first calf [6]. The optimal age for first calving varies among cattle breeds and is influenced by factors such as genetics, nutrition, and management practices [7]. Generally, AFC strikes a balance between allowing heifers to reach sufficient maturity for a healthy pregnancy and ensuring timely reproductive efficiency. For many cattle breeds, the recommended age for first calving is between 20 and 27 months [8]. Breeders aim to achieve optimal AFC to ensure that heifers are adequately developed and capable of handling the physical demands of pregnancy and lactation. Achieving this requires proper nutrition, well-managed breeding programs, and attention to health and growth factors. Heifers that calve at an appropriate age contribute to the overall efficiency of breeding programs, leading to healthier and more resilient cattle populations [9]. Breeders use careful monitoring and management practices to achieve optimal AFC, thereby promoting long-term success in their cattle breeding endeavors.

Calving interval, a vital reproductive trait, represents the duration between consecutive calvings in a female bovine reproductive cycle. This key factor directly influences the frequency and regularity of calving within a herd, affecting overall reproductive efficiency and productivity. The optimal calving interval varies according to specific breeding goals, management practices, and cattle breed characteristics. Generally, a shorter calving interval, typically ranging from 12 to 13 months, is preferred [10]. Achieving an efficient calving interval is crucial for maintaining a consistent calving pattern, optimizing the herd’s reproductive performance, and ensuring a steady production cycle. Multiple factors influence the calving interval, including the postpartum recovery period, genetics, nutrition, and overall herd management practices [11]. The ability of cows to swiftly recover after calving, combined with proper nutrition and breeding strategies, contributes to shorter calving intervals. Efficient management of the calving interval is essential to sustain productive and economically viable cattle breeding programs. Breeders focus on implementing sound reproductive management practices, such as effective breeding protocols and proper healthcare, to minimize the calving interval and enhance the overall reproductive efficiency of the herd [12]. A well-maintained calving interval ensures a steady supply of calves and contributes to the success and sustainability of cattle breeding.

Gestational length, another vital reproductive parameter in cattle breeding, refers to the duration of pregnancy from conception to the calf’s birth. The optimal gestation duration varies among cattle breeds and generally ranges from 279 to 287 days [13]. Effective management of gestation length is key to ensuring a successful calving process, minimizing the risk of complications, and promoting the overall well-being of both the dam (the pregnant cow) and the newborn. Cattle breeds can exhibit variations in inherent gestation length and selective breeding can be used to achieve the desired traits [14]. Understanding and monitoring gestation length are essential components of successful reproductive management programs for breeders. Proper nutrition during pregnancy, routine veterinary care, and attention to environmental factors contribute to maintaining an optimal gestation length [15]. Effective management of gestation length enhances the chances of a healthy and trouble-free calving process, ensuring the health of the cow and the newborn calf.

The number of artificial inseminations per conception is a key measure in cattle breeding that reflects the efficiency of the artificial insemination (AI) process. This signifies the number of attempts, typically through artificial insemination, required for female bovines to conceive successfully. Achieving optimal NAIPC is essential for maximizing fertility and conception efficiency in cattle breeding programs. A lower NAIPC indicates a higher conception rate, suggesting that fewer insemination attempts are needed to achieve a successful pregnancy [16]. Conversely, a higher NAIPC may suggest challenges in fertility or other factors affecting conception [17]. Several factors influence NAIPC, including the reproductive health of the female, the timing and technique of insemination, the quality of semen used, and overall herd management practices [18]. Breeders aim to minimize NAIPC through careful monitoring of the estrous cycle, proper timing of insemination, and the use of high-quality semen from proven sires [18,19,20]. By focusing on this parameter, breeders can implement targeted strategies to optimize fertility, reduce the number of insemination attempts, and improve the overall effectiveness of cattle breeding programs [21]. The efficient management of NAIPC aligns with the broader goals of sustainable breeding practices and contributes to the long-term success of the cattle industry.

The sequencing of the bovine genome in 2009, along with advancements in information technology, facilitated the development of modern, scientifically designed breeding schemes for enhancing economically important traits. These efforts resulted in a notable increase in both the quality and quantity of milk and meat production per animal [22]. The GWAS on Hanwoo female reproductive traits holds significant importance for addressing future challenges and advancing breeding strategies. Understanding the genetic basis of reproductive performance can help identify key genomic regions associated with important traits, providing valuable information on the underlying genetic factors influencing female reproductive characteristics. The identification of essential genes, haplotypes, and their regulatory mechanisms as markers for quantitative traits has the potential to enhance strategies for selecting beef cattle both currently and in the future [22]. There has been a significant focus on conducting genome-wide association studies (GWAS) to investigate economic traits in cattle, including fertility traits [23,24,25,26,27,28], production traits [29,30,31], somatic cell score [32,33,34,35], and disease resistance [36,37,38,39] in recent years. Numerous significant single nucleotide polymorphisms (SNPs) and biologically relevant genes were identified. However, a limited number of studies with sample sizes exceeding ~ 3000 individuals focus on beef cattle reproductive traits [24, 27, 40, 41]. Therefore, there is a need to revisit GWAS for reproductive traits using a larger population sample to obtain more reliable and comprehensive results. Despite research on genetic parameter estimations [2] and the accuracy of genomic predictions for reproductive traits [42] in recent years, GWAS in the context of Hanwoo cattle remains an underexplored area, with no notable studies addressing these reproductive traits. The scarcity of literature addressing the GWAS for Hanwoo reproductive traits prompted the primary objective of our current investigation. In our study, we analyzed a population of over 10,000 individuals, representing a significant improvement in terms of sample size compared with that of previous studies. Exploring the genetic architecture of these reproductive traits within this breed would provide baseline information regarding the quantitative trait loci (QTLs) or genes underlying the traits under investigation. Therefore, our objectives were to identify significant SNPs associated with these traits, explore the genetic architecture and biological relevance of these markers at the whole-genome level, and identify potential candidate genes in Hanwoo cattle.

Results

Phenotypes

Prior to conducting any statistical analysis, the phenotypic data of AFC, CI, and GL were checked for normality through visualization of the data distribution and application of the Kolmogorov-Smirnov Normality Test (Fig. 1). The results revealed that the data of AFC, CI, and GL followed a normal distribution. Additionally, records of animals with an NAIPC above four were excluded from the dataset, as visualized in Fig. 1. Table 1 summarizes the key reproductive traits in the Hanwoo cattle population considered in our GWAS, focusing on animals with both genotypic and phenotypic records after quality control. The mean AFC duration was 736.18 days, and the standard deviation was 64.43 days, indicating a moderate level of variability. The CI displayed a mean of 370.56 days and a standard deviation of 40.04 days, suggesting a relatively consistent pattern. The mean duration was 286.62 days, with a narrow standard deviation of 4.91 days, indicating a more homogeneous distribution. The NAIPC, measured on a scale of 1 to 4, had a mean value of 1.44 and a standard deviation of 0.79, demonstrating a moderate level of variability in this reproductive parameter.

Fig. 1
figure 1

Distribution of AFC, CI, GL, and NAIPC phenotype in Hanwoo cows

Table 1 Summary of the statistics of the Hanwoo reproductive traits used in the GWAS

SNP genotyping

After performing quality control procedures involving 78.30% of the initial SNPs across all 29 Bos taurus autosomes (BTA), a set of 40,807 common SNPs was selected. The distribution of these markers was uneven, with a significant over-representation of specific chromosomes, as shown in Fig. 2. Among these, BTA1 had the highest number of SNPs (2570), covering approximately 158 megabases (Mb), whereas BTA28 had the fewest SNPs (705), spanning approximately 46.1 Mb. Additionally, BTA1, BTA2, BTA3, and BTA6 harbored more than 2000 SNPs each.

Fig. 2
figure 2

Distribution of SNPs across chromosomes after quality control

Population structure and linkage disequilibrium (LD) decay analysis

This study used principal component analysis (PCA) to explore the genetic structure of Hanwoo cattle using quality-controlled SNP data. This analysis revealed a clear genotype cluster in the dataset, as illustrated in Fig. 3a. The scree plot of the eigenvalues demonstrated that the first two principal components explained 2.3% of the total variability (Fig. 3b). The square of the correlation coefficient between markers at two loci (r2) was used to estimate LD. It is generally expected that LD will decay by half when the r2 value falls below 0.2 [43]. The LD decayed to an r2 value of 0.08 at approximately 500 kb in the Hanwoo cattle population under investigation (Fig. 3c).

Fig. 3
figure 3

Analysis of population structure in Hanwoo cows using quality-controlled SNP data. (a) Principal component analysis (PCA) plot. (b) Scree plot illustrating variance accumulation of the top ten principal components (PCs). The x-axis denotes the top ten PCs, while the y-axis represents eigenvalues indicating the amount of variation. Accumulated variance for each PC is marked by an empty circle, and the “elbow” point indicates the number of factors to generate. (c) Genome-wide linkage disequilibrium (LD) decay plot. LD, measured as the squared correlation coefficient (r2) between pairs of polymorphic markers, is plotted against their genetic distance (bp) across the chromosomes. The red line represents the moving average of the ten adjacent markers, with each gray dot signifying the distances between two markers and their corresponding squared correlation coefficient. The blue line marks the LD cutoff of 0.1, while the green line signifies the critical LD at a distance of less than 500 kb

Association analysis

GWAS of reproductive traits revealed 68 significant SNPs distributed across 29 BTA. However, the distribution was uneven, with specific chromosomes exhibiting more SNPs associated with each trait. Notably, BTA14 had the highest number of identified SNPs, totaling 25. Additionally, BTA6, BTA7, BTA8, BTA10, BTA13, BTA17, and BTA20 displayed 8, 5, 5, 3, 8, 2, and 12 significant SNPs, respectively. A comprehensive summary of the GWAS results is provided, including significant SNP IDs for the studied traits, SNP positions on the respective BTAs, allele types (minor/major), minor allele frequency (MAF), p values, and nearby candidate genes.

Marker loci associated with AFC

GWAS results for AFC in Hanwoo cattle revealed a diverse set of SNPs associated with this reproductive trait. In total, 38 SNPs were identified across all 29 BTA (Table 2; Fig. 4). Notably, BTA14 exhibited the highest number of SNPs (25 variants), indicating a potential genomic hotspot for genetic factors influencing AFC. Five SNPs were identified on BTA8, and these genetic markers spread across a genomic range with positions ranging from 59.91 Mb to 60.16 Mb. Similarly, BTA14 emerges as a prominent genomic region containing a substantial number of SNPs associated with AFC. These SNPs spanned a range of positions from 26.16 Mb to 27.16 Mb, underlining the extensive genetic landscape on BTA14 that influences the age at which Hanwoo cattle experience their first calving. Additionally, BTA20 featured in the GWAS results, with a distinct set of eight SNPs associated with AFC, positioned between 22.17 Mb and 22.20 Mb.

Table 2 Genome-wide significant SNPs underlying AFC in Hanwoo cows
Fig. 4
figure 4

Manhattan plot of GWAS for AFC in Hanwoo cows. The y-axis represents -log10 (observed) p-values for genome-wide SNPs against their respective positions on each chromosome (x-axis). The horizontal green line indicates the suggestive (5 × 10− 5) threshold level

The identified SNPs on BTA20 exhibited complete LD and were located within a 38.76 kb haplotype block (20: 22,173,199–22,211,958 bp). This suggested that mutations near the potential QTL may have a significant effect on AFC (Fig. 5). The BTA8 regional association plot and heatmap of LD are presented in (Additional file 1: Fig. S1). Furthermore, among these SNPs, the highest significance was observed at position 26.16 Mb on BTA14 (rs42304778, p = 8.73 × 10− 11). Additionally, apart from rs42304778, other noteworthy SNPs on BTA14 based on the p-value threshold included rs42304792, p = 1.23 × 10− 10; rs41725705, p = 1.43 × 10− 10; rs209439851, p = 4.52 × 10− 10; rs43083563, p = 1.16 × 10− 9; and rs110634307, p = 1.46 × 10− 9, located on BTA14 at the position of 26.17 Mb to 26.57 Mb (Fig. 4). The genomic inflation factor (λ) was calculated to assess population stratification, revealing that AFC exhibited a λ value of 0.996 (Additional file 2: Fig S2a). To visually represent the observed versus expected p-values (-log10P) for AFC, we generated quantile-quantile (QQ) plots (Additional file 2: Fig S2a). The QQ plots in this study clearly demonstrated a close alignment between the observed and expected values, indicating that the p-values were normally distributed. This alignment suggested that population stratification was effectively addressed using the appropriate model, thereby enhancing the likelihood of identifying true associations.

Fig. 5
figure 5

Regional association plot (top) showing the distribution of significant loci associated with AFC at various positions on BTA14 and BTA20 and heatmap of LD (bottom). The red horizontal line indicates -log10P = 4.30

Marker loci associated with CI

The GWAS revealed five significant chromosomal regions in the calving interval of Hanwoo cattle (Table 3; Fig. 6). Specifically, BTA7 featured two identified SNPs located in genomic regions of 9.56 Mb and 21.57 Mb, respectively. Additionally, within the GWAS results, BTA10 featured a noteworthy SNP associated with AFC at 53.11 Mb. Furthermore, two SNPs on BTA17, positioned at 10.12 Mb and 18.32 Mb, exhibited significance with the calving interval. Among the notable findings, the significant SNPs associated with this trait were BovineHD0700002480 (p = 4.42 × 10− 5), Hapmap33901-BES9_Contig395_449 (p = 2.58 × 10− 5), BTB-00425619 (p = 4.50 × 10− 5), NIAS_SPC_00646 (p = 4.07 × 10− 5), and ARS-BFGL-NGS-11,930 (p = 4.95 × 10− 5), respectively. The regional association plot and LD heat map for the BTA7, BTA10, and BTA17 regions are presented in (Additional file 1: Fig S1). In addition, the study used a QQ plot that revealed a close alignment between the observed and expected p-values. The λ was calculated and found to be 1.021, indicating that potential confounding factors, such as population stratification, were adequately addressed (Additional file 2: Fig S2b).

Table 3 Genome-wide significant SNPs underlying CI in Hanwoo cows
Fig. 6
figure 6

Manhattan plot of GWAS for CI in Hanwoo cows. The y-axis represents -log10 (observed) p-values for genome-wide SNPs against their respective positions on each chromosome (x-axis). The horizontal green line indicates the suggestive (5 × 10− 5) threshold level

Marker loci associated with GL

The GWAS for GL in Hanwoo cattle revealed 13 significant SNPs (Table 4; Fig. 7). Notably, BTA13 exhibited the highest number of associated SNPs, totaling eight variants, suggesting a potential genomic hotspot for genetic factors influencing GL. Three SNPs were identified on BTA7, and these genetic markers were distributed across a genomic range, spanning positions from 62.02 Mb to 69.51 Mb. Additionally, BTA10 featured two SNPs associated with GL, positioned at 0.42 Mb and 0.43 Mb, respectively. Similarly, BTA13 emerged as a prominent genomic region with many identified SNPs related to GL, spanning positions from 40.78 Mb to 41.53 Mb. Among the top SNPs associated with GL, the five most significant SNPs were BTA-32,481-no-rs, p = 3.60 × 10− 7; ARS-BFGL-NGS-60,607, p = 3.80 × 10− 7; BovineHD0700020355, p = 1.90 × 10− 6; ARS-BFGL-NGS-113,658, p = 3.54 × 10− 6; and ARS-BFGL-NGS-37,354, p = 7.30 × 10− 6. Additional file 1: Fig S1 shows the regional association plot and LD heatmap for the BTA7, BTA10, and BTA13 regions. The QQ plot for the GL exhibited a normal distribution of p-values, with λ values (1.045) close to 1, indicating effective control of spurious results and a high likelihood of true associations (Additional file 2: Fig S2c).

Table 4 Genome-wide significant SNPs underlying GL in Hanwoo cows
Fig. 7
figure 7

Manhattan plot of GWAS for GL in Hanwoo cows. The y-axis represents -log10 (observed) p-values for genome-wide SNPs against their respective positions on each chromosome (x-axis). The horizontal green line indicates the suggestive (5 × 10− 5) threshold level

Marker loci associated with NAIPC

The GWAS for NAIPC in Hanwoo unveiled eight significant SNPs on BTA6, spanning from 58.39 Mb to 64.58 Mb. Simultaneously, BTA20 revealed four SNPs, covering regions from 23.72 Mb to 23.89 Mb (Table 5; Fig. 8). Notably, the most significant marker associated with NAIPC was identified on BTA6 at 64.58 Mb (BTB-01312166, p = 2.12 × 10− 7). Following closely, the second-most significant SNP was located on BTA6 at 58.41 Mb, influencing NAIPC in Hanwoo (BovineHD4100005053, p = 4.80 × 10− 7). These findings underscore the specific genetic variations associated with NAIPC on chromosomes 6 and 20 in Hanwoo cattle. The association plot and LD heat map for the BTA6 and BTA20 regions are depicted in Additional file 1: Fig S1. Furthermore, the QQ plot generated for NAIPC exhibited a normal distribution of p-values, with λ values (1.044) closely approximating 1 (Additional file 2: Fig S2d). This suggested an effective control of spurious results and a high probability of true associations, further validating the GWAS findings.

Table 5 Genome-wide significant SNPs underlying NAIPC in Hanwoo cows
Fig. 8
figure 8

Manhattan plot of GWAS for NAIPC in Hanwoo cows. The y-axis represents -log10 (observed) p-values for genome-wide SNPs against their respective positions on each chromosome (x-axis). The horizontal green line indicates the suggestive (5 × 10− 5) threshold level

Identification of candidate genes and functional annotation

We explored the genes surrounding our reported significant marker loci associated with reproductive traits in Hanwoo by searching the National Center for Biotechnology Information (NCBI) database on cattle (Bos taurus) based on the Bos_taurus_UMD_3.1.1 genome assembly, considering a search window of ± 500 Kb around the identified marker loci. Considering the observed LD decay distance in the studied population, it is plausible that the causal mutations and genes were situated within a region spanning 500 kb, both upstream and downstream of the identified GWAS signals (Fig. 3c). A total of 138 unique positional candidate genes were identified within a 1 Mb region centered in proximity to the significant marker loci underlying reproductive traits in Hanwoo cattle (Tables 2, 3 and 4, and 5). Notably, we identified 37 nearby genes for AFC, 43 genes for CI, 23 genes for GL, and 27 genes for NAIPC.

Functional annotation of Gene Ontology (GO) terms was initially performed to identify the biological meaning and systematic features of the candidate genes using the Database for Annotation, Visualization, and Integrated Discovery (DAVID) and KEGG Orthology-Based Annotation System (KOBAS). The gene ontology encompassed four categories: Biological Process (BP), Molecular Functions (MF), and Cellular Component (CC), and Kyoto Encyclopedia of Genes and Genomes (KEGG). Candidate genes associated with reproductive traits exhibited significantly different GO terms (P < 0.05). Specifically, 37, 13, 11, and 8 GO terms were linked to BP, MF, CC, and KEGG, respectively (Fig. 9 and Additional file 3: Fig S3).

Fig. 9
figure 9

Bar plot illustrating the -log10 of the p-values for selected GO terms related to biological processes

The pathway enrichment analysis revealed that genes involved in AFC, CI, GL, and NAIPC were enriched in chemical synaptic transmission (GO:0007268), regulation of transcription from RNA polymerase II promoter (GO:0006357), proteasome-mediated ubiquitin-dependent protein catabolic process (GO:0043161), cellular response to DNA damage stimulus (GO:0006974), interstrand cross-link repair (GO:0036297), positive regulation of neuron projection development (GO:0010976), cardiac muscle fiber development (GO:0048739), parathyroid gland development (GO:0060017), cell population proliferation (GO:0008283), cellular response to glucose stimulus (GO:0071333), developmental process (GO:0032502), bone morphogenesis (GO:0060349), ovarian follicle development (GO:0001541), regulation of cell growth (GO:0001558), anatomical structure development (GO:0048856), somitogenesis (GO:0001756), cellular response to mechanical stimulus (GO:0071260), thymus development (GO:0048538), hematopoietic progenitor cell differentiation (GO:0002244), heart development (GO:0007507), angiogenesis (GO:0001525), in utero embryonic development (GO:0001701), transcriptional activator activity (GO:0001228), transcription factor activity (GO:0003700), RNA polymerase II core promoter proximal region sequence-specific DNA binding (GO:0000978), glutathione peroxidase activity (GO:0004602), transcription cofactor binding (GO:0001221), RNA binding (GO:0003723), bHLH transcription factor binding (GO:0043425), RNA polymerase II transcription factor activity (GO:0000981), nucleus (GO:0005634), RNA polymerase II transcription factor complex (GO:0090575), Golgi membrane (GO:0000139), synapse (GO:0045202), neuronal cell body (GO:0043025), thyroid hormone synthesis (bta04918), metabolic pathways (bta01100), longevity regulating pathway (bta04211), relaxin signaling pathway (bta04926), and estrogen signaling pathway (bta04915). The functional gene set annotations and enrichment details are presented in Additional file 4: Table S1.

Subsequently, we used the GeneCards and Mouse Genome Informatics databases and conducted an extensive literature review to explore the functional roles of the identified genes. Positional candidate genes with functional biological roles related to reproductive traits, or those previously reported to be associated with reproductive traits, were regarded as promising candidates (Table 6). Consequently, we identified ten candidate genes with potential relevance to age at first calving. These genes, namely FANCG, UNC13B, TESK1, TLN1, CREB3, FAM110B, UBXN2B, SDCBP, TOX, and MAP3K1 that were located on BTA8, BTA14, and BTA20, exhibited promising associations with AFC based on their known functions and previous research findings. Furthermore, APBA3, TCF12, and ZFR2, located on BTA7 and BTA10 were involved in the calving interval; PAX1, SGCD, and HAND1, located on BTA7 and BTA13, were associated with gestational length; and RBM47, UBE2K, and GPX8 located on BTA6 and BTA20, were related to the number of artificial inseminations per conception in Hanwoo cows.

Table 6 Promising candidate genes associated with reproductive traits in Hanwoo cows

Discussion

Efficient reproductive traits are pivotal for the success of Hanwoo cattle farming and influence genetic improvement, economic viability, and overall herd productivity. The optimal performance of traits, such as age at first calving, calving interval, gestation length, and number of artificial inseminations per conception, plays a vital role in selective breeding programs, leading to the propagation of desirable genetic characteristics within the Hanwoo population. Their direct impact on productivity, rapid return on investment, and overall sustainability in the Hanwoo cattle industry underscores the economic significance of these traits [44]. Moreover, a keen understanding of reproductive traits will enable farmers to implement effective farm management practices, ensuring timely breeding and proper adaptation to environmental conditions [45].

Candidate gene functions associated with AFC

By annotating gene functions, we identified five candidate genes associated with AFC on BTA8. These genes were FANCG, UNC13B, TESK1, TLN1, and CREB3. The FANCG (Fanconi anemia complementation group G) gene, positioned between 59.75 Mb and 59.76 Mb, encodes a protein expressed in the chromosomal instability syndrome associated with various developmental abnormalities, progressive bone marrow failure, reduced fertility, retarded growth, hyperpigmentation, and a predisposition to cancer [46, 47]. Fanconi anemia (FA) is a genetically heterogeneous, recessive disorder characterized by cytogenetic instability, hypersensitivity to DNA cross-linking agents, increased chromosomal breakage, and defective DNA repair. This gene encodes a protein for complementation group G (https://www.genecards.org/cgi-bin/carddisp.pl?gene=FANCG). As reported by Guitton-Sert et al. [47], analysis of embryos revealed germ cell aplasia occurring in the embryonic stages in the majority of FA mouse models. The earliest mammalian germ cells, known as primordial germ cells (PGCs), undergo extensive epigenetic reprogramming before progressing to meiosis. The onset of embryonic development relies on fundamental interactions between the sperm and oocytes [48]. Prior to this interaction, the sperm acrosome undergoes a series of transformations essential for its fusion with the zona pellucida. This intricate biological process, known as the acrosome reaction, involves acrosomal exocytosis, substructural remodeling, and various biochemical modifications [49]. These sequential processes are indispensable for successfully fusing sperm with the oocyte plasma membrane during a single fertilization event [50]. Positioned in the 59.76 Mb to 60.03 Mb region, the UNC13B (Unc-13 homolog B) gene plays a pivotal role in ensuring the normal functioning of the aforementioned mechanisms, ultimately contributing to fertility success [49]. The expression pattern of UNC13B closely aligns with these essential processes and is associated with immune response and cell differentiation. Consequently, UNC13B has emerged as a potential target for unraveling the intricate connection between immune response and fertility [49]. Testis-associated actin remodeling kinase 1 (TESK1) is a member of a conserved gene family known for its widespread functionality across various cellular processes, with particularly high expression in the testis [51]. According to Toshima et al. [52], TESK1 is expressed in different tissues and cell lines, albeit at relatively low levels. This observation led to the assumption that TESK1 might have general cellular functions rather than being testis-specific. Moreover, their research identified differential expression of the TESK1 gene with reproductive and developmental processes. Numerous animal and human studies have associated this gene with fertility and infertility [53]. Notably, studies have confirmed the presence of TESK1 in spermatozoa, suggesting its potential regulatory role during chromosomal repackaging in spermatozoa and post-fertilization in oocytes [54, 55]. Another important candidate gene associated with AFC is TLN1 (Talin 1), located in the genomic region spanning from 60.28 Mb to 60.32 Mb. TLN1 is important in various aspects of embryonic development, physiology, and disease processes. It performs various functions associated with integrins and directly influences diverse aspects of biology and medicine. These encompass critical stages in mammalian pregnancy, particularly implantation of the blastocyst onto the uterine wall, subsequent placentation, and conceptus development [56]. Studies on TLN1 indicate that tamoxifen-induced inactivation of the talin1 gene across the embryo results in an angiogenesis phenotype confined to newly formed blood vessels. This phenotype manifests rapidly in early embryos, leading to vessel defects within 48 h and embryo death within 72 h [57]. TLN1 is predominantly expressed in maternal epithelium and trophoblast giant cells (TGC). As pregnancy progresses, TLN1 expression increases in the TGC and the surrounding caruncular epithelial cells while it diminishes in the basal compartment of the caruncular epithelial cells in bovines [58]. This dynamic expression pattern underscores TLN1’s integral involvement at various pregnancy stages. An additional noteworthy candidate gene associated with AFC is CREB3 (CAMP responsive element binding protein 3), situated on BTA8 within the genomic region spanning from 60.31 Mb to 60.32 Mb. This gene is significant in the decidualization of endometrial stromal cells (ESCs) [59]. Additionally, elevated expression levels of CREB3 were observed in decidualized cells on days 6–8 of pregnancy, highlighting the potential regulatory influence of CREB3 on the decidualization pathway [60, 61].

Furthermore, we identified four candidate genes associated with AFC on BTA14, namely FAM110B, UBXN2B, SDCBP, and TOX, along with one gene on BTA20, identified as MAP3K1. The positional candidate gene identified in our study, located around the significant SNP on BTA14 spanning from 26.04 Mb to 26.97 Mb and associated with AFC in Hanwoo cattle, aligns with previous research. Specifically, a study on Nellore cattle reported an association of the genes UBXN2B (UBX domain protein 2B) and SDCBP (Syndecan binding protein) with AFC [62]. Similarly, in Brahman cattle, the FAM110B (Family with sequence similarity 110 member B) and TOX (Thymocyte selection associated high mobility group box) genes were linked to puberty-related traits, including age at the formation of the first corpus luteum [63]. Notably, FAM110B, UBXN2B, and TOX, located in the genomic region of 26.04–26.97 Mb on chromosome 14, have previously been shown to be associated with various traits such as growth, birth weight, carcass weight, average daily gain, feed intake, meat tenderness, height, and stature across different beef cattle breeds [64,65,66]. Additionally, the TOX gene, previously identified in genome-wide association studies on Nellore females, has been linked to reproductive traits [67]. Furthermore, SDCBP, mapped to a conserved region on chromosome 14, has emerged as a candidate gene for determining carcass weight in Hanwoo cattle, given its role in binding to various transmembrane proteins [68,69,70].

The gene MAP3K1 (Mitogen-activated protein kinase kinase kinase 1), located on BTA20 spanning from 22.35 Mb to 22.44 Mb, is a key regulator of MAPK activation. This activation is integral to various cellular processes, including gene expression, cell proliferation, migration, survival, and death [71]. Moreover, extensive studies in the human genomic context have revealed a robust correlation between MAP3K1 mutations and sexual development and differentiation [72, 73]. MAP3K1 functions as a dynamic signaling molecule with diverse cell type-specific roles and contributes to the development of the female reproductive tract. Notably, females deficient in the kinase domain of MAP3K1 present with conditions such as imperforate vagina, labor failure, and infertility, highlighting the indispensable role of MAP3K1 in reproductive health [74].

Candidate gene functions associated with CI

The CI in Hanwoo cattle, which denotes the time lapse between successive calving events in an individual cow, is a major reproductive trait with multifaceted implications. It indicates reproductive efficiency by influencing the frequency and yield of offspring within a defined timeframe. Additionally, the calving interval is important in herd management strategies, enabling farmers to strategically plan breeding cycles and allocate resources [75]. Economic ramifications are substantial, as a shorter calving interval enhances turnover rates and contributes to the financial viability of Hanwoo cattle farming [76]. In this study, we identified the presence of the ZFR2 and APBA3 genes on BTA7, whereas the TCF12 gene was identified on BTA10 near the significant SNPs associated with CI in Hanwoo cows. Specifically, the zinc finger RNA binding protein 2 (ZFR2) was located within the genomic range of 21.35 Mb to 21.41 Mb. Notably, mutations in ZFR2 have been implicated as a potential cause of primary ovarian insufficiency in women, demonstrating an association with a complete lack of oocytes and follicles [77]. Additionally, the amyloid beta precursor protein binding family A member 3 (APBA3) gene, situated in the region from 21.44 Mb to 21.45 Mb, has been linked to ovarian hyperstimulation for in vitro fertilization in women [78]. Furthermore, the transcription factor 12 (TCF12) gene, found between 53.05 Mb and 53.52 Mb, has been observed to play a role in muscle development and regeneration, as indicated by loss-of-function studies, and is important in regulating cell growth and differentiation during embryonic development [79].

Candidate gene functions associated with GL

We have identified two noteworthy candidate genes, HAND1 and SGCD, associated with gestation length on BTA7 spanning from 67.72 Mb to 70.10 Mb, and PAX1 on BTA13 ranging from 41.25 Mb to 41.26 Mb. The HAND1 gene, which encodes heart and neural crest derivatives expressed 1, is linked to congenital heart disease (CHD) and is expressed in placental trophoblasts and endothelial cells in various mouse models [80]. Histological examination of the placenta revealed that the loss of HAND1 in labyrinthine progenitor trophoblasts during early pregnancy significantly affected syncytial layer formation and labyrinthine vasculature development [80]. HAND1, a basic helix-loop-helix transcription factor, is involved in multiple organ systems during embryogenesis. SGCD gene sarcoglycan delta may contribute to the embryonic heart response in animals and is associated with autosomal recessive limb-girdle muscular dystrophy and dilated cardiomyopathy due to SGCD mutations [81]. Additionally, the PAX1 gene, a paired box 1 transcription factor, is important for the development of various tissues during embryogenesis [82], including the thymus [83, 84], vertebral column [85], chondrogenic differentiation, and chondrocyte maturation [86]. These findings shed light on the roles of HAND1, SGCD, and PAX1 in regulating gestation length and their broader implications in embryonic development.

Candidate gene functions associated with NAIPC

The NAIPC is a metric used in animal reproduction, particularly in livestock breeding. It quantifies the average number of artificial insemination attempts required for successful conception or pregnancy. This metric is valuable for assessing the efficiency of artificial insemination techniques, reproductive performance, and overall breeding management strategies. A lower number indicates higher reproductive efficiency, whereas a higher number suggests challenges or inefficiencies in the artificial insemination process [87]. Monitoring and optimizing this parameter in Hanwoo cattle breeding programs are essential for optimizing reproductive outcomes, refining breeding strategies, ensuring overall health, and maximizing reproductive success in Hanwoo populations. We identified three promising candidate genes associated with NAIPC, BTA6, and BTA20 in Hanwoo cattle through gene function annotation. The UBE2K (Ubiquitin-conjugating enzyme E2 K) and RBM47 (RNA binding motif protein 47) genes are situated on BTA6 between 60.41 Mb and 61.34 Mb, whereas GPX8 (glutathione peroxidase 8) is located on BTA20 at 23.98 Mb. UBE2K is associated with reproductive failure, specifically during gametogenesis and embryogenesis [88]. Similarly, RBM47 influences mouse blastocyst development, with transcriptional expression observed during the preimplantation stages. Immunofluorescence analysis indicated that the RBM47 protein was first detected in morula-stage embryos and was primarily localized in the nucleus of blastocyst embryos [89]. Previous studies by Shivalingappa et al. [90] suggested multifunctional roles of RBM47 in processes such as RNA editing and transcriptional activation during blastocyst development. In a rat experiment by Mihalik et al. [91], GPX8 was observed throughout the preimplantation period from unfertilized oocytes to blastocysts. Notably, GPX8 was detected in the ovary, uterine tube, and uterus of the mother, with the highest protein levels observed on day one of pregnancy, which gradually declined thereafter. Immunohistochemistry revealed GPX8 in Graafian follicles within the ovary, and immunofluorescence confirmed its presence in ovulated oocytes and corona radiata cells of the oviduct. These findings highlight the potential roles of UBE2K, RBM47, and GPX8 in the Hanwoo cow reproductive processes.

Conclusion

GWAS studies with a large sample size are a potential tool for uncovering novel genomic regions associated with traits of interest. It is important to note that multiple genes likely influence reproductive traits, each contributing a small proportion to the observed variation. In addition, mapping genes to SNP loci may not always lead to identifying genes within or near the identified SNPs. Since there is no prior report of GWAS for reproductive traits in Korean Hanwoo cattle, the current study’s findings serve as a foundational baseline, requiring further research for validation. Concerns regarding the effects of breed composition and population structure on GWAS results should be acknowledged. In the present GWAS for key reproductive traits in Korean Hanwoo cattle, we successfully identified chromosomal regions that contribute to a deeper understanding of the genetic and physiological mechanisms regulating these traits, along with candidate genes for investigating causal mutations. In total, 68 significant genome-wide SNPs were detected, with BTA6, BTA7, BTA8, BTA10, BTA13, BTA14, BTA17, and BTA20 emerging as the prominent genomic regions. Through gene function annotation, we identified FANCG, UNC13B, TESK1, TLN1, CREB3, FAM110B, UBXN2B, SDCBP, TOX, MAP3K1, APBA3, TCF12, ZFR2, PAX1, SGCD, HAND1, RBM47, UBE2K, and GPX8 as the most promising candidate genes for age at first calving, calving interval, gestation length, and the number of artificial inseminations per conception, respectively. These findings hold significant promise for future marker-assisted selection in Hanwoo cattle breeding programs and provide a pathway for enhanced trait selection and genetic improvement of this valuable breed. In conclusion, our groundbreaking GWAS analysis illuminates a path toward unraveling the genetic intricacies of Korean Hanwoo cattle reproductive traits, opening avenues for innovative advancements in cattle breeding practices.

Materials and methods

Animals and phenotypes

First-parity data were collected from 11,348 Hanwoo cows from nine commercial herds in the South Korean province of Gyeongsanbuk-do. Four female reproductive traits, namely AFC, CI, GL, and NAIPC, were examined. The AFC, CI, and GL measurements were recorded in days, whereas NAIPC was recorded as the total number of occurrences.

Genotyping and quality control

A total of 11,348 Hanwoo cows were genotyped using an Illumina Bovine 50 K SNP chip (Illumina Inc., San Diego, CA, USA), where 53,866 SNPs were embedded. We excluded SNPs located in duplicate sex chromosomes or in uncertain positions to ensure data quality, resulting in the removal of 1750 SNPs. This process yielded 52,116 SNPs for subsequent analysis. For further refinement, several quality control (QC) criteria were applied to filter out low-quality SNPs. Specifically, SNPs with a minor allele frequency (MAF) below 5% (i.e., monomorphic; 9281 SNPs), an SNP call rate below 90% (732 SNPs), individuals with a genotyping call rate less than 90% (N = 62), and SNPs showing a significant deviation from Hardy–Weinberg equilibrium (HWE) with a p-value exceeding 10− 6 (1296 SNPs) were excluded from the dataset. An identity-by-state (IBS) test was conducted to identify genetically similar individuals or potential genotyping errors. Pairs of individuals with a similarity rate exceeding 99% were considered identical or indicative of genotyping errors (N = 48). The IBS and QC processes were performed using the PLINK v1.9 toolset [92]. Additionally, missing alleles were imputed using Beagle v5.4 software [93]. Following the IBS and QC procedures, 11,238 animals with genotypes of 40,807 SNPs remained available for subsequent analysis.

Estimation of population structure and linkage disequilibrium

Principal component analysis (PCA) and linkage disequilibrium (LD) analyses were conducted on quality-controlled SNPs to explore the population structure of Hanwoo cattle. PCA was performed using the PLINK toolset v1.9 [92]. The average LD decay distance across the entire imputed genome of the Hanwoo population was calculated using TASSEL software v5.2.92 [94]. LD decay was visualized by plotting the distance against the average r2 value using the R package ggplot2 [95].

GWAS analysis

Reproductive traits were analyzed using the linear mixed model (LMM) implemented in the genome-wide efficient mixed-model analysis (GEMMA) software v0.98.5 [96]. GEMMA computes a genomic relationship matrix (GRM) between individuals within each population to determine the population structure. The univariate linear mixed model in the GEMMA is described as follows:

$$\text{y}=\text{W}{\upalpha }+\text{X}{\upbeta }+\text{u}+{\upepsilon }$$

where y is the vector of phenotypes; W is the incidence matrix covariates, including fixed effects of the herd in which the animal was raised and the year and season of birth and calving; α is a vector of the corresponding coefficients, including the intercept; X represents the vector of all marker genotypes; β represents the effect size of the SNP; \(\text{u} \sim \text{M}\text{V}{\text{N}}_{\text{n}}(0, {\uplambda }{{\uptau }}^{-1}\text{K})\) is an n-vector of animal additive effects; and \({\upepsilon } \sim \text{M}\text{V}{\text{N}}_{\text{n}}(0, {{\uptau }}^{-1}{\text{I}}_{\text{n}})\) represents an n-vector of errors; \({{\uptau }}^{-1}\) is the variance of the residual errors; λ is the ratio between the two variance components; K represents the genomic relationship matrix (GRM); \({\text{I}}_{\text{n}}\) is an n \(\times\) n identity matrix; and \(\text{M}\text{V}{\text{N}}_{\text{n}}\) represents the n-dimensional multivariate normal distribution. GEMMA calculates the GRM as follows [96]:

$$\text{G}=\frac{1}{\text{p}}\sum _{\text{i}=1}^{\text{p}}{({\text{x}}_{\text{i}}-{1}_{\text{n}}{\stackrel{-}{\text{x}}}_{\text{i}})({\text{x}}_{\text{i}}-{1}_{\text{n}}{\stackrel{-}{\text{x}}}_{\text{i}})}^{\text{T}}$$

where X represents the n × p matrix of the genotypes, \({\text{x}}_{\text{i}}\) represents the genotypes of the ith SNP, \({\stackrel{-}{\text{x}}}_{\text{i}}\) is the sample mean, and \({1}_{\text{n}}\) is the n × 1 vector of 1.

Manhattan plots were generated to visualize the genome-wide distribution of significant SNPs. The significance level is represented as the negative base − 10 logarithm (-log10) of the p-value for each SNP. The Bonferroni test determined the genome-wide significance threshold (0.05/N, where N represents the number of SNPs). A more lenient threshold of 5 × 10− 5 (4.30) was implemented to identify suggestive SNPs, as suggested by the Wellcome Trust Case Control Consortium (https://www.wtccc.org.uk/), after recognizing the strictness of this criterion. The genomic inflation factor, lambda (λ), was calculated to assess population stratification by comparing the median chi-squared test statistics from GWAS to the expected median of the chi-squared distribution. In our study, p-values from GWAS results for all traits were used to compute λ using the qchisq() function in R [97]. Ideally, the genomic inflation factor should be close to 1 after adjusting for population stratification [39]. However, the notably high value of the genomic inflation factor suggests that factors beyond population stratification, such as strong linkage disequilibrium, significant associations between phenotypic traits and SNPs, or systematic technical biases, may contribute to the observed inflation [40]. Additionally, QQ plots were drawn to depict observed versus expected p-values (-log10P) for each trait.

Analysis of haplotype block

GWAS often reveal significant SNPs associated with target traits in putative regions. Clustering of these SNPs may be attributed to high LD and non-random association of the alleles on the chromosome. We used PLINK v1.9 [92] and LDBlockShop software [98] for chromosomal regions in which multiple SNPs were significantly clustered around the top SNP to investigate these genomic patterns. This approach enabled us to examine LD patterns within these regions meticulously.

Identification of candidate genes and analysis of functional enrichment

Putative candidate genes within the QTL regions, as well as in the nearest upstream and downstream regions (500 kb) of our mapped significant SNPs, were identified based on the bovine genome assembly (Bos_taurus_UMD_3.1.1) [22]. We used online resources, including Genome Data Viewer (https://www.ncbi.nlm.nih.gov/genome/gdv/org=bos-taurus; accessed on November 15, 2023), BovineMine v1.6 (an integrated data warehouse for the Bovine Genome Database (http://128.206.116.13:8080/bovinemine/begin.do, accessed on November 15, 2023), and BGVD (Bovine Genome Variation Database and Selective Signatures, available at http://animal.nwsuaf.edu.cn/code/index.php/BosVar, accessed on November 17, 2023). We conducted KEGG and GO analyses using DAVID [99] and KOBAS v3.0 [100] to investigate the functions of all candidate genes. Enriched terms with a significance threshold of p-value < 0.05 were selected to further explore the genes involved in pathways and biological processes. The functional roles of the identified genes within and near significant SNPs associated with reproductive traits were explored using published reports in PMC for Biotechnology Information (NCBI database) journals and other literature surveys. The functional roles of each gene were also obtained from online resources, including human gene functions at GeneCards (www.genecards.org), the Mouse Genome Informatics (MGI) website (https://www.informatics.jax.org/), and Ensembl (www.ensembl.org/biomart/martview), accessed on November 18, 2023. Candidate genes that appeared to be functionally related to the desired traits of interest were considered promising candidate genes.