Background

In livestock breeding, sires have an important effect in disseminating superior genetic merit, particularly in situations where artificial insemination (AI) is used [1,2]. Sires with better fertility guarantee the efficiency of transmission of the alleles with a superior effect. Andrological parameters are also related to the fertility of sires, which is an important selection trait itself. Sires with good andrological parameters are important, because beef cattle conception rates have economic impact in the production system [3,4]. Improved conception rates increase the economic return. Poor semen quality also impacts on the success rates of reproductive biotechnologies [5].

Andrological traits, such as scrotal circumference measured at 12 months and percentage of morphologically normal sperm measured at 24 months have a moderate and negative genetic correlation with female puberty [6-9] and a moderate and positive genetic correlation with female stayability in the herd [6,8,10,11]. In other words, the selection for higher scrotal circumference and/or higher percentage of normal sperm in young bulls, should lead to female progeny that will be sexually precocious and have a higher probability to stay in the herd. Female fertility traits are of high relevance for beef cattle production in tropical areas. These traits could be from four to thirteen times more important, economically speaking, than carcass and growth traits [12]. Due to cost, andrological traits other than scrotal circumference, are not commonly measured and evaluated in animal breeding programs [13]. The identification of genetic markers associated with the traits could assist in animal breeding, via genomic selection.

Using a GWAS methodology, QTL regions were identified on the bovine X chromosome that potentially influence andrological traits in cattle [14,15]. The aim of this study was to fine-map the QTL regions, focussing on candidate genes to identify possible causative mutations. In future, these variants may be used to construct a low density chip for improved genetic evaluation with a better cost-benefit [16]. Further, customized chips with causative mutations are likely to have a higher transferability among breeds because predictions derived from them do not depend on linkage disequilibrium between the marker assayed and the causal mutation. A GWAS study confirmed that variants in coding regions explain more of the trait variation than random SNPs, exemplifying the important role of missense mutations in genomic evaluation [17].

Candidate genes in the X chromosome QTL regions were chosen according to their biological role. They are: LOC100138021, CENPI, TAF7L, NXF2, CYLC1, TEX11, AR, UXT and SPACA5. The gene LOC100138021 is a homolog of the TCP11 gene and plays an important role in spermatogenesis and sperm function in humans [18]; CENPI participates in gonadal development and gametogenesis in rats [19]; TAF7L could be spermatogenesis-specific and is related to human male infertility [20]; NXF2, is an mRNA transporter and its inactivation causes bull infertility [21]; CYLC1 is a protein of the spermatozoa head with a cytoskeleton function in cattle and humans [22]; UXT protein participates in the AR transcription regulation in human prostate cells [23] and SPACA5 codes for protein in the sperm acrosome with lysozyme activity (Gene Ontology). The first four genes are in the QTL associated with percentage of morphologically of normal sperm (39 Mb-59 Mb) and the others in the QTL associated with scrotal circumference (68 Mb-93 Mb) [14,15].

SNPs in TEX11 and AR genes were found to be associated with semen and testis traits in cattle [24]. In this study, the aim was to validate these polymorphisms in another population, so their effect might be confirmed. This validation exercise was extended to include two more candidate genes, not related to the above mentioned QTL: PLAG1 and TEKT4. The PLAG1 mutation on BTA14 has a pleiotropic effect in many economically important traits in cattle and other species [25,26] and TEKT4 is a gene associated with spermatozoa motility from a proteomic study in Brahman cattle [27].

A total of eleven genes were chosen as candidates for andrological traits in cattle. The aim was to locate potential causative mutations, defined as non-synonymous SNPs and SNPs or indels in coding and splicing regions, and to verify their association with scrotal circumference (SC) and percentage of normal sperm (PNS) traits in beef bulls. Further, the association of the candidate SNPs with female fertility traits and male growth traits were tested for evaluation of pleiotropic effects.

Results and discussion

SNP discovery and genotyping

Based on the 69 bull genomes available, files with SNPs and indels were generated for the target regions. The variants were selected according to their locations (coding regions and splicing sites). For each of seven genes in the SC and PNS QTL regions in chromosome X and for TEKT4 in chromosome 25, one non-synonymous SNP per gene was selected to be genotyped in the entire population. For TEX11 two SNPs were tested.

Details about these genes and selected SNPs, such as position and identification number, variation, position in the coding region (CDS) and in the protein and the amino acid change can be found in Table 1. The usual hypothesis applies: changes in the amino acid composition may change protein activity and affect the associated phenotype.

Table 1 Description of candidate genes and interrogated SNP

Using TaqMan assays, 1,021 male cattle were genotyped for twelve SNPs: eight SNPs in the genes described above and four SNP studied in other populations previously. The four SNPs studied before were on the genes TEX11 (Tex11_r38k and Tex11_r696h), AR (AR1_In4) and PLAG1 (rs109231213) [24,25].

The allelic and genotypic frequencies for all these SNPs are described in Table 2 and 3. All of them presented a good distribution to be used for association analyses. As expected, for the SNPs located in the X chromosome, heterozygotes were not identified in males.

Table 2 Allelic and genotypic frequencies of the SNPs (1,021 bulls)
Table 3 Allelic and genotypic frequencies of the SNPs (2,024 cows)

Analysis of linkage disequilibrium

The linkage disequilibrium (LD) was estimated and an arbitrary r2 value of 0.80 was considered to indicate that SNPs were in strong linkage disequilibrium. For the bulls (Additional file 1: Table S1), the r2 estimates ranged from 0 to 1. However, only two pairs of SNPs had a high estimate of r2. The SNPs Tex11_r38k and Tex11_r696h were completely linked (r2 value of 1). This means that all animals had the same genotypes for both SNPs and it was impossible to differentiate their effects. For the association analyses, the SNP Tex11_r38k was therefore used to represent both. The SNPs located in LOC100138021 and TAF7L genes also had a high LD (r2 value of 0.981); all the other pairs estimates were lower than 0.80. The effect of these SNPs could therefore be analysed separately.

For the cows (Additional file 2: Table S2 and Additional file 3: Table S3), similar results were obtained. The SNPs Tex11_r38k and Tex11_r696h have a high r2 value (r2 = 0.993 for Brahman cows and r2 = 0.927 for TC cows). SNPs located in LOC100138021 and TAF7L genes also had a higher LD (r2 value of 0.852 for Brahman cows and 0.827 for TC cows); all the other pairs estimates were lower than 0.80.

Usually, the LD of chromosome X is higher in comparison to the LD of the autosomes [28]; however, the r2 values obtained here (r2 < 0.80) for most of the SNPs may be explained in terms of population characteristics. The study population consisted of animals of a composite breed and crossbreeds, forming a population where a higher level of recombination will be expected.

Association analyses

The substitution allelic effect for the named allele of each SNP, standard errors, p-values and the percentage of the additive genetic variance explained were estimated for each studied trait: percentage of normal sperm at 24 months (PNS), scrotal circumference at 12 (SC12), at 18 months (SC18) and at 24 months (SC24) (Table 4). Significant SNPs reported here for SC at three ages and for PNS (Table 4) serve to confirm previously described QTL regions [14,15]. The results show that GWAS research is a very strong tool to find candidate genes. Further, combining the GWAS information with the available genome sequences yield putative causative mutations (non-synonymous and disruptive SNP) associated with SC and PNS.

Table 4 SNP association analysis in bull population (andrological traits)

The most significant SNPs associated with PNS (p < 0.05) explained from 1.93% to 2.73% of the additive genetic variance in the bull population. The percentage of additive genetic variance explained by the most significant SNP (p < 0.001) for SC traits varied from 0.65% to 13.47%, in this population. It is worth noticing the effects of SNPs in AR and TEX11 genes for all SC traits and the ones in LOC10013802 and TAF7L for SC12 (Table 4). These percentages of explained genetic variance are considered high for individual mutations. For example, known causative mutations such as those in the calpain and calpastatin genes associated with meat quality explain up to 2% of the phenotypic variance [29]. The very high percentage of additive genetic variance explained by some markers could be explained by the fact that the LD estimates for the X chromosome are higher in comparison to the autosomes. The lower occurrence of cross-overs means that larger DNA fragments are inherited. The SNP associations we observed may therefore report the combined effects of more than one marker. An example of this is the complete linkage of the two SNPs in the TEX11 gene in the bull population and the inability to differentiate their effects.

The genes LOC10013802, TAF7L, CENPI and NXF2 were located in the PNS QTL, but they also influence SC traits, indicating a pleiotropic effect for these andrological traits. For these genes and three more (CYLC1, UXT, SPACA5) this study provides the first evidence of an association with male fertility traits in livestock. Polymorphisms in TAF7L, NXF2 and LOC10013802 have been associated with male fertility traits in humans and mice, indicating that these genes have conserved roles among mammals [20,30,31], [21,32], [33], respectively. For CYLC1, UXT, SPACA5 and CENPI, this is the first SNP association study to provide evidence of their influence in male mammal fertility traits.

The SNPs located in AR and TEX11 have been studied before, and their influence on scrotal circumference traits in Brahman and Tropical Composite bulls has been documented [24]. The similar results obtained, in the present study, validate these findings. The SNP in the AR gene is located in intron 4 and it is in linkage disequilibrium with important variants located in the promoter region of the gene in cattle. These variants are responsible for the creation/absence of binding sites for SRY gene, the gene that initiates sex differentiation in mammals [24]. For TEX11, the gene with a SNP that has a large effect on the analysed traits, there is some information based on humans and mouse studies. It is known that this gene acts in gonad development [34], as a meiosis-specific factor [30,35,36] and its loss of function eliminates the spermatocytes. Defects in this gene may cause chromosomal asynapsis and reduction in crossover formation [35]. It has been also shown that it acts in male fertility by competing with estrogen receptor (ERβ) for a specific binding site in the HPIP protein [37]. The non-synonymous SNPs described here changes the amino acid 38 and 696 and the region of TEX11 protein that binds HPIP protein is from aminoacids 378 to 947 [37], suggesting that the SNP Tex11_r696h may be the best candidate mutation, since it changes an important protein site.

The significant effect (p < 0.005) of the SNP in PLAG1 for SC12 indicates that this gene also influences scrotal circumference measurements in cattle. This SNP has a pleiotropic effect on a number of growth traits [25] and it was associated with age at 26 cm of SC in cattle [25]. The absence of association of TEKT4 gene (candidate by a proteomic study with spermatozoa motility in cattle) suggests that there are post-transcriptional changes that might be responsible for affecting the phenotype not related to genotypic variation.

The strong associations seen here confirm that genes located on the X chromosome affect male fertility traits and SNPs in this chromosome should be incorporated in the genetic analysis in order to have better evaluations and genomic values predictions. A recent study validated the importance of coding SNP variants and confirmed that missense SNPs mapped explain the greatest variant for many traits in cattle [17]. The fine-mapping conducted here also highlights the importance to work with putative causative mutation and the benefits that it might bring to the animal breeding and genetics.

The single marker regressions with the top markers fixed are shown in Table 5. The results for PNS and SC traits at different ages indicate that the effects of the top markers are independent in the population. The fixation of the top marker for each trait still allows the significance of other SNPs to be detected. In addition to the LD results shown above, these results indicate that SNPs are segregating separately and that they independently contribute to the traits.

Table 5 Single marker regression fixing the top markers

The significant SNPs found for these andrological traits are good candidates to be included in customized low density chips for cattle evaluation [16]. Further GWAS and causative mutations studies in the autosomes might be done in the future in order to identify more informative variants for these traits.

These SNPs were also analysed for growth traits in the same males (Table 6). Almost all the SNPs were associated with birth weight and some were also associated with weaning and yearling weights, mainly the SNP in PLAG1 and TEX11 genes (p < 0.05) (Table 6). It indicates the selection of the avourable alleles for andrological traits may also select for heavier animals. This result is not surprising given the known genetic correlation between weight and SC [7]. The association of the SNP in PLAG1 with growth traits confirm previous results reported [25].

Table 6 SNP association analysis in bull population (growth traits)

Overall, there was no association between tested SNP and reproductive traits in females (Table 7). The SNP in the AR gene was also associated with the age at the first corpus luteum (AGECL) in Brahman cows (p < 0.05) (Table 7). The selection for this allele may contribute to later cycling cows.

Table 7 SNP association analysis in cow population

The TaqMan assays were also used to genotype 90 Angus cattle in order to verify the origin of the alleles (Table 8). For the SNPs located in the genes LOC100138021, TEX11, AR, NXF2, UXT and TAF7L, one of the alleles is fixed in the Angus population and for the genes CYLC1, CENPI and PLAG1, one allele is close to fixation. Fortes et al. found similar results for the PLAG1 SNP [25]. The Brahman population of the study represented all genotypes for the SNPs. The source of variation for ten out of twelve of the genes studied therefore appears to be the zebu cattle.

Table 8 Allelic frequencies in Angus population (90 animals)

Conclusions

The QTL on chromosome X associated with bull fertility have been confirmed in an independent population. Putative causative mutations in the X chromosome influence the production of normal sperm and scrotal circumference of young bulls in Zebu cattle and their crossbreds. They are good candidate SNPs to be incorporated in low-density chips that could facilitate genetic evaluation. Moreover, the information provided on key genes may serve as basis for further functional experiments. Pleiotropic effects across andrological and growth traits were reported; nevertheless these mutations had no impact on female fertility traits.

Methods

Animals and phenotypic data

Animal Care and Use Committee approval was not required for this study because the data were obtained from existing phenotypic databases and DNA storage banks as described below.

Data from 1,021 bulls whose breeds were Brahman (n = 113), Tropical Composite (n = 741) and crossbreeds (n = 167) from five farms born from 2004 to 2009 were used in the current study. These animals were bred by the Beef CRC and the experimental design as well as the general population description of the CRC were reported previously [7,9]. Importantly, the animals used in this study had not been genotyped for any of the previous CRC studies.

The traits utilized in this study were: scrotal circumference at 12 (SC12), at 18 months (SC18) and at 24 months (SC24) and percentage of normal sperm at 24 months (PNS), birth weight (BW), weight at 200 days (W200), weight at 400 days (W400) and weight at 600 days (W600). All the traits were measured in the same bulls. Details about the measurements of the andrological traits can be found in [24].

Data from 935 Brahman cows and 1,089 Tropical Composite cows also from Beef CRC population were used in the current study. The traits analyzed were: age at first corpus luteum (AGECL) and postpartum anestrus interval (PPAI). More information about the population, breeds and the phenotypes could be found in [38].

In order to determine the proportion of Bos taurus alleles, 90 Angus cattle were also genotyped using the same methodology described below and the frequencies were compared.

Bioinformatic analyses

The genome of 64 bulls (from CSIRO Animal, Food and Health Sciences in St Lucia, Brisbane, QLD, Australia) was used to generate a VCF (variant call format) files with variants (SNPs and indels) information for the target regions using the software SNVer version 0.4.1.[39] The breed of the 64 bulls are: 42 Brahman, 14 Hereford and 8 Senepol.

Variant Effect Predictor (VEP) is an online tool from Ensembl website (http://www.ensembl.org/info/docs/tools/vep/index.html) was used to predict the functional consequences of detected variants. The aim was to find disruptive variants with a major effect on the traits. So, we started looking for non-synonymous SNPs and SNPs/indels in coding regions and splicing sites of the candidate genes listed above.

Genotyping of selected SNPs

Custom TaqMan assays were developed for the novel selected SNPs according to TaqMan Array Design Tool [40] and are listed in Additional file 4: Table S4. SNPs in TEX11 (Tex11_r38k and Tex11_r696h), AR (AR1_In4) and PLAG1 (rs109231213) genes, primers and probes were used as described by [24] and [26], respectively.

Analysis of linkage disequilibrium

The linkage disequilibrium (r2) was estimated using the Plink program (http://pngu.mgh.harvard.edu/~purcell/plink/, accessed 5 June 2014) to determine which SNPs were more frequently inherited together. Considering two loci with two alleles for each locus (A/a and B/b), the following formula was used:

r 2 = [ f(ABx f(ab) – f(Abx f(aB)]2f(Ax f(ax f(Bx f(b)] = D 2/ [f(Ax f(ax f(Bx f(b)] where ‘f’ is the frequency and D = f(AB) - f(A) x f(B)

Statistical analyses

The single marker regression was examined for genotyped animals using a mixed model analysis of variance with ASREML software. The mixed model is described below:

$$ {y}_i = X\beta + Z\mu + {S}_j{a}_j + {e}_i $$

Where yi represents the phenotypic measurement for the i th animal, X is the incidence matrix relating fixed effects in β with observations in y, Z is the incidence matrix relating to random additive polygenic effects of animal in μ with observations in y and S j is the observed animal genotype for the jth SNP (coded as 0, 1 or 2 to represent the number of copies of the B allele), if the SNPs were located in the X chromosome and males were genotyped, they were coded as 0 or 2 (since there is no heterozygous), a j is the estimated SNP effect, lastly e i is the random residual effect. For SC12, SC18, SC24 and PNS, the same fixed effects were used for each trait. These fixed effects included contemporary group (animals born in the same year and raised together), the interaction of year and month of birth and breed. For AGECL and PPAI, the fixed effects were contemporary group (i.e., group of heifers born in the same year and raised together), herd of origin and age of dam. For bull growth traits, the fixed effects included cohort origin and age. The p-values were not corrected for multiple testing.

The percentage of the genetic variance accounted by the jth SNP was estimated according to the formula \( \%Vj=100.\frac{2{p}_j{q}_j{a}_j^2}{\sigma_g^2} \) where p and q are the allele frequencies for the jth SNP estimated across the entire population, aj is the estimated additive effect of the jth SNP on the trait under analysis, and σ g 2 is the REML estimate of the (poly-) genetic variance for the trait.

The single marker regression was also done fixing the top marker (higher F statistic) for each trait. The p-values of the other markers were recalculated consecutively until no marker has a significant p-value. The aim of these analyses is to verify the independence of the effect among the markers.

Availability of supporting data

Animal genotypes for all markers are available as Additional file 5: Table S5.