Background

Genetic gain focused on production for the dairy goat population in New Zealand is limited largely to within farm selection and some buck transfers between farms. This has resulted in considerable unrealised marker driven selection potential which led to the search for and discovery of quantitative trait loci (QTL) for milk production traits in a subset of the population. In particular, the most significant QTL identified was located on chromosome 19 associated with all three production traits including fat yield, protein yield and milk volume [1]. A QTL at the same location was also reported in the French [2, 3] and British dairy goat populations [4]. A pleiotropic effect for this QTL has been described on a number of udder conformation traits including udder attachment and udder depth [3, 4]. It appears that udders in animals with higher milk production determined by the chromosome 19 locus are characterised by greater udder depth with apparent weaker and narrower udder attachment. It is important to evaluate potential negative pleiotropic effects before large scale genetic selection. There are examples where the potential introduction of alleles may have or have resulted in the introduction of negative characteristics. For example, in cattle the non-synonymous A > C mutation in a splice enhancer region of the bovine Diacylglycerol O-Acyltransferase 1 (DGAT1) gene dramatically reduced milk fat content but caused scouring (non-bloody watery diarrhoea), impaired growth rate and flattened intestinal microvilli [5]. Furthermore, the non-synonymous C636R mutation in bovine Mannose Receptor C Type 2 (MRC2) gene increased muscularity in meat breeds but caused Crooked Tail Syndrome [6, 7].

Goat udder traits and teat morphology play important roles in milk quality (somatic cell count), milking ability/ease of milking [8] and overall milk production [9]. Goat udder circumference [10, 11] and udder depth [12] have been positively correlated with milk volume. Goats with greater udder depth also had a higher somatic cell count, a negative indicator of milk quality, udder health and marker of mastitis [13]. Teat size and shape was likewise indicative of mammary gland health with shorter and narrower teats associated with a lower somatic cell count [13, 14]. These factors influence the animal’s productive lifespan or longevity [15]. Extensive selection pressure for milk production traits has been shown to exert a negative effect on udder and teat traits [3, 4, 16]. It is thereby important to consider these traits in animal selection. Udder health and conformation traits have not been assessed at scale in the New Zealand goat herd. This paper reports for the first time a genome wide investigation for goat udder traits of the New Zealand goat population using Illumina GoatSNP50 BeadChip (Illumina Inc., San Diego, CA, USA) and identifies caprine SNP marker rs268292132 as a viable selection tool for future marker assisted breeding programmes.

Methods

Animals and phenotypic measurements

A total of 1058 animals from 2 farms (439 animals from farm A, 619 animals from farm B) with mixed age (mean age: 4.72 years, range: 2–9 years) and of primiparous or multiparous status (mean parity: 3.72 years, range: 1–8) were scored for udder traits in February and March 2020. The New Zealand goat herd is a mixed breed population comprised of approximately 85% Saanen, 5% Toggenburg, 5% Nubian/Anglo-Nubian, 4.5% Alpine and 0.5% Sable [17]. The udder traits measured included udder depth (UD), fore udder attachment (FUA), rear udder attachment (RUA), udder furrow (UF), teat placement (TP) and teat angle (TA). Udder traits were scored on a point system from 1 to 9 described in Table 1 with definition of traits as used in McLaren et al. [18]. Additionally, the following milk production traits, milk volume, fat yield and protein yield were measured for a subset of 336 animals out of the 1058-animal cohort (141 animals from farm A, 195 animals from farm B) as part of standard herd testing procedures using Fourier transform infrared spectroscopy. Animals were of mixed age (mean age: 6.49, range: 6–9) and primiparous or multiparous status (mean parity: 5.49, range: 5–8). The animals were from the cohort described in Scholtens et al [1]. Milk production traits were modelled by extrapolation to a 305-day lactation length period, restricted to animals with lactation length > 200 days.

Table 1 Goat udder traits definitions

Genotypes

Skin samples from the 1058 phenotyped animals were collected and genotyped with the Illumina GoatSNP50 BeadChip with 53,347 SNP markers (Illumina Inc., San Diego, CA, USA) [19]. Markers with ambiguous or missing genotypes in more than 5% of goats, or with minor allele frequency < 0.01 were omitted. After filtering, 51,477 markers were retained.

Genome-wide association analysis

Genome wide association mapping with an applied mixed linear model accounting for differences in population structure was performed using genome-wide complex trait analysis tool (GCTA) v1.93.2 [20, 21]. The following mixed linear model was used in our study:

$$ y=a+ bx+g-+e $$

y is the phenotype of interest, a is the mean term, b is the fixed (additive) effect of the SNP, x is the SNP genotype, g− is the effect of population structure estimated by calculating genetic relationship matrices (GRM) between all animals [21]. Mixed linear model association analysis, leave one chromosome out (MLMA-LOCO) approach was used where the chromosome on which the significant candidate SNP is located are excluded from the GRM calculations to avoid double fitting of candidate variants. e is the residual effect.

A SNP was considered significantly associated to a trait of interest at the whole genome level if their -log10 P-values exceeded -log10(5 × 10− 8). SNP marker trait association plots against the ARS1 goat reference [22] were constructed using the ggplot R package [23].

Calculation of allelic effect sizes for marker rs268292132

Allelic effect sizes for marker rs268292132 were calculated for production and udder traits in the 1058 animal population using a linear regression model adjusted for age and farm origin. A one-way analysis of variance (ANOVA) test between the means (α = 0.05) was performed to characterise allelic effect differences between rs268292132 genotype groups presented in Fig. 2. The Shapiro-Wilk normality assumption test was performed on all data presented. Multiple pairwise comparisons adjusted for false discovery rate were utilised to highlight statistically significant differences in the data presented.

Whole genome sequencing for variant identification

Whole genome sequencing was performed on 302 goats consisting of 48 animals in the cohort used for GWAS, 45 additional animals originating from farms A and B and 209 animals distributed across four other farms. The animals were selected to capture the genetic diversity of each farm. Skin samples were collected from these animals and sequenced on the Illumina HiSeq XTen platform (Illumina Inc., San Diego, CA, USA). For each animal, 700–1000 million paired end reads with read pair lengths of 150 nucleotides were generated. An average sequencing depth of 35 × was achieved. Read sequence quality was assessed using FastQC (FastQC v0.11.7). Reads were mapped to the ARS1 goat reference genome assembly using Burrows Wheeler aligner [24]. Duplicate read pairs were marked using picard MarkDuplicates (Picard v2.1.0., https://broadinstitute.github.io/picard/). Reads aligned around small insertions/deletions were realigned using the Genome Analysis Toolkit (GATK v 3.8, RealignerTargetCreator and IndelRealigner tools) [25, 26]. Probabilities for single nucleotide and indel variants were discovered individually in each alignment using GATK’s HaplotypeCaller tool [27] and genotyped jointly across the entire cohort with GATK’s GenotypeGVCFs tool. Pairwise linkage disequilibrium R2 values between variants were calculated using PLINK software [28]. The Ensembl Variant Effect Predictor (VEP v90.3) software (https://www.ensembl.org/vep) was used to annotate variants with their calculated variant consequences based on the ARS1 reference General Feature Format (GFF) file (http://ftp.ensembl.org/pub/release-100/gff3/capra_hircus/Capra_hircus.ARS1.100.gff3.gz).

Results

A chromosome 19 QTL associated with goat udder health traits in the NZ goat population

We report a single genome wide significant QTL for several udder health traits including udder depth, fore udder attachment and rear udder attachment (Table 2 & Fig. 1). The QTL extend from chromosome 19 positions 24 - 29 Mb. A QTL signal was also detected for teat angle and udder furrow at the same locus but did not reach genome wide significance. No QTL association was observed for the teat placement phenotype. Fitting the most significant QTL SNP (rs268292132) as a covariate in the association analysis resulted in the ablation of the entire QTL signal, indicative of a single haplotype responsible for the signal.

Table 2 Genome wide significant SNPs for goat udder traits in the New Zealand goat population
Fig. 1
figure 1

Manhattan plots of SNP marker association across chromosomes 1 to 29 plus X for Fore udder attachment (A), Rear udder attachment (B), Teat angle (C), Teat placement (D), Udder depth (E) and Udder furrow (F) for 1058 animals. Population structure was accounted for by estimating genetic relationship matrices (GRM) between all animals. Genome wide significance threshold is indicated by a dashed line at P-value of 5.0 × 10−8. SNP markers of genome wide significance are coloured in orange

An overlapping locus was previously identified to be associated with milk production traits in the French [3], British [4] and New Zealand goat populations [1]. The SNP most significantly associated with milk production in the New Zealand study (rs268292132) was also the most significantly associated SNP with the above udder traits in this study (Table 2 & Fig. 1).

Whole genome sequence analysis revealed two candidate non-synonymous mutations in the proteasome 20S subunit Beta 6 (PSMB6) and the sex hormone binding globulin (SHBG) genes

Whole genome sequence analysis of 302 animals which included 48 animals in the GWAS cohort highlighted 340/40,966 variants within the 24 - 29 Mb chromosome 19 QTL interval with high linkage disequilibrium, R2 > 0.8 with the most significant SNP marker, rs268292132. Of these, no high impact variants with severe consequence on protein sequence including transcript ablation, splice acceptor, splice donor, stop gain, frameshift, stop lost and transcript amplification variants were identified (Full breakdown of the 340 variants listed in Additional file 1 & Additional file 3). Two non-synonmous mutations and one inframe deletion was predicted to alter protein sequence. The in-frame deletion in candidate gene, SENP3 at position 27,036,323 (NC_030826.1:g. 27,036,324_27,036,326del), detected at an alternative allele frequency of 0.553 and linkage disequilibrium R2 value of 0.877 with the most significant SNP marker deletes a conserved glutamate (E89del). The variant has not been reported in other goat populations (Additional file 1 and Table 3). A non-synonymous variant in the candidate gene PSMB6 at position 26,370,181 (NC_030826.1:g. 26,370,181C > T, rs647861802), observed at an alternative allele frequency of 0.566 and linkage disequilibrium R2 value of 0.887 with the most significant SNP marker encodes a substitution of conserved valine 221 to isoleucine (ENSCHIP00000027128.1:p.V222I, GERP conservation score = 0.9). The variant has been reported in the NextGen collaborative research project at an allele frequency of 0.038 (Additional file 1 and Table 3). A G > T transversion at position 27,087,979 in the SHBG gene (NC_030826.1:g. 27,087,979G > T, rs672568929) exists at an alternative allele frequency of 0.555 and linkage disequilibrium R2 value of 0.87 with the most significant SNP marker in the genome-sequenced animals. This variant encodes a substitution of serine 267/277/281/166 (alternative transcripts) to isoleucine (ENSCHIP00000021646.1:p.S267I/ ENSCHIP00000021662.1:p.S277I/ENSCHIP00000021669.1:p.S281I/ENSCHIP00000021683.1:p.S166I/ENSCHIP00000021685.1:p.S166I). This variant has been reported in the NextGen collaborative research project at an allele frequency of 0.028 (Additional file 1 and Table 3).

Table 3 Summary of candidate variants described in the study

An adverse relationship between milk production traits and udder conformation

Allelic effect sizes for marker rs268292132 adjusted for age and farm origin were calculated in our GWAS population. An average difference of 3.66 kg fat (P-value = 3.09 × 10− 4), 4.12 kg protein (P-value = 2.58 × 10− 8) and 159.52 L volume (P-value = 7.22 × 10− 10) was recorded when comparing opposing homozygotes (Fig. 2 & Table 4). The production gains were accompanied by a decrease of 1.14 score points in udder depth (P-value = 9.31 × 10− 18), 0.99 score points in fore udder attachment (P-value = 5.05 × 10− 7) and 1.19 score points in rear udder attachment (P-value = 1.08 × 10− 12) when selecting for the high production T allele (Fig. 2 & Table 4).

Fig. 2
figure 2

Allelic effect sizes for marker rs268292132 genotype groups. Production traits, Milk Fat (A), Milk Protein (B), Milk volume (C) and udder traits, Fore udder attachment (D), Rear Udder Attachment (E), Teat Angle (F), Teat Placement (G), Udder Depth (H), Udder Furrow (I) were plotted as least squared means with 95% confidence intervals. Asterisks indicate pairwise significant differences between genotype groups (P-value < 0.5, multiple pairwise comparisons adjusted for false discovery rate; α = 0.05)

Table 4 Allelic effect sizes for rs268292132 SNP marker (chr19:26,610,610)

Discussion

A genome wide association study with an applied mixed model to account for differences in population structure was performed in 1058 dairy goats in New Zealand. The study highlighted a QTL located on chromosome 19 (24–29 Mb) associated with udder depth, fore udder attachment and rear udder attachment. This QTL was also identified in a previous study of the dairy goat population in New Zealand to be associated with milk production traits; fat yield, protein yield and milk volume (as described in Scholtens et al. [1]). Our findings are in agreement with trait effects of an overlapping QTL previously reported in French dairy goats for milk fat content, milk protein content, udder depth and rear udder attachment [2, 3] and British dairy goats for milk volume, udder depth and udder attachment [4]. The pleiotropic effect of this locus on both production and udder phenotypes combined with a high density of genes within the region (22.5 genes/Mb) has made it difficult to nominate candidate genes for the observed traits. Martin and colleagues suggested several fatty acid and lipid metabolism pathway genes including phospholipase D2 (PLD2), gamma-glutamyltransferase 6 (GGT6) and the arachiodonate lipoxygenase (ALOX) family of genes [3]. Mucha and colleagues also noted a potential role for protein stability and lipid homeostasis gene, asialoglycoprotein receptor 2 (ASGR2) in udder formation [4]. Our analysis from 302 whole genome sequenced animals showed no protein-altering variants for PLD2, GGT6, ASGR1/ASGR2 and ALOX genes.

Whole genome sequence analysis revealed 340 variants within the 5 Mb QTL with linkage disequilibrium R2 > 0.8 with the most significant SNP marker rs268292132. After removing markers not predicted to change protein sequence, two non-synonmous variants and one inframe deletion variant were of interest. We propose two candidate non-synonymous variants including a valine to isoleucine variant (V222I) in the proteasome 20S subunit 6 (PSM6B) gene, and a serine to isoleucine variant (S267I/S277I/S281I/S166I) in the sex hormone binding globulin (SHBG) gene. SIFT scores were predicted at 0.3 and 0.01 respectively suggesting a likely deleterious effect for the SHBG mutation. Genetic variation in human PSM6B was associated with LDL cholesterol and total cholesterol levels in a meta-analysis of 617,303 individuals of multi-ethnic backgrounds with 32 million genotyped and imputed variants [29]. PSMB6 encodes a core component of the 20S proteasome complex involved in intracellular protein degradation. However, its role in lipid metabolism is unclear. In the same GWAS study, genetic variants in SHBG were associated with triglyceride levels [29]. SHBG regulates the plasma metabolic clearance rate of steroid hormones such as oestrogen [30,31,32], although like PSMB6 a strong functional link with lipid metabolism is not yet determined. Additionally, a GAG glutamate inframe deletion (E89del) was detected in the candidate gene Sentrin-specific protease 3 (SENP3). Limited knowledge exists in the literature on the mechanistic role of SENP3 in milk production or morphological udder development. Given its widespread roles in protein stability, chromatin organisation, transcription, DNA repair, autophagy, protein localisation and homeostasis, a number of potential mechanisms maybe of interest [33]. Attributing causality to either of these mutations will require further genotyping of these variants within a study population that contain sufficient meioses to separate out causative haplotypes and appropriate functionality testing. Despite directing the majority of our focus to variants predicted to impact protein sequence as a direct link to functionality, we acknowledge that non-coding and regulatory variants are also of interest. Indeed our whole genome analysis have revealed two splice region variants, one 5′ UTR variant and five 3′ UTR variants within the QTL that exists in high linkage disequilibrium with the most significant QTL SNP marker. Future studies involving these variants in eQTL analyses or chromatin accessibility analyses such as ChIP-seq or ATAC-seq or experimental CRISPR-cas9 variant models could provide additional insight into biological causality. Moreover, future investigation and characterisation of large copy number variants in our dataset could additionally reveal candidate variants not detectable by SNP chip methods.

Selection on the chromosome 19 QTL for milk production traits will hold economic value in dairy industries. Markers capturing the QTL effect may be utilised in marker-assisted breeding programmes to drive selection of the next generation of high-performance animals. Employing selection with the most significant marker, rs268292132 (chr19:26,610,610) was predicted to increase fat yield, protein yield and milk volume in our study population. However, a potential adverse relationship between production traits and udder conformation has been indicated for this locus [3, 4]. Our study replicated the observations of a shift in the negative direction for udder depth, fore udder attachment and rear udder attachment scores. It is unclear whether the negative correlation between production and udder traits are a consequence of increased milk production and udder volume retention giving rise to a perceived increase in udder depth with weak and narrow udder attachment, or a true biological deterioration of udder morphology independent of the production increase. In the future, it will be interesting to determine whether the gene underlying the QTL has a direct effect on mammary gland morphology. Both alleles at the rs268292132 locus are common in the dairy goat population, with the high-milk production allele approaching fixation on some farms after long-term phenotypic selection for both milk production and conformation traits. These observations imply that the magnitude of the udder effects may be well-balanced with the increase in milk production. A comprehensive analysis of udder shape deterioration versus the production increase across the wider population will be required to optimise selection and breeding programmes for marker rs268292132 as these traits affect production [9], milk quality (somatic cell count) [13], milking ability [8] and total number of days in production [15].

The overarching goal of this research is development of a marker-assisted breeding programme that can be implemented to accelerate productivity gain. This study identified use of the rs268292132 SNP as a viable selection marker for marker assisted breeding programmes to increase production.

Conclusion

Genome wide association analyses across the dairy goat population in New Zealand have discovered a 24-29 Mb QTL on caprine chromosome 19 that has a pleiotropic effect on milk production and udder conformation traits. A negative correlation between production and udder conformation traits was observed in the New Zealand goat population which corroborates reports in French and British dairy goat herds. Close monitoring of udder health is recommended for future selection programmes involving this locus. Marker assisted breeding programmes with markers effectively identifying this QTL can be utilised to select the next generation of high-performance animals to accelerate productivity gain.