The accuracies of genomic predictions are often reported to decrease with increasing genetic distance (or meiosis) from the reference population. For example, Habier et al. [1] showed that, in the German Holstein breed, accuracies of genomic predictions of animals that were distantly-related to the reference population declined. Saatchi et al. [2] reported a decline in accuracy of genomic predictions that were derived from a US Hereford population when they were tested in Canadian, Uruguayan or Argentinean Hereford populations. These results suggest that the linkage disequilibrium (LD) between markers and quantitative trait loci (QTL) was different in the validation population compared to the reference or training population. This occurs because LD within a group of related animals may be lost due to recombination in less closely-related animals. Several authors also reported that the accuracy of genomic predictions was poor for a breed not included in the reference (i.e. across-breed genomic predictions) [3,4]. Across-breed prediction is particularly challenging because, in addition to the possible occurrence of inconsistent LD between markers and QTL [5,6], QTL may be breed-specific, which places an upper limit to the accuracy that can be reached in another breed.

This problem of poor prediction for animals not closely-related to the reference population is exacerbated when BLUP (best linear unbiased prediction) is used to derive prediction equations. BLUP (or the mathematical equivalent genomic BLUP, GBLUP) is widely used for genomic prediction because of its computational efficiency and because it performs almost as well as nonlinear methods for within-breed prediction [7,8]. GBLUP assumes that the effects of all markers are drawn from the same normal distribution, which implies that all markers are assumed to have very small effects. In spite of this unrealistic assumption, GBLUP can capture the effects of QTL, even if the effects are moderate to large, by using a linear combination of markers. Since LD can extend over long genomic distances, this linear combination can include markers that are a long distance away from the QTL. For example, long-range LD probably explains why predictions based on 50 K single nucleotide polymorphism (SNP) markers have similar accuracies as predictions based on higher density chips (800 K) for within-breed prediction of Holstein cattle [9,10]. Thus, closely-related animals inherit similar long chromosomal segments to those of the animals in the reference population and hence the same linear combination of markers will predict the effect of QTL. However, if recombination breaks up these long chromosomal segments, the predictive power of the linear combination of markers will decrease [1]. In contrast, non-linear methods, such as BayesB [11], allow the effects of some markers to be large, while many markers have zero (or near-zero) effect. This allows the prediction equation to be driven by markers that are close to the QTL and in strong LD with it. The LD between such markers and their associated QTL is broken down less quickly because the recombination distance between them is small. Although using non-linear alternatives (e.g. BayesA, BayesB, BayesR) is not always superior to GBLUP for within-breed prediction, nonlinear methods are expected to improve the persistency of the accuracy of genomic predictions over future generations [1,10].

Within a single breed, a marker may be in strong LD with a QTL in spite of being some distance away. Therefore, to find markers close to and in LD with QTL in all breeds, using a reference population that includes more than one breed can be advantageous. Combining animals from multiple breeds in a reference dataset will reduce the long-range LD that is present within a breed but may not necessarily increase the accuracy of predictions, particularly if predictions are evaluated in direct offspring of the reference population. Thus, it is not surprising that, in the literature, the reported benefits of using a multi-breed reference dataset are mixed. However, some improvements have been observed for breeds with small (within-breed) reference populations and, in general, results have been more promising for beef cattle than for dairy cattle [4,12-16]. In some cases, the failure of prediction equations to benefit from a multi-breed reference population could be due to the use of medium (50 K) density SNP chips, which are unlikely to have consistent across-breed LD [6].

In this paper, we show that using a large reference population from two breeds, combined with high-density SNP genotypes and a nonlinear method for the analysis (BayesR) increases the accuracy of genomic predictions in a breed that is not included in the reference population. To create a large reference population, we expanded the current Australian reference population of progeny-tested bulls by including cows e.g. [16,17]. Our dataset of bulls and cows is similar to that recently used by Raven et al. [18] for a genome-wide association study. Here, in contrast to [18], we fitted all SNPs simultaneously and extended the BayesR methodology from Erbe et al. [10] to include cow records and estimate fixed effects. The use of cows requires making changes in the evaluation procedure because cows and bulls have different degrees of uncertainty in their measurement, i.e. there is heterogeneous error variance. In addition, if nonlinear methods identify markers that are close to QTL, they should be able to map the QTL with greater precision than alternative methods such as GBLUP. We assessed the ability of BayesR with the multi-breed reference to fine-map QTL by mapping known loci such as DGAT1 [19] and identify new genes that affect dairy traits by combining the BayesR results with differential gene expression of the mammary gland compared to that of 17 other tissue types.




Illumina Bovine HD genotypes (777 K SNPs) were available for 1620 Holstein bulls and cows, 125 Jersey bulls, and 114 Australian Red bulls. After quality control, carried out as in Erbe et al. [10], and removal of non-polymorphic SNPs, 632 002 SNPs remained. A total of 10 311 Holstein, 4738 Jersey and 249 Australian Red bulls and cows were genotyped with the Illumina Bovine SNP array (54 K SNPs) and passed parentage verification. After quality control, 43 425 SNPs remained. All animals had genotypes imputed to the higher density SNP panel using Beagle 3 [20]. The Australian Red animals were used only to evaluate the prediction equations derived from reference populations of Holstein animals, Jersey animals or Holstein plus Jersey animals. Australian Red cattle have a large component of Scandinavian Red ancestry and are genetically distinct from Holstein and Jersey cattle (Figure 1). The average LD between markers for Australian Red cattle is lower than that of either Holstein or Jersey cattle (see Additional file 1: Figure S1).

Figure 1
figure 1

Relationships between Holstein, Jersey and Australian Red dairy cattle. Shown are principal components 1 and 2 for the genomic relationship matrix [24] constructed from a random sample of Holstein (n =334) and Jersey (n =326) animals with the genotyped Australian Red (n =313) animals. Principle components were obtained using the eigen() function in R [50].


Phenotypes for the genotyped animals (trait deviations for cows and daughter trait deviations for bulls) were from the April 2013 genetic evaluations from the Australian Dairy Herd Improvement Scheme (ADHIS) for fat yield (FY), milk yield (MY), protein yield (PY), fertility (FERT), stature (STAT) and survival (SURV). Trait deviations were corrected for herd year season effects, permanent environmental effects, and heterosis. Milk composition traits, i.e. percentage of fat and protein in milk (F% and P%), were calculated using a linear approximation of the milk production and milk solid yield traits. For example, F% was calculated as:

$$ F\%=\frac{F{Y}_P}{M{Y}_P}\left[\frac{FY}{F{Y}_P}-\frac{MY}{M{Y}_P}\right], $$

where FY P is the (within-breed) average fat yield and MY P is the (within-breed) average milk yield. P% was calculated using the same methodology. Values for FY P (kg/lactation in Holstein = 284; Jersey = 522; Australian Red = 256), PY P (kg/lactation in Holstein = 243; Jersey = 193; Australian Red = 216) and MY P (L/lactation in Holstein = 7417; Jersey = 5273; Australian Red = 6254) were from the 2012 ADHIS annual report [21].

Reference and validation datasets

The Holstein and Jersey phenotypes were split into reference and validation datasets for each trait. The reference datasets consisted of six different combinations of up to 11 527 Holstein and 4687 Jersey animals. The six reference sets were: (1) Holstein bulls, (2) Jersey bulls, (3) Holstein and Jersey bulls, (4) Holstein bulls and cows, (5) Jersey bulls and cows or (6) all reference animals (Holstein and Jersey bulls and cows).

The four validation datasets consisted of a minimum of: (1) 251 Holstein bulls, (2) 81 Jersey bulls, (3) 247 Australian Red cows or (4) 114 Australian Red bulls. Validation animals for the Holstein and Jersey breeds were selected on the basis of birth year and cows that were progeny of bulls in the validation set were removed from the reference set. All bulls had more than 20 effective daughter records. The number of animals in the reference and validation populations for each breed and each trait are in Table 1.

Table 1 Number of phenotypic records for each trait in the reference and validation sets

Model fitted to the reference data

Genomic predictions for each trait were estimated for the validation datasets using only animals in the prescribed reference dataset. Two procedures were used to estimate marker effects, either GBLUP or BayesR. The model that was fitted to the reference dataset in both cases included fixed effects (overall mean, breed and sex nested within breed, when appropriate), SNP effects and polygenic effects. The model was:

$$ \mathbf{y}=\mathbf{X}\mathbf{b}+\mathbf{Z}\mathbf{a}+\mathbf{W}\mathbf{v}+\mathbf{e}, $$


y = vector of n trait or daughter deviations (phenotypes) for cows or bulls,

b = vector of p fixed effect solutions,

a = vector of q polygenic breeding values, distributed N(0, A σ 2 a ),

v = vector of m SNP effects,

e = vector of n residual errors, distributed N(0, E σ 2 e ),

X = design matrix allocating phenotypes to fixed effects (X = n by p matrix),

Z = design matrix allocating phenotypes to polygenic breeding values (Z = n by q matrix),

W = design matrix of SNP marker genotypes (W = n by m matrix),

A = numerator relationship matrix,

σ 2 a  = additive genetic variance not explained by the SNPs,

σ 2 e  = error variance.

Constructing the matrix of weights for errors (E)

The analysis aimed to account for the uncertainty in phenotypic records, particularly between bulls and cows but also for bulls with few or many daughters. This uncertainty affects the error variance associated with each record, that is e ~ N(0,E σ 2 e ), where E is a diagonal matrix constructed as diag(1/w i ), where w i is the weighting coefficient for each animal. Weights were scaled such that the error variance for animals with one observation of their own phenotype is σ 2 e . The calculation of the weighting coefficient differs between cows (which have their own records) and bulls (for which phenotypes are daughter deviations) and was done following Garrick et al. [22] i.e.:

$$ {w}_i(bulls)=\frac{d\left(1-{h}^2\right)}{4-{h}^2}, $$
$$ {w}_i(cows)=\frac{r\left(1-{h}^2\right)}{1+\left(r-1\right)t-r{h}^2}, $$

where h 2 is the heritability of a single record of the trait, d is the effective number of daughters, r is the number of records per cow and t is the repeatability of the trait. All variables (h 2, d, r and t) were obtained from ADHIS, and the heritabilities and repeatabilities for each trait are in Table 1.


The GBLUP method was implemented using restricted maximum likelihood in ASReml [23]. GBLUP assumes that all marker effects are drawn from the same distribution [i.e. \( \mathbf{v}\sim N\left(0,\mathbf{I}{\sigma}_v^2\right) \)] and a model equivalent to Equation (ii) was fitted in which Wv = Qg, where Q is a (n x n) design matrix allocating phenotypes to animals and \( \mathbf{g}\sim N\left(0,\mathbf{G}{\sigma}_g^2\right) \)]. G was calculated according to Yang et al. [24] and \( {\sigma}_g^2 \) is the genetic variance explained by all SNPs and was estimated from the data. Solutions for fixed effects (\( \widehat{\mathbf{b}} \)), polygenic breeding values (â) and genomic estimated breeding values (ĝ) for the model y = Xb + Za + Qg + e are the same as for the mixed model equations following [25]:

$$ \left[\begin{array}{ccc}\hfill \mathbf{X}\hbox{'}{\mathbf{R}}^{-1}\mathbf{X}\hfill & \hfill \mathbf{X}\hbox{'}{\mathbf{R}}^{-1}\mathbf{Z}\hfill & \hfill \mathbf{X}\hbox{'}{\mathbf{R}}^{-1}\mathbf{Q}\hfill \\ {}\hfill \mathbf{Z}\hbox{'}{\mathbf{R}}^{-1}\mathbf{X}\hfill & \hfill \mathbf{Z}\hbox{'}{\mathbf{R}}^{-1}\mathbf{Z}+{\mathbf{A}}^{-1}{\sigma}_a^{-2}\hfill & \hfill \mathbf{Z}\hbox{'}{\mathbf{R}}^{-1}\mathbf{Q}\hfill \\ {}\hfill \mathbf{Q}\hbox{'}{\mathbf{R}}^{-1}\mathbf{X}\hfill & \hfill \mathbf{Q}\hbox{'}{\mathbf{R}}^{-1}\mathbf{Z}\hfill & \hfill \mathbf{Q}\hbox{'}{\mathbf{R}}^{-1}\mathbf{Q}+{\mathbf{G}}^{-1}{\sigma}_g^{-2}\hfill \end{array}\right]\left[\begin{array}{c}\hfill \widehat{\mathbf{b}}\hfill \\ {}\hfill \widehat{\mathbf{a}}\hfill \\ {}\hfill \widehat{\mathbf{g}}\hfill \end{array}\right]=\left[\begin{array}{c}\hfill \mathbf{X}\hbox{'}{\mathbf{R}}^{-1}\mathbf{y}\hfill \\ {}\hfill \mathbf{Z}\hbox{'}{\mathbf{R}}^{-1}\mathbf{y}\hfill \\ {}\hfill \mathbf{Q}\hbox{'}{\mathbf{R}}^{-1}\mathbf{y}\hfill \end{array}\right], $$

where R −1 = E −1 σ −2 e , and all other terms are as described above. Following [25], the solutions are:

$$ \widehat{\mathbf{b}}={\left[{\mathbf{X}}^{\hbox{'}}{\mathbf{R}}^{-1}\mathbf{X}\right]}^{-1}\mathbf{X}\hbox{'}{\mathbf{R}}^{-1}\left(\mathbf{y}-\mathbf{Z}\widehat{\mathbf{a}}-\mathbf{Q}\widehat{\mathbf{g}}\right), $$
$$ \widehat{\mathbf{a}}={\left[{\mathbf{Z}}^{\hbox{'}}{\mathbf{R}}^{-1}\mathbf{Z}+{\mathbf{A}}^{-1}{\sigma}_a^{-2}\right]}^{-1}\mathbf{Z}\hbox{'}{\mathbf{R}}^{-1}\left(\mathbf{y}-\mathbf{X}\widehat{\mathbf{b}}-\mathbf{Q}\widehat{\mathbf{g}}\right), $$
$$ \widehat{\mathbf{g}}={\left[{\mathbf{Q}}^{\hbox{'}}{\mathbf{R}}^{-1}\mathbf{Q}+{\mathbf{G}}^{-1}{\sigma}_g^{-2}\right]}^{-1}\mathbf{Q}\hbox{'}{\mathbf{R}}^{-1}\left(\mathbf{y}-\mathbf{X}\widehat{\mathbf{b}}-\mathbf{Z}\widehat{\mathbf{a}}\right). $$

In this study, solutions for \( \hat{\mathbf{g}} \) were obtained with ASReml [23] and then back-solved to estimate SNP effects, i.e. to obtain solutions for \( \hat{\mathbf{v}} \), where back-solving was as described by Yang et al. [26].


This paper extends BayesR, following Meuwissen et al. [11], Meuwissen and Goddard [27] and Erbe et al. [10], with modifications to account for the heterogeneous error variance in the phenotypes and to estimate the fixed effects in the model. BayesR [10] assumes that the distribution of SNP effects is a mixture of normal distributions, with the kth component comprising a proportion prk of the mixture and having variance σ 2 k . Similar to the construction of the G matrix [24], SNP alleles in W were standardised prior to analysis in BayesR to have a unit variance (i.e. \( \left[{w}_{i,j}-2 freq\left({w}_j\right)\right]/\sqrt{2 freq\left({w}_j\right)\left(1- freq\left({w}_j\right)\right)} \), where w i,j is the genotype of SNP j for animal i, and freq(w j ) is the allele frequency of j). Note that in the following, the current estimates of the parameters in the Gibbs sampler used in the analysis (e.g. \( \tilde{\mathbf{b}} \)) are distinguished from the final estimates (e.g. \( \widehat{\mathbf{b}} \)) using superscript notation. The model is implemented using the following steps:

  1. (1)

    The error variance was sampled from a scaled inverse chi-squared distribution with mean equal to ẽE − 1 and n – 2 degrees of freedom where \( \tilde{\mathbf{e}}=\mathbf{y}-\mathbf{X}\tilde{\mathbf{b}}-\mathbf{Z}\tilde{\mathbf{a}}-\mathbf{W}\tilde{\mathbf{v}} \), and \( \tilde{\mathbf{b}} \), ã and are the current values of those terms in the model.

  2. (2)

    The fixed effects were sampled from a normal distribution with a mean given by [X ' R − 1 X]− 1 X ' R − 1 y*, following Equation (vi) where y* is the phenotype corrected for the current estimates of all other terms and fixed effects in the model, with variance [X ' R − 1 X]− 1.

  3. (3)

    The polygenic effect was sampled from a normal distribution, with mean for animal i equal to \( {\left[{{\mathbf{Z}}_i}^{\hbox{'}}{\mathbf{R}}_{ii}^{-1}{\mathbf{Z}}_{\mathrm{i}}+{\boldsymbol{A}}_{ii}^{-1}{\upsigma}_a^{-2}\right]}^{-1}{\mathbf{Z}}_i^{\hbox{'}} \), following Equation (vii), where Z i is the row corresponding to animal i in Z and A ii −1 and R ii −1 are the ith diagonal elements of A −1 and R −1, respectively. The variance of the sampling distribution for the polygenic effect for animal i was \( {\left[{{\mathbf{Z}}_i}^{\hbox{'}}{\mathbf{R}}_{ii}^{-1}{\mathbf{Z}}_i+{\mathbf{A}}_{ii}^{-1}{\upsigma}_a^{-2}\right]}^{-1} \). More details on the estimation of the polygenic effects are in the appendix of [28].

  4. (4)

    The polygenic variance was sampled from a scaled inverse chi-squared distribution with mean ãA − 1 ã and n – 2 degrees of freedom.

  5. (5)

    The effect of SNP j was sampled by first sampling a component of the mixture and then drawing j from that distribution. A residual model (i.e. \( {\mathbf{y}}_j^{*}={\mathbf{W}}_{.j}{v}_j+{\mathbf{e}}_{\boldsymbol{j}} \), where y j * is the phenotype corrected for all other effects, excluding the current marker j, W .j is column j from the genotype matrix W, v j is the allelic substitution effect of marker j, and e j is the error) was used to determine the full conditional posterior probability for each distribution k as:

    $$ L\left({v}_{j,k}\Big|{\sigma}_k^2\right)=-0.5\left[ \ln \left(1+{\mathbf{W}}_{.j}\hbox{'}{\mathbf{R}}^{-1}{\mathbf{W}}_{.j}{\sigma}_k^2\right)+{\mathbf{y}}_j^{*}\hbox{'}{\mathbf{R}}^{-1}{\mathbf{y}}_j^{*}-{\mathbf{y}}_j^{*}\hbox{'}{\mathbf{R}}^{-1}{\mathbf{W}}_{.j}{v}_{j,k}\right]+ \ln \left(p{r}_k\right), $$

    where pr k is the current estimate for the proportion of markers from distribution k, v j,k is an estimate of the effect for marker j when sampled from distribution k, and \( {v}_{j,k}={\left[{\mathbf{W}}_{.j}\hbox{'}{\mathbf{R}}^{-1}{\mathbf{W}}_{.j}+{\sigma}_k^{-2}\right]}^{-1}{\mathbf{W}}_{.j}\hbox{'}{\mathbf{R}}^{-1}{\mathbf{y}}^{*} \) (following Equation (vi)). The full conditional posterior probability that marker j is from distribution k was calculated as:

    $$ {\left\{{\displaystyle \sum_{l=1,4}} \exp \left[L\left({v}_{j,l}\Big|{\sigma}_l^2\right)-L\left({v}_{j,k}\Big|{\sigma}_k^2\right)\right]\right\}}^{-1}. $$

    More details for the derivation of the full conditional likelihood function are in Additional file 2 (see Additional file 2). Once the component of the mixture distribution was determined, allele effects were sampled from a normal distribution, using the residual model with a mean v j,k and variance \( {\left[{\mathbf{W}}_{.j}\hbox{'}{\mathbf{R}}^{-1}{\mathbf{W}}_{.j}+{\sigma}_k^2\right]}^{-1} \).

  6. (6)

    The prior pr k was updated by sampling from a Dirichlet distribution given by pr k  ~ Dir(α k  + β k ), where α k is the prior counts for markers from distribution k and β k is the current number of markers with effects sampled from distribution k. The prior assumed one marker from each distribution (i.e. α k =1).

BayesR was implemented with a multi-threaded C++ program to improve computing performance. Based on Erbe et al. [10], we defined four possible distributions for σ 2 v,k with variance equal to 0, 0.0001σ 2 a2, 0.001σ 2 a2, and 0.01σ 2 a2, where σ 2 a2 is the additive genetic variance explained by the pedigree, which was determined prior to the analysis by fitting y = Xb + Za + e (following Equation (ii)) in ASReml [23]. The Gibbs sampler used at least 30 000 iterations, with 20 000 iterations discarded as burn-in, and each analysis had five replicate Gibbs sampling chains. Final parameter estimates were the means of the sampled effects in the post burn-in iterations, which were obtained separately for each chain.

Assessment of the accuracy and bias of predictors

The model that was fitted to the reference datasets always included the estimate of marker effects (\( \widehat{\mathbf{v}} \)) and a polygenic (â) term. Thus, predictions for the Holstein and Jersey validation datasets considered only predictions based on genotype (\( {\widehat{\mathbf{y}}}_{\mathbf{v}}=\mathbf{W}\widehat{\mathbf{v}} \)) or prediction of the total genetic merit of the animal (\( \widehat{\mathbf{y}}=\widehat{\mathbf{a}}+\mathbf{W}\widehat{\mathbf{v}} \)). A proxy for the accuracy of prediction was assessed for these two quantities as the correlation between the prediction and the phenotype [i.e. \( r\left(\mathbf{y},\widehat{{\mathbf{y}}_{\mathbf{v}}}\right) \) or r(y, ŷ)] and the bias in the prediction was assessed as the regression coefficient of phenotype on the prediction [i.e. \( b\left(\mathbf{y},\widehat{{\mathbf{y}}_{\mathbf{v}}}\right) \) or b(y, ŷ)]. For BayesR, the accuracy was the average correlation across the five replicate chains.

Accuracies were calculated for many combinations of dataset (bulls or bulls and cows), for each method (GBLUP or BayesR), breed (single or both breeds), and with or without inclusion of the polygenic term in the prediction. Therefore, to summarize the effects of all these factors on accuracy, the accuracies (r) were analysed using the following linear model rm,n,o,p = datasetm + datasetm.methodn + datasetm.breedo + datasetm.polygenicp + em,n,o,p. We did not use this model to test significance of each factor because the accuracies were not independent. Rather, we used the model to estimate the effect of each factor and reported these estimates. Data on bias were analysed in the same way.

Derterminating the precision of QTL mapping

To map QTL, GEBV in sliding windows of 250 kb (i.e. ‘local’ GEBV) [29] were calculated for each animal from the multi-breed bull and cow reference dataset for the milk production traits (FY, MY, PY, F%, P%). Local GEBV were calculated as \( {\mathbf{W}}_{{\boldsymbol{j}}_1:{\boldsymbol{j}}_2}{\widehat{\mathbf{v}}}_{{\boldsymbol{j}}_1:{\boldsymbol{j}}_2} \), where \( {\mathbf{W}}_{{\boldsymbol{j}}_1:{\boldsymbol{j}}_2} \) and \( {\widehat{\mathbf{v}}}_{{\boldsymbol{j}}_1:{\boldsymbol{j}}_2} \) includes all SNP markers within a 250 kb region of the genome. Adjacent 250 kb windows were separated by 50 kb. The variance of the local GEBV was determined for each breed, trait and window. If the variance of a window was greater than 50 times that of an average window, the window was defined as containing a QTL. Windows that contained QTL were examined for possible candidate loci based on QTL reported in the literature and for genes that were over- or under-expressed in the mammary gland (P < 1 × 10−5) compared to 17 other tissue types [30]. To obtain the latter, RNA was extracted from 18 tissues (including mammary gland) in triplicate from a single lactating Holstein cow at one time point. RNA was sequenced on the Illumina HiSeq2000 platform using 100 base paired end reads. After quality control and filtering, approximately 4 × 107 to 1 × 108 reads per tissue were aligned to the Ensembl annotation of the UMD3.1 bovine genome assembly using Tophat2 [31]. A matrix of gene counts by tissue was constructed with HTSeq [32] and the bioconductor ‘edgeR’ package [33] was used to perform tissue-specific expression analysis where the intercept was the mean gene expression across all tissues.

QTL often affected more than one milk production trait. We summarised the pleiotropic pattern of the effects on milk production traits of windows that were identified to contain QTL as follows. The correlation between local GEBV [34] for all pairwise combinations of traits were calculated for windows for which the local GEBV variance for each trait was greater than 3 times that of an average window. Windows with mid-points separated by less than 0.5 Mbp and with similar patterns of effects were assumed to be part of the same QTL and combined into a single region. QTL were allocated to one of nine possible groups, first according to the QTL’s largest effect on a yield trait (FY, MY, PY) and then by the QTL’s pattern of pleiotropic effects on the remaining two yield traits (defined by either a negative (−) or positive (+) correlation, or with no (n) effect. For example, a 'FY-' QTL had its largest effect on FY and a negative correlation with (either one or both) of MY and PY. Similarly, a 'MYn' QTL had its largest effect on MY with no notable effect on either FY or PY. QTL regions affecting only P% were grouped with the MYn QTL as a change in milk composition was assumed to be a sensitive measure of increased milk volume (i.e. an increase in milk volume with no change in milk solids would result in a decreased P%. Hence P% was considered to be potentially more sensitive to changes in MY than to changes in milk volume than L of milk per lactation).


Variance components from GBLUP and BayesR

The proportions of phenotypic variance captured by polygenic effects, SNPs and residuals for each method were investigated to assess differences between the GBLUP and BayesR. The proportion of phenotypic variance captured by genetic terms (i.e. SNP + polygenic) in GBLUP and BayesR differed by less than 5% for most traits (Table 2). The notable exception was F% (Table 2), for which the BayesR estimate of the genetic variance (SNP + polygenic) was 20 to 30% smaller than the GBLUP estimate.

Table 2 Variance components from GBLUP and BayesR for the combined (bull and cow) reference sets

The total genetic variance accounted for less than 5% of the phenotypic variance for FERT and SURV; 20 to 60% of the phenotypic variance for FY, MY, PY and STAT; and more than 70% of the phenotypic variance for F% and P% (Table 2). The variance captured by SNPs was equal to about 70% of the genetic variance for production traits (FY, MY, PY, F%, P%) and about 90% for STAT and FERT. For SURV, the proportion of genetic variance captured by SNPs in Jersey cattle was low (less than 60%) compared to the estimate in either the Holstein or the multi-breed reference datasets (about 75%). BayesR and GBLUP resulted in very similar estimates of the variance captured by SNPs, relative to the total genetic variance, in Jersey cattle for most traits. However, SNPs in the Holstein and the multi-breed Holstein/Jersey reference datasets were estimated to capture 5 to 10% less of the total genetic variance with BayesR than with GBLUP.

The average number of SNPs in each distribution for BayesR indicates that most SNPs (>99%) had no effect on the traits (Table 3). More than 10 SNPs were estimated to come from the distribution with the largest variance (i.e. 0.01σ 2 a2 ) for P% and F% in the Jersey and the multi-breed Holstein/Jersey reference datasets, and for P%, STAT and FERT in the Holstein dataset. Between 5 and 10 SNPs were estimated from the distribution with the largest variance for FY, F% in Holstein cattle; for FY, MY, PY, STAT and FERT in Jersey cattle and for FY, MY, STAT and FERT in the multi-breed Holstein/Jersey reference dataset. SURV had the lowest number of SNPs estimated from the distribution with the largest variance for all traits for both the Holstein and Jersey datasets.

Table 3 Average number of SNPs estimated to be in each distribution by BayesR 1

Assessment of the accuracy and bias of predictions

Averaged across the five milk production traits, the accuracy of Holstein GBLUP predictions using bulls only in the reference dataset was equal to 0.61, compared with 0.52 if only pedigree information (no SNP effects) was used (see Additional file 3: Table S1). Increasing the size of the reference dataset by including cow records had the largest and most consistent effect on improving the accuracy of genomic predictions for milk production traits (FY, MY, PY, F%, P%) and STAT. The accuracy increased by an average of 5.4% in the Holstein and 4.2% in the Jersey breed when cows were added to the reference datasets for these traits. However, there was little or no benefit of adding cows to the reference dataset for FERT in the Holstein breed and for SURV in each breed. The effect of adding Jersey animals into the combined (bull and cow) Holstein reference dataset had little effect on the accuracies for milk production traits in the Holstein breed, but there was a small average increase in accuracy of 1% for milk production traits in the Jersey breed. Genomic predictions for all Jersey and Holstein validation populations are in Additional file 3 (see Additional file 3: Table S2 and Table S3). Table S1 (see Additional file 3: Table S1) summarises the effects of the reference dataset, the method of prediction and the addition of the polygenic term on the accuracy and bias of predictions.

Accuracies obtained with the BayesR method were generally equal to or higher than those with GBLUP (see Additional file 3: Table S1). The average increase in accuracy using BayesR for milk production traits was equal to 6 and 3% for the bull only and the combined (bull and cow) reference datasets for Holstein cattle and about 5% for Jersey cattle. The largest increases in accuracy when using BayesR were observed for F% for both Holstein and Jersey cattle, probably because of the large-effect loci that segregate for this trait [7,8]. This occurs in spite of the apparent underestimation of phenotypic variance by BayesR reported in Table 3.

Genomic predictions for FERT (r ≈ 0.50) and SURV (r ≈ 0.43) in Holstein cattle were little affected by the prediction method or reference dataset used (see Additional file 3: Table S1). In Jersey cattle, accuracies obtained for FERT when using SNP information and only bulls for training were lower than the accuracy of pedigree-based predictions obtained when using the combined (bull and cow) reference dataset. Accuracies of genomic predictions for SURV in Jersey were rarely higher than those based on pedigree data. It is likely that these results for FERT and SURV reflect the low heritability and low accuracy of the records for these traits.

Adding polygenic effects to the prediction model increased the accuracy for milk production traits by on average 1 (Holstein) and 3% (Jersey) when using the combined (bull and cow) reference datasets (see Additional file 3: Table S1). However, adding polygenic effects increased bias by on average 13 and 17% in Holstein and Jersey cattle. The effect of adding polygenic effects on prediction bias was similar for the bull only and combined (bull and cow) reference datasets, and a similar bias was also observed in the pedigree (only) predictions. It seems that the pedigree relationships cause the increase in bias observed when polygenic effects are added to the genomic predictions, and this increase in bias was independent of whether the bull only or combined reference datasets were used (note that national genomic evaluations in ADHIS regress parent averages by approximately 0.6 to account for this bias). When predictions were based on SNP effects only, the overestimation of GEBV was greater in Jersey (average slope =0.94 for milk production traits) than in Holstein cattle (average slope =1.02) but, in general, the slope of the regressions did not differ notably from 1.

Within- and multi-breed genomic predictions for Holstein and Jersey

Using the combined (bull and cow) reference datasets and excluding polygenic effects was found to give the ‘best’ (highest accuracy with least bias) genomic predictions for Holstein and Jersey validation animals. The observed accuracies and bias for these reference datasets when using only SNP effects for prediction are in Table 4 for milk production traits. These results show that BayesR resulted in an average increase in accuracy of 3 and 6% in the Holstein and Jersey single breed reference datasets, compared to GBLUP. There was little effect (±1%) of using the multi-breed reference dataset on prediction accuracies when using BayesR, and a small favourable effect (<2%) for GBLUP.

Table 4 Accuracy and bias of within- and multi-breed genomic predictions for milk production traits

Across-breed genomic predictions

Table 5 gives the accuracy and bias when prediction equations were tested in a breed not included in the reference population. Using predictions from the other breed resulted in a 40% reduction in prediction accuracy for the Holstein and Jersey breeds (Table 5), compared to prediction accuracies when the target breed was included in the reference dataset (Tables 4). The accuracy of prediction for the Australian Red breed was on average 3 and 9% greater when using a reference population that included both Jersey and Holstein animals compared to a reference population that included either Holstein or Jersey animals, respectively (Australian Red animals were never included in the reference population).

Table 5 Accuracy and bias of across-breed genomic predictions for milk production traits

Across-breed predictions showed an overall benefit of using BayesR compared to GBLUP (Table 5). Across various traits and breed combinations, BayesR outperformed GBLUP by on average 7% for all across-breed predictions in milk production traits. BayesR showed a very large (17%) advantage for F%, probably due to the segregation of SNPs with large effects [7,8] and a consistent advantage of 5 to 7% for FY, MY and PY. The combined effect of using BayesR and a multi-breed reference increased the accuracy of genomic predictions for the Australian Red animals by 8 and 17%, compared to the accuracies obtained from a single-breed reference dataset of Holstein or Jersey animals using GBLUP.

Precision of QTL mapping with BayesR and GBLUP

We hypothesized that BayesR results in more accurate across-breed genomic predictions because it locates QTL effects more precisely in the genome. We examined the QTL regions identified by BayesR and GBLUP for QTL previously reported in the literature and identified several regions that contain QTL for milk production traits (e.g. DGAT1 [19], ABCG2 [35], FASN [36], SCD [37], the casein complex, LALBA and PAEP (formally LGB) [38]; GHR [39] and AGPAT6 [40]). In most cases, except for DGAT1 and GHR, the gene in the QTL region that was most over- (or under-) expressed in the mammary gland (compared to the 17 other tissues analysed) matched the loci reported in the literature (Table 6). Although GHR is cited as a candidate gene for the region identified on bovine chromosome BTA20, CCL28 showed higher differential expression (P < 1 × 10−29) and it should be noted that this region is reported to contain other QTL [41].

Table 6 Regions with large variance in local GEBV from BayesR for milk production traits

To investigate the precision of GBLUP versus BayesR in mapping QTL, we specifically investigated the mapping of the PAEP gene. Figure 2 shows the absolute value of the SNP effect estimates in the region. With BayesR, SNPs were identified that have larger effects on milk production traits than most of the surrounding SNPs. In contrast, with GBLUP all the SNPs in the identified region had small effects, although there was possibly a small increase in SNP effect estimates for SNPs for which BayesR also found larger effects. In spite of these small effects, somewhat surprisingly, the local variance in GEBV using GBLUP did find peaks in the region of PAEP (Figure 3). This is probably due to the SNP estimates in the linear combination for local GEBV reinforcing each other in the area of the peak but almost cancelling each other out in other regions. However, a careful inspection showed that, although GBLUP often indicated a region with large GEBV variance near the QTL, the maximum variance was larger and more concentrated near the reported QTL for BayesR than for GBLUP. This is due to the heterogeneous variance assumptions in the BayesR method, which allow SNPs in high LD with the QTL to have larger effects.

Figure 2
figure 2

SNP effects estimated by BayesR and GBLUP for FY, MY and PY near the PAEP gene. Shown is the (mean corrected) absolute value of SNP effect estimates from the bull and cow, multi-breed reference population. Traits are FY = fat yield, MY = milk yield and PY = protein yield. The position of PAEP on BTA11 is marked (*). Note the changed y-axis scale for each graph.

Figure 3
figure 3

Local GEBV variance near the PAEP gene for FY, MY and PY using BayesR and GBLUP . Shown is the (mean corrected) GEBV variance in 250 kb windows for Holstein and Jersey reference animals from SNP effects estimated from the bull and cow, multi-breed reference population. Traits are FY = fat yield, MY = milk yield and PY = protein yield. The position of PAEP on BTA11 is marked (*). Note the changed y-axis scale for each graph.

PAEP is reported to have a large effect on PY and smaller effects on MY and FY, and encodes the primary whey protein of bovine milk [38]. Although GBLUP indicates a region of high GEBV variance that encompasses PAEP, BayesR captured this pattern of effects more accurately and estimated appropriately the SNPs with large effects near PAEP in the analysis of PY. The analysis of FY, MY and PY with GBLUP likely captured the effect of PAEP, but the effect seemed to be dispersed over a large region that covered possibly the entire 5 Mbp region shown in Figure 2.

A second example of QTL mapping is provided in Figures S2 and S3 (see Additional file 1: Figures S2 and S3) for AGPAT6. In agreement with the other reports [40], we observed AGPAT6 to have a large effect on FY with smaller effects on MY and PY. Similar to PAEP, the effect that was estimated for AGPAT6 by GBLUP was spread over a larger genomic region than the effect that was estimated by BayesR. Interestingly, the effects of AGPAT6 on PY estimated by both methods were very similar. It seems that the difference between BayesR and GBLUP declines as the effect size of a locus decreases. In addition, there may be two other QTL near AGPAT6 that affect MY and PY.

All QTL with large effects (defined by a local GEBV variance greater than 50 times that of an average window) identified in the BayesR analysis with the multi-breed bull and cow reference dataset and their pleiotropic effects are presented in Table 6. Several of these regions were also identified by GBLUP and by a previous study on this data using a genome-wide association approach with single-marker regression analysis [18]. We identified QTL from eight of the nine possible groupings for pleiotropic effects. That is, large-effect QTL for milk yield traits could have positive, negative or no (observable) correlation with other milk yield traits.


The accuracy of genomic predictions using GBLUP depends on the size of the reference population [42,43]. Thus, when a large reference population is available for a single breed of dairy cattle, such as Holstein, GBLUP captures most of the potential accuracy for genomic predictions and there seems little benefit in using nonlinear methods for prediction, such as BayesR, or high-density genomic markers [9]. However, when predictions need to be more robust and are used to predict genetic merit of distantly-related animals, such as animals in future generations or animals from different breeds, the benefits of genomic prediction using GBLUP with medium-density SNPs decreases compared to nonlinear methods. This was first pointed out by Habier et al. [1] who reported poor predictions with GBLUP over successive generations compared to BayesB. Here, we show the advantages of nonlinear genomic prediction methods with across-breed predictions. We showed that the accuracy of genomic predictions obtained using BayesR increased by 8 and 17% compared to GBLUP predictions from a single breed when they were estimated for Australian Red animals from a multi-breed Holstein/Jersey reference population. In regions that contain known mutations that affect milk production, we demonstrated that BayesR localises SNP effects to smaller genomic regions than GBLUP. Thus, robust genomic prediction of genetic merit and QTL mapping are related problems, which can both be accomplished by nonlinear methods such as BayesR.

Increasing the size of the reference population by including cows increased accuracy of genomic predictions by 4.2 to 5.4% for traits with moderate to high heritability. Adding cows had little or no effect on the bias of predictions. This is in contrast to the bias introduced when adding cows in French and US studies [44,45], possibly because, in our case, cows were sampled from commercial herds with little potential for bias from preferential treatment and animals were not selectively genotyped based on genetic merit. In our data, adding cows benefitted predictions despite their phenotypic records being less accurate than records on bulls. This is probably because the size of the reference population increased substantially by adding the cows, by almost three times in the Holstein data and five times in the Jersey data. A further benefit of using the Holstein cows was that they were more genetically diverse than the Holstein bull population (see Additional file 1: Figure S4). This diversity is useful to identify SNPs that track causative mutations, and thus contributes to improving the robustness of genomic predictions. Since the cows that were added to the reference population were animals from commercial farms, it is possible that some animals may have been recently admixed with another breed and present varying degrees of traditional ancestry with Australian dairy cattle, such as British Friesians.

The BayesR QTL mapping approach, combined with expression data from mammary gland tissue, was powerful for the identification of many previously reported QTL that are known to be involved in milk production. For the known QTL, the patterns of pleiotropic effects estimated by BayesR matched the reported effects for some mutations. This study suggests that QTL mapping using a nonlinear approach and considering multiple traits may improve the mapping precision. This will be most beneficial for QTL with large effects on one trait and smaller effects on another trait. For example, the large effect of AGPAT6 on FY could help to more precisely map the smaller effects of this locus on PY. We observed little difference between GBLUP and BayesR in the across-breed prediction for P%, presumably because it is controlled by many QTL with small effects. A strategy that uses multiple traits to assist the localisation of QTL may be useful to increase robustness and accuracy of across-breed predictions for traits such as P%.

Our analysis identified several interesting candidate loci for milk production traits that (1) were identified as QTL with both BayesR and GBLUP, (2) were highly over- (or under-) expressed in the mammary gland compared to the other 17 tissue types analysed and (3) have functions in milk synthesis that have been described independently. It may be interesting to further study these loci, which include: SLC37A1 that encodes a glucose-6-phosphate transporter involved in the homeostasis of blood glucose [46]; MUC1 that encodes a glycoprotein that is a component of the surface membrane of fat globules in milk [47] and is also assumed to contribute to epithelial cell defence against bacteria; and CSF2RB, which is involved in the JAK-STAT signalling pathway (the JAK-STAT pathway has a central role in prolactin signalling [48]). Another promising novel candidate gene is TPH1, which is involved in mammary gland development (GO:0067074) and serotonin synthesis [49].


The use of a nonlinear method (such as BayesR) and high-density SNP genotypes, combined with a multi-breed reference population that included cows and bulls, led to the highest accuracies of genomic prediction, especially for a breed that was not included in the reference population. The advantage of BayesR over GBLUP is due to its better use of SNPs that are close to the causal mutation. Thus, the accuracy of GEBV derived using BayesR should be greater than that of GEBV derived using GBLUP for a variety of target populations and across multiple generations. It seems that BayesR is a useful methodology to map genes responsible for variation in quantitative traits.