Background

Selection based on dense markers across the genome [1] has become an important component of dairy cattle breeding programs [27]. The accuracy of genomic prediction relies on the amount of information used to derive the prediction equation. In many genomic selection programs, thousands of bulls which have been progeny tested over the last decades have been genotyped and are used as national reference populations. These have been extended by sharing data across countries to include much more information, such as the North American cooperation [8], the EuroGenomics project [7], and the joint Brown Swiss project [9]. Generally, genomic predictions are based on the data of all genotyped animals. However, in practice, not all individuals can be genotyped. To make use of as much information as possible for genetic evaluation, it is appealing to blend the genomic predicted breeding value and the traditional estimated breeding values (EBV) into genomically enhanced breeding values (GEBV) or to perform genomic prediction using all information available simultaneously.

Many studies have shown that a linear model which assumes that effects of all single nucleotide polymorphisms (SNP) are normally distributed with equal variance performs as well as variable selection models for most traits in dairy cattle [2, 4]. Because such BLUP models are simple and have low computational requirements, they have become popular approaches for practical genomic prediction. De-regressed proofs (DRP) [10, 11] are generally used as the response variable for genomic prediction since they can be easily derived from the EBV that are usually available.

Several blending strategies, including multi-step and single-step approaches, have been proposed to estimate GEBV [4, 5, 1218]. The core of a single-step procedure is the integration of the marker-based relationship matrix into the pedigree-based relationship matrix such that information of genotyped and non-genotyped animals is used simultaneously [1315]. Previous study by Su et al. [18] reported that a single-step procedure resulted in more accurate GEBV than a multi-step procedure.

Some studies [1315, 18] have reported that the combined relationship matrix in a single-step method may need to be adjusted because the marker- and pedigree-based relationship matrices may not be on the same scale, and different methods to adjust for this have been proposed [1922]. These adjustments may also benefit genomic prediction using other models that integrate marker- and pedigree-based relationship matrices, such as a GBLUP model with a polygenic effect.

The purpose of this study was to compare single-step blending and GBLUP methods with and without adjustment of the genomic relationship matrix for genomic prediction of 16 traits in the Nordic Holstein population. De-regressed proofs were used as response variables in both GBLUP and the single-step blending methods.

Methods

Data

Data consisted of 5 214 genotyped bulls born between 1974 and 2008 and 9 374 non-genotyped bulls born between 1950 and 2008. The bulls were divided into a training and a validation population by birth date, October 1, 2001. Thus, the training data contained 3 045 genotyped and 8 822 non-genotyped bulls born before this date, and the validation data contained 2 169 genotyped bulls born after this date. Non-genotyped bulls born after October 1, 2001 were not used in training or validation. For the GBLUP methods described below, the training data only included the 3 045 genotyped animals. All 16 traits (sub-indices) in the Nordic Total Merit index were assessed, including yield, conformation, fertility, and health traits. For each trait, the DRP with reliability less than 0.20 were excluded from the training and the validation data. This removed 1.3%, 2.8% and 3.2% of DRP for birth index, fertility and health, respectively, and less than 0.5% for the other traits. The numbers of individuals in the training and validation datasets differed between traits (Table 1).

Table 1 Heritability (h 2 ) of the traits, number of bulls in training (Train) and validation datasets (Valid gen ) for GBLUP and single-step blending

Marker genotypes were obtained using the Illumina Bovine SNP50 BeadChip (Illumina, SanDiego, CA). The final marker data included 48 073 SNPs for 5 214 bulls after removing SNP with minor allele frequency (MAF) less than 0.01 and locus average GenCall score less than 0.60.

De-regressed proofs (DRP) were used as response variables for genomic prediction in all approaches. Based on EBV data of 14 588 progeny-tested bulls and pedigree data of 42 144 animals, the de-regression was carried out by applying the iterative procedure described in [23, 24] using the MiX99 package [25] and with the heritabilities shown in Table 1, which were those used in Nordic cattle routine genetic evaluation. A detailed description of the Nordic cattle genetic evaluation and standardized procedures of EBV is given in http://www.nordicebv.info/Routine+evaluation/.

Statistical models

Three GBLUP and two single-step blending methods were used. All analyses were performed with the DMU package [26, 27], for estimating both the variance components and breeding values.

Simple GBLUP

The basic GBLUP method [28, 29] used to predict direct genomic breeding values (DGV) was:

y = 1 μ + Z g + e
(1)

where y is the data vector of DRP of genotyped bulls, μ is the overall mean, 1 is a vector of ones, Z is a design matrix that allocates records to breeding values, g is a vector of DGV to be estimated, and e is a vector of residuals. It was assumed that g N ( 0 , G σ g 2 ) where σ g 2 is the additive genetic variance, and G is the marker-based genomic relationship matrix [28, 29]. Allele frequencies used to construct G were estimated from the observed genotype data. Random residuals were assumed such that e N ( 0 , D σ e 2 ) where σ e 2 is the residual variance and D is a diagonal matrix with elements d i i = 1 / w i . The weights wi account for heterogeneous residual variances due to differences in reliabilities of DRP. They were defined as w i =r i 2 /(1-r i 2 ) , where r i 2 is the reliability of DRP. The reliability was calculated as r i 2 =EDC/(EDC+k ) , where EDC is effective daughter contribution, and k=(4-h 2 ) /h 2 . To avoid possible problems caused by extreme weight values, reliabilities larger than 0.98 were set to 0.98.

GBLUP with a polygenic effect

y = 1 μ + Z u + Z g + e
(2)

where u is the vector of residual polygenic effects that are not captured by the SNP.

Here, we used an equivalent approach. Let g ω = u + g , Var ( g ω ) = A σ u 2 + G σ g 2 , where A is the pedigree-based relationship matrix. Define σ g ω 2 = σ u 2 + σ g 2 and w = σ u 2 / ( σ u 2 + σ g 2 ) , then w = σ u 2 / ( σ u 2 + σ g 2 ) = ω σ g ω 2 and σ g 2 = ( 1 - ω ) σ g ω 2 , such that Var ( g ω ) = [ ω A + ( 1 - ω ) G ] σ g ω 2 where ω is the ratio of residual polygenic to total additive genetic variance. Thus, the above model is equivalent to

y = 1 μ + Z g ω + e .
(3)

It was assumed that g ω ~ N ( 0 , G ω σ g ω 2 ) , where Gω is a combined relationship matrix, G ω = ω A + ( 1 - ω ) G . The estimates of gω were defined as DGVω to distinguish from the simple GBLUP and the single-step blending methods.

Adjusted GBLUP with a polygenic effect

The model was the same as the above GBLUP method with a polygenic effect but G was adjusted to be on the same scale as A. Then, the combined relationship matrix was G ω * = ω A + ( 1 - ω ) G * , where G* is the adjusted genomic relationship matrix. The adjustment of G is described below.

Original single-step blending

The original single-step blending method [15, 17, 18] uses information from genotyped and non-genotyped individuals simultaneously by combining the genomic relationship matrix G with the pedigree-based numerator relationship matrix A, using the following model:

y = 1 μ + Z a + e
(4)

where y is the vector of DRP for both genotyped and non-genotyped bulls, 1 is a vector of ones, Z is a design matrix, and a is the vector of additive genetic effects, which are the sum of the genomic and the residual polygenic effects. It was assumed that a ~ N ( 0 , H σ a 2 ) , where matrix H is the modified genetic relationship matrix that combines pedigree-based relationship information [13, 15]:

H = [ G ω A 2 1 A 1 1 - 1 G ω G ω A 1 1 - 1 A 1 2 A 2 1 A 1 1 - 1 G ω A 1 1 - 1 A 1 2 + A 2 2 - A 2 1 A 1 1 - 1 A 1 2 ]
(5)

where A11 is the sub-matrix of the pedigree-based relationship matrix (A) for genotyped animals, A22 is the sub-matrix of A for non-genotyped animals, A12 (or A21) is the sub-matrix of A for relationships between genotyped and non-genotyped animals, and G ω = ( 1 - ω ) G + ω A 11 , where ω is a weight (within the range from 0.05 to 0.40 in this study). The G matrix used in the single-step blending was the same as in the GBLUP method. The inverse of H[15, 17] is

H - 1 = G ω - 1 - A 11 - 1 0 0 0 +A - 1
(6)

Adjusted single-step blending

In the adjusted single-step blending method, the G matrix was adjusted for the difference between the original genomic relationship matrix and pedigree relationship matrix (A11), as proposed by previous studies [19, 20]. The G matrix was adjusted using two parameters α and β [21], i.e.,

G * = G β + α ,
(7)

which were derived from the following equations:

Avg.diag( G )β+α=Avg .diag( A 11 )
(8)
Avg.offdiag( G )β+α=Avg .offdiag( A 11 )
(9)

Matrix G* was then used to replace G to construct the combined relationship matrix in the single-step blending method.

The weights ω ranging from 0.05 to 0.40 were used to construct G ω and G ω * for the single-step blending methods and for the GBLUP methods with a polygenic effect.

Validation

The reliabilities of genomic predictions were measured as squared correlations between the predicted breeding values and DRP for bulls in the validation data, divided by the average reliability of the DRP in validation data. A Hotelling-Williams t-test was used to test the difference between the validation correlations obtained from these five prediction methods [30, 31]. Bias of genomic predictions was measured as the regression of DRP on the genomic predictions [32].

Results

Genomic predictions using the GBLUP method were improved when a polygenic effect was included (Tables 2 and 3). With a relative weight of 0.2 on the residual polygenic variance, the average reliability of genomic predictions for the 16 traits was 0.363, which was 0.3% points higher than the average reliability from the simple GBLUP. Moreover, the GBLUP method with a polygenic effect reduced bias of genomic predictions. Averaged over the 16 traits, the absolute deviation of the regression coefficient (DRP on genomic prediction) from 1 was 0.093 when using the GBLUP methods with a polygenic effect and 0.107 when using the simple GBLUP method. The GBLUP methods with a polygenic effect slightly reduced also bias in mean, as the intercept in the regression analysis was closer to 0, compared with the simple GBLUP. For the two GBLUP methods with a polygenic effect, adjustment of the genomic relationship matrix had no effect on predictive ability and bias.

Table 2 Reliabilities of genomic predictions using different methods
Table 3 Intercept (INT) and regression coefficient (REG) of DRP on genomic predictions from different methods

Table 4 reports validation reliabilities of GEBV from the two single-step blending methods and DGVω from the GBLUP method with a polygenic effect (the adjusted GBLUP method is shown as an example) for the 16 traits, with a relative weight ω = 0.20. The adjusted single-step blending led to the highest reliability of genomic predictions, followed by the original single-step blending, and the GBLUP method resulted in the lowest reliability. Reliabilities ranged from 0.206 to 0.503 (average 0.379) for the original single-step blending, from 0.206 to 0.503 (average 0.382) for the adjusted single-step blending, and from 0.183 to 0.481 (average 0.363) for the GBLUP method. In general, single-step blending was better than the GBLUP method and adjusted single-step blending was better than the original single-step blending, especially for production traits. On average, reliabilities of genomic breeding values predicted using the original single-step blending were 1.6 % higher than reliabilities from the adjusted GBLUP method, but 0.3% lower than reliabilities from the adjusted single-step blending.

Table 4 Reliabilities of genomic predictions using different methods

The regression coefficients (Table 5) ranged from 0.757 to 1.138 (average absolute deviation from 1 equal to 0.084) for the original single-step blending, from 0.760 to 1.148 (average absolute deviation 0.080) for the adjusted single-step blending, and from 0.752 to 1.176 (average absolute deviation 0.093) for the adjusted GBLUP method. Predictions from the single-step blending methods appeared to have less bias than predictions from GBLUP, and predictions from the adjusted single-step blending has slightly less bias than predictions from the original single-step blending method. In addition, the two single-step blending methods led to smaller absolute deviation of the intercept from 0 than the adjusted GBLUP method, indicating less bias in mean.

Table 5 Intercept (INT) and regression coefficient (REG) of DRP on genomic predictions using different methods

Table 6 presents differences between groups of the top 300 bulls based on predictions from the different methods. For all 16 traits, more than 9% of the top 300 bulls based on the adjusted GBLUP method differed from the top 300 bulls based on the two single-step blending methods. Differences between the two single-step blending methods were small, except for production traits, which was in agreement with the small differences in reliabilities of GEBV from the two single-step blending methods.

Table 6 Differences between groups of the top 300 bulls based on genomic prediction using different methods

In order to test the effect of different weighting factors ω in forming Gω and H, eight values of ω between 0.05 and 0.40 were used for the two single-step blending methods and the two GBLUP methods with a polygenic effect. On average, reliabilities varied from 0.356 to 0.363 over the eight scenarios for the two GBLUP methods, from 0.372 to 0.379 for the original single-step blending, and from 0.374 to 0.382 for the adjusted single-step blending (Figure 1). The highest mean reliability was obtained when using a weight of 0.15 or 0.20 for the four methods. The mean absolute deviation of the regression coefficient from 1 varied from 0.080 to 0.104 for the two GBLUP methods, from 0.074 to 0.098 for original single-step blending and from 0.072 to 0.091 for adjusted single-step blending (Figure 2). Mean of absolute deviations tended to decrease with increasing weights.

Figure 1
figure 1

The impact of different weights on reliability of genomic predictions using different methods. GBLUP with a polygenic effect (GBLUP-AG), adjusted GBLUP with a polygenic effect (GBLUP-AG*), original single-step blending (Single-ori), and adjusted single-step blending (Single-adj).

Figure 2
figure 2

The impact of different weights on the mean absolute deviation from 1 of the regression coefficient of DPR on prediction using different methods. GBLUP with a polygenic effect (GBLUP-AG), adjusted GBLUP with a polygenic effect (GBLUP-AG*), original single-step blending (Single-ori), and adjusted single-step blending (Single-adj).

Discussion

This study applied three GBLUP and two single-step blending methods for genomic prediction in Nordic Holsteins. Predictive abilities of the five methods were compared in terms of reliability and bias. Results indicated that both the original single-step blending and the adjusted single-step blending were more accurate than the three GBLUP methods because the two single-step blending approaches used much more information to predict breeding values. Similar results were reported by Su et al. [18] for the Nordic Red population. In the current study, the size of the training dataset for the single-step blending methods was almost three times as large as that for the three GBLUP methods (Table 1) since DRP of the non-genotyped animals also provided information through a combined relationship matrix. Including pedigree information may also improve genomic predictions because the SNP may not account for all additive genetic variance. As shown in this study, including a residual polygenic effect in the GBLUP methods led to slightly higher reliability of genomic predictions.

A regression coefficient of DRP on genomic predictions less than 1 indicates overestimation of the variance of genomic predictions (inflation), while a coefficient larger than 1 indicates underestimation (deflation). The two single-step blending methods led to less bias than the three GBLUP methods, and the two GBLUP methods with a polygenic effect resulted in less bias than the simple GBLUP method without a polygenic effect. The problem of inflation of genomic predictions is critical in practice [3335] as it can give an unfair advantage to juvenile over older progeny test bulls [17]. Aguilar et al. [17] showed that this bias was reduced by weighting the G and A matrices, and Liu et al. [36] found that including a polygenic effect in a GBLUP model (random regressions on SNP genotypes) led to less bias in genomic predictions. The present study showed that the weighting factor had an effect on the bias of genomic predictions for all traits in the single-step blending approaches and the GBLUP methods with a polygenic effect. A weight of 0.40 resulted in the smallest minimum absolute deviation from 1 for the regression of GEBV or DGVω on DRP, averaged over the 16 traits, but a loss of reliability around 0.8%, compared to a weight of 0.20, which led to highest average reliability and an acceptable average absolute deviation of regression coefficient from 1 (Figure 1,2).

The adjusted single-step blending method resulted in less bias than the original single-step blending for all settings of the weight factor. In a simulation study, Vitezica et al. [19] also found that the single-step method was less biased and more accurate when the genomic relationship matrix was adjusted by a constant. Using chicken data, Chen et al. [20] showed that unbiased evaluations can be obtained by adding a constant to the G matrix that is based on current allele frequencies and suggested that the optimal G has average of diagonal and off-diagonal elements close to those of A11. Forni et al. [22] also showed that re-scaling the G matrix is a reasonable solution to avoid inflation in pig data. However, in the present study, the adjusted G matrix did not improve genomic predictions in the GBLUP methods with a polygenic effect. This suggests that, based on the present data, adjustment of G has little effect on genomic prediction when only genotyped animals are used, but may be important in other data where there is a large difference in scale between G and A.

The results from the present study indicate that increasing the weighting factor (0.40) reduces bias and that weighting factors around 0.15 to 0.20 give the highest reliability but the optimal weighting factors differed between traits. Similarly, Liu et al. [36] observed that the optimal residual polygenic variance in a GBLUP model (random regressions on SNP genotypes) with a polygenic effect appears to differ among traits. Therefore, trait-specific weighting factors should be used in the single-step blending methods and the GBLUP methods with a polygenic effect. In the near future, both bulls and heifers may be pre-selected based on genomic EBV. This will lead to biased predictions of breeding values in both conventional and genomic evaluation procedures. In such situations, appropriate methods to correct the bias of predictions are required [37].

Christensen et al. [21] compared the adjusted and original single-step blending methods on pig data. In their study, the improvement of prediction reliabilities by adjustment of G matrix is much larger, compared with the results from the current study. This may be because there was more inbreeding in the pig data, which resulted in average values of the diagonal and off-diagonal elements of A11 equal to 1.145 and 0.298, and estimates of β and α equal to 0.895 and 0.298, respectively. In the present study, the averages of the diagonal and off-diagonal elements of A11were 1.060 and 0.085, and estimates of β and α were 0.976 and 0.085, i.e. closer to one and zero, respectively. This means that the original G matrix was less adjusted in this study compared to the study on pig data by Christensen et al. [21].

Conclusions

The single-step blending methods can increase reliability and reduce bias of genomic predictions. The adjusted single-step blending method performed slightly better than the original single-step blending method, both with respect to reliability and bias of genomic predictions. The weighting factor used in these single-step blending methods had a small effect on reliability of genomic prediction but an important effect on bias.