Background

The applied models to estimate genomic breeding values described in this paper, are derived from a multiple QTL mapping model described by Meuwissen and Goddard [1]. The methods are implemented using variable (i.e. in this case presence of a QTL or not on a putative QTL position) selection via Gibbs sampling [2]. Thus, the applied Bayesian method avoids the computationally costly Metropolis-Hastings step that was implemented in the BayesB model of Meuwissen et al. [3].

Methods

Parameterization of the model

The data were analyzed with five different models considering only additive genetic effects. The first model (called 'HAP_POL') was:

y i = μ + s i + u i + j = 1 5994 ( q i j 1 + q i j 2 ) v j + e i MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemyEaK3aaSbaaSqaaiabdMgaPbqabaGccqGH9aqpcqaH8oqBcqGHRaWkcqWGZbWCdaWgaaWcbaGaemyAaKgabeaakiabgUcaRiabdwha1naaBaaaleaacqWGPbqAaeqaaOGaey4kaSYaaabmaeaadaqadaqaaiabdghaXnaaBaaaleaacqWGPbqAcqWGQbGAcqaIXaqmaeqaaOGaey4kaSIaemyCae3aaSbaaSqaaiabdMgaPjabdQgaQjabikdaYaqabaaakiaawIcacaGLPaaaaSqaaiabdQgaQjabg2da9iabigdaXaqaaiabiwda1iabiMda5iabiMda5iabisda0aqdcqGHris5aOGaeeiiaaIaemODay3aaSbaaSqaaiabdQgaQbqabaGccqGHRaWkcqWGLbqzdaWgaaWcbaGaemyAaKgabeaaaaa@586C@

where y i is the phenotype of animal i, μ is the overall mean, s i is a fixed effect for sexe, u i is the polygenic effect of animal i, v j is the direction of the QTL effects of the haplotypes at putative QTL position j, qij1 (qij2) is the size of the QTL effect for the paternal (maternal) haplotype of animal i at putative QTL position j, and e i is the residual term for animal i [1]. Note that the total effect of a haplotype is modeled as q ij. × v j , and that q ij . and v j may have a positive or negative value. The covariance among polygenic effects (u.) was modeled as A × σ G 2 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeq4Wdm3aa0baaSqaaiabbEeahbqaaiabbkdaYaaaaaa@2FC5@ , where A is the relationship matrix which was based on the full pedigree and σ G 2 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeq4Wdm3aa0baaSqaaiabbEeahbqaaiabbkdaYaaaaaa@2FC5@ is the polygenic variance. The second model (called 'HAP_NOPOL') was the same as HAP_POL, but omitted the polygenic component. The HAP models assumed a putative QTL in the midpoint of each marker bracket. The covariances among haplotypes at bracket j (q.j.) were modeled as H j , which is the matrix of estimated IBD probabilities among the haplotypes at the midpoint of bracket j. The variance of q.j .was assumed 1, while v j is a scale parameter that accommodates for a bracket to have a large (small) effect, if a QTL is (not) present. IBD probabilities between haplotypes were calculated using the algorithm of Meuwissen and Goddard [4], which combines linkage disequilibrium with linkage information and, for each bracket j, considers 20 surrounding markers and all available pedigree information. The effective population size was assumed 100 and the number of generations since an arbitrary founder population was also assumed 100, as in Meuwissen and Goddard [1]. All pairs of base haplotypes (i.e haplotypes of first generation of genotyped animals) with an IBD probability above 0.95 were clustered, using a hierarchical clustering algorithm. If the matrix of IBD probabilities among base haplotypes was not positive definite after clustering, the matrix was bended by adding |min_eigenval| + 0.01 to all the diagonal elements, where |min_eigenval| is the absolute value of the lowest (negative) eigenvalue. The matrix was subsequently inverted by LU denomposition. The elements in H j 1 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeCisaG0aa0baaSqaaiabdQgaQbqaaiabgkHiTiabigdaXaaaaaa@3059@ for the descendant haplotypes were then calculated using the algorithm of Fernando and Grossman (1989) [5]. When the IBD probability of descendant haplotypes with one of their parental haplotypes exceeded 0.95, the descendant haplotype was clustered with this parental haplotype.

The third applied model called 'SNP_POL' was:

y i = μ + s i + u i + j = 1 6000 ( q i j 1 + q i j 2 ) v j + e i MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemyEaK3aaSbaaSqaaiabdMgaPbqabaGccqGH9aqpcqaH8oqBcqGHRaWkcqWGZbWCdaWgaaWcbaGaemyAaKgabeaakiabgUcaRiabdwha1naaBaaaleaacqWGPbqAaeqaaOGaey4kaSYaaabmaeaadaqadaqaaiabdghaXnaaBaaaleaacqWGPbqAcqWGQbGAcqaIXaqmaeqaaOGaey4kaSIaemyCae3aaSbaaSqaaiabdMgaPjabdQgaQjabikdaYaqabaaakiaawIcacaGLPaaaaSqaaiabdQgaQjabg2da9iabigdaXaqaaiabiAda2iabicdaWiabicdaWiabicdaWaqdcqGHris5aOGaeeiiaaIaemODay3aaSbaaSqaaiabdQgaQbqabaGccqGHRaWkcqWGLbqzdaWgaaWcbaGaemyAaKgabeaaaaa@5842@

where y i , s i , μ, and u i are as in model 1, v j is the direction of the effects of the alleles at marker locus j, qij1 and qij2 are the sizes of the marker effects of animal i at marker locus j, and e i is the residual term for animal i. The fourth model (called 'SNP_NOPOL') was the same as SNP_POL, but omitted the polygenic component.

For reasons of comparison, a fifth model was applied, which did include the polygenic effects, but omitted the SNP effects. This model was called 'POL'.

Solving algorithm

For all models, a Markov chain Monte Carlo method using Gibbs sampling was used to obtain posterior estimates for all the effects in the model [1]. The scale parameter of a putative QTL at locus j, v j , was sampled from a normal distribution N(0, σ V 2 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeq4Wdm3aa0baaSqaaiabbAfawbqaaiabbkdaYaaaaaa@2FE3@ ), if a QTL was present in bracket j, whereas v j was sampled from N(0, σ V 2 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeq4Wdm3aa0baaSqaaiabbAfawbqaaiabbkdaYaaaaaa@2FE3@ /100) if no QTL was not present in bracket j. The variance of v j , σ V 2 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeq4Wdm3aa0baaSqaaiabbAfawbqaaiabbkdaYaaaaaa@2FE3@ , was sampled from an scaled inverse chi-square distribution with a prior variance of 0.058. This prior variance was calculated as the additive genetic variance, estimated using model 'POL', divided by 30, i.e. assuming 30 additive and unrelated QTL affecting the trait, across the 6 chromosomes. The presence of a QTL in bracket j was sampled from a Bernoulli distribution with probability equal to P ( v j | σ V 2 ) × Pr j P ( v j | σ V 2 ) × Pr j + P ( v j | σ V 2 / 100 ) × ( 1 Pr j ) MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaqcfa4aaSaaaeaacqqGqbaucqGGOaakcqWG2bGDdaWgaaqaaiabdQgaQbqabaGaeiiFaWNaeq4Wdm3aa0baaeaacqqGwbGvaeaacqqGYaGmaaGaeiykaKIaey41aqRagiiuaaLaeiOCai3aaSbaaeaacqqGQbGAaeqaaaqaaiabbcfaqjabcIcaOiabdAha2naaBaaabaGaemOAaOgabeaacqGG8baFcqaHdpWCdaqhaaqaaiabbAfawbqaaiabbkdaYaaacqGGPaqkcqGHxdaTcyGGqbaucqGGYbGCdaWgaaqaaiabbQgaQbqabaGaey4kaSIaeeiuaaLaeiikaGIaemODay3aaSbaaeaacqWGQbGAaeqaaiabcYha8jabeo8aZnaaDaaabaGaeeOvayfabaGaeeOmaidaaiabc+caViabigdaXiabicdaWiabicdaWiabcMcaPiabgEna0kabcIcaOiabigdaXiabgkHiTiGbccfaqjabckhaYnaaBaaabaGaemOAaOgabeaacqGGPaqkaaaaaa@6900@ , where P(v j | σ V 2 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeq4Wdm3aa0baaSqaaiabbAfawbqaaiabbkdaYaaaaaa@2FE3@ ) is the probability of sampling v j from N(0, σ V 2 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeq4Wdm3aa0baaSqaaiabbAfawbqaaiabbkdaYaaaaaa@2FE3@ ), i.e. 1 2 π σ V 2 e v j 2 2 σ V 2 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaqcfa4aaSaaaeaacqaIXaqmaeaadaGcaaqaaiabikdaYiabec8aWjabeo8aZnaaDaaabaGaeeOvayfabaGaeeOmaidaaaqabaaaaOGaeeyzau2aaWbaaSqabeaacqGHsisljuaGdaWcaaqaaiabdAha2naaDaaabaGaemOAaOgabaGaeGOmaidaaaqaaiabikdaYiabeo8aZnaaDaaabaGaeeOvayfabaGaeeOmaidaaaaaaaaaaa@4013@ , and Prj is prior probability of the presence of a QTL in bracket j. Prj was calculated per bracket as five times (i.e. assuming five QTL per chromosome) the length of bracket j, divided by the total length of all the brackets on the chromosome. More details on the prior distributions and the fully conditional distributions can be found in Meuwissen and Goddard [1]. The Gibbs sampler was implemented using residual updating, which was proven to be an computationally efficient way to solve the equations [6]. The Gibbs sampler was run for all models for 30,000 iterations and 3,000 iterations were removed as burn-in.

Results

Estimates for the mean and both sexes were small in the SNP and HAP models, i.e. the estimates ranged from 7.3E-05 to 3.2E-02 (results not shown). Accuracy of estimated breeding values, i.e. the correlation with the simulated breeding values, were calculated for all five models for animals without phenotypic information (Table 1). The accuracy of estimated breeding values was similar across the four genomic models. However, the accuracy of the HAP models decreased across generations, while the accuracy of the SNP models appeared to be constant across generations. Coefficients of the regression of true breeding values on estimated breeding values were 0.85–0.86 for the HAP models and 0.94–0.96 for the SNP models, indicating that the bias of the estimated breeding values was larger for the HAP models than for the SNP models. The correlations among estimated breeding values (EBVs) of animals with phenotypes and the correlation with their phenotypes were calculated (Table 2). The correlation between phenotypes and EBVs was largest for the POL model, while both for the SNP and HAP models it was slightly higher when the polygenic effect was included, compared to when it was excluded. The correlations among EBVs of animals without phenotypes were also calculated (Table 3). Correlations among EBVs from the SNP and HAP models were all > 0.94, indicating small differences in predictive ability between those models. The correlation between the POL and the other models were much lower for animals without phenotypes (0.21–0.23; Table 3), compared to animals with phenotypes (0.76–0.78; Table 2).

Table 1 Correlations (reflecting accuracy) between true and estimated breeding values, and coefficients of regression of true breeding values on estimated breeding values (estimated using all five models) for animals without phenotypes in generations 4, 5 and 6.
Table 2 Correlations between phenotypes and estimated breeding values for animals with phenotypes, estimated using all five models.
Table 3 Correlations between estimated breeding values for animals without phenotypes, estimated using all five models.

Posterior QTL probabilities > 0.1 were plotted along the genome for all genomic models (Figure 1). All models were able to detect nearly all QTL that explained at least 0.5% of the total phenotypic variance.

Figure 1
figure 1

Posterior QTL probabilities along the genome, estimated using HAP_POL, HAP_NOPOL, SNP_POL, and SNP_NOPOL, and the position of QTL that explained > 0.5% van de phenotypic variance.

Discussion

The presented methods have been applied in multiple studies, where they proved to be able to detect QTL [1, 7] as well as estimate genomic breeding values accurately [79]. In the present study, differences in accuracies of the EBVs of the HAP and SNP models were small, which is in agreement with the finding that for r2 values between adjacent markers of ~0.2 the differences in accuracies of the HAP and SNP models are negligible [8]. Apparently, including linkage analysis information next to linkage disequilibrium information in the model (i.e. going from the SNP to the HAP model), does not yield additional information to estimate effects more accurately.

Interestingly, the POL model yielded a higher correlation between EBV and phenotype than the genomic models. However, the accuracy of the EBVs for animals with the genomic models were 0.93–0.94, while the accuracy for the same animals were only 0.70 for the POL model (results not shown).

Conclusion

For the provided data set, including a polygenic effect in the genomic model had no effect on the accuracy of the total EBVs or prediction of the QTL positions. The SNP model yielded slightly higher accuracies for the total EBVs, while both models were able to detect nearly all QTL that explained at least 0.5% of the total phenotypic variance.