Background

Since the availability of dense genotyping panels [1], genomic prediction [2, 3] has become a very successful strategy for the prediction of breeding values of candidates for selection. Genomic prediction methods are based on the evaluation of the additive substitution effects of markers that capture a large part of the dominance and higher-order interaction effects [4]. However, estimating dominance effects may be relevant because their estimates can be used to allocate mates among candidates for selection [5, 6].

Two approaches have been suggested to estimate the effects of dominance in genomic prediction methods. The first [7] directly models the additive (a) and dominance (d) effects, while for the second, Vitezica et al. [8] proposed to include allele substitution (α) and dominance deviation (δ) effects in order to compute appropriate breeding values. However, both these approaches impose a Gaussian regularization of additive and dominance effects that forces a symmetric distribution of the posterior estimates.

Nevertheless, the classical theory of quantitative genetics [9] argues that inbreeding depression and heterosis are based on the presence of directional dominance (i.e., a higher percentage of positive than negative dominance effects) and this contrasts with the assumption of symmetry of the above-described procedures. This discrepancy can be overcome in at least two ways: (1) by assuming that the mean of dominance effects differ from zero, which leads to the inclusion of a covariate for the average individual homozygosity in the statistical model, and (2) by using skewed distributions for the regularization of dominance effects. The first approach can be called regression on genomic inbreeding and it was empirically used by Sun et al. [6], Silió et al. [10], Aliloo et al. [11] and Zeng et al. [12], and proved by Xiang et al. [13]. As far as we are aware, the second approach was never applied in the field of animal genetics, although it may be of considerable value since it ensures that the most frequent dominance effects are close to zero. In contrast, regression on genomic inbreeding implies that the mean and mode of the dominance effects are equal, although they may differ from zero. In the statistical literature, there is a broad corpus on the specification of skewed distributions [14, 15] and, among them, the family of skew-elliptical distributions defined by Sahu et al. [16] can be easily implemented in Bayesian regression using Markov chain Monte Carlo (MCMC) techniques [17].

The objectives of this study were: (1) to develop a genomic best linear unbiased prediction (BLUP) model that uses a prior skewed distribution for dominance effects; (2) to compare it with the model with inclusion of a covariate for inbreeding proposed by Xiang et al. [13]; and (3) to confirm the presence of directional dominance for pig litter size.

Methods

Data

The data used in this study were from two unrelated pig lines provided by Genus plc (Hendersonville, TN, USA). Genotypes for all sows were generated using the Illumina PorcineSNP60 BeadChip (Illumina, San Diego). After quality control, i.e. excluding genotypes from single nucleotide polymorphisms (SNPs) with a minor allele frequency lower than 0.05 and a call rate lower than 0.95 in each population, 37,900 and 37,011 genotypes for SNPs remained for lines 1 and 2, respectively. Individuals with a call rate lower than 0.95 were also removed. Finally, the number of sows included in the analysis were 3631 and 2612 for lines 1 and 2, respectively. In total, 13,449 and 11,581 records on litter size (number of piglets born alive) were available for these sows, with an average litter size of 11.7 ± 2.9 and 12.4 ± 3.0 for lines 1 and 2, respectively.

Genomic prediction models

The first step was the definition of a Full Model that included both approaches for directional dominance, i.e. regression on genomic inbreeding and a skewed distribution for SNP effects). Then, a subset of model parameters was set to 0 in order to identify reduced models. The Full Model was:

$${\mathbf{y}} = {\mathbf{1}}\mu + {\mathbf{h}}b + {\mathbf{Xt}} + {\mathbf{Wr}} + {\mathbf{Qc}} + {\mathbf{Za}} + {\mathbf{Kd}} + {\mathbf{e}},$$

where \({\mathbf{y}}\) is the vector of phenotypic records, \(\mu\) is the general mean, \(b\) is a covariate that can be interpreted as inbreeding depression or heterosis, \({\mathbf{t}}\) is a vector of order of parity effects (4 levels − 1st, 2nd, 3rd and > 3rd), \({\mathbf{r}}\) is a vector of farm-year-month of farrowing effects (3163 levels for line 1 and 4293 for line 2), \({\mathbf{c}}\) is a vector of permanent environmental effects (3631 and 2612 levels for lines 1 and 2, respectively), \({\mathbf{a}}\) and \({\mathbf{d}}\) are vectors of additive and dominance effects (37,900 and 37,011 levels), and \({\mathbf{e}}\) is a vector of residuals. Furthermore, \({\mathbf{h}}\) is a vector of the average SNP homozygosity of the individuals and \({\mathbf{X}}\), \({\mathbf{W}}\), \({\mathbf{Q}}\), \({\mathbf{Z}}\) and \({\mathbf{K}}\) are incidence matrices that link the phenotypic records with \({\mathbf{t}}\), \({\mathbf{r}}\), \({\mathbf{c}}\), \({\mathbf{a}}\) and \({\mathbf{d}}\), respectively. Under the Bayesian paradigm, prior distributions were uniform for \(\mu\), \(b,\) and for each element of \({\mathbf{t}}\), univariate Gaussian for each element of \({\mathbf{r}}\), \({\mathbf{c}}\) and \({\mathbf{a}}\), and skew Gaussian for each element of \({\mathbf{d}}\). Finally, prior distributions for the variances of farm-year-month of farrowing (\(\sigma_{r}^{2}\)), permanent environmental (\(\sigma_{c}^{2}\)), additive (\(\sigma_{a}^{2}\)), dominance (\(\sigma_{d}^{2}\)), and residual effects (\(\sigma_{e}^{2}\)) were scaled inverted Chi square (see “Appendix” for a full description of the Bayesian inference). It should be noted that directional dominance comprises two model parameters, one covariate for the average SNP homozygosity (\(b\)) and one asymmetry parameter (\(\lambda\)) that is involved in the skew Gaussian prior distribution of dominance effects.

Based on the Full Model, three reduced models were defined as follows:

  • Model SC: symmetric dominance effect with the inbreeding depression covariate, i.e. the asymmetry parameter (\(\lambda\)) was set to zero. This model was equivalent to that defined by Xiang et al. [13].

  • Model AN: asymmetric dominance effect without the inbreeding depression covariate, i.e. covariate \(b\) was set to zero.

  • Model SN: symmetric dominance effect without the inbreeding depression covariate, i.e. both asymmetry parameter \(\lambda\) and covariate \(b\) were set to zero. This model was equivalent to that defined by Su et al. [5].

The four models (Full, SC, AN, SN) were analyzed using a Gibbs sampler [18] (see “Appendix” for a full description) that provided posterior distributions for all unknowns in the model, i.e. individual breeding values (\(s_{a}\)) and dominance deviations (\(s_{d}\)), additive and dominance variances (\({\text{V}}_{\text{A}}\) and \({\text{V}}_{\text{D}}\)), and the expected inbreeding depression per percentage of inbreeding (\({\text{I}}_{\text{D}}\)). Models SC, AN and SN were analyzed by five chains of 75,000 iterations, after discarding the first 25,000. Each chain used a different random seed. As the convergence of the Full Model was clearly the worst, the Gibbs sampler implementation for this model was set to five chains of 250,000 iterations, after a burn-in of the first 50,000. Convergence and effective sample size were checked using the standard procedures [19] with CODA package [20] and by visual inspection of the chains. Finally, models were compared using the deviance information criteria (DIC) [21] and the logarithm of the conditional predictive ordinate (LogCPO) [22] (see “Appendix” for a full description).

Results

The results of the convergence and the effective size of the MCMC chains are presented in Additional file 1: Tables S1 and S2. The average number of iterations required until convergence was computed using the Raftery and Lewis approach [23] and ranged from 93.0 (for parameter \(b\) in the SC Model for line 1) to 9204.0 (\(b\) in the Full Model for line 1). The estimated effective sample size (EFS) of the MCMC chains [24] ranged from 82 (\({\text{d}}^{2}\) in the SN Model for line 2) to 16,510 (\({\text{h}}^{2}\) in the Full Model for line 1). Finally, the required numbers of samples to achieve an accuracy of 0.1 for the 0.5 quantile with a probability of 0.95 were calculated using the Raftery and Lewis approach [23] and ranged from 3210 (\(b\) in the SC Model for line 1) to 290,280 (\(b\) in the Full Model for line 1).

Posterior mean estimates (and posterior deviations) of variance components, asymmetry parameters, and expected inbreeding depression and results of the comparison of models are in Tables 1 and 2 for lines 1 and 2, respectively. Posterior estimates of the variance of the additive effects (\(\sigma_{a}^{2}\)) under Model SN were equal to 0.394 × 10−4 and 0.678 × 10−4 for lines 1 and 2, respectively. Compared with the SN Model, these estimates were slightly lower for the AN Model (0.345 × 10−4 and 0.617 × 10−4), those for the SC Model were moderately higher (0.439 × 10−4 and 0.701 × 10−4) (Tables 1 and 2) and those for the Full Model were similar (0.381 × 10−4 and 0.615 × 10−4).

Table 1 Posterior mean (and posterior standard deviation) estimates for variance components, asymmetry parameters, inbreeding depression, ratios of additive and dominance variation and criteria for model comparison for line 1
Table 2 Posterior mean (and posterior standard deviation) estimates for variance components, asymmetry parameters, inbreeding depression, ratios of additive and dominance variation and criteria for model comparison for line 2

A different pattern was observed for the variance of dominance effects (\(\sigma_{d}^{2}\)), with posterior mean estimates being equal to 0.369 × 10−4 and 0.430 × 10−4 for lines 1 and 2, respectively, for Model SN (Tables 1 and 2). Models that allowed for asymmetry of dominance effects (AN and Full) provided higher posterior mean estimates of the variance of dominance effects (0.769 × 10−4 and 0.536 × 10−4 for line 1 and 0.872 × 10−4 and 0.993 × 10−4 for line 2, respectively) than the SC Model (0.122 × 10−4 and 0.334 × 10−4 for lines 1 and 2, respectively).

Because of the above results, estimates of additive genetic variance (\({\text{V}}_{\text{A}}\)) and narrow sense heritability (\({\text{h}}^{2}\)) were higher for Models AN and Full than for Models SN and SC (Tables 1 and 2). In contrast, posterior mean estimates of the variance of dominance deviations (\({\text{V}}_{\text{D}}\)) and percentage of dominance variation (\({\text{d}}^{2}\)) were lower for Models SN and SC than for Models AN and Full (Tables 1 and 2).

Estimates of the variance of farm-year month effects (\(\sigma_{r}^{2}\)) and of residuals (\(\sigma_{e}^{2}\)) were consistent between models, ranging from 0.160 to 0.161 for line 1 and from 0.296 to 0.299 for line 2 for the farm-year-month variance and from 6.567 to 6.570 and from 6.630 to 6.635 for the residual variance for lines 1 and 2, respectively. However, the estimates of the variance of permanent environmental effects (\(\sigma_{c}^{2}\)) differed substantially between models (Tables 1 and 2), with posterior mean estimates for the SN and SC Models being the highest (0.478 and 0.572 for line 1 and 0.580 and 0.614 for line 2, respectively) and decreasing when asymmetry was allowed, reaching the lowest estimates for Models AN (0.308 and 0.380 for lines 1 and 2, respectively) and Full (0.394 and 0.333).

Posterior mean estimates of the asymmetry parameter for dominance effects (\(\lambda\)) were all positive (Tables 1 and 2 and Fig. 1) and ranged from 0.135 (line 1 and Model Full) to 0.380 (line 1 and Model AN). However, it should also be noted that posterior probabilities of a positive value for \(\lambda\) were higher than 0.999 for Model AN, while the highest posterior density regions at 95% (HPD95) for \(\lambda\) included zero for the Full Model for both lines.

Fig. 1
figure 1

Posterior distribution of the asymmetry parameter (λ) under Models AN and Full for lines 1 and 2

The regression coefficient on individual homozygosity (\(b\)) was estimated with Models SC and Full (Tables 1 and 2 and Fig. 2). With the SC Model, posterior mean estimates of \(b\) were clearly negative (− 12.15 and − 7.95 for lines 1 and 2, respectively), but equal to − 5.72 and 1.73 for lines 1 and 2 for the Full Model. It should also be noted that posterior standard deviations were higher for the Full than for the SC Model. The HPD95 regions for \(b\) included zero for the Full Model, but posterior probabilities of negative values were always higher than 0.99 for Model SC.

Fig. 2
figure 2

Posterior distribution of the covariate for individual homozygosity (b) under Models SC and Full for lines 1 and 2

Results for the expected inbreeding depression (\({\text{I}}_{\text{D}}\)) per percentage of inbreeding are in Tables 1 and 2 and Fig. 3. Posterior mean (and posterior standard deviation) estimates of \({\text{I}}_{\text{D}}\) for the SN Model were − 0.016 (0.005) piglets for line 1 and − 0.008 (0.005) piglets for line 2. However, posterior mean (and posterior standard deviation) estimates for remaining Models (AN, SC and Full) were remarkably lower, being − 0.044 (0.006), − 0.045 (0.006), and − 0.045 (0.006) for line 1 and − 0.028 (0.008), − 0.025 (0.008) and − 0.029 (0.008) for line 2.

Fig. 3
figure 3

Posterior distribution of the expected inbreeding depression for an inbreeding level of 0.10 for lines 1 and 2

Correlations of estimates for the SNP additive (\(a\)) and dominance (\(d\)) effects and for breeding values (\(s_{a}\)) and dominance deviations (\(s_{d}\)) between the four models of analysis are in Additional file 2: Tables S3 and S4. Correlations of estimates of the additive and dominance effects between models were always higher than 0.990 and correlations of estimates of breeding values and dominance deviations between models were also close to 1. However, it should be noted that the correlations between the estimated breeding values from the SN Model and the dominance deviations from the AN Model with the estimates from the remaining models were remarkably lower than those from the other models. In the first case, they ranged from 0.933 to 0.944 in line 1 and from 0.794 to 0.842 in line 2 and in the second case, from 0.769 to 0.944 in line 1 and from 0.702 to 0.857 in line 2.

Results of the model comparison tests (logCPO and DIC) are also in Tables 1 and 2. In both lines, the model with the best fit for both tests was the SC Model, followed by the SN and AN Models. The Full Model had the worst fit.

Discussion

The advent of dense genotyping information has allowed the development of models for genomic evaluation [3] that have revolutionized the field of animal breeding during the last decade. Most models for genomic evaluation are designed to deal with the classical statistical problem of large \(p\) and small \(n\), because the number of parameters to evaluate is frequently larger than the number of phenotypic data. The most common approach for dealing with this problem is the use of some kind of regularization of the effects of SNPs [25]. Several approaches have been suggested, ranging from simple Gaussian regularization [2] to more complex models that involve t-shaped [2], double exponential [26, 27], or mixtures of distributions [2, 28, 29]. However, all these methods of regularization use symmetric distributions that, from a Bayesian perspective, imply that marker effects are centered at zero. This assumption seems reasonable for the additive or substitution effects, but it is not so clear for dominance effects. In fact, the classical theory of quantitative genetics attributes the phenomenon of inbreeding depression (or heterosis) to the presence of directional dominance or, in other words, a positive average of dominance effects, jointly with a decrease (or increase) in the degree of heterozygosity [9]. In this study, we considered two approaches to model directional dominance in genomic evaluation methods. The first assumed a prior distribution for dominance effects that allowed a mean that was different from 0, i.e. Model SC, following the work of Xiang et al. [13]; the second assumed that dominance effects followed a skew Gaussian distribution that has a higher probability of positive (or negative) effects, i.e. Model AN. Finally, both approaches were combined into a Full Model.

All models were implemented using a Gibbs sampler. The analysis of the MCMC chains indicated that convergence was achieved with the proposed burn-in for all models and both lines (25,000 iterations for AN, SC and SN Models and 50,000 for the Full Model). Nevertheless, the EFS was heterogeneous across parameters and models. In general, the EFS of the variance of dominance effects was smaller than that of the variance of additive effects, and the EFS of the parameters related with directional dominance (\(b\) and \(\lambda\)) were very large for the AN and SN Models and remarkably smaller for the Full Model. Nevertheless, the sizes of the five Gibbs sampler chains (5 × 75,000 iterations for AN, SN and SC Models and 5 × 200,000 for the Full Model) were always larger than the length required for estimation of the 0.5 quantile of the posterior distributions with an accuracy of 0.1 and with a probability of 0.95, based on the Raftery and Lewis approach [23].

Evidence of directional dominance

Results from Models SC and AN provided clear evidence of directional dominance for both lines (Figs. 1 and 2); posterior distributions of the regression coefficient on individual homozygosity (\(b\) in Model SC) and the asymmetry parameter (\(\lambda\) in Model AN) did not include zero in the highest posterior density at 99%. These results confirm the presence of directional dominance for litter size in pigs and they are in line with extensive reports on positive estimates for inbreeding depression and heterosis in the literature [30, 31]. However, results from the Full Model were not so clear because it suffered from some degree of statistical confounding of \(b\) and \(\lambda\), as observed in the strong posterior correlation (0.91) between the Gibbs samples of \(b\) and \(\lambda\) (see Additional file 3: Figures S1 and S2). As a consequence, their posterior distributions were wider and they included zero in the HPD at 95% for \(b\) and \(\lambda\) (Figs. 1 and 2) and convergence and EFS for both these parameters were worse than with the SC and AN Models (see Additional file 1: Tables S1 and S2).

Models that allow the presence of directional dominance (SC, AN, Full) were able to predict the expected inbreeding depression (\({\text{I}}_{\text{D}}\)) in populations that had a low range of levels of genealogical inbreeding. This approximation uses the classical additive model of inbreeding depression [9] but replacing dominance effects of causal polymorphisms with dominance effects of SNPs. In this approach, a linear relationship between inbreeding and inbreeding depression is assumed. Results were presented as the expected inbreeding depression per percentage of inbreeding. In the analyzed populations, the expected inbreeding depression coefficients under these models were around − 0.045 and between − 0.025 and − 0.028 piglets in lines 1 and 2, respectively. These results concur with those of Vitezica et al. [32], who also reported larger estimates of inbreeding depression in line 1 than in line 2 and they are close to the estimates of inbreeding depression for litter size in other pig populations [33,34,35]. In contrast, the estimates provided by the SN Model were substantially closer to 0, i.e. − 0.016 and − 0.008 piglets for lines 1 and 2, respectively. This may indicate that models that do not allow for directional dominance, such as the SN Model, cannot predict the magnitude of inbreeding depression (or heterosis) correctly and, thus, lead to biases if they are used for the prediction of future mate performance and mate allocation [5].

Nevertheless, there were some remarkable differences between the results obtained for the two lines, which are interesting to analyze further. Evidence of directional dominance was larger for line 1 than for line 2 for both approaches (estimates of − 12.15 vs. − 6.48 for \(b\) in Model SC and of 0.38 vs. 0.25 for \(\lambda\) in Model AN), although posterior estimates of the dominance variance were lower for line 1 for all models. This suggests that the magnitude of directional dominance (or inbreeding depression) is not necessarily related to the amount of dominance variance estimated from resemblance between relatives. In fact, in the presence of inbreeding, the total genetic variance is split into five components [36, 37]: the additive and dominance genetic variances in the base population, the dominance genetic variance between homozygous individuals, the covariance between additive and dominance effects between homozygous individuals, and the square of the inbreeding depression. Traditional approaches to estimate dominance variance using genealogical [38, 39] or genomic dominance relationships [32] only take the additive and dominance variance in the base population into account and ignore the remaining variance components. It is possible that the presence of directional dominance also allows some of the other variance components that are not considered under the assumption of multivariate normality to be captured.

Of particular significance is the fact that the estimates of the variance of dominance effects differed substantially between models. Lower estimates were obtained with the SC Model, whereas estimates from Models SN, AN and Full were higher. The cause of the inflation of dominance effects under the last three models may be the restrictions imposed by the assumed prior distributions. Under Model SN, the prior distribution forced the mean and mode of effects to be centered at zero. Thus, if directional dominance exists, specific estimates of the effects of SNPs would attempt to accommodate this, which would lead to an increase in the variance of dominance effects. Model AN allowed the presence of more positive (or negative) dominance effects but it forced the mode of the distributions to be close to zero. Estimates of the effects of SNPs for Model AN may be even larger than for Model SN, but as the prior distribution forced them to have a mode close to zero, the variance of dominance effects was also inflated in Model AN. Furthermore, the increase in the variance of dominance effects in Models SN, AN and Full with respect to Model SC was compensated by a corresponding decrease in the permanent environmental variance, as pointed out in other studies [40, 41]. Thus, the estimate of the permanent environmental variance was the largest for Model SC for both lines.

The differences between models were also reflected in the correlations of estimates of breeding values and dominance deviations between models. Although the correlations for estimates of SNP additive and dominance effects between models were very high (see Additional file 2: Tables S3 and S4), the correlations for estimates of breeding values and dominance deviations provided some exceptions. For breeding values, estimates from the model that did not consider directional dominance (Model SN) had lower correlations with estimates from the other models (SC, AN and Full). This suggests that the inclusion of directional dominance with either of the two approaches would result in substantial changes in the ranking of individuals based on estimates of breeding values, which may have consequences for breeding decisions. In addition, the correlations between estimates of dominance deviations from Model AN with those from the other models were also lower (0.70–0.94), which may imply that the use of skewed prior distributions affects estimation of dominance deviations and the prediction of performance of future individuals (or crosses).

Comparison of models

The best model based on the two criteria used for comparison of models was the SC Model, followed by the SN and AN Models; the Full Model provided the worst fit for both lines. Model SN does not consider directional dominance and thus, it was penalized relative to Model SC. Models AN and Full were equally able to capture directional dominance since they led to similar estimates of inbreeding depression. However, they were penalized because the number of unknowns in these models is larger than in Model SC; they estimate \(\lambda\) and one auxiliary variable for the dominance effect of each SNP.

In the light of these results, the main finding of our study is that Model SC, as defined by Xiang et al. [13], is recommended for the analysis of traits when directional dominance (or inbreeding depression) is expected and when resulting estimates of dominance effects are used for prediction of performance of future mates and mate allocation [5]. This recommendation is strengthened by the ease with which the SC Model can be formulated based on the genomic dominance relationship matrix [8], which helps to reduce the computational burden and directly provides predictions of additive and dominance effects for each individual.

However, the application of skewed distributions should not be completely discarded for new lines of research. First, we assumed that the additive and dominance effects were independent, although it is possible to use multivariate asymmetric distributions [16], as in the models of Wellman and Bennewitz [42], which consider a relationship between the magnitudes of additive and dominance effects. Second, the assumption of Gaussian distributions can be replaced by the asymmetric version of any other distribution, such as t-shape or double exponential distribution, leading to asymmetric versions of the Bayes B [3] or Bayesian Lasso [26] approaches. These approaches may avoid the large increase in the variance of dominance effects since most of the estimates of the dominance effects of SNPs will be forced to be zero [3] or closer to zero [26] than with a prior Gaussian distribution.

Finally, all the approaches described here assume that directional dominance is homogeneous along the genome. However, there is evidence in the literature of local differences in the causes of inbreeding depression across the genome [43, 44]. Further research is needed to investigate this phenomenon and, also, to model additional causes of inbreeding depression (or heterosis), such as epistatic interactions [45].

Conclusions

The results of our study confirm the presence of positive directional dominance for litter size in two lines of pigs. Ignoring this in genomic evaluation models with dominance effects alters the prediction of breeding values and may cause bias in the prediction of inbreeding depression (or heterosis) and of the performance of future mates. These effects can be avoided by using two alternative models, one that includes a non-zero mean of dominance effects and another that uses skewed prior distributions for them, with the latter providing a better fit. Thus, this approach should be recommended for modeling dominance effects, at least for datasets that have similar features as those analyzed here.