Background

Commercial pig producers generally use a terminal crossbreeding system with three breeds. In this system, F1 sows from two maternal breeds are mated to purebred boars from a breed that has high-level production traits (growth, leanness, feed efficiency) to produce pigs for slaughter. Commonly, boar lines in Europe are Duroc and Pietrain and sows are crosses between Large White and Landrace. Genetic evaluation is usually done within each of these breeds based on recorded phenotypes on purebred animals. However, ideally genetic evaluation of purebreds should incorporate phenotypes of interest recorded on crossbreds, and breeding values for performance in the three-way cross should be estimated.

Many pig breeding organisations have started to use genomic selection [1], for which genetic evaluation is often done by applying single-step methods [24] to handle the fact that only a fraction of the animals are genotyped. Here, the pedigree-based additive genetic relationship matrix is replaced by a combined relationship matrix based on both marker genotypes and pedigree. Genomic selection is implemented for purebreds, but it also offers opportunities for incorporating information from crossbreds and selecting for crossbred performance [57].

Table 1 Example pedigree

For two-way terminal crossbreeding (two breeds named \(\mathcal {A}\) and \(\mathcal {B}\), and all crossbred animals \(\mathcal {A}\mathcal {B}\) have known purebred parents), Wei and van der Werf [8] proposed the following trivariate model:

$$\begin{aligned} \mathbf {y}_{\mathcal {A}}&= \mathbf {X}_{\mathcal {A}} \varvec{\beta }_{\mathcal {A}} + \mathbf {Z}_{\mathcal {A}} \mathbf {a}_{\mathcal {A}} + \mathbf {e}_{\mathcal {A}} , \nonumber \\ \mathbf {y}_{\mathcal {B}}&= \mathbf {X}_{\mathcal {B}} \varvec{\beta }_{\mathcal {B}} + \mathbf {Z}_{\mathcal {B}}\mathbf {a}_{\mathcal {B}} + \mathbf {e}_{\mathcal {B}} , \nonumber \\ \mathbf {y}_{\mathcal {A}\mathcal {B}}&= \mathbf {X}_{\mathcal {A}\mathcal {B}} \varvec{\beta }_{\mathcal {A}\mathcal {B}} + \mathbf {g}_{\mathcal {A}\mathcal {B}} + \mathbf {e}_{\mathcal {A}\mathcal {B}} , \end{aligned}$$
(1)

where the vectors \(\mathbf {y}_{\mathcal {A}}\), \(\mathbf {y}_{\mathcal {B}}\) and \(\mathbf {y}_{\mathcal {A}\mathcal {B}}\) contain phenotypes on animals from breeds \(\mathcal {A}\) and \(\mathcal {B}\) and from the cross \(\mathcal {A}\mathcal {B}\), respectively, and for the three populations \(\mathcal {A}\), \(\mathcal {B}\) and \(\mathcal {A}\mathcal {B}\), the vectors \(\varvec{\beta }_{\mathcal {A}}\), \(\varvec{\beta }_{\mathcal {B}}\) and \(\varvec{\beta }_{\mathcal {A}\mathcal {B}}\) contain fixed effects (note that intercepts should always be included!), and \(\mathbf {e}_{\mathcal {A}} \sim N(\mathbf {0}, \sigma ^2_{e,\mathcal {A}} \mathbf {I})\), \(\mathbf {e}_{\mathcal {B}} \sim N(\mathbf {0}, \sigma ^2_{e,\mathcal {B}} \mathbf {I})\) and \(\mathbf {e}_{\mathcal {A}\mathcal {B}} \sim N(\mathbf {0}, \sigma ^2_{e,\mathcal {A}\mathcal {B}} \mathbf {I})\) are the residual error vectors. The vectors \(\mathbf {a}_{\mathcal {A}}\) and \(\mathbf {a}_{\mathcal {B}}\) contain breeding values for purebred performance (mating within breed) for breeds \(\mathcal {A}\) and \(\mathcal {B}\), respectively, and the vector of genetic values on the crossbreds, \(\mathbf {g}_{\mathcal {A}\mathcal {B}}\), is related to the vectors of breeding values on purebred animals for crossbred performance (mating with the other breed), \(\mathbf {g}_{\mathcal {A}}\) and \(\mathbf {g}_{\mathcal {B}}\), by additive pedigree-based relationships (throughout this paper, additive genetic effects for purebred performance and for crossbred performance are denoted by \(\mathbf {a}\) and \(\mathbf {g}\), respectively). Each animal has then two breeding values (one related to mating within breed, e.g. \(\mathbf {a}_{\mathcal {A}}\), and another related to mating to another breed to produce the cross, e.g. \(\mathbf {g}_{\mathcal {A}}\)) and these are correlated. Genetic correlations less than 1 are due to the presence of non-additive gene action in combination with different allele frequencies in the two breeds [9, 10], but also to genotype by environment interactions. The model also assumes different genetic variances in the two pure breeds, which is often the case in practice. Christensen et al. [11] reformulated the model using partial relationship matrices (see below) and constructed those from a combination of marker genotypes and pedigree in such a way that it could be fitted by using standard animal breeding software, i.e. a single-step method was developed.

Table 2 Breed \(\mathcal {A}\) specific partial relationship matrix \(\mathbf {A}^{\mathcal {A}}\) for the pedigree in Table 1

The aim of this work was to develop models for three-way terminal crosses that handle both pedigree-based and marker-based relationships, as well as combined relationship matrices based on both pedigree and marker genotypes. As indicated above, an essential part of the model is the specification of relationships such that the model can be fitted by using standard animal breeding software.

Methods

We present a specific scenario with records on all three pure breeds and on three-way production pigs, but not on two-way crossbred sows, having in mind production traits such as daily gain, leanness or feed efficiency. However, since we will specify relationships across all five populations, it is straightforward to generalise to other scenarios with records.

Table 3 Breed \(\mathcal {B}\) specific partial relationship matrix \(\mathbf {A}^{\mathcal {B}}\) for the pedigree in Table 1

The model for this three-way terminal crossbreeding system is in principle a straightforward generalisation of the Wei and van der Werf model [8] to the following four-variate model:

$$\begin{aligned} \mathbf {y}_{\mathcal {A}}&= \mathbf {X}_{\mathcal {A}} \varvec{\beta }_{\mathcal {A}} + \mathbf {Z}_{\mathcal {A}} \mathbf {a}_{\mathcal {A}} + \mathbf {e}_{\mathcal {A}} , \nonumber \\ \mathbf {y}_{\mathcal {B}}&= \mathbf {X}_{\mathcal {B}} \varvec{\beta }_{\mathcal {B}} + \mathbf {Z}_{\mathcal {B}}\mathbf {a}_{\mathcal {B}} + \mathbf {e}_{\mathcal {B}} , \nonumber \\ \mathbf {y}_{\mathcal {C}}&= \mathbf {X}_{\mathcal {C}} \varvec{\beta }_{\mathcal {C}} + \mathbf {Z}_{\mathcal {C}}\mathbf {a}_{\mathcal {C}} + \mathbf {e}_{\mathcal {C}} , \nonumber \\ \mathbf {y}_{\mathcal {C}(\mathcal {A}\mathcal {B})}&= \mathbf {X}_{\mathcal {C}(\mathcal {A}\mathcal {B})} \varvec{\beta }_{\mathcal {C}(\mathcal {A}\mathcal {B})} + \mathbf {g}_{\mathcal {C}(\mathcal {A}\mathcal {B})} + \mathbf {e}_{\mathcal {C}(\mathcal {A}\mathcal {B})} , \end{aligned}$$
(2)

where notation is defined as for Eq. (1), and it is assumed that all \(\mathcal {C}(\mathcal {A}\mathcal {B})\) animals have a purebred \(\mathcal {C}\) father and crossbred \(\mathcal {A}\mathcal {B}\) mother, and that these \(\mathcal {A}\mathcal {B}\) animals all have purebred parents. Breed \(\mathcal {C}\) animals have two breeding values that are correlated, \(\mathbf {a}_{\mathcal {C}}\) for purebred performance (mating within breed) and \(\mathbf {g}_{\mathcal {C}}\) for \(\mathcal {C}(\mathcal {A}\mathcal {B})\) crossbred performance (mating between a male and a \(\mathcal {A}\mathcal {B}\) crossbred female). Breed \(\mathcal {A}\) animals also have two breeding values, \(\mathbf {a}_{\mathcal {A}}\) for purebred performance (mating within breed) and \(\mathbf {g}_{\mathcal {A}}\) for crossbred \(\mathcal {C}(\mathcal {A}\mathcal {B})\) performance (mating with a breed \(\mathcal {B}\) animal whose female \(\mathcal {A}\mathcal {B}\) crossbred offspring is mated with a breed \(\mathcal {C}\) male). Finally, breed \(\mathcal {B}\) animals have two breeding values \(\mathbf {a}_{\mathcal {B}}\) and \(\mathbf {g}_{\mathcal {B}}\), defined similarly to the breeding values for breed \(\mathcal {A}\). For each breed, association between breeding values for purebred and crossbred performances is determined by a \(2\times 2\) genetic variance-covariance matrix. An essential part of the model is the specification of the additive relationships between genetic values for crossbred performance on crossbred animals and purebred animals, and in particular marker-based versions of these relationships such that pedigree-based and marker-based relationships are consistent. These relationships should also be specified in such a way that the model can be formulated using Kronecker products, allowing the model to be fitted by using Henderson’s mixed model equations and standard animal breeding software. Additive relationships are relationships between gene substitution effects and these can be defined either within populations or across populations [12]. These two approaches will be called “partial genetic” and “common genetic” approaches in the following.

Table 4 Breed \(\mathcal {C}\) specific partial relationship matrix \(\mathbf {A}^{\mathcal {C}}\) for the pedigree in Table 1

Lo et al. [13] derived the following recursive formulas for the variance and covariance of genotypic values for animals composed of multiple breeds under an additive model. Let the genotypic value of individual i be \(g_i\), then the additive variance is:

$$\begin{aligned} {\text {Var}}(g_i)&= \sum _b f^b_i \sigma ^2_{g,b} + {\text {Cov}}(g_{f(i)},g_{m(i)}) \nonumber \\ & \quad + 2 \sum _b\sum _{b^{\prime }>b} (f^b_{f(i)}f^{b^{\prime }}_{f(i)}+f^b_{m(i)}f^{b^{\prime }}_{m(i)})\sigma ^2_{g,b,b^{\prime }} \end{aligned}$$
(3)

where b and \(b^{\prime }\) denote breeds, \(f^b_i\) is the breed b content of individual i, \(\sigma ^2_{g,b}\) is the breed b genetic variance, \(g_{f(i)}\) and \(g_{m(i)}\) are the additive genetic values of parents f(i) and m(i), respectively, and \(\sigma ^2_{g,b,b^{\prime }}\) is the breed b and breed \(b^{\prime }\) segregation genetic variance. The additive covariance between genotypic values of individuals i and \(i^{\prime }\) is:

$$\begin{aligned} {\text {Cov}}(g_i, g_{i^{\prime }}) =({\text {Cov}}(g_{f(i)},g_{i^{\prime }}) + {\text {Cov}}(g_{m(i)},g_{i^{\prime }}))/2 , \end{aligned}$$
(4)

when \(i^{\prime }\ne i\) is not a descendant of i.

Table 5 Breed \(\mathcal {A}\mathcal {B}\) segregation partial relationship matrix \(\mathbf {A}^{\mathcal {A}\mathcal {B}}\) for the pedigree in Table 1

García-Cortés and Toro [14] showed that Eqs. (3) and (4) could be expressed as (using matrix notation):

$$\begin{aligned} {\text {Var}}(\mathbf {g})= \sum _b \sigma ^2_b\mathbf {A}^b + \sum _b\sum _{b^{\prime }>b} \sigma ^2_{b,b^{\prime }}\mathbf {A}^{b,b^{\prime }} \end{aligned}$$

where the \(\mathbf {A}^b\) and \(\mathbf {A}^{b,b^{\prime }}\) matrices are separately defined using recursions, and that this provides a partition of the vector of genotypic values into:

$$\begin{aligned} \mathbf {g}= \sum _b \mathbf {g}^b + \sum _b\sum _{b^{\prime }>b} \mathbf {g}^{b,b^{\prime }} , \end{aligned}$$

where all the \(\mathbf {g}^b\), \(\mathbf {g}^{b,b^{\prime }}\) vectors are independent, \({\text {Var}}(\mathbf {g}^b)=\sigma ^2_b\mathbf {A}^b\) and \({\text {Var}}(\mathbf {g}^{b,b^{\prime }})= \sigma ^2_{b,b^{\prime }}\mathbf {A}^{b,b^{\prime }}\). They termed matrix \(\mathbf {A}^b\) as the breed b specific partial relationship matrix and matrix \(\mathbf {A}^{b,b^{\prime }}\) the breed b and breed \(b^{\prime }\) segregation partial relationship matrix. The vectors \(\mathbf {g}^b\) and \(\mathbf {g}^{b,b^{\prime }}\) depend on genetic origin, such that \(\mathbf {g}^b\) is the breed b specific partial genetic vector, and \(\mathbf {g}^{b,b^{\prime }}\) is the breed b and breed \(b^{\prime }\) segregation partial genetic vector. Matrices \(\mathbf {A}^b\) and \(\mathbf {A}^{b,b^{\prime }}\) have sparse inverses that can be computed using the usual methods for the additive relationship matrix (see [14]). In this paper, the approach using a partition of the genetic effects into independent terms is named partial genetic approach.

Table 6 Common relationship matrix \(\mathbf {A}(\varvec{\Gamma })\) for the pedigree in Table 1

Legarra et al. [15] proposed that pedigree relationships should be specified across all animals, and that for base animals in the pedigree, the pedigree-based relationships within and across breeds and inbreeding should be estimated from observed marker genotypes. This approach is contradictory to the García-Cortés and Toro [14] approach described above, since it violates the assumption of independence of the \(\mathbf {g}^b\) and \(\mathbf {g}^{b,b^{\prime }}\) vectors. The approach in which relationships are specified across breeds is named common genetic approach.

Table 7 Pedigree in Table 1 with metafounders

First, partial genetic and common genetic approaches for constructing pedigree-based relationships are presented, then the corresponding two different ways of constructing marker-based relationships are presented, and finally the genetic variances and covariances in model (2) are shown for the two approaches. Detailed derivations are in the “Appendix”.

Additive genetic model for crossbred \(\mathcal {C}(\mathcal {A}\mathcal {B})\) performance: partial genetic approach

For the three-way crossbreeding system, the decomposition of the additive genetic effects by García-Cortés and Toro [14] is as follows. For a \(\mathcal {C}(\mathcal {A}\mathcal {B})\) crossbred animal,

$$\begin{aligned} g_i= g^{\mathcal {C}}_{\mathcal {C}(\mathcal {A}\mathcal {B}),i} + g^{\mathcal {A}}_{\mathcal {C}(\mathcal {A}\mathcal {B}),i} + g^{\mathcal {B}}_{\mathcal {C}(\mathcal {A}\mathcal {B}),i} + g^{\mathcal {A}\mathcal {B}}_{\mathcal {C}(\mathcal {A}\mathcal {B}),i}, \end{aligned}$$

where terms \(\mathbf {g}^{\mathcal {C}}_{\mathcal {C}(\mathcal {A}\mathcal {B})}\), \(\mathbf {g}^{\mathcal {A}}_{\mathcal {C}(\mathcal {A}\mathcal {B})}\), \(\mathbf {g}^{\mathcal {B}}_{\mathcal {C}(\mathcal {A}\mathcal {B})}\) are breed of origin specific partial genetic effects and \(\mathbf {g}^{\mathcal {A}\mathcal {B}}_{\mathcal {C}(\mathcal {A}\mathcal {B})}\) is a breed-segregation term. For a \(\mathcal {A}\mathcal {B}\) crossbred sow,

$$\begin{aligned} g_i = g^{\mathcal {A}}_{\mathcal {A}\mathcal {B},i} + g^{\mathcal {B}}_{\mathcal {A}\mathcal {B},i}, \end{aligned}$$

with terms \(\mathbf {g}^{\mathcal {A}}_{\mathcal {A}\mathcal {B}}\) and \(\mathbf {g}^{\mathcal {B}}_{\mathcal {A}\mathcal {B}}\) being breed of origin partial genetic effects. Finally, for purebred animals, the three vectors of breeding values for crossbred \(\mathcal {C}(\mathcal {A}\mathcal {B})\) performance, \(\mathbf {g}_{\mathcal {A}}\), \(\mathbf {g}_{\mathcal {B}}\) and \(\mathbf {g}_{\mathcal {C}}\), are defined as being equal to the genotypic values.

In this way, a breed-specific partial genetic effect is defined for all animals containing the specific breed, and a breed-segregation partial genetic effect is defined for crossbred \(\mathcal {C}(\mathcal {A}\mathcal {B})\) animals. Assuming that base individuals in the three breeds are not related across breeds implies that:

$$\begin{aligned} \mathbf {g}^{\mathcal {A}}=\left[ \begin{array}{c} \mathbf {g}_{\mathcal {A}} \\ \mathbf {g}^{\mathcal {A}}_{\mathcal {A}\mathcal {B}} \\ \mathbf {g}^{\mathcal {A}}_{\mathcal {C}(\mathcal {A}\mathcal {B})} \\ \end{array} \right] , \, \mathbf {g}^{\mathcal {B}}=\left[ \begin{array}{c} \mathbf {g}_{\mathcal {B}} \\ \mathbf {g}^{\mathcal {B}}_{\mathcal {A}\mathcal {B}} \\ \mathbf {g}^{\mathcal {B}}_{\mathcal {C}(\mathcal {A}\mathcal {B})} \\ \end{array} \right] , \, \mathbf {g}^{\mathcal {C}}=\left[ \begin{array}{c} \mathbf {g}_{\mathcal {C}} \\ \mathbf {g}^{\mathcal {C}}_{\mathcal {C}(\mathcal {A}\mathcal {B})} \\ \end{array} \right] \end{aligned}$$

are independent. In addition, for a crossbred \(\mathcal {C}(\mathcal {A}\mathcal {B})\) individual the fact that it inherits either a breed \(\mathcal {A}\) or \(\mathcal {B}\) allele is independent of what particular alleles the \(\mathcal {A}\mathcal {B}\) mother has and what alleles all other \(\mathcal {A}\mathcal {B}\) individuals have, and hence \(\mathbf {g}^{\mathcal {A}\mathcal {B}}_{\mathcal {C}(\mathcal {A}\mathcal {B})}\) is independent of the vectors above.

The variance-covariance matrices of the partial genetic effects become (García-Cortés and Toro [14]):

$$\begin{aligned} {\text {Var}}\left[ \begin{array}{c} \mathbf {g}_{\mathcal {A}} \\ \mathbf {g}^{\mathcal {A}}_{\mathcal {A}\mathcal {B}} \\ \mathbf {g}^{\mathcal {A}}_{\mathcal {C}(\mathcal {A}\mathcal {B})} \\ \end{array} \right] = \sigma _{g,\mathcal {A}}^2 \mathbf {A}^{\mathcal {A}}, {\text {Var}}\left[ \begin{array}{c} \mathbf {g}_{\mathcal {B}} \\ \mathbf {g}^{\mathcal {B}}_{\mathcal {A}\mathcal {B}} \\ \mathbf {g}^{\mathcal {B}}_{\mathcal {C}(\mathcal {A}\mathcal {B})} \\ \end{array} \right] = \sigma _{g,\mathcal {B}}^2 \mathbf {A}^{\mathcal {B}} , \end{aligned}$$
$$\begin{aligned} {\text {Var}}\left[ \begin{array}{c} \mathbf {g}_{\mathcal {C}} \\ \mathbf {g}^{\mathcal {C}}_{\mathcal {C}(\mathcal {A}\mathcal {B})} \\ \end{array} \right] = \sigma _{g,\mathcal {C}}^2 \mathbf {A}^{\mathcal {C}}, {\text {Var}}\left[ \mathbf {g}^{\mathcal {A}\mathcal {B}}_{\mathcal {C}(\mathcal {A}\mathcal {B})} \right] = \sigma _{g,\mathcal {A}\mathcal {B}}^2 \mathbf {A}^{\mathcal {A}\mathcal {B}}, \end{aligned}$$

where the breed-specific partial relationship matrices are defined by the recursive formulas:

$$\begin{aligned} A^{b}_{ii}&= f^{b}_i + A^{b}_{f(i)m(i)}/2, \\ A^{b}_{ii^{\prime }}&= (A^{b}_{f(i)i^{\prime }}+A^{b}_{m(i)i^{\prime }})/2, \end{aligned}$$

for breed \(b=\mathcal {A},\mathcal {B},\mathcal {C}\), with \(f_i^b\) denoting the breed b proportion, and the breed-segregation partial relationship matrix is defined by the recursive formulas:

$$\begin{aligned} A^{\mathcal {A}\mathcal {B}}_{ii}&= 2 (f^{\mathcal {A}}_{f(i)} f^{\mathcal {B}}_{f(i)} + f^{\mathcal {A}}_{m(i)} f^{\mathcal {B}}_{m(i)}) + A^{\mathcal {A}\mathcal {B}}_{f(i)m(i)}/2, \\ A^{\mathcal {A}\mathcal {B}}_{ii^{\prime }}&= (A^{\mathcal {A}\mathcal {B}}_{f(i)i^{\prime }}+A^{\mathcal {A}\mathcal {B}}_{m(i)i^{\prime }})/2, \end{aligned}$$

where in both cases non-contributing animals are not included in the resulting matrices. We immediately see that \(\mathcal {C}(\mathcal {A}\mathcal {B})\) animals are the only animals contributing to matrix \(\mathbf {A}^{\mathcal {A}\mathcal {B}}\), and since \(f^{\mathcal {A}}_{m(i)} =f^{\mathcal {B}}_{m(i)}=1/2\) for these animals, the matrix is a diagonal matrix with diagonal elements equal to \(2\times 1/2\times 1/2 = 1/2\). This specification of additive relationships using partial relationship matrices is equivalent to the specification in Eqs. (3) and (4).

To illustrate the different partial relationship matrices, we analysed the small pedigree in Table 1. Tables 2, 3, 4 and 5 show the partial relationship matrices for this example.

Wei and van der Werf [8] presented a reduced form of the two-way crossbreeding model (1) in which the Mendelian sampling term of the genetic effect on crossbred animals was included in the residual error term. A reduced model can also be formulated for the three-way crossbreeding model by expressing:

$$\begin{aligned} g_i&= 0.5g_{f(i)} + 0.5g_{m(i)} + \Phi _{\mathcal {C}(\mathcal {A}\mathcal {B}),i} \\&= 0.5g_{\mathcal {C},f(i)} + 0.5 g^{\mathcal {A}}_{\mathcal {A}\mathcal {B},m(i)} + 0.5g^{\mathcal {B}}_{\mathcal {A}\mathcal {B},m(i)} + \Phi _{\mathcal {C}(\mathcal {A}\mathcal {B}),i}, \end{aligned}$$

for a \(\mathcal {C}(\mathcal {A}\mathcal {B})\) crossbred animal i, where \(\Phi _{\mathcal {C}(\mathcal {A}\mathcal {B}),i}\) is the Mendelian sampling term. The Mendelian sampling terms are independent among the \(\mathcal {C}(\mathcal {A}\mathcal {B})\) crossbred animals, and by making the approximation that father f(i) is not inbred and since mother m(i) is not inbred, the variance is constant. In this way, the Mendelian sampling error term can be included into the residual error term \(\mathbf {e}_{\mathcal {C}(\mathcal {A}\mathcal {B})}\) in model (2), and the model can be formulated using three breed-specific partial relationship matrices defined on the \(\mathcal {A},\mathcal {B},\mathcal {C}\) and \(\mathcal {A}\mathcal {B}\) animals. However, as explained in Christensen et al. [11], such a reduced model cannot be extended to incorporate marker genotypes since these provide information about the Mendelian sampling term. Therefore, we did not pursue the reduced form of the model any further.

Note that model (2) with relationships as presented here is the most obvious generalisation of the Wei and van der Werf model in Eq. (1) from two to three breeds since base individuals are assumed unrelated. Without a formulation using partial relationship matrices, it would be difficult to estimate parameters in this model using standard animal breeding software.

Additive genetic model for crossbred \(\mathcal {C}(\mathcal {A}\mathcal {B})\) performance: common genetic approach

In the previous subsection, base animals were assumed to be unrelated. An alternative proposed by Legarra et al. [15] is to assume that base animals are related and inbred within breeds and related between breeds with relationships determined by:

$$\begin{aligned} \varvec{\Gamma }= \left[ \begin{array}{ccc} \gamma _{\mathcal {A}} &{} \gamma _{\mathcal {A},\mathcal {B}} &{} \gamma _{\mathcal {A},\mathcal {C}} \\ \gamma _{\mathcal {A},\mathcal {B}} &{} \gamma _{\mathcal {B}} &{} \gamma _{\mathcal {B},\mathcal {C}} \\ \gamma _{\mathcal {A},\mathcal {C}} &{} \gamma _{\mathcal {B},\mathcal {C}} &{} \gamma _{\mathcal {C}} \\ \end{array} \right] . \end{aligned}$$

This means that among the base animals, the variance-covariance of genetic effects is as follows. The variance-covariance within breed is defined by:

$$\begin{aligned} {\text {Var}}(g_i)= \sigma _g^2(1 + \gamma _b/2), \end{aligned}$$

for an individual in breed b, and

$$\begin{aligned} {\text {Cov}}(g_i,g_{i^{\prime }})= \sigma _g^2\gamma _b, \end{aligned}$$

for two individuals in breed b, i.e. base animals are inbred with coefficient \(\gamma _b/2\) and related with relationship coefficient \(\gamma _b\). Furthermore,

$$\begin{aligned} {\text {Cov}}(g_i,g_{i^{\prime }})= \sigma _g^2 \gamma _{b,b^{\prime }}, \end{aligned}$$

for two individuals in different breeds b and \(b^{\prime }\), i.e. base animals in different breeds are related. Therefore, a joint relationship matrix is specified among all base animals, and by applying the usual recursive definition:

$$\begin{aligned} A(\varvec{\Gamma })_{ii}&= 1 + A(\varvec{\Gamma })_{f(i)m(i)}/2, \\ A(\varvec{\Gamma })_{ii^{\prime }}&= (A(\varvec{\Gamma })_{f(i)i^{\prime }}+A(\varvec{\Gamma })_{m(i)i^{\prime }})/2, \end{aligned}$$

an additive relationship matrix \(\mathbf {A}(\varvec{\Gamma })\) is defined across all animals with relationships among the three base populations \(\mathcal {A},\mathcal {B}\) and \(\mathcal {C}\) defined by matrix \(\varvec{\Gamma }\). The variance-covariance of genetic effects is therefore determined by

$$\begin{aligned} {\text {Var}}\left[ \begin{array}{c} \mathbf {g}_{\mathcal {A}} \\ \mathbf {g}_{\mathcal {B}} \\ \mathbf {g}_{\mathcal {C}} \\ \mathbf {g}_{\mathcal {A}\mathcal {B}} \\ \mathbf {g}_{\mathcal {C}(\mathcal {A}\mathcal {B})} \\ \end{array} \right] = \sigma ^2_g \mathbf {A}(\varvec{\Gamma }). \end{aligned}$$
(5)

Table 6 shows the common relationship matrix for the pedigree in Table 1.

Legarra et al. [15] suggested a framework where individuals in the base population of the pedigree are related because they originate from overlapping ancestral populations with a finite size, and they termed each of these ancestral populations as a meta-founder to be included in the pedigree. Here, \(\mathcal {A},\mathcal {B},\mathcal {C}\) are meta-founders, and each base individual in the pedigree has a meta-founder, which is both its parents; see example in Table 7. When extending the pedigree and the matrix \(\mathbf {A}(\varvec{\Gamma })\) with these meta-founders, Legarra et al. [15] showed that the algorithms for computing the sparse inverse matrix \(\mathbf {A}(\varvec{\Gamma })^{-1}\) directly as in Henderson [16] and submatrices of \(\mathbf {A}(\varvec{\Gamma })\) by the Colleau algorithm [17] are as usual.

The parameter \(\sigma _g^2\) in Eq. (5) does not correspond to the usual genetic variance which is the variance among unrelated individuals in the base population. As explained in Legarra et al. [15], \(\sigma _g^2(1 - \gamma _b/2)\) corresponds to the variance among unrelated breed b animals, and therefore the genetic variances for crossbred \(\mathcal {C}(\mathcal {A}\mathcal {B})\) performance are \(\sigma _g^2(1 - \gamma _{\mathcal {A}}/2)\), \(\sigma _g^2(1 - \gamma _{\mathcal {B}}/2)\) and \(\sigma _g^2(1 - \gamma _{\mathcal {C}}/2)\), corresponding to \(\sigma _{g,\mathcal {A}}^2\), \(\sigma _{g,\mathcal {B}}^2\) and \(\sigma _{g,\mathcal {C}}^2\) in the previous section, respectively. In addition, Legarra et al. [15] explained that the breed-segregation variance is \(\sigma ^2_g((\gamma _{\mathcal {A}}+\gamma _{\mathcal {B}})/2-\gamma _{\mathcal {A},\mathcal {B}})/4\), which corresponds to \(\sigma _{g,\mathcal {A}\mathcal {B}}^2\) in the previous section.

Genomic model for crossbred \(\mathcal {C}(\mathcal {A}\mathcal {B})\) performance: partial genetic approach

Marker-based partial relationship matrices are constructed by tracing breed of origin of alleles and defining relationships according to breed of origin. Assume that breed of origin of alleles can be determined for all animals and define breed-specific allele content matrices as: matrix \(\mathbf {m}^b\) with entries 0, 1, 2 for purebred b animals, matrices \(\mathbf {z}^{\mathcal {A}}\) and \(\mathbf {z}^{\mathcal {B}}\) with entries 0, 1 for paternal and maternal alleles, respectively, for crossbred \(\mathcal {A}\mathcal {B}\) animals, matrix \(\mathbf {z}^{\mathcal {C}}\) with entries 0, 1 for paternal allele of crossbred \(\mathcal {C}(\mathcal {A}\mathcal {B})\) animals, and finally matrices \(\mathbf {z}_p^{\mathcal {A}}\) and \(\mathbf {z}_p^{\mathcal {B}}\) with entries 0, 1, respectively, for crossbred \(\mathcal {C}(\mathcal {A}\mathcal {B})\) animals when the breed-specific allele is inherited and zero otherwise. This means that breed of origin of each allele needs to be traced, usually by a phasing software [18].

Marker-based breed-specific partial relationship matrices are constructed as follows (details can be found in the “Appendix”). For breed \(\mathcal {A}\), the marker-based breed \(\mathcal {A}\) specific partial relationship matrix \(\mathbf {G}^{\mathcal {A}}\) is divided into submatrices with indices denoting genotyped breed \(\mathcal {A}\) and crossbred \(\mathcal {A}\mathcal {B}\) animals,

$$\begin{aligned} \mathbf {G}^{\mathcal {A}} = \left[ \begin{array}{ccc} \mathbf {G}^{\mathcal {A}}_{\mathcal {A},\mathcal {A}} &{} \mathbf {G}^{\mathcal {A}}_{\mathcal {A},\mathcal {A}\mathcal {B}} &{} \mathbf {G}^{\mathcal {A}}_{\mathcal {A},\mathcal {C}(\mathcal {A}\mathcal {B})} \\ \mathbf {G}^{\mathcal {A}}_{\mathcal {A}\mathcal {B},\mathcal {A}} &{} \mathbf {G}^{\mathcal {A}}_{\mathcal {A}\mathcal {B},\mathcal {A}\mathcal {B}} &{} \mathbf {G}^{\mathcal {A}}_{\mathcal {A}\mathcal {B},\mathcal {C}(\mathcal {A}\mathcal {B})} \\ \mathbf {G}^{\mathcal {A}}_{\mathcal {C}(\mathcal {A}\mathcal {B}),\mathcal {A}} &{} \mathbf {G}^{\mathcal {A}}_{\mathcal {C}(\mathcal {A}\mathcal {B}),\mathcal {A}\mathcal {B}} &{} \mathbf {G}^{\mathcal {A}}_{\mathcal {C}(\mathcal {A}\mathcal {B}),\mathcal {C}(\mathcal {A}\mathcal {B})} \end{array} \right] , \end{aligned}$$

which are defined as

$$\begin{aligned}&\mathbf {G}^{\mathcal {A}}_{\mathcal {A},\mathcal {A}}=\frac{(\mathbf {m}^{\mathcal {A}}-2\mathbf {p}^{\mathcal {A}}\mathbf {1}^\mathrm{{\scriptstyle {T}}})(\mathbf {m}^{\mathcal {A}}-2\mathbf {p}^{\mathcal {A}}\mathbf {1}^\mathrm{{\scriptstyle {T}}})^\mathrm{{\scriptstyle {T}}}}{s^{\mathcal {A}}} , \\&\mathbf {G}^{\mathcal {A}}_{\mathcal {A},\mathcal {A}\mathcal {B}}=\frac{(\mathbf {m}^{\mathcal {A}}-2\mathbf {p}^{\mathcal {A}}\mathbf {1}^\mathrm{{\scriptstyle {T}}})(\mathbf {z}^{\mathcal {A}}-\mathbf {p}^{\mathcal {A}}\mathbf {1}^\mathrm{{\scriptstyle {T}}})^\mathrm{{\scriptstyle {T}}}}{s^{\mathcal {A}}} , \\&\mathbf {G}^{\mathcal {A}}_{\mathcal {A},\mathcal {C}(\mathcal {A}\mathcal {B})}=\frac{(\mathbf {m}^{\mathcal {A}}-2\mathbf {p}^{\mathcal {A}}\mathbf {1}^\mathrm{{\scriptstyle {T}}})(\mathbf {z}_p^{\mathcal {A}}-\mathbf {p}_p^{\mathcal {A}})^\mathrm{{\scriptstyle {T}}}}{s^{\mathcal {A}}} , \\&\mathbf {G}^{\mathcal {A}}_{\mathcal {A}\mathcal {B},\mathcal {A}\mathcal {B}}=\frac{(\mathbf {z}^{\mathcal {A}}-\mathbf {p}^{\mathcal {A}}\mathbf {1}^\mathrm{{\scriptstyle {T}}})(\mathbf {z}^{\mathcal {A}}-\mathbf {p}^{\mathcal {A}}\mathbf {1}^\mathrm{{\scriptstyle {T}}})^\mathrm{{\scriptstyle {T}}}}{s^{\mathcal {A}}}, \\&\mathbf {G}^{\mathcal {A}}_{\mathcal {A}\mathcal {B},\mathcal {C}(\mathcal {A}\mathcal {B})}=\frac{(\mathbf {z}^{\mathcal {A}}-\mathbf {p}^{\mathcal {A}}\mathbf {1}^\mathrm{{\scriptstyle {T}}})(\mathbf {z}_p^{\mathcal {A}}-\mathbf {p}_p^{\mathcal {A}})^\mathrm{{\scriptstyle {T}}}}{s^{\mathcal {A}}} , \\&\mathbf {G}^{\mathcal {A}}_{\mathcal {C}(\mathcal {A}\mathcal {B}),\mathcal {C}(\mathcal {A}\mathcal {B})}=\frac{(\mathbf {z}_p^{\mathcal {A}}-\mathbf {p}_p^{\mathcal {A}})(\mathbf {z}_p^{\mathcal {A}}-\mathbf {p}_p^{\mathcal {A}})^\mathrm{{\scriptstyle {T}}}}{s^{\mathcal {A}}} , \end{aligned}$$

where the vector \(\mathbf {p}^{\mathcal {A}}\) contains breed \(\mathcal {A}\) specific allele frequencies, matrix \(\mathbf {p}_p^{\mathcal {A}}\) has elements (ij) equal to \(p^{\mathcal {A}}_j\) when the crossbred \(\mathcal {C}(\mathcal {A}\mathcal {B})\) individual i inherited an \(\mathcal {A}\) specific allele and zero otherwise, and \(s^{\mathcal {A}}\) is a scaling parameter. The marker-based breed \(\mathcal {B}\) specific partial relationship matrix \(\mathbf {G}^{\mathcal {B}}\) is defined similarly to \(\mathbf {G}^{\mathcal {A}}\), and the marker-based breed \(\mathcal {C}\) specific partial relationship matrix is

$$\begin{aligned} \mathbf {G}^{\mathcal {C}} = \left[ \begin{array}{cc} \mathbf {G}^{\mathcal {C}}_{\mathcal {C},\mathcal {C}} &{} \mathbf {G}^{\mathcal {A}}_{\mathcal {C},\mathcal {C}(\mathcal {A}\mathcal {B})} \\ \mathbf {G}^{\mathcal {C}}_{\mathcal {C}(\mathcal {A}\mathcal {B}),\mathcal {C}} &{} \mathbf {G}^{\mathcal {A}}_{\mathcal {C}(\mathcal {A}\mathcal {B}),\mathcal {C}(\mathcal {A}\mathcal {B})} \end{array} \right] , \end{aligned}$$

where submatrices are defined as

$$\begin{aligned}&\mathbf {G}^{\mathcal {C}}_{\mathcal {C},\mathcal {C}}=\frac{(\mathbf {m}^{\mathcal {C}}-2\mathbf {p}^{\mathcal {C}}\mathbf {1}^\mathrm{{\scriptstyle {T}}})(\mathbf {m}^{\mathcal {C}}-2\mathbf {p}^{\mathcal {C}}\mathbf {1}^\mathrm{{\scriptstyle {T}}})^\mathrm{{\scriptstyle {T}}}}{s^{\mathcal {C}}} , \\&\mathbf {G}^{\mathcal {C}}_{\mathcal {C},\mathcal {C}(\mathcal {A}\mathcal {B})}=\frac{(\mathbf {m}^{\mathcal {C}}-2\mathbf {p}^{\mathcal {C}}\mathbf {1}^\mathrm{{\scriptstyle {T}}})(\mathbf {z}^{\mathcal {C}}-\mathbf {p}^{\mathcal {A}}\mathbf {1}^\mathrm{{\scriptstyle {T}}})^\mathrm{{\scriptstyle {T}}}}{s^{\mathcal {C}}} , \\&\mathbf {G}^{\mathcal {C}}_{\mathcal {C}(\mathcal {A}\mathcal {B}),\mathcal {C}(\mathcal {A}\mathcal {B})}=\frac{(\mathbf {z}^{\mathcal {C}}-\mathbf {p}^{\mathcal {C}}\mathbf {1}^\mathrm{{\scriptstyle {T}}})(\mathbf {z}^{\mathcal {C}}-\mathbf {p}^{\mathcal {C}}\mathbf {1}^\mathrm{{\scriptstyle {T}}})^\mathrm{{\scriptstyle {T}}}}{s^{\mathcal {C}}} , \end{aligned}$$

where the vector \(\mathbf {p}^{\mathcal {C}}\) contains estimated breed \(\mathcal {C}\) specific allele frequencies and \(s^{\mathcal {C}}\) is a scaling parameter.

The breed-segregation partial relationship matrix is defined as:

$$\begin{aligned} G^{\mathcal {A}\mathcal {B}}_{i,i^{\prime }} = \sum _j r_{d^j_i} r_{d^j_{i^{\prime }}}/(2n), \end{aligned}$$

where \(r_{d^j_i}=1\) when \(d^j_i\in \mathcal {A}\) and \(r_{d^j_i}=-1\) when \(d^j_i\in \mathcal {B}\), \(r_{d^j_{i^{\prime }}}\) is defined similarly to \(r_{d^j_i}\), and n is the number of markers. Note that diagonal elements of \(\mathbf {G}^{\mathcal {A}\mathcal {B}}\) equal diagonal elements of \(\mathbf {A}^{\mathcal {A}\mathcal {B}}\) (i.e. 1/2). Off-diagonal elements of \(G^{\mathcal {A}\mathcal {B}}\) measure whether pairs of individuals share more alleles from a particular parental breed (\(\mathcal {A}\) or \(\mathcal {B}\)) than expected. Expectations of off-diagonal elements \(\mathbf {G}^{\mathcal {A}\mathcal {B}}\) equal off-diagonal elements of \(\mathbf {A}^{\mathcal {A}\mathcal {B}}\) (i.e. 0).

Relationship matrices that combine pedigree and marker information [2, 4] can then be constructed. Below, indices 1 and 2 in submatrices denote non-genotyped and genotyped animals, respectively. The breed \(b=\mathcal {A},\mathcal {B},\mathcal {C}\) specific combined relationship matrices are given by their sparse inverses

$$\begin{aligned} (\mathbf {H}^{b})^{-1} = \left[ \begin{array}{cc} \mathbf {0}&{} \mathbf {0}\\ \mathbf {0}&{} (\mathbf {G}^{b})^{-1} - (\mathbf {A}^{b}_{22})^{-1} \end{array} \right] + (\mathbf {A}^{b})^{-1}, \end{aligned}$$
(6)

for \(b=\mathcal {A},\mathcal {B},\mathcal {C}\), and because \(\mathbf {A}^{\mathcal {A}\mathcal {B}}=\mathbf {I}/2\) the breed-segregation combined relationship matrix is

$$\begin{aligned} \mathbf {H}^{\mathcal {A}\mathcal {B}} = \left[ \begin{array}{cc} \mathbf {I}/2 &{} \mathbf {0}\\ \mathbf {0}&{} \mathbf {G}^{\mathcal {A}\mathcal {B}} \end{array} \right] . \end{aligned}$$
(7)

Matrices \((\mathbf {A}^{\mathcal {A}})^{-1}\), \((\mathbf {A}^{\mathcal {B}})^{-1}\) and \((\mathbf {A}^{\mathcal {C}})^{-1}\) can be computed directly in sparse format and matrices \(\mathbf {A}^{\mathcal {A}}_{22}\), \(\mathbf {A}^{\mathcal {B}}_{22}\) and \(\mathbf {A}^{\mathcal {C}}_{22}\) can be computed by the Colleau algorithm [17]; see Christensen et al. [11].

The breed-specific partial marker-based relationship matrices above require estimates of breed-specific allele frequencies. Such estimates can be obtained from marker genotypes of purebred animals and breed-specific marker alleles for crossbred animals. Furthermore, there is a need to adjust these matrices to be compatible with partial pedigree relationship matrices similar to Christensen et al. [11, 19], i.e. \(\mathbf {G}^b_a=\mathbf {G}^b\beta _b+\alpha _b\mathbf {J}^b\) where \(\alpha _b\) and \(\beta _b\) are parameters and \(\mathbf {J}^b\) is a matrix with entries \(\mathbf {J}^b_{i,i^{\prime }}=f^b_i f^b_{i^{\prime }}\). The scaling parameters \(s^b\) in marker-based relationship matrices \(\mathbf {G}^{b}\), b=\(\mathcal {A},\mathcal {B},\mathcal {C}\) are unspecified above, since the compatibility adjustment involves a scaling parameter \(\beta _b\) for each breed, and therefore \(s^b\) can be arbitrary. On the other hand, matrix \(\mathbf {G}^{\mathcal {A}\mathcal {B}}\) does not need an adjustment.

Finally, to incorporate the fact that marker genotypes only capture a fraction of the genetic effects, the partial marker-based relationship matrices \(\mathbf {G}^{b}\), \(b\in \mathcal {A},\mathcal {B},\mathcal {C}\) and \(\mathbf {G}^{\mathcal {A}\mathcal {B}}\) above may be replaced by matrices \(\mathbf {G}^{b}_{\omega }=\mathbf {G}^{b}(1-\omega )+\mathbf {A}^{b}\omega\), \(b\in \mathcal {A},\mathcal {B},\mathcal {C}\) and \(\mathbf {G}^{\mathcal {A}\mathcal {B}}(1-\omega )+\mathbf {A}^{\mathcal {A}\mathcal {B}}\omega\), respectively, where \(\omega\) is the fraction of genetic variance not captured by marker genotypes [4].

Genomic model for crossbred \(\mathcal {C}(\mathcal {A}\mathcal {B})\) performance: common genetic approach

The marker-based relationship matrix is constructed as usual across all genotyped animals:

$$\begin{aligned} \mathbf {G}=\frac{(\mathbf {m}-\mathbf {1}\mathbf {1}^\mathrm{{\scriptstyle {T}}})(\mathbf {m}-\mathbf {1}\mathbf {1}^\mathrm{{\scriptstyle {T}}})^\mathrm{{\scriptstyle {T}}}}{s} , \end{aligned}$$

where \(\mathbf {m}\) is the gene content matrix with entries 0, 1, 2 and s is scaling parameter. As in Christensen [20] and Legarra et al. [15], we chose common allele frequencies, i.e. \(p_j=0.5\), and then determine the parameters in matrix \(\varvec{\Gamma }\) and parameter s such that the pedigree-based and marker-based relationship matrices are compatible. Parameters in matrix \(\varvec{\Gamma }\) and scaling parameter s can be estimated by matching \(\mathbf {A}(\varvec{\Gamma })\) and \(\mathbf {G}\) for purebred individuals; see Legarra et al. [15]. For example, if genotyping is done in each of the three pure breeds then the following system of equations can be used to determine the parameters:

$$\begin{aligned} \bar{G}_{\mathcal {A},\mathcal {B}}/s&=\gamma _{\mathcal {A},\mathcal {B}}, \ \ \bar{G}_{\mathcal {A},\mathcal {C}}/s=\gamma _{\mathcal {A},\mathcal {C}}, \ \ \bar{G}_{\mathcal {B},\mathcal {C}}/s=\gamma _{\mathcal {B},\mathcal {C}}, \\ \bar{G}_{\mathcal {A},\mathcal {A}}/s&=\bar{A}_{\mathcal {A},\mathcal {A}}(1-\gamma _{\mathcal {A}}/2) + \gamma _{\mathcal {A}} , \\ \bar{G}_{\mathcal {B},\mathcal {B}}/s&= \bar{A}_{\mathcal {B},\mathcal {B}}(1-\gamma _{\mathcal {B}}/2) + \gamma _{\mathcal {B}}, \\ \bar{G}/s&=(\bar{\text {diag}(A_{\mathcal {A},\mathcal {A}})}(1-\gamma _{\mathcal {A}}/2) + \gamma _{\mathcal {A}})/3 \\&\quad +(\bar{\text {diag}(A_{\mathcal {B},\mathcal {B}})}(1-\gamma _{\mathcal {B}}/2) + \gamma _{\mathcal {B}})/3 \\&\quad +(\bar{\text {diag}(A_{\mathcal {C},\mathcal {C}})}(1-\gamma _{\mathcal {C}}/2) + \gamma _{\mathcal {C}})/3, \end{aligned}$$

where \(\bar{G}_{\mathcal {A},\mathcal {A}}\), \(\bar{G}_{\mathcal {A},\mathcal {B}}\), \(\bar{G}_{\mathcal {A},\mathcal {C}}\), \(\bar{G}_{\mathcal {B},\mathcal {B}}\), \(\bar{G}_{\mathcal {B},\mathcal {C}}\) and \(\bar{G}_{\mathcal {C},\mathcal {C}}\) denote averages of elements in submatrices of \(\mathbf {G}\), \(\bar{G}=(\bar{G}_{\mathcal {A},\mathcal {A}}+\bar{G}_{\mathcal {B},\mathcal {B}}+\bar{G}_{\mathcal {C},\mathcal {C}})/3\), \(\bar{A}_{\mathcal {A},\mathcal {A}}\), \(\bar{A}_{\mathcal {A},\mathcal {B}}\), \(\bar{A}_{\mathcal {A},\mathcal {C}}\), \(\bar{A}_{\mathcal {B},\mathcal {B}}\), \(\bar{A}_{\mathcal {B},\mathcal {C}}\) and \(\bar{A}_{\mathcal {C},\mathcal {C}}\) denote averages of elements in submatrices of \(\mathbf {A}_{22}\), and \(\bar{\text {diag}(A_{\mathcal {A},\mathcal {A}})}\), \(\bar{\text {diag}(A_{\mathcal {B},\mathcal {B}})}\), \(\bar{\text {diag}(A_{\mathcal {C},\mathcal {C}})}\) denote averages of diagonal elements in submatrices of \(\mathbf {A}_{22}\). This is a linear system of 7 equations with 7 parameters \(\gamma _{\mathcal {A}}, \gamma _{\mathcal {B}}, \gamma _{\mathcal {C}}, \gamma _{\mathcal {A},\mathcal {B}}, \gamma _{\mathcal {A},\mathcal {C}}, \gamma _{\mathcal {B},\mathcal {C}}\) and 1/s and can therefore be solved directly to obtain estimates.

The relationship matrix that combines pedigree and marker information becomes

$$\begin{aligned} \mathbf {H}(\varvec{\Gamma })^{-1} = \left[ \begin{array}{cc} \mathbf {0}&{} \mathbf {0}\\ \mathbf {0}&{} \mathbf {G}^{-1} - (\mathbf {A}(\varvec{\Gamma })_{22})^{-1} \end{array} \right] + \mathbf {A}(\varvec{\Gamma })^{-1} . \end{aligned}$$
(8)

Finally, similar to the previous section, the marker-based relationship matrices \(\mathbf {G}\) above may be replaced by \(\mathbf {G}_{\omega }=\mathbf {G}(1-\omega )+\mathbf {A}(\varvec{\Gamma })\omega\) where \(\omega\) is the fraction of genetic variance that is not captured by marker genotypes.

Genetic models for both purebred and crossbred \(\mathcal {C}(\mathcal {A}\mathcal {B})\) performances

In the previous sections, partial genetic and common genetic models for additive genetic effects for crossbred \(\mathcal {C}(\mathcal {A}\mathcal {B})\) performance were presented, and in both cases genomic versions of the models and combined relationship matrices were shown. Now, we show how the genetic variances and covariances for the model in Eq. (2) look like in the two cases.

For the partial genetic case, the vector of genetic effects on crossbred \(\mathcal {C}(\mathcal {A}\mathcal {B})\) individuals equals \(\mathbf {g}^{\mathcal {C}}_{\mathcal {C}(\mathcal {A}\mathcal {B})} + \mathbf {g}^{\mathcal {A}}_{\mathcal {C}(\mathcal {A}\mathcal {B})} + \mathbf {g}^{\mathcal {B}}_{\mathcal {C}(\mathcal {A}\mathcal {B})} + \mathbf {g}^{\mathcal {A}\mathcal {B}}_{\mathcal {C}(\mathcal {A}\mathcal {B})}\) and based on \(\mathbf {g}^{\mathcal {A}}_{\mathcal {C}(\mathcal {A}\mathcal {B})}\), \(\mathbf {g}^{\mathcal {B}}_{\mathcal {C}(\mathcal {A}\mathcal {B})}\) and \(\mathbf {g}^{\mathcal {C}}_{\mathcal {C}(\mathcal {A}\mathcal {B})}\), breed-specific partial relationships define the breeding values for crossbred \(\mathcal {C}(\mathcal {A}\mathcal {B})\) performance on purebred animals, \(\mathbf {g}_{\mathcal {A}}\), \(\mathbf {g}_{\mathcal {B}}\) and \(\mathbf {g}_{\mathcal {C}}\), respectively. Combining these effects with the breeding values for purebred performances, \(\mathbf {a}_{\mathcal {A}}\), \(\mathbf {a}_{\mathcal {B}}\) and \(\mathbf {a}_{\mathcal {C}}\), the variance-covariance of genetic effects is determined by

$$\begin{aligned}&{\text {Var}}\left[ \begin{array}{c} \mathbf {a}_{\mathcal {A}} \\ \star \\ \star \\ \text{- - - - -} \\ \mathbf {g}_{\mathcal {A}} \\ \mathbf {g}^{\mathcal {A}}_{\mathcal {A}\mathcal {B}} \\ \mathbf {g}^{\mathcal {A}}_{\mathcal {C}(\mathcal {A}\mathcal {B})} \\ \end{array} \right] = \varvec{\Sigma }_{\mathcal {A}}\bigotimes \mathbf {H}^{\mathcal {A}}, \\&{\text {Var}}\left[ \begin{array}{c} \mathbf {a}_{\mathcal {B}} \\ \star \\ \star \\ \text{- - - - -} \\ \mathbf {g}_{\mathcal {B}} \\ \mathbf {g}^{\mathcal {B}}_{\mathcal {A}\mathcal {B}} \\ \mathbf {g}^{\mathcal {B}}_{\mathcal {C}(\mathcal {A}\mathcal {B})} \\ \end{array} \right] = \varvec{\Sigma }_{\mathcal {B}} \bigotimes \mathbf {H}^{\mathcal {B}} , \\&{\text {Var}}\left[ \begin{array}{c} \mathbf {a}_{\mathcal {C}} \\ \star \\ \text{- - - - -} \\ \mathbf {g}_{\mathcal {C}} \\ \mathbf {g}^{\mathcal {C}}_{\mathcal {C}(\mathcal {A}\mathcal {B})} \\ \end{array} \right] = \varvec{\Sigma }_{\mathcal {C}} \bigotimes \mathbf {H}^{\mathcal {C}}, \\&{\text {Var}}\left[ \mathbf {g}^{\mathcal {A}\mathcal {B}}_{\mathcal {C}(\mathcal {A}\mathcal {B})} \right] = \sigma _{g,\mathcal {A}\mathcal {B}}^2 \mathbf {H}^{\mathcal {A}\mathcal {B}}, \end{aligned}$$

with the four vectors being independent. Here, \(\bigotimes\) denotes the Kronecker product, \(\star\) denotes artificial random vectors such that the genetic variance-covariance matrices can be expressed using Kronecker products and matrices

$$\begin{aligned} \varvec{\Sigma }_b = \left[ \begin{array}{cc} \sigma ^2_{a,b} &{} \sigma _{ag,b} \\ \sigma _{ag,b} &{} \sigma ^2_{g,b} \end{array} \right] , \end{aligned}$$

for \(b=\mathcal {A},\mathcal {B}, \mathcal {C}\), are the \(2\times 2\) variance-covariance matrices containing the genetic variances for purebred breeding values and crossbred breeding values, and the covariance between them. Thus, using partial relationship matrices provides a formulation of the model in Eq. (2) using Kronecker products, such that parameters can be estimated and breeding values predicted using standard animal breeding software. In this model, there are 10 genetic parameters and \(2(n_{\mathcal {A}}+ n_{\mathcal {B}} + n_{\mathcal {C}} + n_{\mathcal {A}\mathcal {B}}) + 3 n_{\mathcal {C}(\mathcal {A}\mathcal {B})}\) genetic values where \(n_{\mathcal {X}}\) is the number of individuals in population \(\mathcal {X}\).

For the common genetic case, all individuals are related, and breeding values for crossbred \(\mathcal {C}(\mathcal {A}\mathcal {B})\) performance on purebred animals, \(\mathbf {g}_{\mathcal {A}}\), \(\mathbf {g}_{\mathcal {B}}\) and \(\mathbf {g}_{\mathcal {C}}\), are defined by additive relationships to the genetic effects on crossbreds, \(\mathbf {g}_{\mathcal {C}(\mathcal {A}\mathcal {B})}\). Combining these effects with the breeding values for purebred performances, \(\mathbf {a}_{\mathcal {A}}\), \(\mathbf {a}_{\mathcal {B}}\) and \(\mathbf {a}_{\mathcal {C}}\), the variance-covariance of genetic effects equals:

$$\begin{aligned} {\text {Var}}\left[ \begin{array}{c} \mathbf {a}_{\mathcal {A}} \\ \star \\ \star \\ \star \\ \star \\ \text{- - - - -} \\ \star \\ \mathbf {a}_{\mathcal {B}} \\ \star \\ \star \\ \star \\ \text{- - - - -} \\ \star \\ \star \\ \mathbf {a}_{\mathcal {C}} \\ \star \\ \star \\ \text{- - - - -} \\ \mathbf {g}_{\mathcal {A}} \\ \mathbf {g}_{\mathcal {B}} \\ \mathbf {g}_{\mathcal {C}} \\ \mathbf {g}_{\mathcal {A}\mathcal {B}} \\ \mathbf {g}_{\mathcal {C}(\mathcal {A}\mathcal {B})} \\ \end{array} \right] = \varvec{\Sigma }\bigotimes \mathbf {H}(\varvec{\Gamma }), \end{aligned}$$

where \(\star\) denotes artificial random vectors and \(\varvec{\Sigma }\) is the \(4\times 4\) genetic variance-covariance matrix:

$$\begin{aligned} \varvec{\Sigma }= \left[ \begin{array}{cccc} \sigma ^2_{a,\mathcal {A}} &{} \sigma _{a,\mathcal {A},\mathcal {B}} &{} \sigma _{a,\mathcal {A},\mathcal {C}} &{} \sigma _{ag,\mathcal {A}} \\ \sigma _{a,\mathcal {A},\mathcal {B}} &{} \sigma ^2_{a,\mathcal {B}} &{} \sigma _{a,\mathcal {B},\mathcal {C}} &{} \sigma _{ag,\mathcal {B}} \\ \sigma _{a,\mathcal {A},\mathcal {C}} &{} \sigma _{a,\mathcal {B},\mathcal {C}} &{} \sigma ^2_{a,\mathcal {C}} &{} \sigma _{ag,\mathcal {C}} \\ \sigma _{ag,\mathcal {A}} &{} \sigma _{ag,\mathcal {B}} &{} \sigma _{ag,\mathcal {C}} &{} \sigma ^2_g \\ \end{array} \right] . \end{aligned}$$

The formulation of the model in Eq. (2) using Kronecker products implies that parameters can be estimated and breeding values predicted using standard animal breeding software. This model contains 10 genetic parameters and \(2(n_{\mathcal {A}}+ n_{\mathcal {B}} + n_{\mathcal {C}}) + n_{\mathcal {A}\mathcal {B}}) + n_{\mathcal {C}(\mathcal {A}\mathcal {B})}\) genetic values, and in addition, 6 parameters in matrix \(\varvec{\Gamma }\).

In the common genetic case, there are three parameters \(\sigma _{a,\mathcal {A},\mathcal {B}}\), \(\sigma _{a,\mathcal {A},\mathcal {C}}\) and \(\sigma _{a,\mathcal {B},\mathcal {C}}\) which are genetic covariances between purebred performances, and these parameters are not present in the partial genetic case. The reason is that they would not be identifiable since there is no specification of the relationships across breeds in the partial genetic case. In the common genetic case, the identifiability of \(\sigma _{a,\mathcal {A},\mathcal {B}}\), \(\sigma _{a,\mathcal {A},\mathcal {C}}\) and \(\sigma _{a,\mathcal {B},\mathcal {C}}\) relies on the genomic relationships between pairs of animals in different breeds. In the partial genetic case, there are four genetic parameters for crossbred performance, \(\sigma _{g,\mathcal {A}}^2\), \(\sigma _{g,\mathcal {B}}^2\), \(\sigma _{g,\mathcal {C}}^2\) and \(\sigma _{g,\mathcal {A}\mathcal {B}}^2\) that scale each of the four partial relationship matrices, whereas in the common genetic case there is only one such parameter \(\sigma _g^2\). As explained in a previous section, there is a correspondence between these parameters via the parameters in matrix \(\varvec{\Gamma }\) as follows: \(\sigma _{g,b}^2=\sigma _g^2(1 - \gamma _b/2)\), \(b=\mathcal {A},\mathcal {B},\mathcal {C}\), \(\sigma ^2_{g,\mathcal {A}\mathcal {B}}=\sigma ^2_g((\gamma _{\mathcal {A}}+\gamma _{\mathcal {B}})/2-\gamma _{\mathcal {A},\mathcal {B}})/4\). However, note that there is a difference between estimating \(\sigma _{g,\mathcal {A}}^2\), \(\sigma _{g,\mathcal {B}}^2\), \(\sigma _{g,\mathcal {C}}^2\) and \(\sigma _{g,\mathcal {A}\mathcal {B}}^2\) from phenotypes as in the partial genetic case, and determining these from a general \(\sigma ^2_g\) and parameters in \(\varvec{\Gamma }\), which are estimated based on marker genotypes as in the common genetic case.

Discussion

For three-way crossbreeding, we presented models based on both pedigree-based, marker-based and combined relationships. Using combined relationship matrices results in a model for genetic evaluation where both pedigree and marker genotypes are used simultaneously for genetic evaluation, i.e. a single-step method for genomic evaluation. This paper provides the models and mathematical formulas, but a numerical implementation is needed before the methods are ready for use in practice. Such methods make it possible to incorporate phenotypes and genotypes on crossbreds into an existing genetic evaluation system, assuming that such a system is based on a single-step method.

The models for three-way crossbreeding investigated in this paper were four-variate models where each variable was measured in a specific population, \(\mathcal {A}\), \(\mathcal {B}\), \(\mathcal {C}\) or \(\mathcal {C}(\mathcal {A}\mathcal {B})\). The main scenario that we have in mind is a scenario where the four variables represent the same biological trait measured in four different genetic backgrounds and possibly different environments, but in principle the four variables could also be different biological traits. An extension of the model to a situation where multiple biological traits are measured in each of the four populations is in principle straightforward since the additive relationship matrices are the same, although in practice it may require the estimation of a very large number of genetic parameters. Extending the approaches to other types of models that are implemented in standard animal breeding software, like threshold models, models with indirect genetic effects, models for test-day records, etc. is also in principle straightforward. Finally, modifying the models to other scenarios with data recording, for example with records on \(\mathcal {A}\mathcal {B}\) individuals or no records on one of the pure breeds, is also straightforward. In general, designing data recording for these complicated models is an issue, and for example to obtain precise estimates of the genetic correlation parameters, it would be important that the relationships between crossbred animals with records and purebred animals with records are close.

Two types of approaches for constructing additive relationships were presented, based on different assumptions about allele substitution effects of causal loci or SNPs. In the partial genetic approach, allele substitution effects of SNPs were assumed independent between breeds, whereas in the common genetic approach, they were assumed to be the same in different breeds. The partial genetic approach requires that alleles are traced according to breed of origin, which is feasible in some scenarios but may be difficult with sufficient accuracy in others. In particular, when crossbred \(\mathcal {C}(\mathcal {A}\mathcal {B})\) animals are genotyped, a reasonable requirement is that breed \(\mathcal {C}\) fathers are also genotyped which would make the tracing of the breed \(\mathcal {C}\) paternal allele feasible, but the tracing of the breed of origin (\(\mathcal {A}\) or \(\mathcal {B}\)) of the maternal allele may be more uncertain and depend on whether \(\mathcal {A}\mathcal {B}\) mothers are genotyped (may not be due to logistical issues), maternal grandfathers are genotyped and maternal grandmothers are genotyped (may be difficult to obtain if these are from multiplier herds). An advantage of the common genetic approach is that the marker-based relationship matrix is easier to construct because tracing the breed of origin of alleles is not required, but a disadvantage may be the computational burden of using a larger relationship matrix. In addition, parameters in matrix \(\varvec{\Gamma }\) need to be estimated and the sensitivity of genetic evaluation to these estimates is unknown. Future research using simulated and real data is needed to clarify the differences between the two approaches.

Other terminal crossbreeding systems are of interest in pig production. Models for two-way crossbreeding are relevant for sow-traits measured on animals from breed \(\mathcal {A}\) and \(\mathcal {B}\) and cross \(\mathcal {A}\mathcal {B}\), and such models were presented in Christensen et al. [11] using partial genetic relationship matrices. An alternative to this partial genetic approach would be to use the common genetic approach presented here. The four-way crossbreeding system where crossbred \(\mathcal {C}\mathcal {D}\) sires are mated to \(\mathcal {A}\mathcal {B}\) dams to produce \((\mathcal {C}\mathcal {D})(\mathcal {A}\mathcal {B})\) pigs for slaughter, is also used in pig production. The approaches in this paper can be extended to such a system, and the resulting model would be a five-variate model. Using the partial genetic approach, there would be four breed-specific partial relationship matrices and two breed-segregation partial relationship matrices, and the corresponding model for purebred and crossbred performances would contain 14 genetic parameters, whereas using the common genetic approach, the model for purebred and crossbred performances would contain 15 genetic parameters.

Many papers have reported genetic correlations between purebred and crossbred performances [2126]. The reported estimated correlations ranged from 0.38 to 0.946, depending on trait and on differences in the environment, and in general with relatively high standard error on the estimates. The higher the genetic correlation, the less gain there will be by including crossbred data into the genetic evaluation system. All these results are from two-way crosses, and the authors are not aware of publications based on data from three-way crossbreeding where data in purebred and crossbred populations are considered to be different traits. The models presented in this paper should be useful to investigate such data from three-way crossbreeding.

Conclusion

Models for genetic evaluation in the three-way crossbreeding system are presented. These models provide estimated breeding values for both purebred and crossbred performances, and can use pedigree-based or marker-based relationships, or combined relationships based on both pedigree and marker information. This provides a framework that allows information from three-way crossbred animals to be incorporated into a genetic evaluation system.